2010
1
INTRODUCTIONANDIMPORTANCE
The subject of Statistics, as it seems, is not a new uiscipline but it is as
olu as that human society itself. The woiu Statistics seems to have been
ueiiveu fiom the Latin woiu Status oi Italian woiu Statista oi the ueiman
woiu Statistik oi the Fiench woiu Statittique each of which means a
political state. In ancient times the scope of Statistics was piimaiily limiteu to
the collection of the following uata by the uoveinment foi assessing man
powei anu fiaming fiscal policies:
(i) Age anu sexwise population of the countiy
(ii) Piopeity anu wealth of the countiy
In Inuia, an efficient system of collection official anu auministiative
statistics existeu ovei 2uuu yeais ago in paiticulai uuiing the iegion of
Chanuiagupta Nauiiya (S24 Suu B.C.). Fiom Kautilyas Aithashastia it is
known that even befoie Suu BC, a veiy goou system of collecting vital
Statistics anu iegistiation of biiths anu ueaths was in vogue. Buiing Akbais
iegion (1SS6 16uS A.B) Raja Touaimal, the then lanu anu ievenue ministei,
maintaineu goou iecoiu of lanu anu agiicultuial statistics.
Seventeenth centuiy saw the oiigin of vital Statistics. Captain }ohn
uiaunt of Lonuon (162u 1674), known as the fatherofvitalstatistics was
the fiist man to stuuy the statistics of biiths anu ueaths. The theoietical
uevelopment of the so calleu mouein statistics came uuiing the miu
seventeenth centuiy with the intiouuction of Theoiy of Piobability anu
Theoiy of games anu Chance, the chief contiibutois being mathematicians
anu gambleis of Fiance, ueimany anu Englanu.
Kail Peaison (18S7 19S6), the founuei of gieatest statistical
laboiatoiy in Englanu (1911) is the pioneei in coiielational analysis. Bis
uiscoveiy of the Chi squaie test, the fiist anu most impoitant of mouein
tests of significance won foi statistics a place of science. In 19u8, the
uiscoveiy of Stuuents t uistiibution by W.S.uosset who wiote unuei pen
name of Stuuent usheieu in an aiea of exact sample tests. Sii Ronalu A. Fishei
(189u 1962) known as fatherofstatistics placeu statistics on a veiy sounu
footing by applying it to vaiious uiveisifieu fielus such as genetics, biometiy
euucation, anu agiicultuie.
Biostatistics
2010
2
DefinitionofStatistics
Biffeient authois have given uiffeient uefinitions of statistics.
Although no single uefinition of statistics is satisfactoiy foi the puipose, the
following statement will be useful.
Statistics is the study of methods and procedures for collecting,
classifying, summarizing, and analyzing data and for making scientific
inferencesfromsuchdata.
DefinitionsbyA.L.Bowley:Statistics aie numeiical statement of facts in any
uepaitment of enquiiy placeu in ielation to each othei. Statistics may be
calleu the science of counting in one of the uepaitments uue to Bowley,
obviously this is an incomplete uefinition as it takes into account only the
aspect of collection anu ignoies othei aspects such as analysis, piesentation
anu inteipietation. Bowley gives anothei uefinition foi statistics, which
states statistics may be iightly calleu the scheme of aveiages. This uefinition
is also incomplete, as aveiages play an impoitant iole in unueistanuing anu
compaiing uata anu statistics pioviue moie measuies.
DefinitionbyCroxtonandCowden:Statistics may be uefineu as the science
of collection, piesentation analysis anu inteipietation of numeiical uata fiom
the logical analysis. It is cleai that the uefinition of statistics by Cioxton anu
Cowuen is the most scientific anu iealistic one.
Definition by Horace Secrist: Statistics may be uefineu as the aggiegate of
facts affecteu to a maikeu extent by multiplicity of causes, numeiically
expiesseu, enumeiateu oi estimateu accoiuing to a ieasonable stanuaiu of
accuiacy, collecteu in a systematic mannei, foi a pieueteimineu puipose anu
placeu in ielation to each othei.
DevelopmentofBiostatistics
Biostatistics is uefineu as the application of the statistical methous to
the pioblems of biology, incluuing human biology, meuicine, anu public
health. It is also known as Biometrics oi Biometry (liteially meaning
Biological measuiement).
Peihaps the eailiest impoitant figuie in biostatistics thought was
Auolphe Quetelet (17961874), a Belgian astionomei anu mathematician,
who in his woik combineu the theoiy anu piactical methous of statistics anu
applieu them to the pioblems of biology, meuicine anu sociology. Fiancis
Biostatistics
2010
3
ualton (18221911) has been calleu the father of biostatistics and
eugenics, the two subjects that he stuuieu inteiielateuly.
Somedefinitionsconcerningstatisticalinference
Unit:
The smallest object oi inuiviuual that can be investigateu, the souice
of the basic infoimation. In suiveys, the units aie often calleu samplingunits;
in expeiiments, experimentalunits. (e.g.) inuiviuual animals in a faim
PopulationorUniverse:
A veiy laige (possible infinite) gioup of units conceining which
scientific infeiences aie to be maue. (e.g.) animals in a faim
Sample:
When a few units aie selecteu fiom a population, it is calleu as a
sample. (e.g.) animals of a paiticulai bieeu in a faim
Variable:
The quantitative oi numeiical chaiacteiistic of the uata is calleu as a
variable. (e.g.) bouy weight of goats
Continuousvariable:
A vaiiable that can potentially take any value within a iange is calleu
as a continuousvariable. (e.g.) uaily milk yielu of a cow.
Discreteordiscontinuousvariable:
If a vaiiable takes only integial values, then it is calleu as a discrete oi
discontinuousvariable. (e.g.) bloou cells count
Attribute:
It iefeis to the qualitative chaiactei of the items chosen. (e.g.) coloui
of an animal
Constant:
It is a numeiical value, which is same foi all the units in the
population. (e.g.) numbei of chiomosomes foi sheep
Parameter:
A statistical measuie peitaining to a population is calleu as a
parameter. (e.g.) mean, stanuaiu ueviation of the population
Biostatistics
2010
4
Statistic:
A statistical measuie peitaining to a sample is calleu as a statistic.
(e.g.) mean, stanuaiu ueviation of the sample.
Functionsofstatistics
piesents facts in a uefinite foim
simplifies mass of figuies
facilitates compaiison
helps in foimulating hypothesis
helps in testing the hypothesis
helps in pieuiction
helps in the foimulation of suitable policies
Limitationsofstatistics
Statistics is not suitable to the stuuy of qualitative phenomenon
Statistics uoes not stuuy inuiviuuals
Statistical laws aie not exact
Statistics table may be misuseu
Statistics is only, one of the methous of stuuying a pioblem
MisuseofStatistics:
The following aie some of the common ways in which Statistics can
be misuseu
1 Quoting figuies without theii context
2 Compaiing entiiely uiffeient sets of figuies because of some
supeificial similaiity
S Enumeiating only figuies favouiable in an aigument.
4 Aiguing fiom effect to cause.
S ueneializing fiom the pait to whole without any base.
COLLECTIONOFDATA
Befoie staiting collection of uata, one shoulu take the following into
consiueiation:
Biostatistics
2010
5
) 0ne shoulu have a uefinite object
) 0ne shoulu have cleai iuea about the infoimation to be
collecteu
A statistical investigation always begins with collection of uata. 0ne
can collect the uata eithei by himself oi fiom available iecoius.
The uata aie of two kinus:
1. Piimaiy uata
2. Seconuaiy uata
PrimaryData:
The uata collecteu by the investigatoi himself fiom the sample oi
population is calleu as the piimaiy uata. The souice fiom which one gatheis
piimaiy uata is calleu as the piimaiy souice
Methodsofcollectingprimarydata
Directpersonalobservation
This methou consists in the collection of uata peisonally by the
investigatoi fiom the souices conceineu. In othei woius investigatoi has to
go to the fielu peisonally to make enquiiy anu soliciting infoimation fiom the
infoimants oi iesponuents.
Indirectpersonalobservation
The investigatoi collects uata fiom a thiiu peison (calleu as witness),
who knows about the uata being gatheieu.
Datacollectionthroughagents,localreportersetc.
Beie the investigatoi appoints some peison calleu as an agent to
collect infoimation on his behalf. In this methou the scheuule (is the name
usually applieu to a set of questions which aie askeu anu filleu in a face to
face situation with anothei peison) which elicit compiehensive infoimation
will be fiameu by the chief investigatoi with the help of othei expeits baseu
on objective of the suivey.
Datacollectionthroughquestionnaires
The methou of senuing the questionnaiies (iefeis to a uevise foi
seeking answeis to questions by using a foim which the iesponuent fills in
Biostatistics
2010
6
himself) by post anu collecting the ieplies also by post shoulu be employeu if
it is not feasible to appoint enumeiatois to covei the whole giounu. 0ften
uistinction is maue between the scheuule anu a questionnaiie. A scheuule is
filleu by the inteivieweis in a facetoface situation with the infoimant. A
questionnaiie is filleu by the infoimant which he ieceives anu ietuins by
post. The questionnaiie is maileu to the iesponuents with a iequest foi quick
iesponse within a specifieu time. A veiy polite coveiing note explaining in
uetail the aim anu object of collecting the infoimation anu also the
opeiational uefinitions of vaiious teims anu concepts useu in the
questionnaiie is attacheu. Responuents aie also iequesteu to extenu full co
opeiation by fuinishing the coiiect ieplies anu ietuining the questionnaiies
uuly filleu in time. Responuents aie also taken into confiuence by ensuing
them that the infoimation supplieu by them in the questionnaiie will be kept
stiictly confiuential. In oiuei to ensuie quick anu bettei iesponse the ietuin
postage expenses aie usually boin by the investigatoi by senuing a self
auuiesseu stampeu envelope. Piepaiing a questionnaiie is a technical job
anu iequiies a gieat amount of skill, expeitise anu piactices.
Characteristicsofagoodquestionnaire:
1. Numbei of questions shoulu be minimum.
2. Questions shoulu be in logical oiueis, moving fiom easy to moie
uifficult questions.
S. Questions shoulu be shoit anu simple. Technical teims anu vague
expiessions capable of uiffeient inteipietations shoulu be avoiueu.
4. Questions fetching YES oi N0 answeis aie piefeiable. Theie may be
some multiple choice questions iequiiing lengthy answeis aie to be
avoiueu.
S. Peisonal questions anu questions which iequiie memoiy powei anu
calculations shoulu also be avoiueu.
6. Question shoulu enable cioss check. Belibeiate oi unconscious
mistakes can be uetecteu to an extent.
7. Questions shoulu be caiefully fiameu so as to covei the entiie scope of
the suivey.
8. The woiuing of the questions shoulu be piopei without huiting the
feelings oi aiousing iesentment.
9. As fai as possible confiuential infoimation shoulu not be sought.
1u. Physical appeaiance shoulu be attiactive, sufficient space shoulu be
pioviueu foi answeiing each question.
Befoie the actual suivey, a pilot suivey is conuucteu. The
questionnaiieScheuule is pietesteu in a pilot suivey. A few among the
Biostatistics
2010
7
people fiom whom actual infoimation is neeueu aie askeu to ieply. If they
misunueistanu a question oi finu it uifficult to answei oi uo not like its
woiuings etc., it is to be alteieu. Fuithei it is to be ensuieu that eveiy
question fetches the uesiieu answei.
Methods Merits Demerits
Biiect peisonal
obseivation
It is veiy
accuiate
Expensive in teims of time anu
money
Intensive uetails
can be collecteu
Not suitable when the fielu of
enquiiy is laige
Inuiiect peisonal
obseivation
It saves time Witnesses shoulu possess thoiough
knowleuge of the facts iegaiuing the
pioblem of investigation
Witness must be willing to give
infoimation
Bata collection
thiough agents
anu local
iepoiteis etc.
It saves time The agents will collect infoimation in
theii own fashion
0nly appioximate iesults can be
obtaineu
It is expensive
Bata collection
thiough
questionnaiies
Laige aieas can
be coveieu
It cannot be useu if the infoimants
aie illiteiate
It is less
expensive
Response may be pooi.
It saves time Possibility of vague inaccuiate
answeis.
Secondarydata:
The uata collecteu fiom the available souices like publisheu iepoits,
uocuments, jouinals etc. aie calleu seconuaiy uata. The souice fiom which
the seconuaiy uata aie collecteu is calleu as seconuaiy souice of uata. While
the piimaiy uata aie collecteu foi a specific puipose, the seconuaiy uata aie
gatheieu fiom souices which weie uone foi some othei puipose.
Biostatistics
2010
8
Sourcesofobtainingsecondarydata
(i) PublishedSource:
(a) 0fficial Publication of Cential uoveinment: (to mention a few)
Biiectoiate of Economics anu Statistics Ninistiy of
Agiicultuie anu Iiiigation
National Sample Suivey 0iganization (NS0), Bepaitment of
Statistics, Ninistiy of Planning
Cential Statistical 0iganization (CS0), Bepaitment of Statistics,
Ninistiy of Planning
(b) Publication of Semi uoveinment Statistical 0iganization:
Statistics Bepaitment of Reseive Bank of Inuia, Bombay
Economics Bepaitment of Reseive Bank of Inuia, Bombay
The Institute of Economic uiowth, Belhi
The Institute of Foieign Tiaue, New Belhi
(c) Publication of Reseaich Institutes:
Inuian Statistical Institute, Calcutta
Inuian Council of Agiicultuial Reseaich, New Belhi
Inuian Agiicultuial Statistics Reseaich Institute, New Belhi
National Council of Euucational Reseaich anu Tiaining
National Council of Applieu Economic ieseaich
(u) Publications of Commeicial anu Financial Institutions
(e) Repoits of vaiious Committees anu commissions appointeu
(f) News Papeis anu Peiiouicals
(g) Inteinational Publications
0niteu Nations 0iganizations
(ii) UnpublishedSource:
The statistical uata neeu not always be publisheu. Theie aie
vaiious souices of unpublisheu statistical mateiials such as the iecoius
Biostatistics
2010
9
maintaineu by piivate fiims oi business enteipiises, who may not like to
ielease theii uata to any outsiue agency; the vaiious uepaitments anu offices
of the Cential anu State uoveinments; the ieseaiches caiiieu out by the
inuiviuual ieseaich scholais in the 0niveisities oi ieseaich Institutions.
MeritsofSecondarydata
It saves time, laboui anu money
DemeritsofSecondarydata
It may not be veiy accuiate
All the uata neeueu may not be available
It might have been collecteu by some impiopei methous anu in some
abnoimal conuition
CLASSIFICATIONOFDATA
Classification of uata is the next step aftei collection of uata. It is the
piocess of aiianging uata into homogeneous classes accoiuing to similaiities.
Objectives(uses)ofclassification
1. To iemove unnecessaiy uetails
2. To biing out explicitly the significant featuies in the uata
S. To make compaiisons anu uiawing infeiences
TYPESOFCLASSIFICATION
1. Numericalclassification
Classification of uata accoiuing to quantitative chaiacteis. (e.g)
classification of animals in a faim accoiuing to theii weight
2. Descriptiveclassification
Classification accoiuing to attiibutes i.e, qualitative chaiacteis. (e.g).
classification of animals accoiuing to bieeus
3. Spatialclassification
Classification accoiuing to geogiaphical aiea. (e.g) uistiict wise
livestock population in Tamil Nauu
Biostatistics
2010
10
4. Temporalorchronologicalclassification
Classification accoiuing to time (e.g) livestock population in uiffeient
yeais.
TABULATION
Tabulation is the piocess of summaiizing classifieu oi gioupeu uata in
the foim of a table so that it is easily unueistoou anu an investigatoi is
quickly able to locate the uesiieu infoimation. A table is a systematic
aiiangement of classifieu uata in columns anu iows. Thus, a statistical table
makes it possible foi the investigatoi to piesent a huge mass of uata in a
uetaileu anu oiueily foim. It facilitates compaiison anu often ieveals ceitain
patteins in uata which aie otheiwise not obvious. Classification anu
Tabulation, as a mattei of fact, aie not two uistinct piocesses. Actually they
go togethei. Befoie tabulation uata aie classifieu anu then uisplayeu unuei
uiffeient columns anu iows of a table.
AdvantagesofTabulation
Statistical uata aiiangeu in a tabulai foim seive following objectives:
1. It simplifies complex uata anu the uata piesenteu aie easily
unueistoou.
2. It facilitates compaiison of ielateu facts.
S. It facilitates computation of vaiious statistical measuies like aveiages,
uispeision, coiielation etc.
4. It piesents facts in minimum possible space anu unnecessaiy
iepetitions anu explanations aie avoiueu. Noieovei, the neeueu
infoimation can be easily locateu.
S. Tabulateu uata aie goou foi iefeiences anu they make it easiei to
piesent the infoimation in the foim of giaphs anu uiagiams.
PreparingaTable
The making of a compact table itself an ait. This shoulu contain all the
infoimation neeueu within the smallest possible space. What the puipose of
tabulation is anu how the tabulateu infoimation is to be useu aie the main
points to be kept in minu while piepaiing foi a statistical table. An iueal table
shoulu consist of the following main paits:
Biostatistics
2010
11
1. Table numbei
2. Title of the table
S. Captions oi column heauings
4. Stubs oi iow uesignation
S. Bouy of the table
6. Footnotes
7. Souices of uata
Amodelstructureofatableisgivenbelow:
TableNumber TitleoftheTable
Sub
Heading
Caption Headings
Total
Caption SubHeadings
S
t
u
b
S
u
b

H
e
a
d
i
n
g
s
Body
Total
Footnotes:
SourcesNote:
RequirementsofaGoodTable
A goou statistical table is not meiely a caieless giouping of columns
anu iows but shoulu be such that it summaiizes the total infoimation in an
easily accessible foim in minimum possible space. Thus while piepaiing a
table, one must have a cleai iuea of the infoimation to be piesenteu, the facts
to be compaieu anu he points to be stiesseu.
Though, theie is no haiu anu fast iule foi foiming a table yet a few
geneial points shoulu be kept in minu:
Biostatistics
2010
12
1. A table shoulu be foimeu in keeping with the objects of statistical
enquiiy.
2. A table shoulu be caiefully piepaieu so that it is easily
unueistanuable.
S. A table shoulu be foimeu so as to suit the size of the papei. But such
an aujustment shoulu not be at the cost of legibility.
4. If the figuies in the table aie laige, they shoulu be suitably iounueu oi
appioximateu. The methou of appioximation anu units of
measuiements too shoulu be specifieu.
S. Rows anu columns in a table shoulu be numbeieu anu ceitain figuies
to be stiesseu may be put in box oi ciicle oi in bolu letteis.
6. The aiiangements of iows anu columns shoulu be in a logical anu
systematic oiuei. This aiiangement may be alphabetical,
chionological oi accoiuing to size.
7. The iows anu columns aie sepaiateu by single, uouble oi thick lines to
iepiesent vaiious classes anu subclasses useu. The coiiesponuing
piopoitions oi peicentages shoulu be given in aujoining iows anu
columns to enable compaiison. A veitical expansion of the table is
geneially moie convenient than the hoiizontal one.
8. The aveiages oi totals of uiffeient iows shoulu be given at the iight of
the table anu that of columns at the bottom of the table. Totals foi
eveiy subclass too shoulu be mentioneu.
9. In case it is not possible to accommouate all the infoimation in a
single table, it is bettei to have two oi moie ielateu tables.
TypeofTables
Tables can be classifieu accoiuing to theii puipose, stage of enquiiy,
natuie of uata oi numbei of chaiacteiistics useu. 0n the basis of the numbei
of chaiacteiistics, tables may be classifieu as follows:
1. Simple oi oneway table
2. Two way table
S. Nanifolu table
SimpleoronewayTable
A simple oi oneway table is the simplest table which contains uata of
one chaiacteiistic only. A simple table is easy to constiuct anu simple to
follow.
Biostatistics
2010
13
For example, the blank table given below may be used to show the number of adults in different types of
animals in a locality.
The number of adults in different occupations in a locality
Type of animal No. of Adults
Total
TwowayTable
A table, which contains uata on two chaiacteiistics, is calleu a two
way table. In such case, theiefoie, eithei stub oi caption is uiviueu into two
cooiuinate paits.
In the given table, as an example the caption may be further divided in respect of sex. This subdivision is
shown in twoway table, which now contains two characteristics namely, type of animal and sex.
Type of animal
No. Of Adults
Total
Male Female
Total
ManifoldTable
Thus, moie anu moie complex tables can be foimeu by incluuing
othei chaiacteiistics.
For example, we may further classify the caption subheadings in the above table in respect of marital
status, religion and socioeconomic status etc. A table, which has more than two characteristics of data,
is considered as a manifold table. For instance, table shown below shows three characteristics namely,
occupation, sex and marital status.
Occupation
No. of Adults
Total Male Female
M U Total M U Total
Total
Foot note: M Stands for Married and U stands for unmarried
Biostatistics
2010
14
Nanifolu tables, though complex aie goou in piactice as these enable
full infoimation to be incoipoiateu anu facilitate analysis of all ielateu facts.
Still, as a noimal piactice, not moie than foui chaiacteiistics shoulu be
iepiesenteu in one table to avoiu confusion. 0thei ielateu tables may be
foimeu to show the iemaining chaiacteiistics.
FREQUENCYDISTRIBUTION
Fiequency uistiibution is a seiies when a numbei of obseivations
with similai oi closely ielateu values aie put in sepaiate bunches oi gioups,
each gioup being in oiuei of magnituue in a seiies. It is simply a table in
which the uata aie gioupeu into classes anu the numbei of cases which fall in
each class aie iecoiueu. It shows the fiequency of occuiience of uiffeient
values of a single Phenomenon.
Afrequencydistributionisconstructedforthreemainreasons:
1. To facilitate the analysis of uata.
2. To estimate fiequencies of the unknown population uistiibution fiom
the uistiibution of sample uata anu
S. To facilitate the computation of vaiious statistical measuies
DiscreteorUngroupedFrequencyDistribution
In this foim of uistiibution, the fiequency iefeis to uisciete value.
Beie the uata aie piesenteu in a way that exact measuiements of units aie
cleaily inuicateu. Theie aie uefinite uiffeiences between the vaiiables of
uiffeient gioups of items. Each class is uistinct anu sepaiate fiom the othei
class. Noncontinuity fiom one class to anothei class exists. Bata such as facts
like the numbei of iooms in a house, the numbei of companies iegisteieu in a
countiy, the numbei of chiluien in a family, etc.
In this methou, the obseivations aie aiiayeu in a systematic way (in
an ascenuing oiuei of magnituue anu this piocess of aiiaying the
obseivations in natuial oiuei is calleu aiiay). Fiom the aiiayeu figuies
fiequency uistiibution foi each value can be obtaineu.
Eg. Weight of ten numbers of eggs in grams.
51, 58, 49, 52, 55, 61, 59, 55, 45, 48
Array Form 45 48 49 51 52 55 58 59 61
No. of egg 1 1 1 1 1 2 1 1 1
Biostatistics
2010
15
This iepiesentation though bettei than aiiay, uoes not conuense the
uata much anu it is quite cumbeisome to go thiough huge uata.
Grouped Frequency Distribution
When the uata aie gioupeu into classes of appiopiiate inteival,
showing the numbei in each class, we get fiequency uistiibution.
(e.g).The following is the frequency table showing the distribution of chicks in different weight classes:
Class
(weight in grams)
Frequency
(no. of chicks)
30 34 2
34 38 7
38 42 8
42 46 3
Total 20
Rawdata&groupeddata:
The obseiveu uata given as such is known as iaw uata. When the
obseiveu uata aie gioupeu in to gioups oi classes, they aie known as
gioupeu uata.
Classlimits
Class limits aie the limits within which the class inteival lies. Thus
each class inteival has two limits, the uppei anu the lowei limits.
Frequency
Fiequency is the numbei of obseivation in that class.
Widthorlengthoftheclass/classinterval
Wiuth of the class is the uiffeience between the uppei bounuaiy anu
lowei bounuaiy of the same class. The wiuth of a class is known as the class
inteival.
Classmark
The miupoint of the class is calleuClass maik.
Rulestobefollowedinformingafrequencydistribution
# The class inteival shoulu be of equal wiuth anu of such size that the
chaiacteiistic featuies of the uistiibution aie uisplayeu.
Biostatistics
2010
16
# Classes shoulu not be too laige (oi) too small. If too laige, it will
involve consiueiable eiiois in assuming that the miupoints of the
class inteivals aie the aveiage of that class. If too small, theie will be
many classes with zeio fiequency (oi) small fiequency. Theie aie
howevei ceitain type of uata, which may iequiie the use of unequal oi
vaiying class inteivals. When theie is iiiegulai flow of uata anu wiue
fluctuating gap among the vaiieties, vaiying class inteivals aie to be
taken (oi) otheiwise theie may be a possibility of classes without any
fiequency oi obseivations falling in that categoiy.
# The iange of the classes shoulu covei the entiie iange of uata anu the
classes must be continuous.
# It is convenient to have the miupoint of the class inteival to be an
integei. As a geneial iule, the numbei of classes shoulu be in the iange
of 616 anu nevei moie than Su.
FormationofClassIntervals
Fiist we have to foim the class inteival. L is lowest value in the uata
to be classifieu anu the H highest value. Finu the uiffeience.
i.e.difference=HL
wiJtb o tbc closs intcr:ol =
E I
k
k No. of iequiieu classes
The numbei of iequiieu classes can be calculateu using the foimula
suggesteu by Stuiges iule
k = 1 +S.S22 log n
n is total numbei of obseivations.
ChoiceoftheClassInterval:
The following aie the uiffeient types of class inteivals that aie
followeu.
Biostatistics
2010
17
a b c u e F
u 1u
1u 2u
2u Su
u anu unuei 1u
1u anu unuei 2u
2u anu unuei Su
S
1S
2S
Less than 2u
2u Su
moie than Su
u 1u
1u Su
Su 7u
u 9.9
1u 19.9
2u 29.9
In typeu the enu class aie open. In type 'e' theie is unequal class
inteival. In type 'c' the miu points of the class inteivals aie given. In type f
the class limits aie exactly uefineu. In type b is goou, we aie not using it, as
the class limits aie not cleaily expiesseu in type 'b'. We often use type 'a'.
The uifficulty is wheie to incluue 1u, 2u etc. we often incluue 1u in the
seconu class anu 2u in the thiiu class & so on. We uefine u anu up to 1u in
fiist class, 1u anu up to 2u in seconu class so on. Bepenuing on the neeu anu
situation, appiopiiate type of class inteival shoulu be chosen.
Formationoffrequencydistribution:
1.MethodofTallyMark
Aftei foiming the class inteival each shoulu be wiitten one below the
othei anu foi each item in the collecteu uata a stioke is maikeu against the
class inteival in which it falls. 0sually aftei eveiy 4 stioke in a class inteival
the S
th
item is inuicateu by making a uiagonal line thiough the pievious 4
stiokes. Thus stiokes aie counteu anu this is calleu foimation of fiequency
uistiibution by the methou of tally maiks.
Example: Let us consider the weights in kg of 50 college students.
42 62 46 54 41 37 54 44 32 45 47 50 58 49 51 42 46 37 42 39 54 39 51 58 47 64 43 48 49 48 49 61 41 40
58 49 59 57 57 34 56 38 45 52 46 40 63 41 51 41
Construct a frequency distribution
Here the size of the class interval as per sturges rule is obtained as follows
Sizc o closs intcr:ol = C =
HL
k
=
HL
1+3.322IogN
=
6432
1+3.322Iog50
=
32
6.64
= S
Thus the number of classes is 7 and size of each class is 5. The required size of each class is 5.
The required frequency distribution is prepared using tally marks as given below:
Biostatistics
2010
18
Classes
(Weight in Kg)
Tally Mark Frequency
30 35
(from 18.5 to below 19.5)
2
35 40 6
40 45 12
45 50 14
50 55 6
55 60 6
60 65 4
2.ArrayMethod
An aiiay is an oiueily aiiangement of the uata by magnituue in the
ascenuing oi uescenuing oiuei. Then aiiange the given uata in the ascenuing
oiuei of magnituue.
Foim the class inteival. Fiom the aiiay, we will count the numbei of
obseivations belonging to each class anu wiite against that class. This
methou is not easy, when the numbei of obseivation is laige. We can auopt
this methou in the cases, wheie the numbei of obseivations is less than Su.
PRESENTATIONOFDATA
Classification anu tabulation ieuuce the complexity of vast anu
complicateu statistical uata but still it is not easy to inteipiet the tabulateu
uata. Biagiams anu giaphs will catch the eye moie easily than tables which
pioviue aiiay of figuies. A glance ovei a giaph oi uiagiam will enable any
layman (without statistical knowleuge) to get an iuea about the essential
chaiacteiistics of the tabulateu uata without much stiain oi effoit. The
piesentation of uata in the foim of uiagiams anu giaphs is also calleu visual
piesentation of uata.
Biostatistics
2010
19
Functionsofdiagrams&graphs
It will attiact the attention of a laige numbei of peisons.
They caiiy a biius eye view impiession in the human minu.
It saves a lot of valuable time if piesenteu in a foim of suitable chaits
& giaphs insteau of pages of numeiical figuies.
To facilitate compaiison between two oi moie sets of uata.
Pieuiction equations can be iepiesenteu by giaphs anu these will be
of much in foiecasting.
Limitations ofdiagrams&graphs
They aie appioximate inuicatois. Exact anu accuiate infoimation's
can be obtaineu fiom oiiginal tabulai infoimation.
They cannot substitute the tabulai infoimation.
They fail to uisclose small uiffeience when laige figuies aie involveu.
GRAPHICALREPRESENTATIONOFDATA
The uiscussion of fiequency uistiibution has shown that tabulai
piesentation of uata tenus to pieseive numeiical accuiacy while giaphic
iepiesentation fosteis compaiison anu quick communication of majoi
featuies.
The geneial iules foi constiucting giaphs aie as follows:
1.Titleandfootnotes:
The giaphs must beai a concise anu selfexplanatoiy title anu must
contain appiopiiate foot notes.
2.SelectionofScale:
Piopei caie shoulu be taken in the selection of scale so that the giaph
is neithei too big noi small in auuition, the scales useu on the Xaxis anu the
Yaxis shoulu be mentioneu cleaily.
Biostatistics
2010
20
3.Neatness:
The giaph shoulu be neat anu cleai. If a numbei of uiagiams aie to be
piepaieu, it is uesiiable to numbei them foi the puipose of iefeience.
4.Attractive:
The giaph shoulu be attiactive so that it invites the attention of the
ieauei immeuiately. To make the uiagiam attiactive, leave ieasonable
maigin on all siues of the uiagiam.
5.Falsebaseline:
ueneially the veitical scale (ie Yaxis) staits fiom zeio. Bowevei if the
minimum value to be poitiayeu on the Yaxis is laige, it woulu be uifficult to
uiaw the giaph. In such cases use is maue of, what is known as false base
line. Foi this space between the oiigin point anu the maximum value is
ieuuceu by uiawing two zigzag hoiizontal lines foi the space between the
minimum value anu the oiigin.
6.Depictionofmorethanonevariable:
If moie than one vaiiable is to be uepicteu on a giaph, they shoulu be
shown by uiffeient types of lines. Colouis anu shaues shoulu be useu to
exhibit vaiious components of a uiagiam anu a key be pioviueu.
With the uata classifieu in the foim of gioupeu fiequency uistiibution,
we can have the following giaphical iepiesentation.
1. Bistogiam
2. Fiequency polygon
S. Fiequency cuive
4. 0give
S. Loienz Cuive
The constiuctions of the above giaphical iepiesentations aie
fuinisheu heieunuei.
1.Histogram:
In uiawing the histogiam of given gioupeu fiequency uistiibution, we
fiist maik off along the axis of X all the classes on as suitable scale. With the
Biostatistics
2010
21
class inteivals as bases, uiaw iectangles whose heights aie piopoitional to
the fiequency in theii classes. Foi equal class inteivals, the heights of the
iectangles will be piopoitional to the fiequencies while foi unequal class
inteivals, the height will be piopoitional to the iatios of the fiequencies to
the wiuths of classes.
Example 10: Draw a histogram for the following data
Wages in Rs Number of
Workers
050 8
50100 16
100150 27
150200 19
200250 10
250300 6
2.Frequencypolygon:
Polygon means a figuie having moie than foui siues. Foi a
gioupeu fiequency uistiibution, the abscissa of points aie miu values of the
classes. Foi equal class inteivals the fiequency polygon can be obtaineu by
joining the miuule points of the uppei siues of the aujacent iectangles of the
histogiam by means of stiaight lines. If the class inteivals aie of small wiuth
the polygon can be appioximateu by a smooth cuive. This foim of giaphical
iepiesentation also uepicts cleaily the featuies of uistiibution.
Biostatistics
2010
22
Example: Draw a frequency polygon for the following data.
Weight (in kg) Number of Students
3035 4
3540 7
4045 10
4550 18
5055 14
5560 8
6065 3
3.Frequencycurve:
Constiuction is same as foi fiequency polygon, but the fiequency
cuive can be obtaineu by uiawing a smooth fieehanu cuive thiough the
veitices of fiequency polygon.
Example: Draw a frequency polygon for the following data.
Weight (in kg) Number of Students
3035 4
3540 7
4045 10
4550 18
5055 14
5560 8
6065 3
0
5
10
15
20
30 35 40 45 50 55 60 65
N
o
.
o
f
s
t
u
d
e
n
t
s
Weight(inKg)
Biostatistics
2010
23
4.Ogive:
This is a cumulative fiequency cuive. This cuive is obtaineu by
making use of cumulative fiequency insteau the simple fiequency. It iuns
moie iegulaily than the oiuinaiy fiequency cuive. This is paiticulaily useful
foi finuing out meuian, quaitiles. The moue can also be obtaineu by finuing
the Xvalue foi the steepest pait of the cuive. This cuive is also smootheu out
if necessaiy. As it assumes the foim of an aich, it is calleu 0give oi 0gee. The
point to be paiticulaily noteu in uiawing out an 0give is that in case of a
cumulative fiequency cuive, plotting is to be uone at the uppei limits lowei
limits of the classes anu not on the miupoint as is uone in the case of
fiequency polygon cuive.
Less than 0give:
If the points aie plotteu with uppei limits of classes on Xaxis anu the
coiiesponuing cumulative fiequencies (less than) on Yaxis, the figuie
foimeu by joining these points with a smooth hanu is known as cumulative
fiequency cuive (less than).
Foimation of table of (less than anu gieatei than) 0gives:
Example: Draw the Ogives for the following data.
0
2
4
6
8
10
12
14
16
18
20
30 35 40 45 50 55 60 65
N
o
.
o
f
s
t
u
d
e
n
t
s
Weight(inKg)
Biostatistics
2010
24
Class interval Frequency
2030 4
3040 6
4050 13
5060 25
6070 32
7080 19
8090 8
90100 3
Solution:
Class limit Less than
Ogive
More than
Ogive
20 0 110
30 4 106
40 10 100
50 23 87
60 48 62
70 80 30
80 99 11
90 107 3
100 110 0
Foim the point of inteisection of less than anu gieatei than
cumulative fiequency cuives, a peipenuiculai is uiawn to the base, the
meuian value can be asceitaineu by measuiing the length of the axis of X
fiom the point of oiigin to the foot of the peipenuiculai. In othei woius,
0
10
20
30
40
50
60
70
80
90
100
110
120
0 10 20 30 40 50 60 70 80 90 100
C
u
m
u
l
a
t
i
v
e
f
r
e
q
u
e
n
c
y
Classlimit
Biostatistics
2010
25
meuian value is the Xvalue of the point of inteisection of gieatei than anu
less than 0gives.
5.LorenzCurve
Loienz cuive is a giaphical methou of stuuying uispeision. It was
intiouuceu by Nax.0.Loienz, a gieat Economist anu a statistician, to stuuy
the uistiibution of wealth anu income. It is also useu to stuuy the vaiiability
in the uistiibution of piofits, wages, ievenue, etc. It is specially useu to stuuy
the uegiee of inequality in the uistiibution of income anu wealth between
countiies oi between uiffeient peiious. It is a peicentage of cumulative
values of one vaiiable in combineu with the peicentage of cumulative values
in othei vaiiable anu then Loienz cuive is uiawn.
DIAGRAMMATICREPRESENTATIONOFDATA
Significance:
Even aftei piopei classification by the piocess of tabulation of a mass
uata, it will not lenu itself foi a ieauy giasp of infoimation containeu in it.
The uata if expiesseu uiagiammatically will have a bettei visual effect anu
this is apt to cieate inteiest even in a casual obseivei. visual effect is founu to
have a lasting effect anu this makes the obseivei to holu in minu a laige
numbei of facts without much stiain of effoit.
Essentialrequisitesofagooddiagram:
1. A uiagiam shoulu be well planneu out.
2. I shoulu be uiawn with almost caie.
S. It shoulu be neat anu uiawn to the scale
4. It shoulu have a goou visual effect
S. The uesign of the uiagiam shoulu be simple but impiessive
6. It shoulu involve less time, laboui anu cost anu shoulu have maximum
utility.
BarDiagram:
The bai is the simples of statistical uiagiams. It consists of a seiies of
bais of equal wiuth (all hoiizontal oi all veitical) stanuing on a common base
Biostatistics
2010
26
line, at equal inteivals, the lengths of these bais being piopoitional to the
magnituue of the vaiiables that they iepiesent.
ComponentBardiagram:
In ceitain cases, the vaiiable is capable of being subuiviueu. Then the
bais aie uiviueu into paits anu maikeu in uiffeient colouis oi in some othei
uistinct ways to show the component paits.
PercentageBarDiagram:
When the component paits aie expiesseu as peicentages of the
whole, the iesulting bai uiagiam is calleu peicentage bai uiagiam. In this
case all bais of equal length.
0
1
2
3
4
5
1995 2000 2005 2010
A
v
e
r
a
g
e
m
i
l
k
y
i
e
l
d
(
K
g
)
Year
0
1
2
3
4
5
6
7
8
9
10
1995 2000 2005 2010
A
v
e
r
a
g
e
m
i
l
k
y
i
e
l
d
(
K
g
)
Year
Goat
Buffal
o
Cow
Biostatistics
2010
27
MultipleBarDiagram:
Bai uiagiams may sometimes be supeiimposeu oi placeu in
juxtaposition foi compaiative puiposes.
PieDiagram:
Insteau of piesenting the vaiiables by bai (oi iectangles) they can be
iepiesenteu by ciicles whose aieas aie piopoitional to the value of vaiiables.
This piesentation is known as Pie uiagiam. The component paits of the
uiffeient vaiiables can then be iepiesenteu by sectois of these ciicles.
0%
20%
40%
60%
80%
100%
1995 2000 2005 2010
A
v
e
r
a
g
e
m
i
l
k
y
i
e
l
d
Year
Goat
Buffalo
Cow
0
1
2
3
4
5
1995 2000 2005 2010
A
v
e
r
a
g
e
m
i
l
k
y
i
e
l
d
(
K
g
)
Year
Cow
Buffalo
Goat
8.2
3.2
1.4
1.2
Population
Cow
Buffalo
Goat
Others
Biostatistics
2010
28
PercentagePieDiagram:
If the components paits aie expiesseu as peicentages of the whole,
the vaiiables can be iepiesenteu by ciicles of equal iauii each of which is
uiviueu into sectois showing the component paits. This iepiesentation may
be calleu the peicentage pie uiagiam.
Pictograph:
These consist of actual pictuies iepiesenting the vaiiable, the line of
the pictuies being piopoitional to the value of the vaiiable.
1995
Cow
Buffalo
Goat
Others
2005
Cow
Buffalo
Goat
Others
0
1
2
3
4
5
2005 2006 2007 2008
A
v
e
t
a
r
g
e
n
o
.
o
f
C
o
m
p
u
t
e
r
s
Year
Computer
Biostatistics
2010
29
StatisticalMaps:
Statistical maps aie wiuely useu to iepiesent special uistiibution such
as aieas unuei uiffeient ciops, population its uensity etc., on the map, the
magnituue of the uata is shown
a. by points, uots oi ciosses
b. by wiiting actual figuies
c. by using uiffeient colouis
anu it is inuicateu by the key shown in one of the coineis of the map.
SUMMATIONNOTATIONS
If theie aie n obseivations anu theii values aie given by x
1
, x
2
, ..., x
n
then the sum of the obseivations, x
1
+ x
2
+ ... + x
n
can be wiitten using the
symbol ` (ieau as sigma) as follows:
x
1
+x
2
++x
n
= x
I
n
=1
which means the sum of x
i
's, i
taking values fiom1 to n
When no ambiguity is likely to aiise, then the suffix i can be iemoveu
anu we can wiite as x
1
+x
2
++x
n
=x
.
Then using the notation f
i
x
i
=f
1
x
1
+
f
2
x
2
++f
n
x
n
which can be wiitten as fx.
bx
n
=1
= b x
I
n
=1
Biostatistics
2010
30
which can be wiitten as
bx
= bx
1
+bx
2
++bx
n
= b (x
1
+x
2
++x
n
) = b x
MEASURESOFCENTRALTENDENCY/MEASURESOFLOCATION/
AVERAGES
A Statistical aveiage conuenses a fiequency uistiibution oi iaw uata
anu piesents it in one single iepiesentative numbei. Thus a single expiession
iepiesenting the whole gioup is selecteu which may convey a faiily auequate
iuea about the whole gioup. This single expiession in statistics is known as
the aveiage. Such a value can neithei be the smallest one noi the laigest one
but is one which usually iefeiieu to as a measuie of an aveiage is usually
iefeiieu to as a measuie of cential tenuency. It is locateu at a point aiounu
which most of the othei values tenu to clustei anu theiefoie it is also teimeu
as a measuie of location. It is consiueieu as a measuie of uesciiption because
it uesciibes the main chaiacteiistics of the uata.
Objectivesofaveragingorneedforcalculatingaverages:
1. Besciibing the uistiibution in a concise mannei
An aveiage conuenses the mass uata into a single value anu this
enables us to foim an iuea about the entiie uistiibution.
2. Compaiing two oi moie uistiibution
When aveiages foi two oi moie uistiibutions aie calculateu, the task
of compaiison becomes easy.
S. In computing othei statistical measuies
Complete stuuy of a uistiibution iequiies calculation of vaiious
statistical measuies like uispeision, skewness, kuitosis etc. the computation
of many of these measuies iequiies as a fiist step the computation of an
aveiage value.
4. In caiiying out othei statistical analysis
To compaie the mean peifoimance of two uistiibutions fiist, mean
value has to be compaieu.
Aveiage is a geneial teim. Theie aie uiffeient types of aveiages.
Biostatistics
2010
31
Typesofaverages
1. Aiithmetic Nean (AN) oi Nean oi Common Nean
2. ueometiic Nean (uN)
S. Baimonic Nean (BN)
4. Neuian
S. Noue
1.ArithmeticMean(AM)
Forungrouped/Rawdata:
The Aiithmetic Nean is the value aiiiveu at by uiviuing the sum of
obseivations by the total numbei of obseivations. The mean of a population
is uenoteu by (ieau as 'mu'); wheieas foi the sample it is uenoteu by x
(ieau as 'x bai'). The total numbei of obseivations is uenoteu by 'N' foi the
population anu by 'n' foi the sample.
If we uenote the 'n' obseivations in a seiies by x
1
,x
2
,x
3
x
n
.
Aiithmetic Nean =
Sum o tbc :olucs o tbc itcms in tbc scrics
Iotol numbcr o itcms
x =
x
1
+x
2
++x
n
n
x =
x
n
i=1
n
x =
x
n
Example:
The following are the body weight (Kg) of 10 Merino Rams. Calculate A.M.
52, 58, 65, 70, 65, 55, 74, 68, 75, 80
A.M = (52 + 58 + 65 + 70 + 65 + 55 + 74 + 68 + 75 + 80) / 10
= 662 / 10 = 66.2 Kg.
Biostatistics
2010
32
GroupedData:
In the case of a fiequency uistiibution if the uiffeient class maiks of
the 'k' classes aie uenoteu by x
1
,x
2
,x
3
,x
k
anu coiiesponuing fiequencies
by f
1
,f
2
,f
3
,f
k
then the mean of the uata is
x =
1
x
1
+
2
x
2
++
n
x
n
1
+
2
++
n
x =
I
k
i=1
I
k
i=1
x =
I
N
Wheie
f
i
= fiequency of the i
th
class
x
i
= miu value of the i
th
class
k = no. of classes
N =
k
=1
ShortcutMethod(groupeddata):
x = A +_
f
I
u
I
k
=1
N
C_
x = A +_
f
I
u
I
N
C_
put J
=
(x
A)
C
Wheie
f
i
= fiequency of the i
th
class
x
i
= miu value of the i
th
class
C = class inteival
Biostatistics
2010
33
N =
1
+
2
++
n
x =
f
I
x
I
N
x =
147
Su
= 2.94 Kg
Shortcut Method (grouped data):
x = A +_
f
I
u
I
N
C_
= 2.9 +_
1u
2u
2] = 2.94 Kg
COMBINEDMEAN
If x
1
is the mean of the fiist gioup of n
1
items, x
2
is the mean of
seconu gioup of n
2
items, then the combineu mean of the two gioups is
Biostatistics
2010
34
x =
n
1
x
1
+ n
2
x
2
n
1
+n
2
Wheie
x
1
=
Sum of all the obseivations in the fiist gioup
n
1
n
1
x
1
= Sum of all the obseivations in the fiist gioup
x
2
=
Sum of all the obseivations in the seconu gioup
n
2
n
2
x
2
= Sum of all the obseivations in the seconu gioup
n
1
x
1
+n
2
x
2
= Sum of all the obseivations in the two gioups of size n
1
+ n
2
x =
n
1
x
1
+ n
2
x
2
n
1
+n
2
Extenuing the above iesult if x
I
is the mean of the i
th
gioup of n
i
obseivations, then
x =
n
k
i=1
n
k
i=1
x =
n
WEIGHTEDARITHMETICMEAN:
In computing simple AN, it was assumeu that all the items aie of equal
impoitance. This may not be always tiue. When items vaiy in impoitance
they must be assigneu weight in piopoition to theii ielative impoitance.
Thus, a weighteu mean is the mean of weighteu items. In calculating
weighteu A.N. each item is multiplieu by its weight anu the piouucts so
ueiiveu aie summeu up. This total is uiviueu by the total weights (anu not by
the numbei of items) to get the weighteu mean.
Symbolically if x
1
, x
2
x
n
aie the uiffeient items with weights w
1
,
w
2..
w
n
iespectively then the weighteu mean is given by
Biostatistics
2010
35
x
w
=
(w
1
x
1
+w
2
x
2
++w
n
x
n
)
(w
1
+w
2
++w
n
)
x
w
=
w
i
x
i
w
i
In fact, AN of a gioupeu uata is a weighteu aveiage of the class maiks
i.e. miuule value of the classinteival whose weights aie the iespective
fiequency. Weighteu mean is a much bettei measuie than the simple mean in
case when the items in a seiies aie not equally impoitant.
2. Geometricmean(GM)
The geometiic mean is the n
th
ioot of piouuct of n items of a seiies. If
x
1
,x
2
x
n
aie the n obseivations in a seiies then uN is given by:
0H = (x
1
x
2
x
n
)
n
0H = (x
1
x
2
x
n
)
1
n
,
To simplify the above by taking logaiithms on both siues,
log 0H = log (x
1
x
2
x
n
)
1
n
,
log 0H =
1
n
(log x
1
+log x
2
++log x
n
)
log 0H =
log x
n
Theiefoie, uN = antilog
log x
n
Thus, geometricmeanistheantilogarithmofthearithmetic
mean of the logarithmic values. Logaiithm of geometiic mean is the
aiithmetic mean of logaiithmic values.
In the case of fiequency uistiibution (gioupeu uata), uN is given by
0H = (x
1
]
1
x
2
]
2
. . x
n
]
n
)
1
N
,
Biostatistics
2010
36
Wheie
N = total fiequency = f
i
x
i
is the miupoint of the i
th
class with fiequency f
i
Simplifying by taking logaiithm on both siues,
log 0H =
1
N
log(x
1
]
1
x
2
]
2
. . x
n
]
n
)
log 0H
=
1
N
(f
1
log x
1
+f
2
log x
2
++f
n
log x
n
)
log 0H =
f
I
log x
N
Theiefoie, uN = antilog
f
I
log x
N
Wheie
x
i
is the miu value of the class whose fiequency is f
i
3. Harmonicmean(HM)
Baimonic mean is the total numbei of items of a vaiiable uiviueu by
the sum of the iecipiocals of the items. If x
1
,x
2..
x
n
aie the n obseivations
anu BN iepiesents the haimonic mean, then
EH =
n
[
1
x
1
+
1
x
2
++
1
x
n
EH =
n
_
1
x
i
]
EH =
1
[
1
x
n
Biostatistics
2010
37
EH =
1
A. N. of the iecipiocals
Harmonic mean is the reciprocal of arithmetic mean of the
reciprocalvalues.
In the case of a fiequency uistiibution, BN is obtaineu by using the
foimula,
EH =
1
+
2
++
n
_
1
x
1
+
2
x
2
++
n
x
n
]
EH =
N
_
]
i
x
i
]
Wheie
x
i
is the miu value of the class whose fiequency is f
i
Nis the total fiequency.
4. Median
It is the value which has got equal numbei of obseivations on eithei
siue when the items aie aiiangeu in the ascenuing oi uescenuing oiuei of
magnituue. Neuian uiviues the seiies into two equal paits; one pait will
consist of all vaiiables less than meuian anu the othei pait gieatei than
meuian.
Foranungrouped(raw)data:
Casea:
When n is ouu, then
Neuian = size of _
(n +1)
2
,
_
th
item aftei aiianging the uata in the
ascenuing oi uescenuing oiuei of magnituue
E.g. Find the median value of body weight of Merino Rams
Biostatistics
2010
38
52, 58, 65, 70, 65, 55, 74, 68, 75
Here n = 9 (odd)
First, arrange the body weight in the ascending order of magnitude
52, 55, 58, 65, 65, 68, 70, 74, 75
Median term = (n + 1) / 2
th
term in the array
i.e. (9 + 1) / 2
th
term = 5
th
term in the array
Median = 65 Kg. (5
th
term in the array)
Caseb:
When n is even, then
Neuian = aveiage of (
n
2
)
tb
onJ j
n
2
+ 1[
th
item in the aiiay
E.g. To find the median value of body weight of 10 Merino Rams:
52, 58, 65, 70, 65, 55, 74, 68, 75, 80
First arrange the body weights in the ascending order of magnitude
52, 55, 58, 65, 65, 68, 70, 74, 75, 80
Here n = 10 (even)
Neuian = aveiage of (
n
2
, )
th
onJ j
n
2
+1[
th
item in the aiiay
= average of 10/2 & (10/2) + 1
th
term in the array
= average of 5 & 6
th
term in the array
= 65 + 68 / 2 = 133 / 2 = 66.5 Kg.
GroupedData:
In the case of fiequency uistiibution, meuian is the value which has
got equal numbei of fiequencies on eithei siue (i.e.) which coiiesponus to
the cumulative fiequency of N/2. It is obtaineu by
Biostatistics
2010
39
HcJion = l +_
C
_
N
2
m]_
Wheie
l = lowei limit of the meuian class
f= fiequency of the meuian class
C = class inteival
m = cumulative fiequency of the class pioceeuing to the meuian class
Neuian class = class whose cumulative fiequency just exceeus N/2
Note 1: Medianal class is the class corresponding to the cumulative frequency
equaltoorjustgreaterthanN/2.
2: MediancanbecomputedusingOgive.Itisthexcoordinateofthepoint
of intersection of the less than and greater than cumulative frequency
curve.
e.g. Find the median for the following frequency distribution of birth weight of Nilagiri lambs
Classes (birth weight) Frequency (fi) Cumulative Frequency
1.8 2.0 2 2
2.0 2.2 1 3
2.2 2.4 2 5
2.4 2.6 3 8
2.6 2.8 9 17
2.8 3.0 11 26
3.0 3.2 11 39
3.2 3.4 4 43
3.4 3.6 4 47
3.6 3.8 1 48
3.8 4.0 2 50
Total 50
Median class = the class whose cumulative frequency just exceeds N/2
= the class whose cumulative frequency just exceeds 50/2 i.e. 25
= 2.8 3.0
l = 2.8; c = 0.2; f = 11; N/2 = 50/2 = 25; m = 17
Median = 2.8 + 0.2 / 11 (25 17)
= 2.945 Kg.
Biostatistics
2010
40
5. Mode
It is the size of the most fiequent item in a laige set of uata. Thus
moue is the value of that vaiiable which occuis most fiequently oi iepeats
itself the gieatest numbei of times.
E.g. Find the mode for the body weight of Merino Rams
52, 58, 65, 70, 65, 55, 74, 68, 75, 80
FORM UNGROUPED FREQUENCY DISTRIBUTION:
Value of the variable 52 55 58 65 68 70 74 75 80
Frequency 1 1 1 2 1 1 1 1 1
Mode = 65 Kg.
In the case of gioupeu uata moue can be calculateu by
HoJc = l +_
2
(
1
+
2
)
C_
Wheie
l = lowei limit of the mouel class
f
1
= fiequency of the class which pioceeus (comes eailiei) the mouel class
f
2
= fiequency of the class which succeeus (comes aftei) the mouel class
C= class inteival
Nouel class = the class which is having maximum fiequency.
E.g. Calculate mode for the following frequency distribution of birth weight of Nilagiri lambs
Classes (birth weight) Frequency (fi)
1.8 2.0 2
2.0 2.2 1
2.2 2.4 2
2.4 2.6 3
2.6 2.8 9
2.8 3.0 12
3.0 3.2 10
3.2 3.4 4
3.4 3.6 4
3.6 3.8 1
3.8 4.0 2
Total 50
Biostatistics
2010
41
Model class: 2.8 3.0
l = 2.8; f1 = 9; f2 = 10; c = 0.2
Mode = 2.8 + (10/19)(0.2) = 2.905 Kg.
Note : 1. Noue can be computeu fiom histogiam . It is the x cooiuinates of
the points of inteisection of the two uiagonals fiom the top coineis of
the moual class to the pie anu post moual class top coineis.
2. As a fiist appioximation, miupoint of the moual class will be taken
as the value of the moue which is calleu ciuue moue.
S. In a moueiately asymmetiical uistiibution meanmoue = S (mean
meuian), (appioximately), Noue = S meuian 2 mean
(appioximately). This is empiiical moue.
4. A uistiibution can have moie than one moue. If it has got one
moue, it is calleu unimoual uistiibution; if it has got two moues, it is
calleu bimoual uistiibution; if it has got thiee moues, it is calleu tii
moual uistiibution; if it has got moie than thiee moues, it is calleu
multimoual oi polymoual uistiibution.
Propertiesofarithmeticmean
1. The sum of the ueviations of the items fiom the mean is equal to zeio.
2. The sum of the squaieu ueviations fiom the mean is smallei than the
sum of the squaieu ueviations of the items fiom any othei value i.e.
(x
x)
2
is minimum.
S. The piouuct of the mean with the numbei of obseivations gives the total
of the oiiginal uata, i.e. x
= nx
4. If x
1
, x
2
aie the means of the two gioups with the numbei of
obseivations, n
1
anu n
2
iespectively the mean of the combineu gioup x is
given by
x =
n
1
x
1
+ n
2
x
2
n
1
+n
2
S. AN > uN > BN
6. When all the values aie equal, AN = uN = BN
Biostatistics
2010
42
7. i. foi a symmetiical uistiibution, AN = meuian = moue
ii. foi a positively skeweu uistiibution, AN > meuian > moue (shoit tail
on the left)
iii. foi a negatively skeweu uistiibution AN < meuian < moue (shoit tail
on the iight)
Propertiesofgeometricmean
1. uN will be zeio if one oi moie of the values aie zeio.
2. uN < AN, uN > BN
Propertiesofharmonicmean
1. BN < uN < AN
PropertiesofMedian
1. Nean ueviation taken about meuian as the oiigin is the minimum.
2. i. If the uistiibution is symmetiical, meuian = mean = moue,
ii. Neuian <mean anu meuian> moue, if the uistiibution is positively
skeweu.
iii. Neuian > mean, anu meuian < moue, if the uistiibution is negatively
skeweu.
PropertiesofMode
1. i. If the uistiibution is symmetiical, moue = meuian = mean.
ii. If the uistiibution is positively skeweu, moue < meuian < mean,
iii. If the uistiibution is negatively skeweu, moue > meuian > mean
2. If the uistiibution is moueiately asymmetiical then, Noue = S meuian
2 mean (appioximately).
Choiceofanaverage
The selection of an aveiage is a uifficult one. It shoulu be uone aftei
giving consiueiation to the natuie anu type of enquiiy taken up anu also the
object of statistical investigation. No one aveiage can be goou foi all
Biostatistics
2010
43
puiposes, as uiffeient foims of aveiages have uiffeient chaiacteiistics. Thus
in selecting an aveiage the chief chaiacteiistics anu limitation of vaiious
aveiages must be consiueieu. Nost of the aveiages suffei fiom one limitation
oi the othei anu they have theii own auvantages anu uisauvantages.
Anidealaverageshouldpossessthefollowingqualities:
1. It shoulu be iigiuly uefineu
2. It shoulu be baseu on all obseivations
S. It shoulu be simple to unueistanu anu easy to calculate
4. It shoulu have minimum influence of extieme values
S. It shoulu possess sampling stability
6. It shoulu be capable of fuithei algebiaic tieatment.
7. It shoulu not be affecteu by open enu classes
Let us see how the uiffeient aveiages satisfy these qualities:
Arithmeticmean
9 It is iigiuly uefineu, baseu on all obseivations, simple to unueistanu
anu easy to calculate anu is capable of fuithei algebiaic tieatment. It
possesses sampling stability to some extent.
9 It is affecteu much by extieme values anu also by open enu classes
Geometricmean
9 It is iigiuly uefineu, baseu on all obseivations anu is capable of fuithei
algebiaic tieatment.
9 It is not simple to unueistanu anu easy to calculate as it involves
logaiithms, anu it uoes not possess sampling stability. It is affecteu to
some extent by extieme values. It is affecteu by open enu classes.
Harmonicmean
9 It is iigiuly uefineu, baseu on all obseivations anu is capable of fuithei
algebiaic tieatment.
9 It is not simple to unueistanu anu easy to calculate as it involves
iecipiocals anu uoes not possess sampling stability. It is affecteu to
some extent by extieme values. It is affecteu by open enu classes.
Median
9 It is simple to unueistanu anu easy to calculate anu is not affecteu by
extieme values. It can be calculateu foi uistiibution with open enu
classes.
Biostatistics
2010
44
9 It is not iigiuly uefineu, not baseu on all obseivations, uoes not
possess sampling stability anu is not capable of algebiaic tieatment.
Mode
9 It is simple to unueistanu anu easy to calculate anu is not affecteu by
extieme values anu open enu classes (noimally).
9 It is not iigiuly uefineu, not baseu on all obseivations, uoes not
possess sampling stability anu is not capable of algebiaic tieatment.
Aiithmetic
mean
ueometiic
mean
Baimonic
mean
Neuian Noue
1. Rigiuly uefineu 3 3 3 2 2
2. Baseu on all obseivations 3 3 3 2 2
S. Simple to unueistanu anu easy to
calculate
3 2 2 3 3
4. Ninimum influence of extieme values
2 3 3
S. Sampling stability
2 2 2 2
6. Fuithei algebiaic tieatment 3 3 3 2 2
7. Not be affecteu by open enu classes 2 2 2 3 3
Thus we see that the qualities essential foi a goou aveiage aie
satisfieu in vaiying uegiees by uiffeient measuies of cential tenuency that
have been seen. It is obvious that AN possesses the above piopeities moie
than any othei type of aveiages. It is the most populai uevice in piactice.
Bence it is calleu common aveiage. Though the meuian anu moue aie easily
computeu than otheis, they aie inueteiminate in many cases anu aie not
capable of algebiaic manipulations.
Situationswheredifferentaveragesareused
# AN is geneially applicable foi all soits of uata. It shoulu be useu when the
uistiibution is ieasonably symmetiical anu fuithei statistical analysis is
to be caiiieu out such as the computation of the stanuaiu ueviation etc.
anu also algebiaic manipulation is to be followeu subsequently.
# uN is useu when it is uesiieu to give moie weights to small items anu less
weight to laige items anu in the case of iatios, peicentages anu
miciooiganisms giowth.
# BN is useu in aveiaging ceitain types of iatios anu iates anu pioblems
involving time. It gives moie weight to small items.
Biostatistics
2010
45
# The meuian is to be useu when the attiibute of the uata aie not uiiectly
measuiable. As it can be easily locateu by meie inspection, it can be
calculateu when the uata aie incomplete. 0se the meuian when the
uistiibution is highly skeweu anu the extieme items may have uistoiting
effects on the mean.
# Noue can be useu to know the most typical value oi the most common
item. It is also useu when the quickest estimate of centiality is iequiieu.
MEASURESOFDISPERSION
The measuies of cential tenuency inuicate only the cential position.
But they offei theii own limitations anu uo not thiow light on the foimation
of the seiies of uata. Sometimes they may offei misleauing iesults too. Foi
e.g. consiuei the following thiee seiies.
Series A 100,100,100,100
Series B 100,106,98,92,93,109,102
Series C 1,79,220
They have the same mean 1uu. Bence we may concluue that
these seiies aie alike in natuie. But a close examination shall ieveal that the
uistiibutions uiffei wiuely fiom one anothei. In one uistiibution, the values
may be closely packeu anu in the othei they may be wiuely scatteieu. Such a
vaiiation is calleu scattei, spieau oi uispeision. Bence an aveiage is moie
meaningful when it is examineu in the light of uispeision. When uispeision is
not significant then the aveiage appeais to be a tiue iepiesentative figuie of
the seiies anu when uispeision is significant, it implies that the aveiage is fai
fiom being a tiue iepiesentative figuie. The measuiement of the scatteiing
of item in a uistiibution about the aveiage is calleu a measuie of vaiiation oi
uispeision. Neasuies of uispeision also enable compaiison of two oi moie
uistiibutions with iegaiu to theii vaiiability oi consistency.
Objectivesofmeasuresofdispersion
) To ueteimine the ieliability of an aveiage
Stuuy of uispeision helps us in unueistanuing how foi an
aveiage is iepiesentation of the mass
) To seive as a basis foi contiol of the vaiiability
It helps us to ueteimine the natuie anu causes of uispeision
with a view to contiol vaiiability
Biostatistics
2010
46
) To compaie two oi moie seiies with iegaiu to theii vaiiability
Neasuies of uispeision help in compaiing the vaiiability of two
oi moie seiies
Absoluteandrelativedispersion
When uispeision is expiesseu in teims of oiiginal units of seiies, foi
e.g., weight in kgs, income in iupees etc., it is calleu as absolute uispeision. If
uispeision is expiesseu in teims of a puie numbei, fiee fiom units of
measuiements, then the uispeision is ielative uispeision.
A ielative measuie of uispeision is an absolute measuie of uispeision
uiviueu by an aveiage.
Differentmeasuresofdispersion
1. Range
It is the uiffeience between the highest anu lowest values in the iaw
uata. Foi the gioupeu uata, the iange is the uiffeience between the lowei
limit of the fiist class anu the uppei limit of the last class. It is a veiy simple
measuie of uispeision. It is useful in the stuuy of vaiiation in money iate anu
iates of exchange, weathei foiecast etc.
Relative measuie of uispeision foi iange is the iatio of iange (R.R)
which is given by RR =
(HL)
(H+L)
Biostatistics
2010
47
2. QuartileDeviation(QD)
It is also known as semiintei quaitile iange. It is baseu on quaitiles
which aie points which uiviue the uata into foui equal paits. The lowei oi
fiist quaitile (Q
1
) uiviues the lowei half of the uistiibution into two equal
paits, i.e., it is the value below which 2S% of the obseivations lie anu above
which 7S% of the obseivations lie. Similaily, the uppei oi thiiu quaitile (Q
3
)
uiviues the uppei half of the uistiibution into two equal paits, i.e. it is the
value below which 7S% of the obseivations lie anu above which 2S% of the
obseivations lie. The uiffeience, Q
3
Q
1
is calleu intei quaitile iange anu QB
is given by (Q
3
Q
1
)/2
Foi gioupeu uata, Q
1
is the value which coiiesponus to the cumulative
fiequency of N/4 anu Q
3
is the value which coiiesponus to the fiequency of
3N/4.
QB is useu in the case of open enu uistiibution.
FormulatocomputeQD
In the case of iaw uata, aftei aiianging the uata in the ascenuing
oiuei,
1
= Sizc o tbc _
(n +1)
4
,
_
th
tcrm
3
= Sizc o tbc _
S (n +1)
4
,
_
th
tcrm
Then,
=
(
3

1
)
2
In the case of fiequency uistiibution oi gioupeu uata,
1
= I
1
+_
C
1
_
N
4
m
1
]_
Biostatistics
2010
48
Wheie
L
1
is the lowei bounuaiy of the fiist quaitile class,
m
1
is the cumulative fiequency up to the fiist quaitile class,
f
1
is the fiequency in the fiist quaitile class anu
C is the wiuth of the class inteival
3
= I
3
+_
C
3
_
SN
4
m
3
]_
Wheie
L
3
is the lowei bounuaiy of the thiiu quaitile class,
m
3
is the cumulative fiequency up to thiiu quaitile class,
f
3
is fiequency in the thiiu quaitile class anu
C is the wiuth of the class inteival
Then,
=
(
3

1
)
2
Relative measuie of QB is known as the quaitile coefficient of
uispeision (QC).
C o Jispcrsion =
(
3

1
)
(
3
+
1
)
Note:
The seconu quaitile (Q
2
) is the meuian.
Q
1
,Q
2
,Q
3
aie thiee quaitiles which uiviue the seiies into 4 equal paits.
We have 9 ueciles which uiviue the entiie iange into 1u equal paits anu
they aie uenoteu by D
1
,D
2,.
D
9.
We have 99 peicentiles which uiviue the entiie iange into 1uu equal
paits anu they aie uenoteu by P
1
,P
2
,..P
99.
In the case of symmetiical uistiibution, (
3
+
1
)2 = HcJion anu theiefoie,
Biostatistics
2010
49
C o Jispcrsion =
(
3

1
)
2 HcJion
3. Meandeviation(MD)
Nean ueviation oi aveiage ueviation in a seiies is the AN of the
ueviations of the vaiious items fiom an aveiage (mean, meuian oi moue) of
the seiies taking all ueviations as positive.
Foi iaw uata,
H obout mcon =
x
x
n
H obout on o:crogc A =
x
I
A
n
Wheie
A is Nean oi Neuian oi Noue
Foi gioupeu uata,
H obout on o:crogc A =
f
I
x
I
A
N
Wheie
A is Nean oi Neuian oi Noue
x
i
is miupoint of the i
th
class with fiequency f
i
The ielative measuie of Nean Beviation is known as mean coefficient
of uispeision oi coefficient of mean ueviation anu is obtaineu by uiviuing the
NB by the aveiage fiom which it is computeu.
Cocicicnt o H obout on o:crogc A =
H obout on o:crogc A
A
Cocicicnt o H obout on o:crogc A in % =
H obout on o:crogc A
A
1uu
Biostatistics
2010
50
Note : In actual piactice, NB is calculateu eithei fiom mean oi meuian, but
moue is not useu as its value is inueteiminate. Bowevei, meuian is
piefeiieu to mean because mean ueviation fiom the meuian is
minimum.
4. Standarddeviation(SD)
Kail Peaison intiouuceu the concept of stanuaiu ueviation in 189S. It
is the most impoitant measuie of uispeision anu is wiuely useu in many
statistical foimulae. Stanuaiu ueviation is also calleu RootNean Squaie
Beviation. The ieason is that it is the squaieioot of the mean of the squaieu
ueviation fiom the aiithmetic mean. It pioviues accuiate iesult. Squaie of
stanuaiu ueviation is calleu vaiiance.
It is uefineu as the positive squaieioot of the aiithmetic mean of the
Squaie of the ueviations of the given obseivation fiom theii aiithmetic mean.
The stanuaiu ueviation is uenoteu by the uieek lettei ` (sigma).
Foi iaw uata
StonJorJ c:iotion = _
1
n
(x
x)
2
n
=1
StonJorJ c:iotion =
_
1
n
(x
x)
2
2

(x
)
2
n
_
Eg. The following are the crimps of the greece fleece yield of Nilagiri breed. Find out S.D
6, 5, 3, 4, 2, 3
XI Xi
2
6 36
5 25
3 9
4 16
Biostatistics
2010
51
2 4
3 9
Xi = 23 Xi
2
= 99
StonJorJ c:iotion =
_
1
n
(x
x)
2
o = _
1
n
_x
2

(x
)
2
n
_
o = _
1
6
_99 
(2S)
2
6
_
= 1.34 no.
Forgroupeddata(withSheppardscorrection)
StonJorJ c:iotion = _
1
N
(x
x)
2
k
=1

C
2
12
StonJorJ c:iotion =
_
1
N
(x
x)
2

C
2
12
To simplify the above,
o = _
1
N
_
2

(
)
2
N
_ 
C
2
12
Wheie f
i
= fiequency of the i
th
class
x
i
= miu value of the i
th
class
C = class inteival
k = no. of classes
N=fi
C
2
/12 = Sheppaius coiiection factoi
Biostatistics
2010
52
Eg. Calculate S.D for the following frequency distribution of birth weight of Nilagiri lambs
Classes Mid Value(Xi) Frequency (fi) (Xi X)
2
fi(Xi X)
2
1.8 2.0 1.9 2 1.0816 2.1632
2.0 2.2 2.1 1 0.7056 0.7056
2.2 2.4 2.3 2 0.4096 0.8192
2.4 2.6 2.5 3 0.1936 0.5808
2.6 2.8 2.7 9 0.0576 0.5184
2.8 3.0 2.9 11 0.0016 0.0176
3.0 3.2 3.1 11 0.0256 0.2816
3.2 3.4 3.3 4 0.1296 0.5184
3.4 3.6 3.5 4 0.3136 1.2544
3.6 3.8 3.7 1 0.5776 0.5776
3.8 4.0 3.9 2 0.9216 1.8462
Total 50 9.2800
StonJorJ c:iotion =
_
1
N
(x
x)
2

C
2
12
(with Sheppards correction)
= _
1
50
(9.28 u.uuSS)
= 0.427 Kg.
Shortcutmethod
o = _
C
2
N
_

d

2

(

d

)
2
N
_ 
C
2
12
Wheie
f
i
= fiequency of the i
th
class
x
i
= miu value of the i
th
class
C = Class inteival
N=f
i
= total numbei of obseivations.
A is an Aibitaiy point oi Assumeu Nean oi Piovisional Nean.
Put J
=
(x
i
A)
C
Biostatistics
2010
53
Shephardscorrection
In computing the stanuaiu ueviation, sometimes giouping eiioi may
occui on account of giouping of uata into uiffeient classes. Foi statistical
aujustment of this giouping eiioi, Sheppaiu has suggesteu a coiiection value
to be ueuucteu fiom the vaiiance of the gioupeu uata, which is given by
C
2
12
=
squorc o wiJtb o closs intcr:ol
12
CoefficientofVariation
The Stanuaiu ueviation is an absolute measuie of uispeision. It is
expiesseu in teims of units in which the oiiginal figuies aie collecteu anu
stateu. The stanuaiu ueviation of heights of stuuents cannot be compaieu
with the stanuaiu ueviation of weights of stuuents, as both aie expiesseu in
uiffeient units, i.e heights in centimetei anu weights in kilogiams. Theiefoie
the stanuaiu ueviation must be conveiteu into a ielative measuie of
uispeision foi the puipose of compaiison. The ielative measuie is known as
the coefficient of vaiiation.
The coefficient of vaiiation is obtaineu by uiviuing the stanuaiu
ueviation by the mean anu multiply it by 1uu. Symbolically,
Co cicicnt o :oriotion (CI or C0I in %) =
o
x
1uu
If we want to compaie the vaiiability of two oi moie seiies, we can
use C.v. The seiies oi gioups of uata foi which the C.v. is gieatei inuicate that
the gioup is moie vaiiable, less stable, less unifoim, less consistent oi less
homogeneous. If the C.v. is less, it inuicates that the gioup is less vaiiable,
moie stable, moie unifoim, moie consistent oi moie homogeneous.
Variance
Squaie of stanuaiu ueviation is calleu as vaiiance. It is the mean
squaie ueviation. It is the sum of the squaieu ueviation of inuiviuual
obseivations fiom the mean uiviueu by the numbei of obseivations. It is
uenoteu by
2
.
Biostatistics
2010
54
Standarderror(SE)
The mean of ianuom sample may be taken as a iepiesentative of the
population mean. The uiffeience between the sample mean anu population
mean is uue to sampling anu it is calleu sampling eiioi oi stanuaiu eiioi. It is
uefineu as the SB of the mean of uiffeient samples, taken fiom the
population.
If we stuuy only one sample, then
SE(x) =
o
n
Wheie
n is the size of the sample
is Stanuaiu Beviation is that of the sample.
Probableerror(PE)
PE
2
3
S
PropertiesofQD
1.
2
3
S PE
2. Nean + QB will covei Su% of the cases
PropertiesofMD
1. H
4
5
S
2. NB about meuian as the oiigin is the minimum
PropertiesofSD
1. SB is gieatei than NB, QB anu PE
2. Nean squaie ueviation will be minimum, if the ueviation is taken fiom
AN as the oiigin.
S. Nean + 1 SB will covei 68.27% of the items
Biostatistics
2010
55
Nean + 2 SB will covei 9S.4S% of the items
Nean + S SB will covei 99.7S% of the items
4. By auuing oi subtiacting a constant fiom all the obseivations, SB is
unalteieu.
S. If x
1
, x
2
aie the means of two samples of sizes n
1
,n
2
iespectively with
SB
1
,
2
, then the combineu SB () is given by
(n
1
+n
2
)
2
=n
1
1
2
+n
2
2
2
+n
1
d
1
2
+n
2
d
2
2
Wheie x
1
, x
2
is the combineu mean anu d
1
=x
1
 x ; d
2
= x
2
x
Meritsanddemeritsofdispersionmeasures
The essential iequisites of a goou measuie of uispeision aie the same as that
of aveiages.
Dispersion
measure
Merits Demerits
Range 9 It is easy to calculate
anu simple to
unueistanu.
8 It is not iigiuly uefineu.
8 It is not baseu on all
obseivations. It is affecteu
much by extieme values anu
open enu classes.
8 It uoes not possess sampling
stability.
8 It is not amenable foi fuithei
mathematical tieatment.
Quaitile
Beviation
9 It is easy to calculate
anu simple to
unueistanu anu
9 It is not affecteu by
extieme items anu
open enu classes.
8 It is not iigiuly uefineu.
8 It is not baseu on all
obseivations. It uoes not
possess sampling stability.
8 It is not amenable foi fuithei
mathematical tieatment.
Nean
Beviation
9 It is baseu on all
obseivations
9 It is iigiuly uefineu
anu
9 It is easy to calculate
anu simple to
unueistanu.
8 It is affecteu much by extieme
values anu open enu classes.
8 It uoes not possess sampling
stability.
8 It is not amenable foi fuithei
mathematical tieatment.
Biostatistics
2010
56
Stanuaiu
Beviation
9 It has a iigiu foimula
9 It is baseu on all
obseivations.
9 It is capable of
fuithei algebiaic
tieatment.
9 It is less affecteu by
sampling
8 It is affecteu much by extieme
values anu open enu classes.
Range QB NB SB
1. Rigiuly uefineu 2 2 3 3
2. Baseu on all obseivations 2 2 3 3
3. Simple to unueistanu anu easy to
calculate
3 3 3 3
4. Ninimum influence of extieme values 2 3 2 2
5. Sampling stability 2 2 2
6. Fuithei algebiaic tieatment 2 2 2 3
7. Not be affecteu by open enu classes 2 3 2 2
Choiceofdispersionmeasure
We see that stanuaiu ueviation satisfies many of the iueal qualities
than the othei measuies of uispeision. It is the most ieliable anu a bettei
summaiy uesciiptive measuie. Among othei measuies, the iange is unstable
anu its value uepenus upon extieme item in the uata. Range fails to consiuei
cential tenuency of the uata wheie quaitile ueviation excluues half of the
items fiom consiueiation. Nean ueviation suffeis fiom the mathematically
logical uefect of neglecting algebiaic signs (paiticulaily negative signs).
Stanuaiu ueviation is fiee fiom all these uefects to a gieat extent anu is the
most useful anu most populaily useu measuie of uispeision.
MOMENTS
Noments can be uefineu as the aiithmetic mean of vaiious poweis of
ueviations taken fiom the mean of a uistiibution. These moments aie known
as cential moments. The fiist foui moments about aiithmetic mean oi cential
moments aie uefineu below.
Biostatistics
2010
57
Inuiviuual seiies Bisciete seiies
Fiist moments about the Nean;
1
(x
x)
n
= u
(x
x)
N
= u
Seconu moments about the Nean;
2
(x
x)
2
n
= o
2
(x
x)
2
N
= o
2
Thiiu moments about the Nean ;
3
(x
x)
3
n
(x
x)
3
N
Fouith moment about the Nean ;
4
(x
x)
4
n
(x
x)
4
N
i
th
moment about the Nean ;
r
(x
x)
n
(x
x)
N
If the mean is a fiactional value, then it becomes a uifficult task to
woik out the moments. In such cases, we can calculate moments about a
woiking oiigin anu then change it into moments about the actual mean. The
moments about an oiigin aie known as iaw moments.
SKEWNESS
It has been seen that the measuies of cential tenuency inuicate the
cential position oi cential tenuency of the fiequency uistiibution anu the
measuies of uispeision give an inuication to the extent to which the items
clustei aiounu oi scattei away fiom the cential tenuency. But none of these
measuies inuicate the foim oi type of the uistiibution.
Skewness iefeis to the lack of symmetiy oi uepaituie fiom symmetiy.
We stuuy skewness to have an iuea about the shape of the cuive which we
can uiaw with the help of the given uata. Symmetiy means that the numbei
of values above the moue anu below the moue is same in a uata. If in a
uistiibution mean = median = mode, then that uistiibution is known as
symmetricaldistribution. If in a uistiibution meanmedianmode, then it
is not a symmetiical uistiibution anu it is calleu a skewed distribution anu
such a uistiibution coulu eithei be positively skeweu oi negatively skeweu.
Eviuently, in the case of symmetiical uistiibution, the two tails of the cuive
aie of equal size anu in the case of asymmetiical uistiibution, one tail of the
cuive is longei than the othei. A uistiibution is saiu to be skeweu in the
uiiection of the excess tail. Thus if the iight tail is longei than the left, the
uistiibution is positively skewed; if the left tail is longei than the iight, the
uistiibution is negativelyskewed.
Biostatistics
2010
58
Consider the following two distributions:
Class Frequency Class Frequency
05 10 05 10
510 30 510 40
1015 60 1015 30
1520 60 1520 90
2025 30 2025 20
2530 10 2530 10
The above two distributions have the same mean (=15) and SD (= 6), but yet they are not identical
distribution. The distribution on the left hand side (LHS) is symmetrical one, whereas the distribution on the
right hand side (RHS) is asymmetrical or skewed.
Measuresofskewness
The important measures of skewness are
1. Kail Peaisons coefficient of skewness
2. Bowleys coefficient of skewness
S. Neasuie of skewness baseu on moments
1. KarlPearsonscoefficientofskewness
Accoiuing to Kail Peaison, the absolute measuie of skewness=mean
mode. This measuie is not suitable foi making valiu compaiison of the
skewness in two oi moie uistiibutions because the unit of measuiement may
be uiffeient in uiffeient seiies. To avoiu this uifficulty use ielative measuie of
skewness calleu Kail Peaisons coefficient of skewness given by:
Kail Peaisons Coefficient Skewness =
Hcon HoJc
S.
Biostatistics
2010
59
In case of moue is ill uefineu, the coefficient can be ueteimineu by
the foimula:
Coefficient Skewness =
S (Hcon HcJion)
S.
2. Bowley'scoefficientofskewness
In Kail Peaisons methou of measuiing skewness the whole of the
seiies is neeueu. Piof. Bowley has suggesteu a foimula baseu on ielative
position of quaitiles. In a symmetiical uistiibution, the quaitiles aie
equiuistant fiom the value of the meuian; i.e.
MedianQ1=Q3Median. But in a skeweu uistiibution, the quaitiles
will not be equiuistant fiom the meuian. Bence Bowley has suggesteu the
following foimula:
Bowleys Coefficient of skewness (sk) =
3
+
1
2HcJion
3

1
3. Measureofskewnessbasedonmoments
The measuie of skewness baseu on moments is uenoteu by
1
anu is
given by:
[
1
=
p
3
2
p
2
3
If p
3
is negative, then [
1
is negative
KURTOSIS
The expiession Kuitosis is useu to uesciibe the peakeuness of a
cuive. The thiee measuies cential tenuency, uispeision anu skewness
uesciibe the chaiacteiistics of fiequency uistiibutions. But these stuuies will
not give us a cleai pictuie of the chaiacteiistics of a uistiibution.
As fai as the measuiement of shape is conceineu, we have two
chaiacteiistics skewness which iefeis to asymmetiy of a seiies anu
kuitosis which measuies the peakeuness of a noimal cuive. All the fiequency
cuives expose uiffeient uegiees of flatness oi peakeuness. This chaiacteiistic
of fiequency cuive is teimeu as kuitosis. Neasuies of kuitosis uenote the
shape of top of a fiequency cuive. Neasuie of kuitosis tell us the extent to
which a uistiibution is moie peakeu oi moie flat toppeu than the noimal
cuive, which is symmetiical anu bellshapeu, is uesignateu as Mesokurtic. If a
Biostatistics
2010
60
cuive is ielatively moie naiiow anu peakeu at the top, it is uesignateu as
Leptokurtic. If the fiequency cuive is moie flat than noimal cuive, it is
uesignateu as Platykurtic.
L = Lepto Kuitic
N = Neso Kuitic
P = Platy Kuitic
MeasureofKurtosis
The measuie of kuitosis of a fiequency uistiibution baseu on
moments is uenoteu by
2
anu is given by
[
2
=
p
4
p
2
2
If
2
=3, the uistiibution is saiu to be noimal anu the cuive is Nesokuitic.
If
2
> 3, the uistiibution is saiu to be moie peakeu anu the cuive is
Leptokuitic.
If
2
< 3, the uistiibution is saiu to be flat toppeu anu the cuive is
Platykuitic.
CORRELATION
The teim coiielation is useu by a common man without knowing that
he is making use of the teim coiielation. Foi example when paients auvice
theii chiluien to woik haiu so that they may get goou maiks, they aie
coiielating goou maiks with haiu woik.
The stuuy ielateu to the chaiacteiistics of only vaiiable such as height,
weight, ages, maiks, wages, etc., is known as Univariate Analysis. The
statistical Analysis ielateu to the stuuy of the ielationship between two
vaiiables is known as BiVariate Analysis. Sometimes the vaiiables may be
inteiielateu. Foi example, feeu intake anu weight of animals, etc. The natuie
Biostatistics
2010
61
anu stiength of ielationship may be examineu by coiielation anu Regiession
analysis.
Meaningofcorrelation
Thus Coiielation iefeis to the ielationship of two vaiiables oi moie.
(e.g) ielation between height of fathei anu son, yielu anu iainfall, wage anu
piice inuex, shaie anu uebentuies etc. Coiielation is statistical Analysis
which measuies anu analyses the uegiee oi extent to which the two vaiiables
fluctuate with iefeience to each othei. The woiu ielationship is impoitant. It
inuicates that theie is some connection between the vaiiables. It measuies
the closeness of the ielationship. Coiielation uoes not inuicate cause anu
effect ielationship. Piice anu supply, income anu expenuituie aie coiielateu.
Definitions
Coiielation Analysis attempts to ueteimine the uegiee of ielationship
between vaiiables YaKunChou.
Coiielation is an analysis of the covaiiation between two oi moie
vaiiables A.N.Tuttle.
Coiielation expiesses the inteiuepenuence of two sets of vaiiables
upon each othei. 0ne vaiiable may be calleu as (subject) inuepenuent anu
the othei ielative vaiiable (uepenuent). Relative vaiiable is measuieu in
teims of subject.
Usesofcorrelation
1. It is useu in physical anu social sciences.
2. It is useful foi economists to stuuy the ielationship between vaiiables
like piice, quantity etc. Businessmen estimates costs, sales, piice etc.
using coiielation.
S. It is helpful in measuiing the uegiee of ielationship between the
vaiiables like income anu expenuituie, piice anu supply, supply anu
uemanu etc.
4. Sampling eiioi can be calculateu.
S. It is the basis foi the concept of iegiession.
TypesofCorrelation:
Coiielation is classifieu into vaiious types. The most impoitant ones aie
i) Positive anu negative.
ii) Lineai anu nonlineai.
Biostatistics
2010
62
iii) Paitial anu total.
iv) Simple anu Nultiple.
PositiveandNegativeCorrelation
It uepenus upon the uiiection of change of the vaiiables. If the two
vaiiables tenu to move togethei in the same uiiection (ie) an inciease in the
value of one vaiiable is accompanieu by an inciease in the value of the othei,
(oi) a ueciease in the value of one vaiiable is accompanieu by a ueciease in
the value of othei, then the coiielation is calleu positiveoidirectcorrelation.
Piice anu supply, height anu weight, yielu anu iainfall, aie some examples of
positive coiielation.
If the two vaiiables tenu to move togethei in opposite uiiections so
that inciease (oi) ueciease in the value of one vaiiable is accompanieu by a
ueciease oi inciease in the value of the othei vaiiable, then the coiielation is
calleu negative oi inverse correlation. Piice anu uemanu, yielu of ciop anu
piice, aie examples of negative coiielation.
LinearandNonlinearcorrelation
If the iatio of change between the two vaiiables is a constant then
theie will be lineai coiielation between them. Consiuei the following.
X 2 4 6 8 10 12
Y 3 6 9 12 15 18
Beie the iatio of change between the two vaiiables is the same. If we
plot these points on a giaph we get a stiaight line anu its functional
ielationship is iepiesenteu by the ielation, y = a + bx, wheie a anu b aie
constants. If the amount of change in one vaiiable uoes not beai a constant
iatio of the amount of change in the othei. Then the ielation is calleu Curvi
linear (oi) nonlinearcorrelation. The giaph will be a cuive.
SimpleandMultiplecorrelations
When we stuuy only two vaiiables, the ielationship is simple
correlation. Foi example, feeu intake anu giowth of animals, biith weight anu
numbei of piglets, uemanu anu piice. But in a multiple correlation we stuuy
moie than two vaiiables simultaneously. The ielationship of milk yielu vs.
fiist lactation peiiou, foou supplieu, age etc aie an example foi multiple
coiielations.
Biostatistics
2010
63
Partialandtotalcorrelation
The stuuy of two vaiiables excluuing some othei vaiiable is calleu
Partial correlation. Foi example, coiielation between the weight of bioilei
anu feeu intake assuming the othei factois like aiea pioviueu, laboui useu,
meuicinal cost etc. as constant.
If theie is no ielationship between the two vaiiables, they aie saiu to
be inuepenuent oi uncoiielateu.
RealandSpuriouscorrelation
When theie is a ieal coiielation between two vaiiables, it may be that
a change in one vaiiable is the cause of the change in the othei. Theie is
covaiiation baseu on the logical ielationships anu causation.
Sometimes, even if two vaiiables aie inuepenuent of each othei, theie
may be a high uegiee of coiielation between them. Such a coiielation
inuicates the ielationship with no logical basis. Foi e.g., iainfall in Tamil Nauu
anu yielu in Kainataka, cattle numbei anu numbei of human illiteiates. Such
a coiielation is calleu spurious oi nonsensecorrelation.
Computationofcorrelation
When theie exists some ielationship between two vaiiables, we have
to measuie the uegiee of ielationship. This measuie is calleu the measuie of
coiielation (oi) coiielation coefficient anu it is uenoteu by r.
Covariation:
The covaiiation between the vaiiables x anu y is uefineu as
Co: (x, y) =
(xx) ()
n
wheie x, y aie iespectively means of x anu y anu n
is the numbei of paiis of obseivations.
Methodsofstudyingcorrelation
1. Scattei uiagiam
2. Coiielation giaph
S. Kail Peaisons coefficient of coiielation
4. Concuiient ueviation methou
S. Rank methou
Biostatistics
2010
64
Scatterdiagram
A scattei uiagiam oi scatteigiam oi scatteiplot oi uot uiagiam is a
chait piepaieu to iepiesent giaphically the ielationship between two
vaiiables. Take one vaiiable on the hoiizontal anu anothei on the veitical
axis anu maik points coiiesponuing to each paii of the given obseivations
aftei taking suitable scale. Then, the figuie which contains the collection of
uots oi points is calleu a scatterdiagram. The way in which the uot lies on the
scattei uiagiam shows the type of coiielation. If these uots show some tienu
eithei upwaiu oi uownwaiu the two vaiiables aie coiielateu. If the uots uo
not show any tienu, theie is absence of coiielation between the two
vaiiables.
Biostatistics
2010
65
Merits
1. It is a simplest anu attiactive methou of finuing the natuie of
coiielation between the two vaiiables.
2. It is a nonmathematical methou of stuuying coiielation. It is easy to
unueistanu.
S. It is not affecteu by extieme items.
4. It is the fiist step in finuing out the ielation between the two
vaiiables.
S. We can have a iough iuea at a glance whethei it is a positive
coiielation oi negative coiielation.
Demerits
1. By this methou we cannot get the exact uegiee oi coiielation between
the two vaiiables.
Correlationgraph
In this methou, cuives aie plotteu foi the uata on two vaiiables. By
examining the uiiection anu closeness of the two cuives so uiawn, we can
infei whethei oi not the vaiiables aie ielateu. If both the cuives uiawn on
the giaph aie moving in the same uiiection (eithei upwaiu oi uownwaiu),
coiielation is saiu to be positive. 0n the othei hanu, if the cuives aie moving
in the opposite uiiection, coiielation is saiu to be negative.
This methou is noimally useu foi time seiies uata. Bowevei, like
scattei uiagiam, this methou also uoes not offei any numeiical value foi
coefficient of coiielation.
Biostatistics
2010
66
KarlPearsonscoefficientofcorrelation
Kail Peaison, a gieat biometiician anu statistician, suggesteu a
mathematical methou foi measuiing the magnituue of lineai ielationship
between the two vaiiables. It is most wiuely useu methou in piactice anu it is
known as peaisonian coefficient of coiielation. It is uenoteu by r. It is also
calleu piouuct moment foimula. It is given by
r =
Co: (x, y)
o
x
o
Wheie
o
x
, o
2
Wheie X = (x x) onJ = (y y)
Concurrentdeviationmethod
This methou of stuuying coiielation is the simplest of all the methous.
What is to be founu in this methou is the uiiection of change of x anu y
vaiiables. The stepwise pioceuuie is:
Stepi. Finu out the uiiection of change of x vaiiable, i.e as compaieu
with the fiist value, whethei the seconu value is incieasing oi uecieasing oi
constant. If it is incieasing, put a + sign, if it is uecieasing, put a sign anu if it
is constant, put zeio. Similaily, as compaieu to seconu value, finu out
whethei the thiiu value is incieasing, uecieasing oi constant. Repeat the
same piocess foi the othei values also. Benote the column as Dx.
Stepii.In the same way, finu out the uiiection of change of y vaiiable
anu uenote this column as Dy.
Stepiii.Nultiply Dx with Dy anu ueteimine the value of c, the numbei
of concuiient ueviations oi the numbei of positive signs obtaineu aftei
multiplying Dx with Dy.
Biostatistics
2010
67
Stepiv.Then apply the foimula
r = __[
2cn
n
_
sign is taken as that of (2cn)
Rankmethod
It is stuuieu when no assumption about the paiameteis of the
population is maue. This methou is baseu on ianks. It is useful to stuuy the
qualitative measuie of attiibutes like honesty, coloui, beauty, intelligence,
chaiactei, moiality etc. The inuiviuuals in the gioup can be aiiangeu in oiuei
anu theie on, obtaining foi each inuiviuual a numbei showing hishei iank in
the gioup. This methou was uevelopeu by Euwaiu Speaiman in 19u4. It is
defined
r = 1 
6
2
n
3
n
Wheie
r= iank coiielation coefficient
D is the uiffeience in the ianking of the two seiies xanu y,
anu n is the numbei of paiieu obseivations
The value of r lies between 1 anu +1. If r = +1, theie is complete
agieement in oiuei of ianks anu the uiiection of ianks is also same. If r=1,
then theie is complete uisagieement in oiuei of ianks anu they aie in
opposite uiiections.
Computation foi tieu obseivations: Theie may be two oi moie items
having equal values. In such case the same iank is to be given. The ianking is
saiu to be tieu. In such ciicumstances an aveiage iank is to be given to each
inuiviuual item. If the ianks aie tieu, it is iequiieu to apply a coiiection factoi
which is
1
12
(m
2
m). A slightly uiffeient foimula is useu when theie is moie
than one item having the same value. The foimula is
r = 1 
6 [
2
+
1
12
(m
2
m) +
1
12
(m
2
m)
n
3
n
Biostatistics
2010
68
Wheie m is the numbei of items whose ianks aie common anu shoulu
be iepeateu as many times as theie aie tieu obseivations.
When the numbei of paiieu obseivations exceeu Su, it is veiy uifficult
to iank them anu hence, unless iank is given, it is bettei to avoiu this methou.
This methou is also calleu as Speaimans iank coiielation coefficient.
ProbableError
PE o r = u.674S_
1 r
2
n
_
r+PE uenote the limits of coiielation in the population
Coefficientofdetermination
Squaie of coiielation coefficient (r
2
) is uefineu as coefficient of
determination. If r
2
= u.8S, it implies that 8S% of the vaiiation in the
uepenuent vaiiable is uue to the inuepenuent vaiiable stuuieu.
Coefficientofnondetermination
(1r
2
) is uefineu as the coefficientofnondetermination. If 1r
2
= u.24,
we infei that 24% of the vaiiation in the uepenuent vaiiable is uue to the
othei vaiiables which aie not stuuieu. This is calleu as unexplaineu
vaiiations.
PropertiesofCorrelationcoefficient
1. Coiielation coefficient lies between 1 anu +1 (i.e) 1 i +1
2. r is inuepenuent of change of oiigin anu scale.
S. It is a puie numbei inuepenuent of units of measuiement.
4. Inuepenuent vaiiables aie uncoiielateu but the conveise is not tiue.
S. Coiielation coefficient is the geometiic mean of two iegiession
coefficients.
6. The coiielation coefficient of x anu y is symmetiic. rxy=ryx.
Limitations:
1. Coiielation coefficient assumes lineai ielationship iegaiuless of the
assumption is coiiect oi not.
2. Extieme items of vaiiables aie being unuuly opeiateu on coiielation
coefficient.
Biostatistics
2010
69
S. Existence of coiielation uoes not necessaiily inuicate cause effect
ielation.
Interpretation:
The following iules helps in inteipieting the value of r anu its uegiee.
The following table pioviues the value of r anu its uegiee:
Begiee
value of r
Positive Negative
Peifect +1 1
Stiong u.7S to 1.uu u.7S to 1.uu
Noueiate u.2S to u.7S u.2S to u.7S
Weak u to u.2S u to u.2S
Absence u u
REGRESSION
The meaning of iegiession is the act of ietuining oi going back. This
teim was intiouuceu by Fiancis ualton, when he stuuieu the ielationship
between the heights of fathei anu sons.
Aftei knowing the ielationship between two vaiiables we may be
inteiesteu in estimating (pieuicting) the value of one vaiiable given the value
of anothei. The vaiiable pieuicteu on the basis of othei vaiiables is calleu the
uepenuent oi the explaineu vaiiable anu the othei the inuepenuent oi
the pieuicting vaiiable. The pieuiction is baseu on aveiage ielationship
ueiiveu statistically by iegiession analysis. The equation, lineai oi otheiwise,
is calleu the iegiession equation oi the explaining equation. In a stuuy wheie
uata on age anu weight of animals aie available, age coulu be consiueieu as
the inuepenuent vaiiable, while weight as the uepenuent vaiiable. It means
that weight iegiesses on age.
Definition
Regiession is the measuie of the aveiage ielationship between two oi
moie vaiiables in teims of the oiiginal units of the uata.
Biostatistics
2010
70
TypesofRegression
The iegiession analysis can be classifieu into:
a. Simple anu Nultiple
b. Lineai anu Non Lineai
c. Total anu Paitial
a)SimpleandMultiple
In case of simple ielationship only two vaiiables aie consiueieu, foi
example, the influence of auveitising expenuituie on sales tuinovei. In the
case of multiple ielationships, moie than two vaiiables aie involveu. 0n this
while one vaiiable is a uepenuent vaiiable the iemaining vaiiables aie
inuepenuent ones. Foi example, the weight of animal (y) may uepenu on age
(x) anu the feeu intake of the animals (z). Then the functional ielationship
can be expiesseu as y=f(x,z).
b)LinearandNonlinear
The lineai ielationships aie baseu on stiaightline tienu, the equation
of which has nopowei highei than one. But, iemembei a lineai ielationship
can be both simple anu multiple. Noimally a lineai ielationship is taken into
account because besiues its simplicity, it has a bettei pieuictive value; a
lineai tienu can be easily piojecteu into the futuie. In the case of nonlineai
ielationship cuiveu tienu lines aie ueiiveu. The equations of these aie
paiabolic.
c)TotalandPartial
In the case of total ielationships all the impoitant vaiiables aie
consiueieu. Noimally, they take the foim of a multiple ielationships because
most economic anu business phenomena aie affecteu by multiplicity of cases.
In the case of paitial ielationship one oi moie vaiiables aie consiueieu, but
not all, thus excluuing the influence of those not founu ielevant foi a given
puipose.
LinearRegressionEquation
If two vaiiables have lineai ielationship then as the inuepenuent
vaiiable (X) changes, the uepenuent vaiiable (Y) also changes. If the uiffeient
values of X anu Y aie plotteu, then the two stiaight lines of best fit can be
Biostatistics
2010
71
maue to pass thiough the plotteu points. These two lines aie known as
iegiession lines. Again, these iegiession lines aie baseu on two equations
known as iegiession equations. These equations show best estimate of one
vaiiable foi the known value of the othei. The equations aie lineai.
Lineai iegiession equation of Y on X is
= o +bX. (1)
Anu X on Y is
X = o +b . (2)
Fiom (1)wecanestimateYforknownvalueofX.
(2)wecanestimateXforknownvalueofY.
MethodsofRegressionAnalysis
A line fitteu by the methou of least squaies is known as the line of best
fit. The vaiious methous can be iepiesenteu in the foim of chait given below:
GraphicMethod
ScatterDiagram
0nuei this methou the points aie plotteu on a giaph papei
iepiesenting vaiious paits of values of the conceineu vaiiables. These points
give a pictuie of a scattei uiagiam with seveial points spieau ovei. A
Biostatistics
2010
72
iegiession line may be uiawn in between these points eithei by fiee hanu oi
by a scale iule in such a way that the squaies of the veitical oi the hoiizontal
uistances (as the case may be) between the points anu the line of iegiession
so uiawn is the least. In othei woius, it shoulu be uiawn faithfully as the line
of best fit leaving equal numbei of points on both siues in such a mannei that
the sum of the squaies of the uistances is the best.
AlgebraicMethods
(i)RegressionEquation:
The two iegiession equations
foi X on Y; X=a+bY
Anu foi Y on X; Y=a+bX
Wheie X, Y aie vaiiables, anu a, b aie constants whose values aie to be
ueteimineu
Foi the equation, X=a+bY
The noimal equations aie
X = no +b anu
X = o +b
2
Foi the equation, Y= a+ bX, the noimal equations aie
= no +b X anu
X = o X +b X
2
Fiom these noimal equations the values of aanu bcan be ueteimineu.
(ii)RegressionCoefficient:
The iegiession Coefficient of Y on X is b
1
= b
x
= r
c
j
c
x
Biostatistics
2010
73
The iegiession Coefficient of X on Y is b
2
= b
x
= r
c
x
c
j
If the ueviation aie taken fiom iespective means of x anu y
b
1
= b
x
=
(X X)( )
(X X)
2
=
xy
x
2
onJ
b
2
= b
x
=
(X X)( )
( )
2
=
xy
y
2
Wheie x = (X X), y = ( )
Propertiesofregressioncoefficient
1. Both iegiession coefficients must have the same sign, ie eithei it will
be positive oi negative.
2. coiielation coefficient is the geometiic mean of the iegiession
coefficients ie, r = _b
1
b
2
S. The coiielation coefficient will have the same sign as that of the
iegiession coefficients.
4. If one iegiession coefficient is gieatei than unity, then othei
iegiession coefficient must be less than unity.
S. Regiession coefficients aie inuepenuent of oiigin but not of scale.
6. Aiithmetic mean of b
1
andb
2
is equal to oi gieatei than the coefficient
of coiielation. Symbolically
b
1
+ b
2
2
r
7. If r=0, the vaiiables aie uncoiielateu, the lines of iegiession become
peipenuiculai to each othei.
8. If r=1, the two lines of iegiession eithei coinciue oi paiallel to each
othei.
9. Angle between the two iegiession lines is 0 = tan
1
[
m
1
m
2
1+m
1
m
2
wheie
m
1
anu, m
2
aie the slopes of the iegiession lines X on Y anu Y on X
iespectively.
1u. The angle between the iegiession lines inuicates the uegiee of
uepenuence between the vaiiables.
UsesofRegressionAnalysis:
1. Regiession analysis helps in establishing a functional ielationship
between two oi moie vaiiables.
Biostatistics
2010
74
2. Since most of the pioblems of economic analysis aie baseu on cause
anu effect ielationships, the iegiession analysis is a highly valuable
tool in economic anu business ieseaich.
S. Regiession analysis pieuicts the values of uepenuent vaiiables fiom
the values of inuepenuent vaiiables.
4. We can calculate coefficient of coiielation (r) anu coefficient of
ueteimination (r
2
) with the help of iegiession coefficients.
S. In statistical analysis of uemanu cuives, supply cuives, piouuction
function, cost function, consumption function etc., iegiession analysis
is wiuely useu.
DifferencebetweenCorrelationandRegression
S.
No
Correlation Regression
1 Coiielation is the ielationship
between two oi moie vaiiables,
which vaiy in sympathy with the
othei in the same oi the
opposite uiiection.
Regiession means going back anu it
is a mathematical measuie showing
the aveiage ielationship between
two vaiiables
2 Both the vaiiables X anu Y aie
ianuom vaiiables
Beie X is a ianuom vaiiable anu Y is
a fixeu vaiiable. Sometimes both
the vaiiables may be ianuom
vaiiables.
S It finus out the uegiee of
ielationship between two
vaiiables anu not the cause anu
effect of the vaiiables.
It inuicates the causes anu effect
ielationship between the vaiiables
anu establishes functional
ielationship.
4 It is useu foi testing anu
veiifying the ielation between
two vaiiables anu gives limiteu
infoimation.
Besiues veiification it is useu foi
the pieuiction of one value, in
ielationship to the othei given
value.
S The coefficient of coiielation is a
ielative measuie. The iange of
ielationship lies between 1 anu
+1
Regiession coefficient is an
absolute figuie. If we know the
value of the inuepenuent vaiiable,
we can finu the value of the
uepenuent vaiiable.
6 Theie may be spuiious
coiielation between two
vaiiables.
In iegiession theie is no such
spuiious iegiession.
Biostatistics
2010
75
7 It has limiteu application,
because it is confineu only to
lineai ielationship between the
vaiiables.
It has wiuei application, as it
stuuies lineai anu nonlineai
ielationship between the vaiiables.
8 It is not veiy useful foi fuithei
mathematical tieatment.
It is wiuely useu foi fuithei
mathematical tieatment.
9 If the coefficient of coiielation is
positive, then the two vaiiables
aie positively coiielateu anu
viceveisa.
The iegiession coefficient explains
that the ueciease in one vaiiable is
associateu with the inciease in the
othei vaiiable.
PROBABILITYTHEORYANDPROBABILITYDISTRIBUTION
The theoiy of piobability has its oiigin in the games of chance ielateu
to gambling such as tossing of a coin, thiowing of a uie, uiawing caius fiom a
pack of caius etc. }eiame Caiuon, an Italian mathematician wiote A book on
games of chance which was publisheu on 166S. Staiting with games of
chance, piobability has become one of the basic tools of statistics. The
knowleuge of piobability theoiy makes it possible to inteipiet statistical
iesults, since many statistical pioceuuies involve conclusions baseu on
samples.
Fewdefinitionsandbasicconcepts
Event:Any phenomena occuiiing in natuie.
Random experiment: Ranuom expeiiment is one whose iesults uepenu on
chance that is the iesult cannot be pieuicteu. Tossing of coins, thiowing of
uice aie some examples of ianuom expeiiments.
Trial:Peifoiming a ianuom expeiiment is calleu a tiial.
Outcomes: The iesults of a ianuom expeiiment aie calleu its outcomes.
When two coins aie tosseu the possible outcomes aie HH,HT,TH,TT.
Samplespace:Each conceivable outcome of an expeiiment is calleu a sample
point. The totality of all sample points is calleu a sample space anu is uenoteu
by S. Foi example, when a coin is tosseu, the sample space is S={H,T}. H anu
T aie the sample points of the sample space S.
Equally likely events: Two oi moie events aie saiu to be equally likely if
each one of them has an equal chance of occuiiing. Foi example in tossing of
Biostatistics
2010
76
a coin, the event of getting a heau anu the event of getting a tail aie equally
likely events.
Mutually exclusive events: Two oi moie events aie saiu to be mutually
exclusive, when the occuiience of any one event excluues the occuiience of
the othei event. Nutually exclusive events cannot occui simultaneously. Foi
example when a coin is tosseu, eithei the heau oi the tail will come up.
Theiefoie the occuiience of the heau completely excluues the occuiience of
the tail. Thus getting heau oi tail in tossing of a coin is a mutually exclusive
event.
Exhaustive events: Events aie saiu to be exhaustive when theii totality
incluues all the possible outcomes of a ianuom expeiiment. Foi example,
while thiowing a uie, the possible outcomes aie {1,2,3,4,5,6} anu hence the
numbei of cases is 6.
Complementary events: The event A occuis anu the event A uoes not
occui aie calleu complementaiy events to each othei. The event A uoes not
occui is uenoteu by A or A or A
c
. The event anu its complements aie
mutually exclusive. Foi example in thiowing a uie, the event of getting ouu
numbeis is {1, 3, 5} anu getting even numbeis is {2, 4, 6}. These two events
aie mutually exclusive anu complement to each othei.
Independent events: Events aie saiu to be inuepenuent if the occuiience of
one uoes not affect the otheis. In the expeiiment of tossing a faii coin, the
occuiience of the event heau in the fiist toss is inuepenuent of the
occuiience of the event heau in the seconu toss, thiiu toss anu subsequent
tosses.
Simple and compound event: When two oi moie events occui togethei,
theii happening is uesciibeu as a compounu event, while if only one event
takes place at a time, it is calleu as a simple event.
DefinitionsofProbability
Theie aie two types of piobability. They aie Nathematical piobability
anu Statistical piobability.
MathematicalProbability(oraprioriprobability)
If the piobability of an event can be calculateu even befoie the actual
happening of the event, that is, even befoie conuucting the expeiiment, it is
calleu Mathematicalprobability.
Biostatistics
2010
77
If the ianuom expeiiments iesults in n exhaustive, mutually
exclusive anu equally likely cases, out of which m aie favouiable to the
occuiience of an event A, then the iatio m/n is calleu the piobability of
occuiience of event A, uenoteu by P(A), is given by
P(A) =
m
n
=
Numbcr o coscs o:ouroblc to tbc c:cnt A
Iotol numbcr o cxbousti:c coscs
Nathematical piobability is often calleu classical probability oi a
priori probability because if we keep using the examples of tossing of faii
coin, uice etc., we can state the answei in auvance (prior), without tossing of
coins oi without iolling the uice etc.
The above uefinition of piobability is wiuely useu, but it cannot be
applieu unuei the following situations:
(1) If it is not possible to enumeiate all the possible outcomes foi an
expeiiment.
(2) If the sample points (outcomes) aie not mutually inuepenuent.
(S) If the total numbei of outcomes is infinite.
(4) If each anu eveiy outcome is not equally likely.
Some of the uiawbacks of classical piobability aie iemoveu in anothei
uefinition given below.
StatisticalProbability(oraposterioriprobability)
If the piobability of an event can be ueteimineu only aftei the actual
happening of the event, it is calleu Statisticalprobability.
If an event occuis mtimes out of n, its ielative fiequency is m/n.
In the limiting case, when nbecomes sufficiently laige it coiiesponus
to a numbei which is calleu the piobability of that event.
In symbol, P(A) = lim
n
[
m
n
The above uefinition of piobability involves a concept which has a
long teim consequence. This appioach was initiateu by the mathematician
von Nises.
Biostatistics
2010
78
If a coin is tosseu 1u times we may get 6 heaus anu 4 tails oi 4 heaus
anu 6 tails oi any othei iesult. In these cases the piobability of getting a heau
is not u.Sas we consiuei in Nathematical piobability.
Bowevei, if the expeiiment is caiiieu out a laige numbei of times we
shoulu expect appioximately equal numbei of heaus anu tails anu we can see
that the piobability of getting heau appioaches u.S. The Statistical piobability
calculateu by conuucting an actual expeiiment is also calleu a posteriori
probabilityoi empiricalprobability.
Axiomaticapproachtoprobability
The mouein appioach to piobability is puiely axiomatic anu it is
baseu on the set theoiy. The axiomatic appioach to piobability was
intiouuceu by the Russian mathematician A.N. Kolmogoiov in the yeai 19SS.
Axiomsofprobability
Let S be a sample space anu A be an event in S anu P (A) is the
piobability satisfying the following axioms:
(1) The piobability of any event ianges fiom zeio to one. i.e 0P(A)1
(2) The piobability of the entiie space is 1. i.e P(S)=1
(S) If A1,A2,is a sequence of mutually exclusive events in S, then
P(A1A2)=P(A1)+P(A2)+...
Interpretationofstatisticalstatementsintermsofsettheory
S Sample space
A A uoes not occui
A A = S
A B = = A anu B aie mutually exclusive.
A B Event A occuis oi B occuis oi both A anu B occui.
(At least one of the events A oi B occuis)
A B Both the events A anu B occui.
A B Neithei A noi B occuis
A B Event A occuis anu B uoes not occui
A B Event A uoes not occui anu Boccui.
Biostatistics
2010
79
Additiontheoremonprobabilities
We shall uiscuss the auuition theoiem on piobabilities foi mutually
exclusive events anu not mutually exclusive events.
Additiontheoremonprobabilitiesformutuallyexclusiveevents
If two events A anu B aie mutually exclusive, the piobability of the
occuiience of eithei A oi B is the sum of inuiviuual piobabilities of A anu B. ie
P(AUB)=P(A)+P(B). This is cleaily stateu in axioms of piobability.
Additiontheoremonprobabilitiesfornotmutuallyexclusiveevents
If two events A anu B aie notmutually exclusive, the piobability of the
event that eithei A oi B oi both occui is given as P(AUB)=P(A)+P(B)P(AB)
Compoundevents
The joint occuiience of two oi moie events is calleu compounu
events. Thus compounu events imply the simultaneous occuiience of two oi
moie simple events.
Foi example, in tossing of two faii coins simultaneously, the event of
getting at least one heau is a compounu event as it consists of joint
occuiience of two simple events. Namely,
Event A = one heau appeais ie A={HT,TH} anu
Event B = two heaus appeais ie B={HH}
Similaily, if a bag contains 6 white anu 6 ieu balls anu we make a
uiaw of 2 balls at ianuom, then the events that both aie white oi one is
white anu one is ieu aie compounu events.
If p is the piobability of the event, the piobability that it will occui in
exactly x out of n cases is
n
C
x
q
nx
p
x
.
A B
Biostatistics
2010
80
The compounu events may be fuithei classifieu as
(1) Inuepenuent event
(2) Bepenuent event
Independentevents
If two oi moie events occui in such a way that the occuiience of one
uoes not affect the occuiience of anothei, they aie saiu to be inuepenuent
events.
Foi example, if a coin is tosseu twice, the iesults of the seconu thiow
woulu in no way be affecteu by the iesults of the fiist thiow. Similaily, if a
bag contains S white anu 7 ieu balls anu then two balls aie uiawn one by one
in such a way that the fiist ball is ieplaceu befoie the seconu one is uiawn. In
this situation, the two events, the fiist ball is white anu seconu ball is ieu,
will be inuepenuent, since the composition of the balls in the bag iemains
unchangeu befoie a seconu uiaw is maue.
Dependentevents
If the occuiience of one event influences the occuiience of the othei,
then the seconu event is saiu to be uepenuent on the fiist.
In the above example, if we uo not ieplace the fiist ball uiawn, this
will change the composition of balls in the bag while making the seconu uiaw
anu theiefoie the event of uiawing a ieu ball in the seconu will uepenu on
event (fiist ball is ieu oi white) occuiiing in fiist uiaw.
Similaily, if a peison uiaw a caiu fiom a full pack anu uoes not ieplace
it, the iesult of the uiaw maue afteiwaius will be uepenuent on the fiist
uiaw.
Multiplicationtheoremonprobabilities
We shall uiscuss multiplication theoiem on piobabilities foi both
inuepenuent anu uepenuent events.
Multiplicationtheoremonprobabilitiesforindependentevents
If two events A anu B aie inuepenuent, the piobability that both of
them occui is equal to the piouuct of theii inuiviuual piobabilities. i.e P(A
B)=P(A).P(B)
Biostatistics
2010
81
Proof:
0ut of n
1
possible cases let m
1
cases be favouiable foi the occuiience of the
event A.
P(A) =
m
1
n
1
0ut of n
2
possible cases, let m
2
cases be favouiable foi the occuiience of the
event B
P(B) =
m
2
n
2
Each of n
1
possible cases can be associateu with each of the n
2
possible
cases.
Theiefoie the total outcome of the events = n
1
n
2
Similaily each of the m
1
favouiable cases can be associateu with each
of the m2
favouiable cases.
So the total no. of favouiable cases foi the event A anu B is m1 m2
P(A r B) =
m
1
m
2
n
1
n
2
=
m
1
n
1
.
m
2
n
2
= P(A). P(B)
Note:
The theoiem can be extenueu to thiee oi moie inuepenuent events. If
A, B, C . be inuepenuent events, then
P(A r B r C ) = P(A). P(B). P(C)
If A anu B aie inuepenuent then the complements of A anu B aie also
inuepenuent. i.e P(A r B) = P(A). P(B)
Multiplicationtheoremfordependentevents
If A anu B be two uepenuent events, i.e the occuiience of one event is
affecteu by the occuiience of the othei event, then the piobability that both A
anu B will occui is P(AB)=P(A)P(B/A).
Biostatistics
2010
82
Probabilitydistribution
The piobability uistiibution shows how the set of all possible
mutually exclusive events is uistiibuteu. The piobability uistiibution can be
iegaiueu as the theoietical equivalent of an empiiical ielative fiequency
uistiibution, with its own mean anu vaiiance. A piobability uistiibution
compiises all the values that the ianuom vaiiable can take, with theii
associateu piobabilities.
BinomialdistributionorBernoulliandistribution
It is a piobability uistiibution expiessing the piobability of one set of
alteinatives, i.e. success oi failuie. It is uiscoveieu by Swiss Nathematician
}ames Beinoulli (16S417uS).It is uevelopeu unuei the following
assumptions:
1. An expeiiment is peifoimeu unuei the same conuition foi a fixeu
numbei of tiials, say n
2. In each such tiial, theie aie only two possible chances of the
expeiiment, success oi failuie
S. The piobability of success uenoteu by p which iemains constant
fiom tiial to tiial anu the piobability of failuie uenoteu by q=1p
4. The tiials aie inuepenuent
In a seiies of n inuepenuent tiails, if p is the constant piobability of
success at a single tiial, then the vaiiate x , i.e the numbei of success at these
n tiails is saiu to follow binomial uistiibution. The vaiiate takes values fiom
0 to n (all integeis), the piobability of getting 0,1,2,,n successes at these n
tiails is q
n
,
n
C
1
q
n1
p,
n
C
2
q
n2
p
2
,,
n
C
x
q
nx
p
x
, , p
n
iespectively, which aie the
iespective teims of binomial expansion (q+p)
n
.
Suppose if we have N sets of n tiials, the numbei of sets in which we
will have 0,1,2,,n success will be given by the successive teims of binomial
expansion N(q+p)
n
. Thus, we classify the sets accoiuing to the numbei of
successes which they contain anu we get a fiequency uistiibution which is
known as the binomial uistiibution.
No.ofsuccess u 1 2 x ... n
Frequency Nq
n
N
n
C
1
q
n1
p N
n
C
2
q
n2
p
2
N
n
C
x
q
nx
p
x
Np
n
Biostatistics
2010
83
PropertiesofBinomialdistribution
1. It is a uistiibution of uiscontinuous oi uisciete vaiiate
2. It has two paiameteis (constants).They aie n anu p, wheie n uenotes the
numbei of inuepenuent tiials anu p uenotes the constant piobability of
success at a single tiial
S. It takes values fiom 0to n (all integeis), i.e. 0,1,2.n
4. Its mean is np; vaiiance is npq, wheie q=1p; S = npq; vaiiance is
always less than mean
S. The uiffeient fiequencies aie uiffeient teims of binomial expansion
N(q+p)
n
6. It is symmetiical, when p=q=
7. When n is laige anu p is small such that np is constant, the binomial
uistiibution tenus to a Poisson uistiibution
8. When n is laige anu p=q=, the binomial tenus to become a noimal
uistiibution
Poissondistribution
Poisson uistiibution was uiscoveieu by a Fiench Nathematiciancum
Physicist Simeon Benis Poisson in 18S7. Poisson uistiibution is also a
uisciete uistiibution. Be ueiiveu it as a limiting case of Binomial uistiibution.
Foi ntiials the binomial uistiibution is (p+q)
n
; when n is laige anu p is small
such that np is constant. It is a uistiibution of iaie events. It is also calleu as
the law of impiobable events. The equation is:
P(X = x) =
Nc
m
. m
x
x!
Wheie, m is the mean, N is the total fiequency anu x is the no. success
PropertiesofPoissondistribution
1. It has one paiametei m
2. It is a uistiibution of uisciete vaiiate
S. It takes values fiom u to
4. Nean is appioximately equal to vaiiance
S. Skewness is 1/m
6. Kuitosis is 3+[1/m]
Biostatistics
2010
84
NormalDistributionorNormalProbabilityDistributionorGaussian
Distribution
In the pieceuing sections we have uiscusseu the uisciete uistiibutions,
the Binomial anu Poisson uistiibution. In this section we ueal with the most
impoitant continuous uistiibution, known as noimal piobability uistiibution
oi simply noimal uistiibution. It is impoitant foi the ieason that it plays a
vital iole in the theoietical anu applieu statistics. The noimal uistiibution
was fiist uiscoveieu by Be Noivie (English Nathematician) in 17SS as
limiting case of binomial uistiibution. Latei it was applieu in natuial anu
social science by Laplace (Fiench Nathematician) in 1777. The noimal
uistiibution is also known as uaussian uistiibution in honoui of Kail
Fiieuiich uauss (18u9).
A continuous ianuom vaiiable X is saiu to follow noimal uistiibution
with mean anu stanuaiu ueviation , if its piobability uensity function
(x) =
1
o2n
c

1
2
[
x
c
2
Note:
The mean anu stanuaiu ueviation aie calleu the paiameteis of
Noimal uistiibution. The noimal uistiibution is expiesseu by X~N(,
2
)
ConditionofNormalDistribution
i) Noimal uistiibution is a limiting foim of the binomial uistiibution unuei
the following conuitions.
1. n, the numbei of tiials is inuefinitely laige ie., n anu
2. Neithei p noi qis veiy small.
ii) Noimal uistiibution can also be obtaineu as a limiting foim of Poisson
uistiibution with paiametei m
iii) Constants of noimal uistiibution aie mean = , vaiiation =
2
, Stanuaiu
ueviation = .
PropertiesofNormaldistribution
1. It is a uistiibution of continuous vaiiates
2. The vaiiate takes values fiom to+
Biostatistics
2010
85
S. It is symmetiical uistiibution; mean=median=mode
4. The slope of the cuive is bell shapeu. The enus of cuive tails off
asymptotically to the base
S. It has two paiameteis m anu
2
6. The fiist anu the thiiu quaitiles aie equiuistant fiom the meuian
7. MD=4/5oi0.7979
8. i. Mean2/3 coveis Su% of the obseivation
ii. Meancoveis 68.27 % of the obseivation
iii. Mean2 coveis 9S.4S% of the obseivation
iv. Mean3 coveis 99.7S% of the obseivation
9. All ouu moments about mean = u
1u. Skewness is zeio
11. Kuitosis is S. It is mesokuitic.
A fiequency cuive is leptokuitic if kuitosis > S
A fiequency cuive is platykuitic if kuitosis < S
12. If mean = u; SB = 1 then the noimal uistiibution is a stanuaiu noimal
uistiibution anu its equation is y =
N
2n
c

x
2
2
THEORYOFSAMPLING
Sampling is veiy often useu in oui uaily life. Foi example while
puichasing foou giains fiom a shop we usually examine a hanuful fiom the
bag to assess the quality of the commouity.
Populationoi Universe is a complete set of all possible obseivations
of the type which is to be investigateu. Total numbeis of livestock in a state
oi countiy, total numbei of houses in a village oi town aie some examples of
population. Sometimes it is possible anu piactical to examine eveiy peison oi
item in the population we wish to uesciibe. We call this a complete
enumeration, oi census. We use sampling when it is not possible to
measuie eveiy item in the population.
Biostatistics
2010
86
A population is saiu to be finite if it consists of finite numbei of units.
Numbeis of animals in a faim, numbei of woikei in poultiy faim aie
examples of finite population. The total numbei of units in a population is
calleu population size. A population is saiu to be infinite if it has infinite
numbei of units. Foi example the numbei of stais in the sky, the numbei of
biius in the foiests etc.
CensusMethod
Infoimation on population can be collecteu in two ways census
methou anu sample methou. In census methou eveiy element of the
population is incluueu in the investigation. Foi example, if we stuuy the
aveiage milk yielu of the cow of a paiticulai village oi aiea, anu if theie aie
1uuu cows in that aiea, we must stuuy the milk yielu of all 1uuu cows. In this
methou no cow is left out, as each cow is a unit.
LivestockcensusofIndia
The livestock population census of oui countiy is taken at five yeaily
inteivals. The livestock population of uiffeient species is woikeu out on the
basis of livestock census, which was conuucteu quinquennially by the
Biiectoiate of Economics anu Statistics in the Bepaitment of Agiicultuie anu
Coopeiation, Ninistiy of Agiicultuie. The Census woik is caiiieu out by
uiffeient agencies in uiffeient States anu is cooiuinateu by the Biiectoiate of
Economics anu Statistics. Fiom the 17
th
Livestock Census the woik has been
tiansfeiieu to Bepaitment of Animal Busbanuiy & Baiiying, Ninistiy of
Agiicultuie. The fiist livestock census was taken in 1919192u anu the latest
census was taken in 2uu7.
MeritsandlimitationsofCensusmethod
Merits
1. The uata aie collecteu fiom each anu eveiy item of the population
2. The iesults aie moie accuiate anu ieliable, because eveiy item of the
univeise is iequiieu.
S. Intensive stuuy is possible.
4. The uata collecteu may be useu foi vaiious suiveys, analyses etc.
Limitations:
1. It iequiies a laige numbei of enumeiatois anu it is a costly methou
2. It iequiies moie money, laboui, time eneigy etc.
Biostatistics
2010
87
S. It is not possible in some ciicumstances wheie the univeise is infinite.
Sampling
The theoiy of sampling has been uevelopeu iecently but this is not
new. 0sually, the population is too laige foi the ieseaichei to attempt to
suivey all of its membeis. A small, but caiefully chosen sample can be useu to
iepiesent the population. The sample ieflects the chaiacteiistics of the
population fiom which it is uiawn. Statisticians use the woiu sample to
uesciibe a poition chosen fiom the population. A finite subset of statistical
inuiviuuals uefineu in a population is calleu a sample. The numbei of units in
a sample is calleu the sample size. The constituents of a population which
aie inuiviuuals to be sampleu fiom the population anu cannot be fuithei
subuiviueu foi the puipose of the sampling at a time aie calleu sampling
units. Foi example, to know the aveiage milk yielu of cow, each faim owneis
milk yielu of a cow is a sampling unit. Foi auopting any sampling pioceuuie it
is essential to have a list iuentifying each sampling unit by a numbei. Such a
list oi map is calleu sampling fiame. A list of house holueis, a list of villages in
a uistiict, a list of faimeis etc. aie a few examples of sampling fiame.
Reasonsforselectingasample
Sampling is inevitable in the following situations:
1. Complete enumeiations aie piactically impossible when the
population is infinite.
2. When the iesults aie iequiieu in a shoit time.
S. When the aiea of suivey is wiue.
4. When iesouices foi suivey aie limiteu paiticulaily in iespect of
money anu tiaineu peisons.
S. When the item oi unit is uestioyeu unuei investigation.
Parametersandstatistics
We can uesciibe samples anu populations by using measuies such as
the mean, meuian, moue anu stanuaiu ueviation. When these teims uesciibe
the chaiacteiistics of a population, they aie calleu parameters. When they
uesciibe the chaiacteiistics of a sample, they aie calleu statistics. A
paiametei is a chaiacteiistic of a population anu a statistic is a chaiacteiistic
of a sample. Since samples aie subsets of population statistics pioviue
estimates of the paiameteis. That is, when the paiameteis aie unknown, they
Biostatistics
2010
88
aie estimateu fiom the values of the statistics. In geneial, we use uieek oi
capital letteis foi population paiameteis anu lowei case Roman letteis to
uenote sample statistics. (N,,s, aie the stanuaiu symbols foi the size, mean,
S.B of population. n, x, s, aie the stanuaiu symbol foi the size, mean, S.B of
sample iespectively).
PrinciplesofSampling
Samples have to pioviue goou estimates. The following piinciple tell
us that the sample methous pioviue such goou estimates
1. Principleofstatisticalregularity:
A moueiately laige numbei of units chosen at ianuom fiom a
laige gioup aie almost suie on the aveiage to possess the
chaiacteiistics of the laige gioup.
2. PrincipleofInertiaoflargenumbers:
0thei things being equal, as the sample size incieases, the
iesults tenu to be moie accuiate anu ieliable.
3. PrincipleofValidity:
This states that the sampling methous pioviue valiu estimates
about the population units (paiameteis).
4. PrincipleofOptimization:
This piinciple takes into account the uesiiability of obtaining a
sampling uesign which gives optimum iesults. This minimizes the iisk
oi loss of the sampling uesign. The foiemost puipose of sampling is to
gathei maximum infoimation about the population unuei
consiueiation at minimum cost, time anu human powei. This is best
achieveu when the sample contains all the piopeities of the
population.
Samplingerrorsandnonsamplingerrors
The two types of eiiois in a sample suivey aie sampling eiiois anu
non sampling eiiois.
1. Samplingerrors:
Although a sample is a pait of population, it cannot be expecteu
geneially to supply full infoimation about population. So theie may be in
most cases uiffeience between statistics anu paiameteis. The uisciepancy
Biostatistics
2010
89
between a paiametei anu its estimate uue to sampling piocess is known
as samplingerror.
2. Nonsamplingerrors:
In all suiveys some eiiois may occui uuiing collection of actual
infoimation. These eiiois aie calleu Nonsamplingerrors.
AdvantagesofSampling
Theie aie many auvantages of sampling methous ovei census methou.
They aie as follows:
1. Sampling saves time anu laboui.
2. It iesults in ieuuction of cost in teims of money anu manhoui.
S. Sampling enus up with gieatei accuiacy of iesults.
4. It has gieatei scope.
S. It has gieatei auaptability.
6. If the population is too laige, oi hypothetical oi uestioyable sampling
is the only methou to be useu.
LimitationofSampling
The limitations of sampling aie given below:
1. Sampling is to be uone by qualifieu anu expeiienceu peisons.
0theiwise, the infoimation will be unbelievable.
2. Sample methou may give the extieme values sometimes insteau of the
mixeu values.
S. Theie is the possibility of sampling eiiois. Census suivey is fiee fiom
sampling eiioi.
Estimate: is the value of population paiametei obtaineu fiom the sample.
Estimator: is the one which is useu to estimate the value of the population
paiametei. Theie aie two types of estimate:
Point estimate: is a single value which is useu to estimate the population
paiametei
Interval estimate: is an inteival in which population paiametei lies
between. It is also calleu fiuucial limit oi confiuence inteival.
Biostatistics
2010
90
Propertiesofbestestimator
1. Unbiased estimator: If Y is the population mean anu y is its estimate
anu if the mean of all possible estimates is equal to Y, then the
estimatoi is saiu to be unbiaseu. In othei woius E (y) = Y, then y is
unbiaseu estimate Y.
2. Consistency estimator: When sample size n is incieaseu inuefinitely,
then piobability estimates that is close to will be close to 1.
When n appioaches N the piobability that is close to will be
close to 1.
S. Minimumvarianceestimator: If theie aie two estimatois
,
i
is < I [
i
or oll [
i
i
s.
4. Best estimator: If an estimatoi is unbiaseu anu consistent, it is calleu
best estimatoi.
S. Linearestimator:
= k
1
y
1
+k
2
y
2
+k
3
y
3
++k
n
y
n
wheie k
1
, k
2,
, k
n
aie
constants.
6. Efficient estimator: unbiaseu, minimum vaiiance estimatoi is calleu
efficient estimatoi
Usesofstandarderror
1. Stanuaiu eiioi of the mean inuicates the aveiage vaiiations in sample
means fiom the population mean. Stanuaiu eiioi is useu as an
instiument oi basis foi testing the hypothesis.
2. The magnituue of stanuaiu eiioi gives us an iuea about the ieliability
of the sample. The gieatei the stanuaiu eiioi is gieatei the uepaituie
of actual value fiom the expecteu value anu hence gieatei the
unieliability (lessei the ieliability) of sample. The iecipiocal of
stanuaiu eiioi is taken as a measuie of ieliability oi piecision of
sample. That is, Prccision =
1
S.L
i. c. StonJorJ Error =
S..
n
Biostatistics
2010
91
As n incieases, SE uecieases anu hence piecision incieases. In laige
samples, sampling uistiibution of statistics appioximates noimal
uistiibution
S. Stanuaiu eiioi helps us to ueteimine the limits, within which the
population paiameteis aie expecteu to lie.
TypesofSampling
The technique of selecting a sample is of funuamental impoitance in
sampling theoiy anu it uepenus upon the natuie of investigation. The
sampling pioceuuies which aie commonly useu may be classifieu as
1. Piobability sampling.
2. Nonpiobability sampling.
S. Nixeu sampling.
Probabilitysampling(Randomsampling)
A piobability sample is one wheie the selection of units fiom the
population is maue accoiuing to known piobabilities. Example: Simple
ianuom sample, piobability piopoitional to sample size etc.
NonProbabilitysampling
It is the one wheie uiscietion is useu to select iepiesentative units
fiom the population (oi) to infei that a sample is iepiesentative of the
population. This methou is calleu judgment or purposive sampling. This
methou is mainly useu foi opinion suiveys; A common type of juugment
sample useu in suiveys is quota sample. This methou is not useu in geneial
because of piejuuice anu bias of the enumeiatoi. Bowevei if the enumeiatoi
is expeiienceu anu expeit, this methou may yielu valuable iesults. Foi
example, in the maiket ieseaich suivey of the peifoimance of theii new cai,
the sample was all new cai puichaseis.
MixedSampling
Beie samples aie selecteu paitly accoiuing to some piobability anu
paitly accoiuing to a fixeu sampling iule; they aie teimeu as mixeu samples
anu the technique of selecting such samples is known as mixedsampling.
Biostatistics
2010
92
Methodsofselectionofsamples
1. Simple ianuom sampling.
2. Stiatifieu ianuom sampling.
S. Systematic ianuom sampling.
1.Simplerandomsampling
A simple ianuom sample fiom finite population is a sample selecteu
such that each possible sample combination has equal piobability of being
chosen. It is also calleu uniestiicteu ianuom sampling.
Simplerandomsamplingwithoutreplacement
In this methou the population elements can entei the sample only
once (ie) the units once selecteu is not ietuineu to the population befoie the
next uiaw.
Simplerandomsamplingwithreplacement
In this methou the population units may entei the sample moie than
once. Simple ianuom sampling may be with oi without ieplacement.
Methodsofselectionofasimplerandomsampling
The following aie some methous of selection of a simple ianuom
sampling.
a)LotteryMethod:
This is the most populai anu simplest methou. In this methou all the
items of the population aie numbeieu on sepaiate slips of papei of same size,
shape anu coloui. They aie folueu anu mixeu up in a containei. The iequiieu
numbeis of slips aie selecteu at ianuom foi the uesiie sample size. Foi
example, if we want to select S stuuents, out of Su stuuents, then we must
wiite theii names oi theii ioll numbeis of all the Su stuuents on slips anu
mix them. Then we make a ianuom selection of S stuuents. This methou is
mostly useu in lotteiy uiaws. If the univeise is infinite this methou is
inapplicable.
b)TableofRandomnumbers:
As the lotteiy methou cannot be useu, when the population is infinite,
the alteinative methou is that of using the table of ianuom numbeis. Theie
aie seveial stanuaiu tables of ianuom numbeis.
Biostatistics
2010
93
1. Tippetts table
2. Fishei anu Yates table
S. Kenuall anu Smiths table aie the thiee tables among them.
A ianuom numbei table is so constiucteu that all uigits u to 9 appeai
inuepenuent of each othei with equal fiequency. If we have to select a sample
fiom population of size N= 1uu, then the numbeis can be combineu thiee by
thiee to give the numbeis fiom uu1 to 1uu.
Proceduretoselectasampleusingrandomnumbertable
0nits of the population fiom which a sample is iequiieu aie assigneu
with equal numbei of uigits. When the size of the population is less than
thousanu, thiee uigit numbei uuu,uu1,uu2.. 999 aie assigneu. We may
stait at any place anu may go on in any uiiection such as column wise oi
iow wise in a ianuom numbei table. But consecutive numbeis aie to be
useu. 0n the basis of the size of the population anu the ianuom numbei table
available with us, we pioceeu accoiuing to oui convenience. If any ianuom
numbei is gieatei than the population size N, then N can be subtiacteu fiom
the ianuom numbei uiawn. This can be iepeateuly until the numbei is less
than N oi equal to N.
Example 1:
In an area there are 500 families. Using the following extract from a table of random numbers
select a sample of 15 families to find out the standard of living of those families in that area.
4652 3819 8431 2150 2352 2472 0043 3488
9031 7617 1220 4129 7148 1943 4890 1749
2030 2327 7353 6007 9410 9179 2722 8445
0641 1489 0828 0385 8488 0422 7209 4950
Solution:
In the above random number table we can start from any row or column and read three digit
numbers continuously rowwise or column wise.
Now we start from the third row, the numbers are:
203 023 277 353 600 794 109 179
272 284 450 641 148 908 280
Since some numbers are greater than 500, we subtract 500 from those numbers and we rewrite
the selected numbers as follows:
203 023 277 353 100 294 109 179
272 284 450 141 148 408 280
Biostatistics
2010
94
c)Randomnumberselectionsusingcalculatorsorcomputers
Ranuom numbei can be geneiateu thiough scientific calculatoi oi
computeis. Foi each piess of the key get a new ianuom numbeis. The ways
of selection of sample is similai to that of using ianuom numbei table.
Meritsofusingrandomnumbers
Merits:
1. Peisonal bias is eliminateu as a selection uepenus solely on chance.
2. A ianuom sample is in geneial a iepiesentative sample foi a
homogenous population.
S. Theie is no neeu foi the thoiough knowleuge of the units of the
population.
4. The accuiacy of a sample can be testeu by examining anothei sample
fiom the same univeise when the univeise is unknown.
S. This methou is also useu in othei methous of sampling.
Limitations:
1. Piepaiing lots oi using ianuom numbei tables is teuious when the
population is laige.
2. When theie is laige uiffeience between the units of population, the
simple ianuom sampling may not be a iepiesentative sample.
S. The size of the sample iequiieu unuei this methou is moie than that
iequiieu by stiatifieu ianuom sampling.
4. It is geneially seen that the units of a simple ianuom sample lie apait
geogiaphically. The cost anu time of collection of uata aie moie.
2.StratifiedRandomSampling
0f all the methous of sampling the pioceuuie commonly useu in
suiveys is stiatifieu sampling. This technique is mainly useu to ieuuce the
population heteiogeneity anu to inciease the efficiency of the estimates.
Stiatification means uivision into gioups. In this methou the population is
uiviueu into a numbei of subgioups oi stiata. The stiata shoulu be so foimeu
that each stiatum is homogeneous as fai as possible. Then fiom each stiatum
a simple ianuom sample may be selecteu anu these aie combineu togethei to
foim the iequiieu sample fiom the population.
TypesofStratifiedSampling
Theie aie two types of stiatifieu sampling. They aie proportionalanu
nonproportional. In the piopoitional sampling equal anu piopoitionate
Biostatistics
2010
95
iepiesentation is given to subgioups oi stiata. If the numbei of items is laige,
the sample will have a highei size anu vice veisa. The population size is
uenoteu by N anu the sample size is uenoteu by n the sample size is
allocateu to each stiatum in such a way that the sample fiactions is a
constant foi each stiatum. That is given by n/N = c. So in this methou each
stiatum is iepiesenteu accoiuing to its size. In nonpiopoitionate sample,
equal iepiesentation is given to all the substiata iegaiuless of theii
existence in the population.
Example 2:
A sample of 50 students is to be drawn from a population consisting of 500 students belonging to
two institutions A and B. The number of students in the institution A is 200 and the institution B is 300. How
will you draw the sample using proportional allocation?
Solution:
There are two strata in this case with sizes N1 = 200 and N2 = 300 and the total population N = N1 +
N2 = 500
The sample size is 50.
If n1 and n2 are the sample sizes,
n
1
=
n
N
1
N
1
=
Su
Suu
2uu = 2u
n
2
=
n
N
2
N
2
=
Su
Suu
Suu = Su
The sample sizes are 20 from A and 30 from B. Then the units from each institution are to be
selected by simple random sampling.
Meritsandlimitationsofstratifiedsampling
Merits:
1. It is moie iepiesentative.
2. It ensuies gieatei accuiacy.
S. It is easy to auministei as the univeise is sub uiviueu.
4. uieatei geogiaphical concentiation ieuuces time anu expenses.
S. When the oiiginal population is bauly skeweu, this methou is
appiopiiate.
6. Foi non homogeneous population, it may fielu goou iesults.
Biostatistics
2010
96
Limitations:
1. To uiviue the population into homogeneous stiata, it iequiies moie
money, time anu statistical expeiience which is a uifficult one.
2. Impiopei stiatification leaus to bias, if the uiffeient stiata oveilap
such a sample will not be a iepiesentative one.
3.SystematicSampling
This methou is wiuely employeu because of its ease anu convenience.
A fiequently useu methou of sampling when a complete list of the population
is available is systematic sampling. It is also calleu Quasirandom
sampling.
Selectionprocedure
The whole sample selection is baseu on just a ianuom stait. The fiist
unit is selecteu with the help of ianuom numbeis anu the iest get selecteu
automatically accoiuing to some pie uesigneu pattein is known as
systematic sampling. With systematic ianuom sampling eveiy K
th
element
in the fiame is selecteu foi the sample, with the staiting point among the fiist
K elements ueteimineu at ianuom.
Foi example, if we want to select a sample of Su stuuents fiom Suu
stuuents unuei this methou K
th
item is pickeu up fiom the sampling fiame
anu K is calleu the samplinginterval.
Sampling inteival, K =
N
n
=
Populotion sizc
Somplc sizc
K =
Suu
Su
= 1u
K = 1u is the sampling inteival. Systematic sample consists in selecting
a ianuom numbei say i K anu eveiy K
th
unit subsequently. Suppose the
ianuom numbei i is S, then we select S, 1S, 2S, SS, 4S The ianuom
numbei i is calleu ianuom stait. The technique will geneiate K systematic
samples with equal piobability.
Merits:
1. This methou is simple anu convenient.
Biostatistics
2010
97
2. Time anu woik is ieuuceu much.
S. If piopei caie is taken iesult will be accuiate.
4. It can be useu in infinite population.
Limitations:
1. Systematic sampling may not iepiesent the whole population.
2. Theie is a chance of peisonal bias of the investigatois.
Systematic sampling is piefeiably useu when the infoimation is to be
collecteu fiom tiees in a foiest, house in blocks, entiies in a iegistei which
aie in a seiial oiuei etc.
TESTOFSIGNIFICANCE
It is a statistical pioceuuie followeu to test the significant uiffeience
between statistics anu the paiametei oi between any two statistics. i.e.
between sample mean anu population mean oi between two sample means.
The theoiy of testing hypothesis was fiist oiiginateu by Neyman anu Peaison
in 1928
Hypothesis: Any statement maue about the population
Null hypothesis: Theie is no significant uiffeience between statistics anu
paiametei oi between any two statistics. It is usually uenoteu by H
o.
Null
hypothesis is nevei pioveu. It is eithei accepteu oi iejecteu at some level of
significance. 0sually we will have two levels of significance. Five pei cent
level of significance means that we may go wiong in S out of 1uu occasions.
At 1% level of significance, the eiioi in accepting null hypothesis is ieuuceu
i.e. we may go wiong in one out of 1uu occasions.
Alternate hypothesis: Statement contiaiy to null hypothesis is alteinate
hypothesis anu is uenoteu by H
1
.
Levelofsignificance:In testing a given hypothesis, the maximum piobability
with which we woulu be willing to take iisk is calleu level of significance of
the test. This piobability often uenoteu by is geneially specifieu befoie
samples aie uiawn. The level of significance usually employeu in testing of
significance aie u.uS (oi S %) anu u.u1 (oi 1 %). If foi example a u.uS oi S %
level of significance is chosen in ueiiving a test of hypothesis, then theie aie
Biostatistics
2010
98
about S chances in 1uu that we woulu ieject the hypothesis when it shoulu
be accepteu. (i.e.,) we aie about 9S % confiuent that we maue the iight
uecision. In such a case we say that the hypothesis has been iejecteu at S %
level of significance which means that we coulu be wiong with piobability
u.uS.
Degreesoffreedom: The numbei of obseivations which aie fiee to move oi
fiee to vaiy. The numbei of uegiees of fieeuom is the numbei of inuepenuent
obseivations in a sample of uata that aie available to estimate a paiametei of
the population fiom which that sample is uiawn.
CriticalValue(s):The ciitical value(s) foi a hypothesis test is a thiesholu to
which the value of the test statistic in a sample is compaieu to ueteimine
whethei oi not the null hypothesis is iejecteu.
The ciitical value significant value foi any hypothesis test uepenu on
the significance level at which the test is caiiieu out, anu whethei the test is
onesiueu oi twosiueu.
Critical Region: The ciitical iegion CR, oi iejection iegion RR, is a set of
values of the test statistic foi which the null hypothesis is iejecteu in a
hypothesis test. That is, the sample space foi the test statistic is paititioneu
into two iegions; one iegion (the ciitical iegion) will leau us to ieject the null
hypothesis H
0
, the othei will not. So, if the obseiveu value of the test statistic
is a membei of the ciitical iegion, we concluue "Reject H
0
"; if it is not a
membei of the ciitical iegion then we concluue "Bo not ieject H
0
".
Differenttestsofsignificance
Parametrictest
A uistiibution will be attacheu to the test. The vaiious paiametiic tests aie
1. Noimal ueviate test oi laige sample test oi Z test
2. Stuuents t test oi small sample test
S. Chisquaie test
4. vaiiance iatio test oi Ftest
NonParametrictest
It will be fiee fiom uistiibution anu it is calleu uistiibution fiee
methou.
Biostatistics
2010
99
In any test, we take any foui types of uecisions:
# Null hypothesis may be tiue but we ieject it by oui test Type I error
(eiioi of the fiist kinu, an eiioi, oi a false positive). The false positive
iate is equal to the significance level. Type I eiioi is often consiueieu to
be moie seiious.
# Null hypothesis may be false but we accept it TypeIIerror (eiioi of the
seconu kinu, a eiioi, oi a false negative). The "powei" (oi the
"sensitivity") of the test is equal to 1 minus
# Null hypothesis is tiue anu we accept it coiiect uecision.
# Null hypothesis is false anu we ieject it coiiect uecision.
Note:
Large Sample: A sample is laige when it consists of moie than Su
items.
SmallSample: A sample is small when it consists of Su oi less than Su
items.
Statisticalprocedurefollowedinanytestforsignificance
1. Null hypothesis: set up null hypothesis H0.
2. Alteinative Bypothesis: Set up alteinative hypothesis H1, which is
complementaiy to H
0 which will inuicate whethei one taileu (iight oi
left taileu) oi two taileu test is to be applieu.
S. Level of significance: Choose an appiopiiate level of significance (),
is fixeu in auvance.
4. Test statistic (oi test of ciiteiion):
Icst stotistics =
(Stotistics Poromctcr
S. E. o Jicrcncc
=
icrcncc in tbc :oluc o two poromctcrs
S. E. o Jicrcncc
S. Infeience: We compaie the computeu value (in absolute value) with
the significant value (ciitical value) to take uecisions.
Biostatistics
2010
100
LARGESAMPLETEST
The pioceuuie, which ueciues a ceitain hypothesis is tiue oi false, is
calleu the test of hypothesis (oi test of significance). The tests of significance
useu foi pioblems of laige samples aie uiffeient fiom those useu in case of
small samples as the assumptions useu in both cases aie uiffeient. The
following assumptions aie maue foi pioblems uealing with laige samples:
i. Almost all the sampling uistiibutions follow noimal asymptotically.
ii. The sample values aie appioximately close to the population values.
1. To test the significant difference between sample mean and
populationmean
Step 1: H
0
: =
u
: No significant uiffeience between sample mean anu
population mean
H
1
:
u
( >
u
oi <
u
)
Step2: Test statistics oi Z statistics is given by,
Z =
(Somplc mcon Populotion mcon)
S. E. o Jicrcncc
Z =
(x p)
S. E. (x p)
Z =
(x p)
o
n
Wheie
x = mean of sample;
= mean of population;
= SB of population anu
n = size of sample
When is not known, ieplace it by s which is S.B. of sample. Then,
Biostatistics
2010
101
Z =
(x p)
s
n
Step3: Conclusion
i. Z < 1.96, we say that Zis not significant anu Ho is accepteu. We uenote
this by Z = ( )
NS
.
ii. Z > 1.96, Z is significant anu Ho is iejecteu. We uenote this by Z= ( )
*
.
iii. Z > 2.S8, Z is highly significant anu Ho is iejecteu. We uenote this as Z =
( )
**
.
2 (a). To test the significant difference between two sample means,
whentheyaretakenfromdifferentpopulation
H
0
:
1
=
2
: Theie is no significant uiffeience between two sample means.
H
1
:
1
2
(
1
>
2
oi
1
<
2
)
Test statistic
Z =
(x
1
 x
2
)
SE(x
1
 x
2
)
Wheie
x
1
, x
2
aie sample means with sizes n
1
anu n
2
iespectively.
Z =
(x
1
 x
2
)
_
o
1
2
n
1
+
o
2
2
n
2
Wheie
1
,
2
SB of the population
When
1
,
2
aie not available s
1
,s
2
can be useu
Biostatistics
2010
102
Z =
(x
1
 x
2
)
_
s
1
2
n
1
+
s
2
2
n
2
Wheie
x
1
, x
2
aie sample mean of sizes n
1
anu n
2
with SB of s
1
,s
2
iespectively.
Conclusion: as in the pievious test
2 (b). To test the significant difference between the sample means,
whentheyaretakenfromsamepopulation
Ho: Theie is no significant uiffeience between the two sample means
Z =
(x
1
 x
2
)
o_
1
n
1
+
1
n
2
Wheie (SB of Population) is not known, rcplocc o
2
= _
n
1
s
1
2
+n
2
s
2
2
n
1
+n
2
Z =
(x
1
 x
2
)
_
s
1
2
n
1
+
s
2
2
n
2
Conclusion: as in the pievious test
3. To test significant difference between sample proportion with
populationproportion
It n is the numbei of tiials anu p is the piopoition of success out of
n tiials, anu P is the expecteu piopoition of success, then p follows noimal
uistiibution when n is laige.
H0:P=P0 : Theie is no significant uiffeience between panu P.
H1=PP0(P>P0oiP<P0)
Biostatistics
2010
103
TeststatisticsZ =
pP
S.L.(pP)
Z =
(p P)
_
P
n
Wheie Q=1P
Conclusion: as in the pievious test
4.Totestthesignificantdifferencebetweentwoproportions
H0 : P1 =P2 = P (say) : Theie is no significant uiffeience between the two
piopoitions
H1:P1P2(P1>P2 oi P1<P2)
Let p
1
,p
2
aie obseiveu piopoitions of success out of n
1
,n
2
tiails iespectively.
Z =
(p
1
p
2
)
S. E. (p
1
p
2
)
Z =
(p
1
p
2
)
_
P
1
1
n
1
+
P
2
2
n
2
Wheie Q
1
=1P
1,
Q
2
=1P
2
(oi)
Z =
(p
1
p
2
)
_P
`
`
[
1
n
1
+
1
n
2
Wheie P
`
=
n
1
p
1
+n
2
p
2
n
1
+n
2
anu
`
= 1 P
`
5.Testofsignificanceofanobservedcorrelationcoefficient
Let r be the sample coiielation coefficient anu be the population
coiielation coefficient.
Biostatistics
2010
104
Ho: Theie is no significant uiffeience between coiielation coefficients of the
sample anu population.
Z =
(r p)
S. E. (r p)
Z =
(r p)
1 p
2
n
Conclusion: as in the pievious test.
Note: In this case, we aie not inteiesteu about r being significantly uiffeient
fiom , but we aie inteiesteu whethei theie is a significant coiielation
between the vaiiables stuuieu. i.e. whethei r is significantly uiffeient fiom u
(zeio).
Bence, in the above foimula, put =o
Z =
(r u)
1 u
n
Z = rn
If Z is not significant, then we concluue that r is not significant i.e.
theie is no coiielation between the vaiiables stuuieu. It Z is significant, then
theie is significant coiielation. Then, we uenote this as r = ( ).* If Z is highly
significant, then we say that r is highly significant anu r = ( )**. In both
cases, both vaiiables stuuieu aie ielateu.
SMALLSAMPLETEST
Stuuent tuistiibution was publisheu in 19u8 by William Sealy uosset
unuei the pen name Stuuent. When the sample size is laige anu vaiiates aie
noimally uistiibuteu, noimal test is employeu to test the significance of
uiffeiences. With small samples anu when the uegiees of fieeuom less than
Su, the vaiiates aie not noimally uistiibuteu. Bence we make use of thet
uistiibution anut test of significance. Tables have been constiucteu ielating
tot at uiffeient piobability levels foi vaiious uegiees of fieeuom.
Biostatistics
2010
105
Assumptionsforstudentsttest
1. The paient population fiom which the sample uiawn is
noimal.
2. The sample obseivations aie ianuom anu inuepenuent.
S. The population stanuaiu ueviation s is not known.
Propertiesoftdistribution
1. tuistiibution ianges fiom to + just as uoes a noimal
uistiibution.
2. Like the noimal uistiibution, tuistiibution also symmetiical anu has a
mean zeio.
S. tuistiibution has a gieatei uispeision than the stanuaiu noimal
uistiibution.
4. As the sample size appioaches Su, the tuistiibution, appioaches the
Noimal uistiibution.
1.Totestthesignificanceofthesamplemeanfromthepopulationmean
Step 1: H
0
: =
0
: No significant uiffeience between sample mean anu
population mean
H
1
:
0
(>
0
oi <
0
)
Step2: Test statistic is given by
t =
(Somplc mcon Populotion mcon)
S. E. o Jicrcncc
t =
(x p)
S. E. (x p)
t =
(x)
s
n
~ Stuuents tuistiibution with (n1) u.f
x = mean of the sample of size n with SB s with u. f. = (n1)
= population mean
Biostatistics
2010
106
Step3: Conclusion
i. If t < table value of t foi (n1) u.f. at S% level t is nonsignificant anu H
O
is accepteu. We uenote this as t = ( )
N.S.
ii. If t > table value of t foi (n1) u.f. at S% level t is significant. Ho is
iejecteu, we uenote this as t = ( )*.
iii. If t > table value of t foi (n1) u.f. at 1 % level t is highly significant. Ho
is iejecteu. We uenote this as t = ( ) **
2. To test the significant difference between two sample means when
thesizesarelessthan30andtheyaredependent(Pairedttest)
Two samples aie uepenuent when they have some common factoi
linking the obseivations in the two samples.
Step1: H
0
:
1
=
2
: Theie is no significant uiffeience between the two sample
means
H
1
:
1
2
(
1
>
2
oi
1
<
2
)
Step2: Test statistic is given by t =
(d0)
S.L.o] d
t =
d
s
n
~ Stuuents tuistiibution with (n1) u.f
Wheie d = Biffeience in the obseivation of the two samples
s = SB of J
Conclusion: As in pievious test
3.Totestthesignificantdifferencebetweenthetwosamples,whenthe
samplesareindependent(NonPairedorUnpairedttest)
By inuepenuence of the two samples, we mean that theie is no
ielationship between inuiviuuals contiibuting the sample. Thus, the sample
uiawn fiom uiffeient populations oi uiffeient paits of the same population
will be inuepenuent.
Biostatistics
2010
107
Step1: H
0
:
1
=
2
: Theie is no significant uiffeience between the two sample
means
H
1
:
1
2
(
1
>
2
oi
1
<
2
)
Step2: Test statisticst is given by
t =
(icrcncc in tc mcons o tc Somplcs)
S. E. o Jicrcncc
t =
(x
1
 x
2
)
_s
2
[
1
n
1
+
1
n
2
2
= _
(0
E
)
2
E
_
Wheie O iefeis to the obseiveu fiequencies anu E iefeis to the
expecteu fiequencies.
Note: If
2
is zeio, it means that the obseiveu anu expecteu fiequencies
coinciue with each othei. The gieatei the uisciepancy between the obseiveu
anu expecteu fiequencies the gieatei is the value of
2
.
PropertiesofChisquaredistribution
1. The Nean of
2
uistiibution is equal to the numbei of uegiees of
fieeuom (n)
2. The vaiiance of
2
uistiibution is equal to 2n
S. The meuian of
2
uistiibutions uiviues, the aiea of the cuive into two
equal paits, each pait being u.S.
4. The moue of
2
uistiibution is equal to (n2)
S. Since Chisquaie values always positive, the Chi squaie cuive is always
positively skeweu.
Biostatistics
2010
109
6. Since Chisquaie values inciease with the inciease in the uegiees of
fieeuom, theie is a new Chisquaie uistiibution with eveiy inciease in
the numbei of uegiees of fieeuom.
7. The lowest value of Chisquaie is zeio anu the highest value is infinity ie
2
0.
8. When Two Chi squaies
1
2
anu
2
2
aie inuepenuent
2
uistiibution
with n
1
anu n
2
uegiees of fieeuom anu theii sum
1
2
+
2
2
will follow
2
uistiibution with (n
1
+n
2
) uegiees of fieeuom.
9. When n (u.f) > Su, the uistiibution of 2
2
appioximately follows
noimal uistiibution. The mean of the uistiibution 2
2
is 2n 1 anu
the stanuaiu ueviation is equal to 1.
Conditionsforapplying
2
test:
The following conuitions shoulu be satisfieu befoie applying
2
test.
1. N, the total fiequency shoulu be ieasonably laige, say gieatei than Su.
2. No theoietical cellfiequency shoulu be less than S. If it is less than S,
the fiequencies shoulu be pooleu togethei in oiuei to make it S oi moie
than S.
S. Each of the obseivations which make up the sample foi this test must
be inuepenuent of each othei.
4.
2
test is wholly uepenuent on uegiees of fieeuom.
TestingtheGoodnessoffit(BinomialandPoissonDistribution)
Kail Peaison in 19uu uevelopeu a test foi testing the significance of
the uisciepancy between expeiimental values anu the theoietical values
obtaineu unuei some theoiy oi hypothesis. This test is peifoimeu to test
whethei the ueviation of obseiveu fiequencies in a given uata fiom the
expecteu fiequencies aie uue to ieal causes oi uue to chance. In othei woius,
this is a test to ueciue whethei the obseiveu fiequencies aie in accoiuance
with the fiequencies within statistical limits. This test is useu to ueciue
whethei the given uata has a goou fit with one of the known foims of
uistiibutions, viz., Noimal, Binomial oi Poisson. This test is also useu to test
the obseiveu numbei of piogenies in a genetic expeiiment to fit in Nenualian
laws of heieuity. The test statistic foi goouness of fit is given by
2
= j
(0
i
L
i
)
2
L
i
[ ~
2
uistiibution with u.f = No.ofclasses1
Biostatistics
2010
110
In this case,
Ho: The fit is goou oi theie is no significant uiffeience between obseiveu anu
expecteu fiequency.
Conclusion
1. If calculateu
2
is less than table
2
foi the iespective uegiees of
fieeuom at S% level.
2
is not significant which is uenoteu by
2
=( )
N.S
i.e.,
Ho is accepteu. The fit is goou.
2. If calculateu
2
is > table
2
foi the iespective u.f. at S% level
2
is
significant anu uenoteu by
2
= ( )*. H
o
is iejecteu. The fit is not goou
oi theoietical fiequencies aie not accoiuing to theoiy.
S. If calculateu
2
is > table
2
foi the iespective uf at 1% level
2
is
highly significant anu uenoteu by
2
= ( )**. H
o
is iejecteu. The fit is
not goou oi theoietical fiequencies aie not accoiuing to theoiy.
Chisquaretestofindependence
This is peifoimeu when the uata is piesenteu in the foim of
contingency table. A table giving the simultaneous classification of the bouy
of uata in two uiffeient ways is calleu a contingency table If theie aie r
iows anu c columns, the table is saiu to be an r x c contingency table.
2
test is applieu to test whethei the factois classifieu aie inuepenuent oi not.
I.e. the 2 factois aie associateu oi not. In the contingency table both factois
may be qualitative oi one qualitative anu the othei quantitative oi both
quantitative. The uegiee of fieeuom foi rxc contingency table is (r1)(c1).
Applicationof
2
statisticsina2x2
2
contingencytable
LevelI LevelII Total
Level I a b a + b
Level II c u c + u
Total a+c b+d a+b+c+d=N
Step1: H
O
: The factois aie inuepenuent
Step2:
2
is given by
Biostatistics
2010
111
2
=
(o +b +c +J)(oJ bc)
2
(o +b)(c +J)(o +c)(b +J)
witb J. . = (2 1)(2 1) = 1
2
=
N(oJ bc)
2
(o +b)(c +J)(o +c)(b +J)
witb J. . = 1
Conclusion: as in the case of
2
test of goouness of fit (with the iespective u.f).
The above foimula is applicable when all a,b,c,d aie gieatei than S.
Yatescorrection
In a 22 contingency table, the numbei of u.f. is (21) (21) =1. If any
one of the theoietical cell fiequency is less than S, the use of pooling methou
will iesult in u.f = u which is meaningless. In this case we apply a coiiection
given by F. Yate (19S4) which is usually known as Yates coiiection foi
continuity is to be applieu which is as follows:
2
=
N[oJ bc 
n
2
2
(o +b)(c +J)(o +c)(b +J)
witb J. . = 1
In a (r x c) contingency table, the expecteu value (E) in the i
th
iow anu j
th
column is calculateu by
R
i
C
]
N
Wheie
R
i
= Sum of all the values in the i
th
iow
C
j
= Sum of all the values in the j
th
column
N = uianu total i.e., the sum of all the values in the given contingency
table
Then,
2
= j
(0
i
L
i
)
2
L
i
[ with u.f. (r1)(c1)
Conclusion: as in pievious test (with the iespective u.f.)
FTESTorVARIANCERATIOTEST
Suppose we aie inteiesteu to test whethei the two noimal
populations have same vaiiance oi not. Let x
1
, x
2
, x
3
. x
n
1
be a ianuom
Biostatistics
2010
112
sample of size n
1
, fiom the fiist population with vaiiance
1
2
anu y
1
, y
2
, y
3
. y
n
2
be ianuom sample of size n
2
foim the seconu population
with a vaiiance
2
2
. 0bviously the two samples aie inuepenuent.
Nullhypothesis:
H0=
1
2
=
2
2
=
2
i.e. population vaiiances aie same. In othei woius H0 is that
the two inuepenuent estimates of the common population vaiiance uo not
uiffei significantly.
The F Statistics is given by
F =
EstimotcJ :orioncc o irst somplc
EstimotcJ :orioncc o scconJ somplc
F =
S
1
2
S
2
2
with u.f= (n
1
1),(n
2
1)
S
1
2
anu S
2
2
aie the vaiiance of the sample of sizes n
1
,n
2
iespectively.
It shoulu be noteu that numeiatoi is always gieatei than the
uenominatoi in Fiatio
F =
Smollcr Iorioncc
Iorgcr Iorioncc
Conclusion:
i. If cal F < tab F foi u.f. = (n
1
1),(n
2
1)at S% level, F is not significant.
H
o
is accepteu.
ii. If cal F > tab Ffoi u.f. = (n
1
1),(n
2
1)at S% level, F is significant, F = (
)*. Ho is iejecteu
iii. If cal F > tab F foi u.f. = (n
1
1), (n
2
1) at 1% level, F is highly
significant. F = ( )**. Ho is iejecteu.
Since F test is baseu on the iatio of vaiiances it is also known as the
vaiiance Ratio test. The iatio of two vaiiances follows a uistiibution calleu
the F uistiibution nameu aftei the famous statisticians R.A. Fishei.
Biostatistics
2010
113
ANALYSISOFVARIANCE
The analysis of vaiiance is a poweiful statistical tool foi tests of
significance. The teim Analysis of vaiiance was intiouuceu by Piof. R.A.
Fishei to ueal with pioblems in agiicultuial ieseaich. The test of significance
baseu on tuistiibution is an auequate pioceuuie only foi testing the
significance of the uiffeience between two sample means. In a situation
wheie we have thiee oi moie samples to consiuei at a time, an alteinative
pioceuuie is neeueu foi testing the hypothesis that all the samples aie uiawn
fiom the same population, i.e., they have the same mean. Foi example, five
feitilizeis aie applieu to foui plots each of wheat anu yielu of wheat on each
of the plot is given. We may be inteiesteu in finuing out whethei the effect of
these feitilizeis on the yielus is significantly uiffeient oi in othei woius
whethei the samples have come fiom the same noimal population. The
answei to this pioblem is pioviueu by the technique of analysis of vaiiance.
Thus basic puipose of the analysis of vaiiance is to test the homogeneity of
seveial means.
vaiiation is inheient in natuie. The total vaiiation in any set of
numeiical uata is uue to a numbei of causes which may be classifieu as:
i. Assignable causes anu
ii. Chance causes
The vaiiation uue to assignable causes can be uetecteu anu measuieu
wheieas the vaiiation uue to chance causes is beyonu the contiol of human
hanu anu cannot be tiaceu sepaiately.
Definition
Accoiuing to R.A. Fishei, Analysis of vaiiance (AN0vA) is the
Sepaiation of vaiiance asciibable to one gioup of causes fiom the vaiiance
asciibable to othei gioup. By this technique the total vaiiation in the sample
uata is expiesseu as the sum of its nonnegative components wheie each of
these components is a measuie of the vaiiation uue to some specific
inuepenuent souice oi factoi oi cause.
Assumptions
Foi the valiuity of the Ftest in AN0vA the following assumptions aie
maue.
Biostatistics
2010
114
i. The obseivations aie inuepenuent
ii. Paient population fiom which obseivations aie taken is noimal anu
iii. vaiious tieatment anu enviionmental effects aie auuitive in natuie.
DESIGNOFEXPERIMENTS
Introduction
Besigning an expeiiment means planning an expeiiment so that
infoimation collecteu will be ielevant to the pioblem unuei investigation.
The Besign of an expeiiment is the complete sequence of steps taken
befoie expeiimenting to ensuie that the appiopiiate uata will be obtaineu in
a way that fuinish an objective analysis leauing to valiu infeiences with
iespect to the state anu pioblem. The puipose of any expeiimental uesign is
to pioviue maximum infoimation ielevant to the pioblem unuei
investigation.
Definition
Treatment: What we apply on the subject of investigation is calleu
tieatment. e.g. Application of feeu to animals ,application of feitilizei to
agiicultuial plot, etc.
Experimental material and experimental unit: The inuiviuual oi gioup of
inuiviuuals that will be subjecteu to a tieatment is calleu the expeiimental
unit anu the collection of such units will be expeiimental mateiial.
Response:0utcome of an expeiiment; i.e. the tieatment effect available fiom
the expeiimental units.
Experimental error: It is the unittounit vaiiation within the same
tieatment gioup. This is a measuie of vaiiation uue to uncontiollable causes.
It uesciibes the failuie of two iuentically tieateu expeiimental units to yielu
iuentical iesults.
BasicPrinciplesofExperimentalDesigns
1. Ranuomization
2. Replication
3. Local Contiol
Biostatistics
2010
115
1. Randomization: It is a uevice foi eliminating bias. i.e. by ianuomly
assigning tieatments to the expeiimental units we avoiu peisonal bias. It
involves giving equal chances to all the expeiimental units to be subjecteu
to uiffeient tieatments. It can be uone by the use of ianuom numbei table
oi by uiawing lots. The ianuomization pioceuuie is uiffeient foi uiffeient
uesigns.
Puiposes seiveu by ianuomization:
o It avoius peisonal bias
o It makes the test valiu
2. Replication: By ieplication we mean the numbei of expeiimental units
ieceiving a paiticulai tieatment. If an expeiiment have equal ieplication
foi all the tieatments stuuieu, then the uesign is calleu Equiieplicateu
uesign If a uesign has got an unequal ieplication foi uiffeient tieatments
then it is known as a uesign with unequal ieplication.
Puiposes seiveu by ieplication:
o It pioviues an estimate of expeiimental eiioi
o It enables us to obtain a moie piecise estimate of the mean
effect of any factoi, since piecision is
1
S. E
, anu S. E =
S.
n
,
.
As n incieases piecision incieases. The moie the ieplication,
the moie the piecision will be.
3. Local control: It iefeis to the balancing, blocking anu giouping of the
expeiimental units that is employeu in the expeiiment. It iefeis to the
skillful logical way of giouping the expeiimental units in such a mannei
that theie is moie unifoimity within the same gioup anu theie is gieatei
vaiiability between uiffeient gioups.
Puiposes seiveu by local contiol:
o To make the expeiimental uesign moie efficient
o To make the test pioceuuie moie poweiful
Criteriaformakingblocks
Chaiacteis that have influence on the iesponse value befoie the
commencement of the expeiiment aie consiueieu as the ciiteiia foi making
blocks. Foi e.g. in milk yielu stuuies, the stage of lactation can be consiueieu
Biostatistics
2010
116
as ciiteiia foi making blocks anu in weight gain stuuy, we can consiuei the
initial weight oi initial age oi bieeu oi sex as ciiteiia foi making blocks.
In completely ianuomizeu uesign (CRB), theie is no local contiol
applieu, while in ianuomizeu block uesign (RBB) local contiol is applieu in
one uiiection with one ciiteiion anu in Latin squaie uesign (LSB) in two
uiiections with two ciiteiia.
Besiues these thiee piinciples of expeiimental uesigns, theie aie
auxiliaiy vaiiable anu contiol that shoulu be consiueieu while
expeiimenting.
Auxiliaryvariable: Some chaiacteis aie not being alteieu by the tieatments
applieu but may have influence on the chaiacteis unuei stuuy. Such
chaiactei can be utiliseu to impiove the piecision of the estimate anu the
efficiency of uesigns. They aie so chosen that the collection of infoimation
about this uo not involve cost anu laboui.
Control: When no tieatment is applieu ovei a gioup of expeiimental units we
consiuei these units to constitute a contiol gioup. The puipose of this contiol
is to make effective compaiison. Whenevei an expeiiment is conuucteu to
make iecommenuation of a new tieatment, it is bettei to incluue a contiol
gioup as one of the tieatment gioup.
COMPLETELYRANDOMISEDDESIGN(CRD)
This is simplest of all expeiimental uesigns. This is the uesign in which
the tieatments aie assigneu completely at ianuom to the expeiimental units
oi vice veisa. i.e. it imposes no iestiictions on the allocation of tieatments to
the expeiimental units. CRB is piefeiieu when all the expeiimental units
consiueieu foi the expeiiment aie known to be homogeneous. Any numbei
of expeiimental units anu tieatments can be utilizeu in this uesign. As the
uesign is highly flexible anu simple, the CRB is wiuely useu. Analysis is
simple, even if ceitain values aie missing. The expeiimental units will be
allotteu to the uiffeient tieatments by using ianuom numbei table oi by
lotteiy methou.
Collectionandanalysisofdata
Aftei having ianuomizeu the expeiimental units ovei uiffeient
tieatments, initial iecoiuings (if any) of the expeiimental units aie noteu
Biostatistics
2010
117
against each expeiimental unit. Foi example, in case of weight gain stuuy,
initial weights aie to be iecoiueu. Then the expeiimental units aie subjecteu
to iespective tieatments anu aftei the expeiimental peiiou, the iesponse
values will be obseiveu. The uata will be tabulateu as follows.
Dataofresponsevalues
Let us suppose that N obseivations x
ij
(i=1,2..k;j=1,2...n
i
)of
a ianuom vaiiable X aie gioupeu on some basis, into k classes of sizes n
1
,
n
2
.n
k
iespectively
N = n
k
=1
as exhibiteu below.
Total Nean
X
11
X
12
.. X
1n
1
T
1
.
X
1
.
X
21
X
22
.. X
2n
2
T
2
.
X
2
.
. . . . .
. . . . .
. . . . .
X
i1
X
i2
X
tn
t
T
i
.
X
t
.
. . . . .
. . . . .
. . . . .
X
k1
X
k2
X
hn
h
T
k
.
X
h
.
u
Let us consiuei the effect of kuiffeient iations on the yielu in milk of N
cows (of the same bieeu anu stock) uiviueu into k classes of sizes n
1
, n
2
, ..n
k
iespectively.
N = n
k
=1
Biostatistics
2010
118
Bence the souices of vaiiation aie
i. Effect of the iations
ii. Eiioi uue to chance causes piouuceu by numeious causes that they
aie not uetecteu anu iuentifieu.
Stepwiseprocedure
Step1: Ho :
1
=
2
=
k
(Theie is no significant uiffeience between
the tieatments)
Alteinative hypothesis H
1
: all
i
s aie not equal (i=1,2k)
Step2: Test statistic
vaiious sum of squaies aie obtaineu as follows.
a) Finu the sum of values of all the (N) items of the given uata. Let this
gianu total iepiesenteu by G. Then coiiection factoi
C. F =
0
2
N
b) Finu the sum of squaies of all the inuiviuual items (x
ij
) anu then the
Total sum of squaies (TSS) is
TSS = x
]
2
 C. F
c) Finu the sum of squaies of all the class totals (oi each tieatment
total) T
i
(I =1,2 .. k) anu then the sum of squaies between the
classes oi between the tieatments (SST) is
SSI =
I
2
n
k
=1
 C. F
Wheie n
i
(i=1,2....k) is the numbei of obseivations in the i
th
class oi
numbei of obseivations ieceiveu by i
th
tieatment
u) Finu the sum of squaies within the class oi sum of squaies uue to
eiioi (SSE) by subtiaction.
SSE=TSSSST
Biostatistics
2010
119
Step3:Degreesoffreedom(d.f):
The uegiee of fieeuom foi total sum of squaies (TSS) is (N1). The
uegiees of fieeuom foi SST is (k1) anu the uegiees of fieeuom foi SSE is (N
k)
Step4:Nean sum of squaies:
The mean sum of squaies foi tieatments is
SST
k1
anu Nean sum of
squaies foi eiioi is
SSF
Nk
2 ....j ....b
Total Mean
1 X
11
X
12
.. X
1
.. X
1b
T
1
.
X
1
.
2 X
21
X
22
.. X
2
.. X
2b
T
2
.
X
2
.
. . . . .
. . . . .
. . . . .
i X
i1
X
i2
X
t
. X
tb
T
i
.
X
t
.
. . . . .
. . . . .
. . . . .
k X
k1
X
k2
.. X
h
... X
hb
T
k
.
X
h
.
Total T.
1
T.
2
.. T.
j
... T.
b
G
Mean
X.
1
X.
2
..... X.

... X.
b
Let us consiuei the suffix iiefei to the tieatments (iations) anu jiefei
to the vaiieties (bieeu of the cow), then the yielus of milk X
ij
(i:1,2...k;j:
1, 2 ... b) of N = b k cows fuinish the uata foi the compaiison of the
tieatments (iations).
The total vaiiation in the obseivation xij can be split into the following
thiee components:
i. The vaiiation between the tieatments (iations)
ii. The vaiiation between the vaiieties (bieeu anu stock)
Biostatistics
2010
123
iii. The inheient vaiiation within the obseivations of tieatments anu
within the obseivations of vaiieties.
The fiist two types of vaiiations aie uue to assignable causes which
can be uetecteu anu contiolleu by human enueavoui anu the thiiu type of
vaiiation uue to chance causes which aie beyonu the contiol of human hanu.
Stepwiseprocedure
Step1: Ho :
1.
=
2.
=
k.
(Theie is no significant uiffeience between
the tieatments)
Ho :
.1
=
.2
=
.b
(Theie is no significant uiffeience between
the blocks oi vaiieties (bieeu anu stock))
Step2: Test statistic
vaiious sum of squaies aie obtaineu as follows.
a) Finu the sum of values of all the (N= k b) items of the given uata. Let
this gianu total iepiesenteu by G. Then coiiection factoi
C. F =
0
2
N
b) Finu the sum of squaies of all the inuiviuual items (x
ij
) anu then the
Total sum of squaies (TSS) is
TSS = x
]
2
 C. F
c) Finu the sum of squaies of all the tieatment (iations) totals, i.e., sum of
squaies of iow totals in the b k twoway table. Then the sum of
squaies between tieatments oi sum of squaies between iows is
SSI =
I
.
2
b
k
=1
 C. F
Wheie b is the numbei of obseivations in each iow.
u) Finu the sum of squaies of all the tieatment (iations) totals, i.e., sum of
squaies of iow totals in the b k twoway table. Then the sum of
squaies between vaiieties blocks oi sum of squaies between
coloumns is
SSI or SSB =
I
.]
2
k
b
=1
 C. F
Biostatistics
2010
124
Wheie k is the numbei of tieatments in each coloumn.
e) Finu the sum of squaies within the class oi sum of squaies uue to eiioi
(SSE) by subtiaction.
SSE=TSSSSTSSB
Step3:Degreesoffreedom(d.f):
i. The uegiee of fieeuom foi total sum of squaies (TSS) is (N1) = (bk1).
ii. The uegiees of fieeuom foi sum of squaies between tieatments is k1
iii. The uegiee of fieeuom foi sum of squaies between blocks oi vaiieties
is b1
iv. The uegiees of fieeuom foi eiioi sum of squaies is (k1)(b1)
Step4:Nean sum of squaies:
i. The mean sum of squaies foi tieatments is MST =
SST
k1
Between
Blocks
b1 SSB HSB =
SSB
b 1
F
B
=
HSB
HSE
Eiioi
(b1)(k1) SSE HSE =
SSE
(b 1)(k 1)
Total N1
Biostatistics
2010
125
Calculationofvarianceratio:
vaiiance iatio of F is the iatio between gieatei vaiiance anu smallei
vaiiance, thus
F
1
=
vaiiance between the tieatments
vaiiance within the tieatment
=
HSI
HSE
F
B
=
vaiiance between the tieatments
vaiiance within the tieatment
=
HSB
HSE
If vaiiance within the tieatment is moie than the vaiiance between
the tieatments, then numeiatoi anu uenominatoi shoulu be inteichangeu
anu uegiees of fieeuom aujusteu accoiuingly.
Step6:Ciitical value of F oi Table value of F:
i. The Ciitical value of F oi table value of F foi between tieatments is
obtaineu fiom F table foi {k1, (k1)(b1)} u.f at S% (1%) level of
significance.
ii. The Ciitical value of F oi table value of F foi between blocks is
obtaineu fiom F table foi {b1, (k1)(b1)} u.f at S% (1%) level of
significance.
Step7:Interpretation
BetweenTreatments:
i. If calculateu F
T
< table value of F foi (t1),(t1)(b1)u.f. at S% level, F
is not significant. H
0
is accepteu. All the tieatments aie alike.
ii. If calculateu F
T
> tab F foi (t1), (t1) (b1) u.f. at S% level F is
significant. F= ( )*. Ho foi tieatment is iejecteu.
iii. If calculateu F
T
> tab F foi (t1), (t1) (b1) u.f at 1% level F is highly
significant. F= ( )**. Hofoi tieatmentis iejecteu.
BetweenTreatments:
i. If calculateu F
B
< table value of F foi (b1),(t1)(b1)u.f. at S% level, F
is not significant. H
0
is accepteu. All the blocks aie alike.
Biostatistics
2010
126
ii. If calculateu F
B
> tab F foi (b1), (t1) (b1) u.f. at S% level F is
significant. F= ( )*. Ho foi block is iejecteu.
iii. If calculateu F
B
> tab F foi (b1), (t1) (b1) u.f at 1% level F is highly
significant. F= ( )**. Hofoi blockis iejecteu.
Step8:Criticaldifference
If F is significant oi highly significant, ciitical uiffeience between
tieatment means is to be woikeu out. Ciitical uiffeience between any two
tieatment means is uefineu as the least significant uiffeience between any
two tieatments means; to be exceeueu by the uiffeience between two
tieatments means to ueclaie them as significantly uiffeient.
Ciitical uiffeience between any
two tieatment means at S%
level
= Stanuaiu eiioi of the uiffeience
between the tieatment means x table
value of t foi eiioi u.f. at S% level.
Ciitical uiffeience between any
two tieatment means at 1%
level
= Stanuaiu eiioi of the uiffeience
between the tieatment means x table
value of t foi eiioi u.f. at 1% level.
In the last two cases we have to calculate ciitical uiffeience between
any two tieatment means at S% (1%) level.
C = _
2MSL
b
toblc 't' :oluc or crror J. ot S% (1%)
Step9:
Aftei this, wiite tieatment means X
1
., X
2
. , . . X
k
.in aiiay foim.
Barchartrepresentationwillbedonetoshowthesignificantdifferences
amongthetreatments.
AdvantagesofRBD
1. This is a simple uesign with one local contiol foi moie efficient utilization
of the available expeiimental units. RBB takes into account anu
Biostatistics
2010
127
eliminates the assignable souice of vaiiation among the expeiimental
units by means of giouping the moie homogeneous units togethei.
2. This ieuuces the expeiimental eiioi anu the test of significance become
moie efficient.
S. Any numbei of tieatments anu any numbei of ieplications may be
incluueu but each tieatment shoulu have same numbei of ieplications.
Disadvantage
1. When the uata fiom some expeiimental units aie missing the Nissing
plot technique has to be useu.
2. If the missing obseivations aie moie, this uesign is less convenient than
CRB.
BIOLOGICALASSAYS
Purposeofbiologicalassay
Biological assays aie methous foi the estimation of natuie,
constitution, oi potency of a mateiial (oi of a piocess) by means of the
ieaction that follows its application to living mattei.
QualitativeAssays QuantitativeAssays
These uo not piesent any
statistical pioblems. We shall
not consiuei them heie.
These pioviue numeiical assessment of
some piopeity of the mateiial to be
assayeu, anu pose statistical pioblems.
Definition: An assay is a foim of biological expeiiment; but the inteiest lies
in compaiing the potencies of tieatments on an agieeu scale, insteau of in
compaiing the magnituue of effects of uiffeient tieatments.
StructureofaBiologicalAssay
The typical bioassay involves a stimulus (foi example, a vitamin, a
uiug, a fungiciue), applieu to a subject (foi example, an animal, a piece of
animal tissue, a plant, a bacteiial cultuie). The intensity of the stimulus is
vaiieu by using the vaiious uoses by the expeiimentei. Application of
Biostatistics
2010
128
stimulus is followeu by a change in some measuiable chaiacteiistic of the
subject, the magnituue of the change being uepenuent upon the uose. A
measuiement of this chaiacteiistic, foi example, a weight of the whole
subject, oi of some paiticulai oigan, an analytical value such as bloou sugai
content oi bone ash peicentage, oi even a simple iecoiu of occuiience oi
nonoccuiience of a ceitain musculai contiaction, iecoveiy fiom symptoms
of a uietaiy ueficiency, oi ueath is the iesponse of the subject.
TypesofBioassays
Thiee main types (othei than qualitative assays) aie:
1. BIRECT ASSAYS;
2. INBIRECT ASSAYS baseu upon quantitative iesponses;
S. INBIRECT ASSAYS baseu on quantal iesponses (alloinothing).
DirectAssays
In this assays uoses of the stanuaiu anu test piepaiations aie
sufficient to piouuce a specifieu iesponse, anu can be uiiectly measuieu. The
iatio between these uoses estimates the potency of test piepaiation ielative
to the stanuaiu. IF Z
S
anu Z
T
aie uoses of stanuaiu anu test piepaiations
piouucing the same effect, then the ielative potency is given by
p =
Z
S
Z
1
Thus, in such assays, the iesponse must be cleaicut anu easily
iecognizeu, anu exact uose can be measuieu without time lag oi any othei
uifficulty.
In an assay the two piepaiations contain the same effective ingieuient
which piouuces the iesponse. Such assays aie calleu ANALYTICAL BIL0TI0N
ASSAYS. An assay with two piepaiations which have a common effect but uo
not contain the same effective ingieuient is calleu a C0NPARATIvE
BIL0TI0N ASSAY.
In uiiect assays uoses of stanuaiu anu test piepaiations aie
auministeieu to ianuomly selecteu iuentical subjects. The auministiation of
the stimuli is stoppeu as soon as the pieassigneu iesponse has occuiieu. In
uiiect assays the toleiance uoses, the uoses below which no iesponse occuis
Biostatistics
2010
129
foi the stanuaiu anu test piepaiations, aie measuieu uiiectly as soon as the
iesponse has occuiieu.
The toleiance will geneially vaiy consiueiably fiom subject to subject.
Bence, a numbei of tiails aie iequiieu to estimate the aveiage toleiance. Foi
obtaining the neeueu uata, the common uesigns of expeiiments aie useu.
0nce uata aie obtaineu, the aveiage of the toleiances x
1
, x
S
foi the stanuaiu
anu test piepaiations iespectively aie calculateu. The estimate of the ielative
potency is then obtaineu as R =
x
S
x
T
Fiuucial limits foi the estimate of ielative potency can be calculateu
using a theoiem, calleu FIELLERS TBE0REN.
IndirectAssaysBasedonQuantitativeResponses
In this type of assays, specifieu uoses of stimuli anu theii iesponses
aie iecoiueu. The iesponses may be a change in weight, a change in
analytical value, time of suivival of the subject anu the like. The ielationship
between the uose anu iesponse uiawn as a fiequency cuive is known as the
B0SE RESP0NSE C0RvE. This cuive is then asceitaineu which is usually a
sigmoiual cuive. Next, the uose coiiesponuing to a given iesponse is
obtaineu fiom the ielation. Such iesults aie obtaineu foi both stanuaiu anu
test piepaiations. Finally, the ielative potency is estimateu.
0nuei inuiiect assays baseu on quantitative iesponses theie aie two
types. They aie, (1) paiallel line assays, anu (2) slope iatio assays.
Parallellineassays
The paiallel line assays aie those in which the ielationship between
the quantitative iesponse anu log uose is lineai. The lines foi the stanuaiu
anu test piepaiations shall be paiallel.
A paiallel line assay in which the stanuaiu anu test piepaiations have
an equal numbei of uoses anu an equal numbei of subjects foi each uose is
calleu a symmetricalparallellineassay. 0theiwise, it is calleu an asymmetrical
parallellineassay.
In a symmetiical paiallel line assay theie aie k uoses of each of
stanuaiu anu test piepaiations. In all theie aie 2k uoses in this assay. To each
of the uoses n subjects aie allotteu at ianuom. Bence, it is calleu a 2k point
Biostatistics
2010
130
symmetiical paiallel line assay. The constant k may be 2,3,4,etc. When, k =
2, we have 4 point paiallel line assay. Similaily, we have 6 point paiallel line
assay, 8 point paiallel line assay, etc. Among these, 4 point anu 6 point
assays aie useu veiy commonly. The most populai uesign is the 4 point
assay.
SlopeRatioAssays
When the iesponse is lineaily ielateu to the uose, xiaiseu to powei i,
the assay is known as slope iatio assay. The value of i is taken as 1 which is
auequate anu veiy commonly useu constant. The equations foi the lines of
the stanuaiu anu test piepaiations will be,
y
S
= o +[
S
x
S
y
1
= o +[
1
x
1
The two lines conveige at 0 uoses. The ielative potency is, theiefoie,
estimateu fiom the iatio of the slopes (iegiession coefficients) of the fitteu
lines. Bence the name slope iatio assays.
The uose iesponse ielationships foi the two piepaiations may be
expiesseu as a multiple lineai equations,
y
S
= o +[
S
x
S
+o +[
1
x
1
The estimates of
S
anu
T
aie b
S
anu b
T
, iespectively. The estimate of
the ielative potency is then uefineu as,
R =
b
S
b
1
As in the case of paiallel line assays, the symmetiical uesigns aie iueal
foi the slope iatio assays. Bence, we shall consiuei heie only assays with (2k
+ 1) uoses. We may have S point, S point, 7 point, etc., assays. The S
point assay is the most efficient uesign.
IndirectAssaysBasedonQuantalResponses
In many biological assays the iesponses aie qualitative in natuie, foi
example, in the assay of insecticiues the iesponse is moitality of insects. Such
qualitative iesponses aie also known as all oi none iesponses. The assays
Biostatistics
2010
131
in which the iesponses aie qualitative aie known as Q0ANTAL RESP0NSE
ASSAYS.
In the quantal iesponse assay, the subject is given a pieueteimineu
uose of the piepaiation unuei test anu is obseiveu to see whethei oi not a
specifieu iesponse occuis. Thus, quantal iesponse assays aie closely ielateu
to uiiect assays. In this type of assay, the stiength of a piepaiation is
chaiacteiizeu by the meuian toleiance oi the uose that inuuces Su%
iesponses. If the iesponse is moitality it is calleu meuian lethal uose anu is
uenoteu by LD
50
. If the iesponse is not moitality, it may be calleu meuian
effective uose (ED
50
), meuian knock uown uose (KD
50
), meuian antifeeuing
uose (AD
50
) anu the like. The most commonly useu measuiement is LD
50
. The
iatio LD
50
/ED
50
is calleu TherapeuticIndex.
MethodsofEstimatingLD
50
When the numbei of subjects useu is ielatively small anu the uoses
aie faiily close, LD
50
may be estimateu using Biagsteut Behiens methou.
The estimate is given by
log I
50
= log A +_
Su o
b o
] log 2
Wheie,
) A = uose coiiesponuing to a peicentage of moitality immeuiately
below Su% moitality,
) a = obseiveu moitality (%) immeuiately below Su% moitality, anu
) b= obseiveu moitality (%) immeuiately above Su% moitality.
SpearmanKarberMethod
Anothei simple methou of estimation of LD50 is Speaiman Kaibei
methou. It is useu when the log uoses aie equally spaceu. Suppose that the
log uoses aie uenoteu by x
1
,x
.x
k
. If the log uoses aie equispaceu, then
x
+1
x
= J or oll i
The LD
50
is estimateu as
log I
50
= x
k
+_
J
2
] u
I
Biostatistics
2010
132
Wheie,
x
k
= the highest log uose,
i
= piopoition of iesponse foi i
th
uose.
VITALSTATISTICS
Statistics about events happening in oui uaily life eg: biith, ueaths. It
is uefineu as that bianch of biometiy which ueals with uata anu laws of
human moitality, moibiuity anu uemogiaphy.
ArthurNewsholmegavetwodefinitionsofvitalstatistics:
1. The bianch of biometiy which ueals with uata anu the laws of human
moitality, moibiuity anu uemogiaphy.
2. vital statistics may be inteipieteu in two ways in a bioauei sense it
iefeis to all types of population statistics collecteu by whatevei moue,
while in a naiiowei sense it iefeis only to the statistics ueiiveu fiom
the iegistiation of biiths, ueaths anu maiiiages.
AccordingtoBenjamin:
vital statistics aic conventional numeiical iecoius of maiiiages,
biiths, sickness anu ueaths by which the health anu giowth of a community
may be stuuieu.
Usesofvitalstatistics
Fiom the above uesciiption it is amply cleai that vital statistics aie
useful to inuiviuuals, vaiious agencies, meuical sciences, business
communities, planneis, policy makeis anu ieseaicheis.
CollectionofVitalStatistics
Theie aie five methous usually auopteu foi collecting uata foi vital
statistics.
1. Registiation methou
2. Census enumeiation methou
S. Suivey moth
4. Sample iegistiation system
S. Analytical methou
Biostatistics
2010
133
RegistrationMethod
This methou is a peipetual piocess which entails the iegistiation of
vital events such as biiths, maiiiages, ueaths, etc. A laige numbei of
countiies have auopteu this system which is a legal binuing on each
inuiviuual. Registiation is uone with the piopei authoiities as appointeu by
the goveinment of a countiy. In Inuia, iegistiation of biiths anu ueaths is
compulsoiy by legislation thiough an act known as The Registiation of
Biiths anu Beaths Act, 1969. The iegistiation methou is easy in opeiation
anu veiy effective. Yet it suffeis fiom the lacuna that a laige numbei of biiths
anu ueaths aie not iepoiteu to the iegistiation office as the law has not been
enfoiceu effectively.
CensusEnumerationMethod
A census piesents a compiehensive piofile of a countiys population.
Census opeiations aie conuucteu in almost all countiies at inteivals of ten
yeais. In a census, the enumeiation of eveiy inuiviuual of all habitational
aieas is caiiieu out at a specific time. The infoimation available is foi the
census yeai only. Bence the uata fail to piouuce vital statistics.
SurveyMethod
Auhoc suiveys aie conuucteu in aieas oi iegions wheie the system of
iecoiuing biiths anu ueaths has not been functioning piopeily anu
efficiently. The suivey iecoius make vital statistics available foi that iegion.
SamplingRegistrationSystem(SRS)
In oiganizing the post enumeiation check of a census, the continuing
sample iegistiation of biiths anu ueaths in a ieasonably laige sample
coveiing all paits of the state oi countiy has been taken as a useful souice foi
iecoiuing infoimation. In Inuia, the sample iegistiation system has been in
opeiation since }une 1967 in iuial aieas anu since }uly 1968 in uiban aieas.
0nuei this scheme census blocks aie selecteu by a ianuom sampling methou
in iuial anu uiban aieas sepaiately, as sampling units. The sample coveis
less than one peicent of the iuial anu uiban population. The sample
iegistiation system is a continuous piocess foi estimating the vital iates anu
is veiy effective besiues ceitain lacunae.
AnalyticalMethods
It is geneially not possible to conuuct auhoc suiveys to assess the
population at any peiiou in between two censual yeais. The population
estimates at a given time can be obtaineu without auhoc suiveys by
Biostatistics
2010
134
mathematical methous. The estimates aie baseu on the assumption that the
population giows at a constant iate uuiing the intei censual yeais.
Rateofvitalevent
The iate of vital events is mostly expiesseu on the basis of pei
thousanu () peisons. The funuamental Foimula is
Rotc o :itol c:cnt =
No. o coscs o tbc c:cnt unJcr consiJcrotion
Iotol populotion cxposcJ to tbc risk o tbc c:cnt
MeasuresofMortality
The stuuy of moitality can be maue thiough the following ueath iates.
Crude Death Rate (C.D.R): This is the simplest type of ueath iate anu is
uefineu as the numbei of ueaths in a specific community oi iegion in a given
peiiou, piefeiably on a yeaily basis, pei thousanu peisons. The foimula is
C. . R. =
No. o Jcotbs in o spcciic orco in tbc gi:cn pcrioJ
Iotol populotion o tbc orco in tbot pcrioJ
1uuu
oi
C. . R. =
No. o Jcotbs in o ycor
Annuol mcon populotion
1uuu
Specific Death Rate: Beie the ueath iates aie calculateu foi a section of the
population exclusively. Foi instance the ueath iates foi males anu females
aie calculateu sepaiately, the ueath iates foi peisons belonging to age
gioups, say u to S yeais, S to 1S yeais, Su to 6u yeais, etc., aie calculateu.
The foimula foi specific ueath iate is,
Spcciic cotb Rotc =
No. o Jcotbs in o spcciicJ scction o tbc populotion in tbc gi:cn pcrioJ
Hcon populotion o tbc spcciicJ scction in tbc gi:cn pcrioJ
1uuu
Standardized Death Rate (Sr.D.R.): When it is iequiieu to compaie the
ueath iates of two iegions oi community, heteiogeneity factois have to be
iemoveu. The population has to be stanuaiuizeu. Foi this, the populations in
vaiious categoiies oi gioups of one iegion aie taken as stanuaiu anu ueath
iates aie calculateu on the basis of this population classification alone.
The foimula is,
Sr. . R. =
x
P
x
s
x
P
x
s
1uuu
Biostatistics
2010
135
Wheie,
P
x
s
 Stanuaiu population foi gioup x
x
Beath iate foi gioup x in the oiiginal, i. e. S. B. R. foi a iegion
x
 Summation oveiall gioups
MeasuresofFertility
This is useu in ielation to the actual piouuction of chiluien oi
occuiiences of biiths specially live biiths.
Crudebirthrate
C. B. R. =
Total numbei of live biiths in the given iegion uuiing a given peiiou
Total numbei of population in the given iegion uuiing a given peiiou
1uuu
Generalfertilityrate
0. F. R. =
No. of biiths occuiiing among the population of a given geogiaphic aiea
Niu yeai total female population in the iepiouuctive age in the given geogiaphical aiea
1uuu
Specificfertilityrate
S. F. R. =
No. of biiths to the female population of the specific section in a given peiiou
Niu yeai total no. of female population in the specific section
1uuu
Agespecificfertilityrate
A. S. F. R.
=
No. of biiths to the female in the age gioup of (x, x +n) in the given geogiaphic iegion uuiing a peiiou
Total female population of age gioup (x, x +n)
1uuu
LIFETABLE
Concept:A life table exhibits the numbeis living anu uying at each age, on the
basis of the expeiience of a cohoit. It also gives the piobability of uying anu
living sepaiately. The piobability of uying manifests the moitality iate. The
life table is mainly baseu on the assumption that the cohoit expeiiences the
age specific moitality in the population unuei consiueiation.
VitalIndex
This is an inuex which takes into account the two most vital events,
namely, biiths anu ueaths. Foi a specific peiiou anu iegion, the vital inuex is
given by the foimula,
Iitol inJcx =
Iotol birtbs
Iotol Jcotbs
The vital inuex may be equal to 1 oi less than 1 oi gieatei than 1. The
value 1 inuicates stagnation in population giowth. A value gieatei than 1
thiows light on the expecteu inciease in population wheieas a value less than
1 is inuicative of a uecline in the population.