Professional Documents
Culture Documents
5LECTURENOTE
INRTODUCTION
Definition
Statisticsisthescienceoflearningfromexperience,especiallyexperiencesthatarrivesalit
tlebitatatime.Thiscenturyhasseenstatisticaltechniquesbecometheanalyticmethodsof
choiceinbiomedicalscience,geneticstudies,epidemiology,agriculturalscienceandothe
rareas.Statisticsimpliesbothstatisticaldataandstatistical33method.Whenitmeansstat
isticaldataitrefertonumericaldescriptionsofquantitativeaspectofthings.Thesedescrip
tionscouldbeinformofcountsormeasurements.Thusstatisticsofstudentsofafacultyofsc
ienceincludecountofthenumberofstudents,suchasmalesandfemales,marriedandunm
arried,orpostgraduatesandundergraduates.Theymayalsoincludesuchmeasurements
astheirheight,weightandIQ(IntelligentQuotient).StatisticsisbroadlydividedintoDescr
iptiveandInferentialstatistics
Population,SampleandModel
Thedatainmedical,biomedical,nutritionaloragriculturalstudiesaregenerallybasedoni
ndividualobservations.Theyareobservationsormeasurementstakenonthesmallestsa
mplingunit.Thesesmallestsamplingunits,frequentlybutnotnecessarily,arealsoindivid
ualsinthebiologicalsense.
Population:Populationoruniverseiswell-
definedinthescienceofstatistics.Throughbiologicaldefinitionoftheterm“population”is
thetotalityofindividualsofagivenspeciespergiventimeandgivenarea,populationin“stat
istics”alwaysmeansthetotalityoftheindividualobservationsaboutwhichinferencesare
tobemade.Apopulationmayreferstovariablesofaconcretecollectionofobjectorcreatuq
ressuchasweightortaillengthsofallthealbinorats,anthropometricmeasurementsandh
aemoglobinorserumproteinlevelsofadults,andnutrientscontentsofvarietiesoffoods.
Samples:Sampleisapartofthepopulation.Largenumberofsamplesmaybetakenfromthe
samepopulation,stillallmembersmaynotbecovered.Inferencesdrawnfromthesampler
efertothedefinedpopulationfromwhichsampleorsamplesaredrawn.
aaaaaaaaaaaaÀThedropofbloodexaminedinthelaboratoryisasamplefromthe“populat
ion”ofallbloodinthebody.
Model:Inductiveinferenceisbasedontheassumptionthatthevaluesinthepopulationun
derstudyarescatteredaccordingtoacertainpattern.Thispatternismodeledbyaprobabil
itydistribution.
Forexample,“theheightofstudentsinthefacultyofscience,OlabisiOnabanjoUniversityfo
llowthenormaldistributionwithmean120cmandstandarddeviation10cm”isthespecifi
cationoftheprobability(orstochastic)model.
Fittingofaprobabilitymodeltothevaluesofacertainpopulationisdonebyspecifyingthep
robabilitydistributionoftheunderlyingrandomvariable.Thefollowingpurposesarethe
reasonsforfittingaprobabilitymodel.
a. Itmaybeusedtodescribethepopulation,
b. Itmaybeusedtopredictsomefuturevalue.
c. Usuallytheprobabilitymodelsarefittedasafirststeptotakeoneamongthesetof
severalpossibleactions
d. Sometimesthevalueoftheparametermaybeofindependentinterest.
DataCollection
Dataformthebedrockonwhichstatisticalanalysismostlyreliedupon.Itisanactivityaime
datgettinginformationtosatisfysomedecisionsorobjectives.Theprocessofcollectingda
tavariesanddependsuponthekindofdatatobecollected.
SourcesofData
Basically,therearetwomajorsourcesofdata,namelyprimaryandsecondarysourcesofda
tacollection.
PrimarySources
Thisreferstothestatisticaldataorinformationwhichtheinvestigatororiginateshimselff
orthepurposeoftheenquiryathand.Examplesarecensus,surveysandexperiments
Advantages
i. Itallowsdetailedandaccurateinformationtobecollected.
ii. Itismorereliable
iii. Themethodofdatacollectionandlevelofaccuracyknown.
Disadvantages
i. Oftentimeconsuming.
ii. Moreexpensive.
Secondarysources
Thisreferstothosestatisticaldatawhicharenotoriginatedbytheinvestigatorhimself,but
whichheobtainsfromsomeoneelse’srecordsorfromsomeorganization,eitherinpublish
edorunpublishedforms.ExamplesincludepublicationsoftheFederalOfficeofStatistics(
FOS),CentralBankofNigeria(CBN),NationalPopulationCommission(NPC)WorldHealt
hOrganization(WHO),etc.
Advantages
i. Notexpensive@@a@@q
ii. Nottimeconsuming
iii. Veryeasytocollect,especiallyinacomputerizedorganization.
Disadvantages
i. Theinformationmaybemisleading.
ii. Itmaynotallowdetailedandaccurateinformationtobecollected.
MethodofDataCollection
Threemethodsofcollectingdataare:
i. Postalquestionnaires
ii. Personalinterviews
iii. Telephoneinterviews
Questionnaires
Aquestionnairecontainsasequenceofquestionsrelevanttothedataorinformationbeing
sought.Thisisaformalquestionspreparedwhichbuttobeansweredbytherespondent.Qu
estionnairesareusuallyoftwoparts,partsoneistheclassificationsection.Itcontainssuch
detailsoftherespondentslikesex,age,maritalstatus,occupation,stateoforiginetc.Thesec
ondpartisrelatedtothesubjectmatteroftheenquiry.
Typesofquestionnaire
a. Close-
EndQuestionnaire:thisisaquestionnairedesignedinsuchawaythatrespondentsarel
imitedtostatedalternativesoroptionstherebynotpermittingfurtheroradditionalex
planationandiscalledstructurequestionnaire.
b. Open-
EndQuestionnaire:Thisisunstructuredquestionnairedesignwhichallowstherespo
ndentfreetomakewhateverreplythattheychooses,thatis,therespondentsarenotina
nywayrestrictedtooptions.
QualityofaGoodQuestionnaire
1. Questionnairesshouldbesimpleandeasilyunderstood.
2. Itshouldbeinlogicalsequence.
3. Itshouldbeshortandunambiguous.
4. Questionsshouldnotoffend,frightenedorbetele-
guiding.Questionsthatmayarousetheresentmentoftherespondentsshouldbeav
oided.
5. Questionshouldnotrequirecalculationtobemade.
6. Questionshouldbeabletohavepreciseanswerlike“Yes”or“No”.
7. Questionsthatrelytoomuchonmemoryshouldbeavoided.Sincesomepeopleforg
eteventstoosoon.
Editing
Thisisawayofcheckingtheansweredquestionnairetocorrectsomeofthemistakes.There
turnedquestionnairesfilledbytheinformantsorbyenumeratorsshouldbescrutinizedat
anearlystagewithaviewtodetecterrors,omissionsandinconsistencies.Theworkofediti
ngrequiresskillandscientificimpartialityofahighdegreeandfourtypesofeditingare:edit
ingforconsistency,uniformity,completenessandaccuracy.
Coding
Theresponsesintheeditedquestionnairearenowtobetranslatedinnumericaltermsinor
dertofacilitateanalysis.Thisisdonebysettingoutalistofcodesforthepossibleresponsest
oquestions.
TabulationandClassification
Thisisanactofarrangingfactsandfiguresintheformoftable(s)orlist.Inordertomakethed
ataeasilyunderstandable,thefirsttaskofthestatisticianistocondenseandsimplifythemi
nsuchamannerthatirrelevantdetailsareeliminatedandtheirsignificantfeaturesstando
utprominently.Theprocedurethatisadoptedforthispurposeisknownasthemethodofcl
assificationandtabulation.
DataPresentation
Itistherepresentationofdatainappropriateforminordertomakethecomparisonandjjun
derstandingeasythroughcharts,diagramorgraph.Nomatterhowinformativeandwellde
signedastatisticaltableis,asamediumforconveyingtothereaderanimmediateandcleari
mpressionofitscontent,itisinferiortoagoodchart,diagramorgraph.Themostpopularch
arts,diagramsandgraphsare,piecharts,bardiagrams(barchartandhistogram)andgrap
hs(frequencypolygonsandOgives).
Piecharts
Apiechartissimplyacircledividedintosections.Thiscirclerepresentsthetotalofthedatab
eingpresentedandeachsectionisdrawnproportionaltoitsrelativesize.Themainadvanta
geofapiechartisthatitiseasytounderstand.
Example
Aninvestigationofthemaritalstatusofthestaffofaninstitutionrevealsthefollowing:
Maritalstatus Noofstaff
Single 35
Married 130 Drawapiechartusingtheaboveinformation.
Widowed 25 Solution
Divorced 10
Totalnoofstaffintheinstitutionis
35+130+25+10=200
Anglecorrespondingtoeachstatusarefoundthus:
35
Single = × 3600 = 630
200
130
Married = × 3600 = 2340
200
25
Widowed = × 3600 = 45 0
200
10
Divorced = × 360 0 = 18 0
200
Thus,thepiechartis:
Total Number of Staff in the Institution
Divorced; Series1; 18; 5%
Widowed; Series1; 45; Single; Series1; 63; 18%
13%
Married ;
Series1;
234; 65%
Observation:thechartclearlyshowsthatmajorityofthestaffintheinstitutionaremarried.
BarCharts
Barchartscouldbesimple,multipleorcomponentinnature.Asinglebarchartcomprisesof
anumberofequallyspacedrectangles.
Amultiplebarchartisusuallyusedinthecomparisonoftwoormoreattributes.
Acomponentbarchartcomprisesofbarswhicharesubdividedintocomponents.Example
Representsthedatausedaboveinbarchart.
Solution:
BarChart
No of staffs
Example
Thesexdistributionofstaffinfivedepartmentsofthefacultyofsciencearegivenbelow
S/No Departments Male Female Total
1 Chemicalscience 25 15 40
ii. Mathematicalscience 65 30 95
iii. Biologicalscience 45 40 85
iv. Physics 35 5 50
v. EarthScience 30 10 40
Total 200 110 310
Presenttheaboveinformationona
i.) MultipleBarChart
ii.) ComponentBarChart.
Solution:
i.)
Male
Female
ii.) ComponentBarChart.
Female
Male
HISTOGRAMS
Histogramsandbarchartslookalikeinpresentation,butwhilethebarsofthebarchartsare
usuallynotjoined,thoseofthehistogramareusuallyjoined.Furthermore,whilethebarch
artattachesimportanceonlytoitsheights,histogramattachesimportancetobothheights
andthewidths.
Example
Obtainthehistogramofthedatainexampleabove
Solution
Histogram.
No of staffs
DescriptiveStatistics
Statisticsisconcernedwithvariability.Itisofinteresttoknow,howtodescribeit?
Howtomeasureit?
Andhowtoreachsensibleconclusionsfromtheresultsofexperimentsandcomparativest
udies?
Descriptivestatisticsdealswithclassificationofdata,thedrawinghistograms,diagramsa
ndgraphssuchaslinegraphs,bargraphs,pictogramsthatcorrespondtofrequencydistrib
utionthatresultafterthedataarebeenclassified.Italsoincludethecomputationofsample
means,mediansandmodes,thecomputationofranges,meanabsolutedeviationsandvari
ances.
Variable,VariationandDistribution
Theresultsofanexperimentofcomparativestudycanalwaysbepresentedasasetofmeas
urementsoneachofagroupofunits.Forexample,theunitsmaybeanimalsofaparticularsp
ecies,patientsofaparticulardiseases,orfamilieslivinginaparticularhousingestate.Agen
eraltermsforanyfeatureoftheunitwhichisobservedormeasuredisvariable.Thus,thewe
ightofananimal,thepresenceorabsenceofasymptominapatientarevariables.
Variations
Therearetwomaintypesofvariation:oneisvariationbetweenunitsandtheotherisvariati
onwithinunits,variationbetweenunitsisuniversalinanyscientificinvestigation.Variati
onwithinunitsisseenwhenobservationsaremadeoveraperiodoftime.Variationisbestd
escribedbytherelativefrequencieswithwhichdifferentobservedvaluesoccur.
Distribution
Thevariationbetweenobservationsisbestdescribedbydistributions.Thewayinwhichth
erelativefrequenciesoftheobservedvaluesofavariablearedisplayed,dependstosomeex
tentonthescaleonwhichthevariabletakesitsvalues.Variablescanbeonaqualitativescale
consistingofvalueslikered,white,blackorwhiteorpresenceorabsenceofadisease.Qualit
ativevariablesarealsonamedasattributes.Thesevariablesarenotcapableofbeingdescri
bednumerically.Examplesare:sex,religion,nationality,colouroftheeyeorskinetc.these
characteristicsarecalled“attributes”or“attributivevariates”or“descriptivecharacteris
tics”.
Secondtypeofvariablearethosetakingvaluesnaquantitativescaleforwhichacompariso
nofmagnitudeisinvolved.Exampleofquantitativevariablesareheight,weight,heamoglo
bin,calorieandnutrientcontentoffoods.
MeasureofCentralTendency
Classificationandtabulationofdataarehelpfulinreducingandunderstandingthebulkoft
helargemassofdata.Buttheyaredescriptive.Tobemoreprecise,thedatashouldbeexpres
sedinnumericalterms.Sotheneedarises,tofindaconstantwhichwillbetherepresentativ
eofagroupofdata.Thisisameasureofhowthedataarecentrallyplaced.Itisalsocalledmeas
ureoflocation.Therearethreepossiblemeasuresoflocationnamely:themean,themedian
andthemode.Themeancanalsobedividedintothreepartsnamely;arithmetic,geometric
andharmonicmean.Bycarefulobservationofdata,itcanbenoticedthattheobservationst
endtoclusteraroundacentralvalue.Thisiscalledcentraltendencyofthatgroup.Thiscentr
alvalueisknownasaverage.
EssentialofaGoodAverage
Sincean‘average’istorepresentthestatisticaldataandisusedalsoforpurposesofcompari
son,itmustpossessthefollowingproperties:
i. Itmustberigidlydefined,andnotlefttothemeanestimationoftheobserver.
ii. Theaveragemustbebasedonallvaluesgiveninthedistribution.
iii. Itshouldbeeasilyunderstandable.
iv. Itshouldbecapableofbeingcalculatedwithmeasurableeaseandrapidity.
v. Itshouldbeaslittleaffectedasmaybepossiblebyfluctuationsofsample.
vi. Itshouldbesuchthatitcanleaditselfreadilytoalgebraicaltreatment.
TheArithmeticalMean
Thearithmeticmeanofaseriesisobtainedbyaddingthevaluesofallobservationsanddivi
dingthetotalbythenumberofobservations.Thisisgenerallycalledthemeasure.Insymbol
s,X1,X2,…,Xnarenobservedvalues,thenthemeanisgivenby:
X̄ =
Total of all individual values x + x + , . .. , x n
= 1 2 =
∑ x1
sample size n n
Example:Thegaininweightsof5albinoratsoveraperiodof5daysare5,6,4,4,4,7.
Thearithmeticmeanormeanis
5 + 6 + 4 + 4 + 7 30
x̄ = = = 5.0
6 6
MeanofaGroupData
Threemethodofcalculationare:thelongmethod,theassumedmeanmethodandthecodin
gmethod.
Longmethod
x̄ =
∑ fx
∑f
Assumedmeanmethod
X̄ = A +
∑ fd
∑f
Where
A=isaguessedorassumedmean
d=X-Aarethedeviationsfromtheassumedmean.
Codingmethod
X̄ = A +
( ) ∑ fu
∑f
c
Where
Aisanappropriatelychosenxvalues
Cisthecommonclassessize
U=…,-3,-2,-1,0,1,2,3,…
Example2.2
Theweightsinkgofacollectionof40studentsinthefacultyofscienceofO.O.U.aregivenbelo
w:
59,53,66,55,57,65,48,59,51,58,52,68,60,70,71,55,70,64,54,67,62,53,49,56,63,48,57,6
1,58,55,50,55,61,52,54,65,56,50,62,60
Calculatethemeanusing:
a. Thelongmethod
b. Theassumedmean61
c. Thecodingmethod
Solution:
Weights(kg) F x fx d=x–a=x–61 fd u Fu
48–50 8 50 400 -11 -88 -2 -16
53–57 12 55 660 -6 -72 -1 -12
58–62 10 60 600 -1 -10 0 0
63–67 6 65 390 4 24 1 6
68–72 4 70 280 9 36 2 8
Total 40 2330 -110 -14
a. Longmethod
X̄ =
∑ fx
∑f
2330
= = 58 . 25
40
b. Assumedmean61
X̄ = A +
∑ fd
∑f
= 61 + (40−110 ) pop1111d😂=61–2.75=58.25
c. CodingMethod
X̄ = A +
( )
∑ fu
∑f
c
HenceC=5.AisthevalueofXcorrespondingtoU–
0foroddnumberofclasseswechooseu=0atthecentre.
Thus,
X̄ = 60 + ( −14
40 )
5
to
=60–1.75
=58.25
TheGeometricMean(GM):
Iftheobservationsinsteadofbeingadded,aremultiplied,thegeometricmeanwouldbethe
nthrootoftheproduct.Inalgebraicsymbols,thegeometricmeanofnobservations,
x 1 , x 2 , x 3 , ... , x n ,isgivenbytheformula:
1
GM = Geometric mean = √( x 1 ) ( x 2 ) ( x 3) , .. . . , ( x n ) = ( x 1 , x 2 , . . ., x n)
n n
Foritslogarithmiccalculation,therelationshipusedis,
=
log x 1 + log x 2 + log x3 . .. + log x n
=
∑ log x i
n n
=simplearithmeticmeanofthelogarithmicvaluesofindividualvalues.
Anti-
logarithmsvaluesofthislogmeanisthegeometricmean.Thegeometricmeanispreferable
tothearithmeticmeaniftheseriesofobservationscontainoneormoreusuallylargevalue.
5
Example
Theintakesofbabymilkfoodobservedinfifteenchildreninonedayareprovidedbelow:
101 114 109 135 122
184 196 185 217 198
148 233 227 336 253
Calculatethegeometricmean?
Solution:
n=15
Itisthereciprocalofthearithmeticmeanofthereciprocalsofobservations.For x 1 , x 2 , ..., x n i
ndividualvalues,harmonicmean(HM)is
1
HM = 1
n ∑ 1x i
n =
n
=
( ) + ( x ) + , . .. , ( x )
1 1
∑ x1 i
1
xi
2 n
Example
Forthenumericalvaluesof1,2,3,4,5,calculateandcomparetheAM,GMandHM
Solution
1 + 2 + 3 + 4 + 5 15
x̄ = = = 3.0
ArithmeticMean(AM)= 5 5
1 1
Withlogarithms,thecalculationsareprovidedbelowforGM.
1
1
log GM = log ( 1 × 2 × 3 × 4 × 5 ) 5 = ( log 1 + log 2 + log 3 + log 4 + log 5 )
5
1
= ( 0 + 0. 30103 + 0 . 47712 + 0 . 60206 + 0 .69897 )
5
1
= ( 2. 07918 ) = 0 . 415836
5
GM=Anti.logofGM=Anti-log(0.415836)=2.60517isequivalentto2.605
1 5
= = 2 .242
Harmonicmean(HM)
1
5 ( 1
1
+
1
2
+
1
3
+
1
4
+
1
5 ) 2 . 23
Therefore,AMisthehighestfollowedbyGMandHM.
TheMode
Thisisthevalueornumberthathasthehighestfrequencyinadistribution.Themodemayn
otexistandevenwhenitdoesexist,itmaynotbeunique.
Forexample:
5,2,4,7,5,3;hasmode5(unimodal)
2,6,3,4,3,2,5hastwomodes2and3(bimodal)
4,7,2,1,3hasnomode
Themodecanbeobtainedbothgraphicallyandbycalculations.Forgroupeddata,weuseth
ehistogramtoestimatethemode,whilebycalculationweusetheformula.
Mode
= L +
[ fm − f a
2fm − fa − f b ] C
Where
L=Lowerclassboundaryofthemodalclass
Fm=Frequencyofthemodalclass
Fa=frequencyoftheclassabovethemodalclass
Fb=frequencyoftheclassbelowthemodalclass
C=sizeofthemodalclassinterval.
THEMEDIAN
Ifasetofdataisarrangedinorderofmagnitude,themiddlevalue,whichdividesthesetintot
woequalgroupsisthemedia.Generally,forNdata
[ ]
th
N+1
Median = item
2
Forexamplefindthemedianofthefollowingsetsofdata
a. 3,6,2,4,3
b. 2,5,3,4,8,3
Solution
a. Arrangementinorder:2,3,3,4,6
HereN=5
[ ]
th
N+1
Median = item
2
=
[ ]
5+1
2
= the 3rd item
=3
b. Arrangementinorder2,3,3,4,5,8
HereN=6
[ ]
th
6+1 th
Median = item = 3. 5 item
2
Thiswillbeinterpretedasthe
3rd item + 4th item 3+4
= = 3.5
2 2
MedianofaGroupData
Themediancanbeobtainedgraphicallyfromthecumulativefrequencycurve(Ogive)orb
ycalculationusingtheformular.
[ ]
N
2 −F
Median = L + C
f
Where
L=valueofthelowerclassboundaryofthemedianclass.
F=Cumulativefrequencyoftheclassjustabovetheonecontainingthemedian.
f=frequencyofthemedianclass
C=sizeofthemedianclassinterval
Example:Usingthedatagiveninexampleabove
i. Constructthehistogramandfromitestimatethemodeofthedistribution.
ii. Calculatethemodeandcompareyouranswerwiththeestimatedvaluein(i)abo
ve
iii. Constructthecumulativefrequencycurveandfromitestimatethemedian.
iv. Calculatethemedianandcompareyourresults.
Solution
i. Histogram
Series1
Themodeisapproximately56
ii.
L +
[ fm − fa
2f m − f a − fb ] C
Themodeclassis53–57
Hence,L=52.5,fm=12,fa=8,fb=10andc=5
Thus
Mode = 52. 5 +
[ 12 − 8
]
2 ( 12 ) − 8 − 10
5
=52.5+3.33
=55.83
Comparison: graphicalvalue=56
Estimatedvalue=55.83
Thesevaluesagreedapproximately
iii.
Weight(kg) Frequency(f Cumfrequency(f)
)
48–52 8 8
53–57 12 20
58–62 10 30
63–67 6 36
68–72 4 40
Total 40
Series1
Estimated=37.5
[ ]
N
2 −F
Median = L + C
iv. f
N 40
= = 20
2 2 i.ethemedianisthe20thvalue.Fromthecumulativefrequencydistributionta
ble20thitemfallswithintheclass53–57.Thusthemedianclassis53-
57,hence,L=52.5,F=8,f=12andC=5
Median = 52. 5
[ 20 − 8
12 ]
5 = 57 . 5
Comparison=Bothofthemareequal
MeasuresofVariationandDispersion
Whilestudyingafrequencydistributionofavariable,itisimportanttoknowhowthefrequ
enciesareclusteredaroundorscatteredawayfromthemeasuresofaveragesorcentralten
dency.Twodistributionsmaycentrearoundthesamepointi.e.arithmeticmeans,butdiffe
rinvariationfromarithmeticmean.Suchvariationiscalleddispersion,spreadorvariabilit
y.Thedegreetowhichnumericaldatatendtospreadaboutanaveragevalueiscalledthevar
iationordispersionofthedata.Variousmeasuresofvariationare,range,quartiledeviation
,meandeviation,standarddeviation,varianceandstandarderror.
i. TheRange
Rangeisthedifferencebetweenthelargestandsmallestitemsofthesampleofobservation
s.Ifsampleofobservations:5,6,7,8and9arethere,therangeis9–5=4i.e.maximumvalue–
minimumvalue,itdependsontwoextremevalues.Itisthedifferencebetweenthelargesta
ndthesmallestnumbersofadistribution.
ii. QuartileDeviation
Quartiledeviationinsemi-interquartilerangeQisgivenbytheformula
1
Q=
2 ( Q3 − Q1 )
WhereQ1andQ3arethefirstandthirdquartilesrespectively.Quartiledeviationisbetterth
anrange,sinceitiscalculatedusingfirstandthirdquartilevalues.
iii. MeanDeviation
Themeandeviationisthearithmeticmeanoftheabsolutevaluesofthedeviationsfromso
meaveragelikemeanormedianormode.
∑ f i ( x i − x̄ )
Mean deviation =
N forgroupdata
∑ ( x i − x̄ )
Mean deviation =
N forungroupeddata
Where
fi=isthefrequencyoftheithclassinterval
xi=istheithmidvalueofclassintervalorithindividualvalue.
x̄ =isthearithmeticmen
N=isthenumberofobservationsorN=∑ f i
iv. StandardDeviation
Thisisthemostcommonlyusedmeasureofvariationordispersion.Ittakesintoaccountall
thevaluesofthevariable.Standarddeviationisdefinedasthesquarerootofthearithmetic
meanofthesquareddeviationsoftheindividualvaluesfromtheirarithmeticmean.Thefor
mulaforlargesamples.
SD2 = 1
n ∑ ( x i − x̄ )2
Where
xi=istheithindividualvalue
x̄ =isthearithmeticmen
n=samplesize
forsmallsamples,theformulais,
SD2 = 1
n− 1 ∑ ( x i − x̄ )2
1
= n−1
[ SS − CF ]
Where
SS=sumofsquares= ∑ x i
2
(∑ x i )
2
CF=correctionfactor= n
Forgroupeddatatheformulais,
SD2 = 1
n− 1 ∑ f i ( x i − x̄ )2
SD = √ 1
n−1 ∑ f i ( x i − x̄ )2
√ [∑ (∑ f i x i )
]
2
1
SD = n−1
f i x 2i −
n
Where
fi=isthefrequencyoftheithclassinterval
xi=isthemidvalueoftheithclassinterval
x̄ =isthearithmeticmen
n=samplesize
Example
Ungroupeddataforthevalues5,6,7,8,9
35
AM= x̄ = 5
=7
SS=∑ x i = 5 + 6 + 7 + 8 + 9 = 255
2 2 2 2 2 2
(∑ x i )
2
352
CF = = = 245
n n
SD = √ 1
n−1
( SS − CF ) = √ 1
4
( 255 − 245 ) = 1 . 58
Groupeddatausingthefrequencydistributionofweights(kg)of70adultsbelow
Classintervalofweigh Middlevalu Frequenc Cumulative fixi
ts(kg) eofxi y(fi) frequency
45–50 47.5 2 2 95.0
50–55 52.5 3 5 157.5
55–60 57.5 6 11 345.0
60–65 62.5 4 15 250.0
65–70 67.5 6 21 405.0
70-75 72.5 4 25 290.0
75–80 77.5 5 30 387.5
Total 30 30 1930.0
N=Σfi=30
7
∑ f i x i = 1970
i =1
x̄ =
∑ f i x i = 1930 = 64 . 33
∑ f i 30
∑ f i x 2i = 126637 . 50
(∑ f i x i )
2
3724900
= = 124163 .33
n 30
√ [∑ ]
2
1 (∑ f i x i )
SD = n−1
f i x 2i −
n
= √ 1
29
[ 126637 . 50 − 124163 .33 ]
=9.24
v. StandardError
Thestandarddeviationofmeanvaluesisknownasstandarderror.Thisisusedtocompare
meanswithoneanother.
S tan dard deviation SD
S tan dard Error (SE ) = =
√ ( sample size ) √n
vi. CoefficientofVariation
Tocomparethevariabilityoftwoserieswhichdifferwidelyintheiraveragesorwhicharem
easuredindifferentunits,arelativemeasureofdispersionisusedwhichisknownascoeffic
ientofvariationordispersion.Theformulais,
S tan dard deviation
Coefficient of var iation ( CV ) = × 100
mean
whenthevariabilityoftwoseriesarecompared,theserieshavinggreaterCVissaidtohave
morevariationthantheotherandtheserieswithlowerCVissaidtobemorehomogeneoust
hantheother.
Example2.6Usingthetableinexample2.5
SD 9.24
SE = = = 1.69
Mean=64.33 √ n √30
9. 24
CV = 100 × = 14 . 36
SD=9.24 64 . 33
Forungroupeddata 5,6,7,8,9
1 . 58
CV = 100 × = 22 .57
Mean=7 7
SD=1.58
vii. Variance
ThevarianceismeasuredinthesquareoftheunitsinwhichthevariableXismeasured.
Theformulaforvarianceis:
∑ ( x i − x̄ )2 ∑ x 2i − n x̄ 2
Variance = =
n n
Abetterestimateofthepopulationvariationisobtainedbysuingadivision(n-
1)insteadofn.
2 ∑ ( x i − x̄ ) 2
S =
Estimatedvariance= n−1
( S) =
√∑ ( x − x̄ )
i
2
Estimatedstandarddeviation= n−1
Characteristicsofsampleandpopulationareprovided
Sample Population
Number n N
Mean x̄
Variance S2 2
Standarddeviation S
S2willbearepresentativeunbiasedestimateofthepopulationvariance2onlyif(n-
1)isusedinthedenominationofS2.
Example
Thebodysurfaceareaoffifteenchildrenaregiven.Calculatethemean,variance,standardd
eviationandstandarderror.
Body SurfaceArea
196 101 184 227 253
185 217 126 336 148
114 135 233 198 109
Solution
∑ x = 2758 = 183 . 9
Mean= n 15
2
(∑ x )
∑x− n 58499 .7
Variance = S2 = = = 4178 . 55
n− 1 14
α=
∑ y −β ∑ x = y−β x
n n
811 800
= −0. 4764 x
12 12
=35.8233
TheregressionequationofYonXisgivenasY=35.823+0.476X
(ii) Theregressionlineofxonyisgivenby
x=α + βY
β=n ∑ xy−∑ x ∑ y
n ∑ y 2 −( ∑ y )2
12(54107 )−(800 )(811)
=
12(54849 )−(811)2
β=1 . 036
α=
∑ x −β ∑ y =800 −1. 036 x 811
n n 12 12
=−3. 38
Teregressionequationofxonyisgivenas
Y=-3.38+1.036Y
CORRELATIONANALYSIS
Wehavedealtwiththeproblemofregressionorestimationofonevariable(thedependent
variable)fromoneormorerelatedvariables(theindependentvariables).Weshallnowco
nsiderthedegreeofrelationshipthatexistsbetweenvariables,thecorrelationanalysis.
Correlationanalysisisatechniqueforestimatingtheclosenessordegreeofrelationshipbe
tweentwoormorevariables.Correlationisthedegreeofassociationbetweentwoormore
variables.Thedegreeofrelationshipmaybepositivethatis,anincreaseinonevariableacc
ompaniedbyanincreaseintheotherornegativewhendecreaseinonevariableisaccompa
niedbyanincreaseintheother.Thepatternsofcorrelationareperfectandpositivecorrelat
ionwhenr=1,perfectandnegativecorrelationwhenr=-
1,positivecorrelationwhenr>0,negativecorrelationwhenr<0andnocorrelationwhenr
=0.
Thecorrelationcoefficientorcoefficientofcorrelationdenotedbyr,isameasureofthestre
ngthoftheliearrelationshipbetweentwovariables.Twotypesofthemeasuresofcorrelati
onare:
(i) KarlPearson’s’productmomentcorrelationcoefficient(r)
(ii) Spearman’srankcorrelationcoefficient(R)
PRODUCTMOMENTCORRELATIONCOEFFICIENT
TheKarlPearson’sproductmomentcorrelationcoefficientisdevotedbyrandgivenby:
n ∑ xy− ∑ x ∑ y
r=
√ [n ∑ x 2−( ∑ x )2 ][n ∑ y 2−(∑ y )2 ]
Where–1<r<1
Itshouldbenotedthatthehigherthemagnitudeofr,themorestrongertheassociation.
EXAMPLE
Thetablebelowgivestheweightofheart(x)andtheweightofkidneys(y)inarandomsampl
eof12adultmalesbetweentheagesof25and55years
Maleno Heartweight(X) Kidneyweight(Y)
1 11.50 11.25
2 9.50 11.75
3 13.00 11.75
4 15.50 12.50
5 12.50 12.50
6 11.50 12.75
7 9.00 9.50
8 11.50 10.75
9 9.25 11.00
10 9.75 9.50
11 14.25 13.00
12 10 12.00
Calculatethecoefficientofcorrelation
Solution:
∑x ∑y
∑ xy - n
r=
∑
√ [ ∑ x 2 -( ∑ x )2 [ ∑ y 2−
( ∑ y )2
n
x=138.00, ∑ y=138.25, ∑ x=1608.12
∑ x2=1632.75, ∑ y2=1602.81
138 . 00 x 138 . 25
1608. 12−
12
r=
√ (1632 .75−
(138 . 00)2
12
r=0.70(to2decimalplaces)
)(1607 . 81−
(138 . 25 )
12
Thereisasignificantrelationshipbetweenheartweightandkidneyweight.
SPEARMANRANKCORRELATIONCOEFFICIENT
Whenvariablesdonotfollownormaldistributionandonedesirestoassesstherelationshi
p,correlationcoefficientknownasspearmanrankcorrelationcoefficientisused.Thevari
ablearerankedbasedonthemagnitude.Thecorrelationbetweenranksofvariablesxandy
isobtained.ThesymbolusedisR,theformulais:
6 ∑ d 2i
R=1−
n ( n2 −1 )
WheredisthedifferencebetweenranksgiventothevariablesofeachpairandniLllsthenu
mberofpairsstudied.Theprocedurewasdevelopedbyspearman.Hence,itisknownasspe
armanrankcorrelationcoefficient.Itsvaluealsorangesfrom–1to1.
EXAMPLE
Fromthetablebelow,calculatethespearmanrankcorrelationbetweensmokingandcanc
er.
Individualranks
1 2 3 4 5 6 7 8 9 10
Gradesofsmoothing(x) 1 2 3 4 5 6 7 8 9 10
Severityofcancer(y) 1 2 3 4 5 6 7 8 9 10
d=differencebetweentherank -1 1 -1 1 -1 -1 2 -1 1 0
sofxandy
d2 1 1 1 1 1 1 4 1 1 0
∑ d 2=1+1+1+1+1+1+4+1+1+0=12
6∑ d2 6(12) 6(12)
R=1− 2
=1− 2
=1−
n(n −1) 10(10 −1) 10(99 )
=1-0.073
=0.927.
Severityofcancerandgradesofsmokingarepositivelycorrelated
EXAMPLE
CalculatethevaluecorrelationcoefficientbetweenthecorrespondingvaluesofXandYgiv
enisthetablebelow
X 22 24 25 16 28 19
Y 48 42 40 38 47 45
Solution
Thevaryingisinascendingorderofmagnitude
X Y RX RY d d2
22 48 3 6 -3 9
24 42 4 3 1 1
25 40 5 2 3 9
16 38 1 1 0 0
28 47 6 5 1 1
19 45 2 4 -2 4
24
6∑ d2
2
R=1- n(n −1 )
6 (24 )
=1- 6(36−1)
=1-0.6857
=0.3143
=0.31
Thereisaloworweakpositivecorrelationbetweenthetwovariables.
TIEINRANKS
Mosttimes,twoormorevaluesofavariablemightbeequal.Insuchcases,weassignt
oeachofthetiedobservationsthemeanoftherankswhichtheyjointlyoccupy.Forexample
( 5+6 )
ifthe5 and6 largestvaluesofavariableareequal,weassigntoeachtherank 2 =5.5,a
th th
ndiftheoffifth,smithandseventhlargestvaluesofavariablearethesameweassigneachth
( 5+6+ 7)
erank 3 =6.6
EXAMPLE
Thetablegivebelowshowstherespectiveweight Χ andΥ (inkg)of12fathersandt
heireldestsons.
Father( Χ ) 66 64 68 65 69 63 71 67 69 68 70 72
Sons(Υ ) 69 67 69 66 70 67 69 66 72 68 69 71
Calculatethecoefficientofrankcorrelationandcommentonthedegreeofcorrelationbet
weenthefather’sweightandtheirson.
Solution
Χ Υ RX RX D=RX-RY d2
70 69 10 5 2.5 6.25
72 71 12 11 1.0 1.00
72.50
6 εd 2
2
R=1- n(n −1 )
6(72. 50 )
=1- 12(144−1
72 .50
=1- 2(143)
=1-0.2535
=0.7465
=0.75
Comment:Thereisafairlyhighpositivecorrelationbetweenthefather’sweightsandthato
ftheireldestsons.
ELEMENTSOFPROBABILITY
Probabilityconceptsarethefoundationsofstatistics.Theunderstandingoftheconceptsof
probabilitywillhelptheinterpretationofthestatisticsinaskilfulway.Probabilityisaterm
appliedtoeventsthatarenotcertain.Itisthestudyofrandomornon-
deterministicexperiments.Soprobabilityisdefinedastheratiooffavorableeventstothet
otalnumberofevents.Briefly,theinterpretationofprobabilitiescanbesummarizedasfoll
ows:
i. Probabilitiesarenumbersbetween0and1,inclusive,thatreflectthechancesofapa
rticularphysicaleventoccurring.
ii. Probabilitiesnear1indicatethattheeventinvolvedisexpectedtooccur.
iii. Probabilitiesnear½indicatethattheeventisjustaslikelytooccurasnot.
Theabovepropertiesareguidelinesforinterpretingprobabilitiesoncethesenumbersare
available,buttheydonotindicatehowtoactuallygoaboutassigningprobabilitiestoevent.
Threemethodsarecommonlyused:theclassicalapproach,therelativefrequencyapproa
chandpersonalorsubjectiveapproach.
TheClassicalApproach
Thismethodcanbeusedwheneverthepossibleoutcomesoftheexperimentareequallylik
ely.Inthiscase,theprobabilityoftheoccurrenceofeventAisgivenby:
n (A) Number of ways A can occur
P [A] = =
n ( s) number of ways the experiment can proceed
WhereSisthesamplesizeandACS.
Itsmaindrawbackisthatitisnotalwaysapplicable;itdoesrequirethatthepossibleoutcom
esbeequallylikely.Itsmainadvantageisthat,whenapplicable,theprobabilityobtainedise
xact.
Example
Whatistheprobabilitythatachildborntoacouple,eachwithgenesfrombothbrownandbl
ueeyes,willbebrown-eyed?
Solution
Wenotethatsincethechildreceivesonegenefromeachparent,thepossibilityforthechilda
re(brown,blue),(blue,brown),(blue,blue)and(brown,brown).
Wherethefinishmemberofeachpairrepresentsthegenereceivedfromthefather.Sinceea
chparentisjustaslikelytocontributeageneforbrowneyesasforblueeyes,allfourpossibili
tiesareequallylikely.
Sincethegeneforbrowneyesisdominant,threeofthefourpossibilitiesleadtoabro
wn-eyedchild.Hence,theprobabilitythatthechildisbrown-eyedis¾=0.75.
Example
Whatistheprobabilityofdrawinganaceatrandomfromawellshuffleddeckof52pl
ayingcards?
Solution
Thereare4acesinacheckof52cardsthatisx=4andn=52.
x 4 1
= =
Hence,probabilityoface n 52 13
TheRelativeFrequencyApproach
Thismethodcanbeusedinanysituationinwhichtheexperimentcanberepeatedmanytim
esandtheresultsobserved.ThentheapproximateprobabilityoftheoccurrenceofeventA,
denotedP(A),isgivenby:
n (A) Number of times event A occured
P [A] = =
N number of times experiment was run
Thedisadvantageofthismethodisthattheexperimentcannotbeaone-
shortsituation,itmustberepeatable.Theadvantageinthismethodorapproachisthatusu
allyitismoreaccurate,becauseitisbasedonactualobservationratherthanpersonalopini
on.
Thusforalargenumberoftrials,theapproximateprobabilityobtainedbyusingtherelativ
efrequencyapproachisusuallyquiteaccurate.
Example
Aresearcherisdevelopinganewdrugtobeusedindesensitizingpatientstobeestingsof20
0subjectstested,180showedalesseningintheseverityofsymptomsuponbeingstungafte
rthetreatmentwasadministered.Itisnaturaltoassumed,then,thattheprobabilityofthiso
ccurringinanotherpatientreceivingtreatmentisatleastapproximately
180
= 0. 90
200
Onthebasisofthisstudy,thedrugisreportedtobe90%effectiveinlesseningthereactionof
sensitivepatientstostings.
Example
If1,000tossesofacoinresultsin520heads,thentherelativefrequencyofheadsis
520
= = 0. 52
1000
Thesubjectiveorpersonalapproach
Thisistheprobabilityassignedtoaneventbasedonsubjectiveorpersonalexperien
ce,informationandbelieve.
Hence,probabilitiesareinterpretedasthestrengthofone’sbeliefintheoccurrence
ofanevent.
.
SomeBasicDefinitions
Experiment:thisreferstoanyprocessofobservationormeasurementwemaynotb
eabletopredict.
Outcome:Thisreferstoresultsobtainedfromanexperiment.
Samplepoint:Thisisanoutcomeinthesamplespace
Samplespace:Thisreferstothecollectionofallpossibleoutcomesofanexperiment.
Event:Thisreferstoanysubsetofasamplespace.
Axiomsofprobability
1. LetSdenoteasamplespaceofanexperiment.ThenP[S]=1
2. P[A]≥0foreveryeventA
3. LetA1,A2,A3,…
beasequenceofmutuallyexclusiveevents.ThenP[A1A2A3…]=P[A1]+P[A2]+P
[A3]…
Axiom1statesafactthatmostpeoplewouldregardasobvious,namelythattheprobability
assignedtoasureorcertain,eventis1.
Axiom2ensuresthatprobabilitiescanneverbenegative.
Axiom3iscalledthepropertyofcountableadditivity.
ProbabilityLaws
1. IfAandAarecomplementaryeventsinasamplespaceS,then
P(A)=1–P(A)
Complementaryevents:TwoeventsAandAaresaidtobecomplementaryiftheyaremutu
allyexclusive.
P(A)+P(A)=1
Mutuallyexclusiveevents:TwoeventsAandAaresaidtobemutuallyexclusiveifth
eoccurrenceofoneeventexcludesorpreventstheprobabilityofoccurrenceoftheo
therevent.
2. P()=0foranysamplesizes
SandaremutuallyexclusiveandS=S
P(S)=P(S)
=P(S)+P()
P()=P(S)–P(S)=0
3. IfAandBareeventsinasamplespaceSandACB,thenP(A)≤P(B)
4. 0≤P(A)≤1foranyeventA.
5. Additionrule:ifAandBareanytwoeventsinasamplespaceS,then
P(AB)=P(A)+P(B)–P(AB)
=P(A)+P(B)foranyP(AB)=0
6. MultiplicationRule:
Theprobabilitythataneventwilloccursjointlyistheproductoftheprobabilitiesofe
achevents.IfAandBareindependentevents,then
P(AB)=P(A)P(B)
Thisrulesisgeneralizedforanarbitrarynumberofindependentevents.
Example
Whatistheprobabilitythatacarddrawnatrandomfromawellshuffledstandardpackwill
beeitheraspadeoraclub?
Solution
S=13,C=13,n=52.
13 1
P ( s ) = P ( spade ) = =
52 4
13 1
P ( c ) = P ( c lub ) = =
52 4
Theoutcomesaremutuallyexclusive,therefore,theP(SorC)=P(s)+P(c)
HenceP(sorc)=¼+¼=½
Example3.6
Findtheprobabilityofgettingthreeheadsinthreerandomtossesofabalancedcoin?
Solution
Probabilityofeachtossis½
Multiplyingthethreeprobabilitiesgives
½½½=1/8
7. ConditionalProbability
GivenasamplespaceS,letAbeanon-
emptypropersubsetofS.i.e.AandAcS.TheprobabilityofaneventBhappeninggiventh
ataneventAhastakenplaceisdenotedbyP(B/A)andisdefinedas:
P (B ∩ C )
P (B / A ) =
P ( A)
IfAandBareanytwoeventsinasamplespaceSandP(A)0,theconditionalprobabilityofBg
ivenAis:
P ( A ∩ B) P ( both events )
P (B / A ) = =
P (A) P ( given event )
Likewise,theconditionalprobabilityofAgivenBandP(B)0is:
P ( A ∩ B) P ( both events )
P ( A / B) = =
P ( B) P ( given event )
Example
Itisestimatedthat15%oftheadultpopulationhashypertension,butthat75%ofall
adultsfeelthatpersonallytheydonothavethisproblem.Itisalsoestimatedthat6%ofthep
opulationhashypertensionbutdoesnotthinkthatthediseaseispresent.Ifanadultpatient
reportsthinkingthatheorshedoesnothavehypertension,whatistheprobabilitythatthed
iseaseis,infact,present?
Solution
LettingAdenotetheeventthatthepatientdoesnotfeelthatthediseaseispresentandBthee
ventthatthediseaseispresent.WearegiventhatP(A)=0.75,P(B)=0.15andP(A B)=0.0
6
Weareaskedtofind:
P ( both ) P ( A ∩ B) 0. 06
P (B / A ) = = = = 0 .08
P ( given ) P (A) 0 .75
Thereisan8%chancethatapatientwhoexpressestheopinionthatsheorhehasnoproble
mwithhypertensiondoes,infact,havethedisease.
Baye’sTheorem
ThistheoremwasformulatedbytheReverendThomasBayes(1761).Itdealswithconditi
onalprobability.Baye’stheoremisusedtofindP(A/
B)whentheavailableinformationisnotdirectlycompatiblewiththatrequiredinconditio
nalprobability.Thatis,itisusedtofindP[A/
B]whenP[AB]andP[B]arenotimmediatelyavailable.
Theorem:
Baye’stheoremismucheasiertouseinpracticalproblemthantostateformally.
Example
ThebloodtypedistributioninOlabisiOnabanjoUniversityistypeA,41%;typeB,9%
;typeAB,4%,andtypeO,46%.Itisestimatedthatduringaninvestigation,4%ofinducteesw
ithtypeObloodweretypedashavingtypeA;88%ofthosewithtypeAbloodwerecorrectlyt
yped;4%withtypeBbloodweretypedasAjand10%withtypeABweretypedasA.onestude
ntwaswoundedandbroughttosurgery.HewastypedashavingtypeAblood.Whatisthepr
obabilitythatthisishistruebloodtype?
Solution
Let
A1=hehastypeAblood
A2=hehastypeBblood
A3=hehasstepABblood
A4=HehastypeOblood
B:ItistypedastypeA.
WewanttofindP[A1/B]
Wearegiventhat
P[A1]=0.41 P[B/A1]=0.88
P[A2]=0.09 P[B/A2]=0.04
P[A3]=0.04 P[B/A3]=0.10
P[A4]=0.46 P[B/A4]=0.04
ByBaye’stheorem
P [ B / A 1] P [ A 1 ]
P [ A 1/ B ] = 4
∑ P [ B/ A1 ] P [ A 1]
i=1
( 0 . 88 ) ( 0 . 41 )
=
( 0. 88 ) ( 0 . 41 ) + ( 0 . 04 ) ( 0 . 09 ) + ( 0. 10 ) ( 0. 04 ) + ( 0. 04 ) ( 0 . 46 )
0.93
Practicallyspeaking,thismeansthatthereisa93%chancethatthebloodtypeisAifithasbe
entypedasA,andthereisa7%chancethatithasbeenmistypedasAwhenitisactuallysomeo
thertype.
FACTORIALS
Factorialisaspecialmultiplicationoperator.Thefactorialsign“!”indicatesaspecialrepeat
edmultiplicationwhichisusedfrequentlyinstatisticalapplications.
Examples
3!=321=6
4!=4321=24
Ingeneral,n!=nn-1n-2,…,321
Wherenisaninteger
Theoperator“”isusedtoindicateamultiplicationofaseriesofnumbers.
Theoperation“”isusedtoindicateasummationofaseriesofnumbers.
5
Π Y2 = Y1 × Y2 × Y3 × Y4 × Y5
i=1
5
∑Y 2 = Y 1 × Y2 × Y3 × Y4 × Y5
i=1
PERMUTATION
Ifrobjectsareselectedfromasetofnobjects,anyparticulararrangement(order)oftheseo
bjectsiscalledapermutation.
Thenumberofpermutationsofrobjectsselectedfromasetofndistinctobjectsis
n n!
Pr =
(n − r ) !
Example
FindthenumberofwaysofarrangingthelettersoftheworldCHEMISTRYif:
a. Allthelettersaretobetakenatatime
b. Fourofthelettersaretobetakenatatime
Solution
a. Requirednumberofarrangements=n!
9!=362880
b. Requirednumberofarrangements=nPr
9 9! 9! 362880
P4 = = =
( 9−4 ) ! 5! 120 =3024
Notes
i. 0!=1andnPn=n!
ii. Thenumberofpermutationsofnobjectsofwhichn1areofonekind,n2ofasecond
n!
kind,…,nkofakthkindis n1 ! n2 ! ..., nk !
COMBINATION
Thisdealswiththenumberofwaysinwhichrobjectscanbeselectedfromasetofnobjects.T
(n ¿) ¿ ¿¿
henumberofwaysinwhichrobjectscanbeselectedfromasetofndistinctobjectsis ¿ orn
Crandisgivenby:
n n!
Cr =
r! ( n − r ) !
Example
Inhowmanywayscanapersonselectthreeitemsfromalistof7suchitems?
Solution
Hencen=7andr=3
NumberofpossibleselectionsnCr
7 7! 7!
C3 = =
3! (7 − 3 ) ! 3! 4!
7×6 ×5
= = 35
3×2×1
=35
Mathematicallyspeaking,aneventwhichisimpossibletooccur,forexample,ananimalgiv
ingbirthtoahumanchild,hasaprobabilityzeroandtheeventwhichiscertaintooccur,fore
xample,death,hasprobabilityunity.Iflifebirthofanoctuplettoawomanarenotknowntoo
ccurinthehistoryofacommunity,thestatisticalprobabilityofsuchaneventiszerointhatc
ommunity.Butitdoesnotmeantthattheeventisanimpossibility.Noprobabilitycanbeneg
ativenorcanitexceedone.Insimpleterms,ifmalnutritionispresentin8percentofchildren
inapopulation,theprobabilitythatarandomlypickedchildwouldhavethatconditionis0.
08.Thus,thismeasuresthelikelihoodoftheeventandinawayiscomplementofuncertaint
y.
Suchqualificationofuncertaintieshasprovedimmenselyusefulineffectivemanagement
ofhealthconditions,bothatindividuallevelaswellasatcommunitylevel.Knowingthatthe
probabilityofdevelopingcoronaryarterydiseaseinseniorexecutivesis,say,3timeshighe
rthaninclerks,providesusascienitificbasistogiveappropriateadviceortoinstituteanint
erventionatindividuallevelandtoplanandexcutepreventivemeasurestocombatthepro
bleminthetargetgroup.Iftheanalysisofrecordsshowthat90percentofthelargenumbero
fpatientsofabdominaltuberculosis(TB)camewithcomplaintofpaininabdomen,vomiti
ngandconstipationoflongduration,the
P(pain,vomiting,constipation/abdominalTB)=0.90
Suchprobabilities,whicharerestrictedtoaspecificgroup,arecalledconditionalprobabili
ties.Therefore,theabovegivenillustrationsaresomeofthebiologicalandhealthspecifice
xamplesofprobability.
EXPERIMENTALDEGISN
DefinitionofTerms
Randomization:Thisistheallocationoftreatmentstounitssuchthattheprobabilitythata
particulartreatmentwillbeallocatedtoaparticularunitisthesameforalltreatments.That
isboththeallocationoftheexperimentmaterialandtheorderinwhichtheindividualtrials
oftheexperimentaretobeperformedrandomlydetermined.Statisticalmethodrequireth
attheobservations(orerrors)beindependencyandidenticallydistributedrandomvaria
blesandrandomizationmakesthisassumptionvalid.Thus,randomizationremovesbiasa
ndallowstheapplicationofprobabilityconcepts.
Replication:itisacompleterepetitionofthebasicexperiment,thatis,itprovidesanestimat
eofthemagnitudeoftheexperimentalerrorandamoreprecisemeasureoftreatmenteffec
ts.
ReductionofRandomVariation
Thethirdbasicprincipleistheuseoftechniquesofexperimentaldesignforthereductionof
randomvariationorlocalcontrolofvariabilityorerrorcontrol.Thisreferstothewayinwhi
chtheexperimentalunitsinaparticulardesignisbalanced,blockedandgrouped.Possessi
onoflocalcontrolisnecessarytoincreasetheefficiencyoftheexperiment.Thecommonlyu
sedtermsare,experiment,treatment,experimentalunit,experimentalerror,grouping,bl
ockingfactors,balancingandprecision.
Experiment:Itisameansofgettingananswertothequestionthattheexperimenterhasinm
ind.Thismaybetodecidewhichofseveralpainrelievingdrugsismosteffectiveorwhethert
heyareequallyeffective.
Similarly,theeffectivenessofvarioustypesofdietsongrowthstatusofchildrenoralbinora
tscanbeassessed.Forassessingtheeffectivenessoftheexperiment,itshouldhaveonegro
uptoserveaslocalcontrol.
Treatment:Thismeanstheexperimentalconditionswhichareimposedonanexperiment
alunitinaparticularexperiment.Inadietaryormedicalexperiment,thedifferentdietsor
medicinesarethetreatments.Inanagriculturalexperiment,thedifferentvarietiesofacro
pordifferentmanureswillbethetreatments.
Experimentalunit:Anexperimentalunitisthematerialtowhichthetreatmentisapplieda
ndonwhichthevariableunderstudyismeasured.Inafeedingexperimentofcowsoralbino
rats,thewholecoworalbinoratistheexperimentalunit.
ExperimentalError:Weusuallycomeacrossvariationinthemeasurementmadeondiffer
entexperimentalunitsevenwhentheygetthesametreatments.Apartofthisvariationissy
stematicandcanbeexplained,whereastheremainderistobetakentobeoftherandomtyp
e.Theunexplainedrandompartofthevariationistermedtheexperimentalerror.
Grouping:Thisistheplacementofhomogenousexperimentalunitsintodifferentgroupst
owhichseparatetreatmentsmaybeassigned.
Blocking:Thisistheassignmentoftheexperimentalunitstoblocksinsuchamannerthatth
eunitswithinanyparticularblockareashomogenousaspossible.
Factors:Afactorisapossiblecauseofresponseorvariation.Factorsincludeage,sex,variet
y,etc.itmaybeobservedthattreatmentsareoftendifferentcombinationsofthelevelsofon
eormorefactors.
Balancing:Thisistheassignmentofthetreatmentcombinationstotheexperimentalunitsi
nsuchawaythatabalancedorsymmetricconfigurationisobtained.
One-wayANOVAisusedwhenwewishtotesttheequalityofk-
populationmeans.TheprocedureisbasedontheassumptionsthateachofKgroupsofobse
rvationisarandomsamplefromanormaldistributionandthatthepopulationvariance2i
sconstantamongthegroups.ANOVAmodelsprovideanappropriateestimatetofacilitate
comparisonofseveralmeans.
Thestatisticalmodelforone-wayclassificationofANOVAis
X ij = μ + α i + ℓ ij
i=1,2,…..,k
j=1,2,…,n
WhereXij=(ij)thobservationfromthejthunitreceivingtreatment
=overallorgrandmean
i=ithTreatmenteffect
ℓ ij =randomerror
ℓ ij ~NID(0,2)
Where
n
X̄ i .. =
1
n ∑ X ij i = 1 , 2 , . .. k
j=1
k n
k ∑ ∑ X ij
X̄ i .. = 1
k ∑ X̄ i . = i=1 j=1
kn
i = 1, 2, ...k
j=1
SumofSquaresidentity
ANOVAispartitioningoftotalvariabilityintocomponentsparts.
Totalsumofsquare(TSS):TheTSSisdefinedasthesumofthesquareofthedeviationsfromt
hegrandmean.
k n
TSS = ∑ ∑ ( X ij − X̄ . . )
2
i=1 j=1
Itisameasureofthedispersionofallthevariatesaboutthegrandmean.Itsdegreeoffreedo
m(df)=k-
1.ItcanbeshownthattheTSS,SStotalortotalvariationscanbepartitionedintotwo.
k n k n
TSS = ∑ ∑ ( X ij − X̄ . . ) =
2
∑ ∑ ( X ij − X̄ i. )2 + n ∑ ( X̄ i. − X̄ . . )2
i=1 j=1 i=1 j=1
BSS
WSS
TSS = Between treatment
Treatment sum of squares
sum of squares
WithinSumofSquares(WSS):WSSorsumofsquaresduetoerror(residualerror)isdefine
dasthedeviationofXij(originalobservation)fromthetreatmentmeans.Itrepresentsthee
xperimentalerrorofthegivenexperimentitsdegreeoffreedomisk(n-1)denotedbySSE.
Betweensumofsquares(BSS):Itisdefinedasthedeviationsofthetreatmentmansaboutth
egrandmean.Thelessthesamplesdifferfromeachother,thesmallertheBSSortreatments
umofsquares(SSTr).
Foreasycomputation,wecanusethefollowing:
k n
TSS = ∑ ∑ ( X ij − X̄ . . )2
i=1 j=1
k n
T2
= ∑∑X 2 −
i=1 j=1 ij nk
[∑ ∑ ]
k n 2
2
T = X ij
Where i=1 j=1
n
Xi .
Xi . = ∑ X ij , X̄ i. =
N
j=1
k n
X..
X . . = ∑ ∑ X ij , X̄ . . =
i=1 j=1 N
WhereN=totalnumberofobservations
k
BSS = n ∑ ( X̄ i. − X̄ . . )2
i=1
1 T2
=
n
∑ T 2i. −
nk
WhereTi.=sumofobservationsintreatmentigroup
WSS=TSS–BSS
One-wayANOVATable(Equalobservation)
Givenmodel, X ij = μ + α i + ℓ ij
Totestthehypothesis
H0:1=2=…=k
H1:atleasttwodi’sarenotequal.
TestStatistics:FfromtheANOVAtablebelow
ANOVATable
Source SS df MS F
Betweentreatments BSS k-1 BSS
k−1
= A A/B
Withintreatments WSS K(n-1) WSS
k ( n−1 )
=B
F (1−α ) , v
Thecriticalvalueis 1
, V2
wheredf,v1=k-1,v2=k(n-
1)andisthesignificantlevels.
Example8.1
GiventhefollowingfivetreatmentsA,B,C,DandEofthreevariableseachperforman
analysisofvariancetotestwhetherthetreatmenteffectsandthesameornotandcomputet
hecoefficientofvariationtodetermineitsprecisionat5%levelofsignificance.
A B C D E
3 5 7 6 4
2 8 8 8 9
4 8 6 7 5
Solution:
TestofHypothesis
H0:1=2=…=5
H1:atleast2di’sarenotequal.
A B C D E
3 5 7 6 4
2 8 8 8 9
4 8 6 7 5
Ti.Total 9 21 21 21 18
X̄ i . mean 3 7 7 7 6
=62
1 T2
BSS =
n
∑ T 2i . −
nk
1 2 ( 90 )2
= [ 9 + 21 + 21 + 21 + 18 ] −
2 2 2 2
3 3×5
=36
WSS=TSS–BSS=62–36=26
ANOVATABLE
SourceofVariation SS df MS F
Betweentreatments 36 4 36
4
=9 9/2.6 = 0.346
Withintreatments 26 10 26
10
= 2 .6
Total 62 14
Teststatistics=Fcal=3.462
CriticalValue=F(1-),4,10=3.48
Decision:SinceFcal>Ftab,weacceptH0andconcludethatthetreatmentmeaneffectsinthefiv
etreatmentsareequalorthereisnosignificantdifferencebetweenthetreatmentmeansin
thefivetreatments.
StatisticalHypothesis
Themostfrequentapplicationofstatisticsistotestsomescientifichypotheses.Resultsofe
xperiments,andinvestigationsareusuallynotclearcutand,therefore,needstatisticaltest
stosupportdecisionsbetweenalternativehypothesis.Astatisticaltestsexaminesasetofs
ampledataandonthebasisofanexpecteddistributionofthedata,leadstoadecisiononwhe
thertoacceptthehypothesisorwhethertorejectthathypothesisandacceptanalternative
one.Thenatureofthetestsvarieswiththedataandthehypothesis,butthesamegeneralphi
losophyofhypothesistestingiscommontoalltests.Astatisticalhypothesisisanassumptio
norstatementwhichmayormaynotbetrueconcerningoneormorepopulation.
Astatisticalhypothesis(orinference)isastatementabouttheparametersorformofapopu
lation.Atestofastatisticalhypothesisisacriteriawhichspecifiesforwhatsampleresultsth
ehypothesisistobeacceptedorrejected.Thehypothesiswhichistobetestedisgenerallyca
lledtheNullhypothesisdenotedbytheH0andhypothesisagainstwhichitistobetestedisca
lledthealternativehypothesisandalsodenotedbyH 1.
TypeIandTypeIIErrors
AtypeIerrorhasbeencommittedifwerejectthenullhypothesiswhenitistrueandatypeIIe
rrorhasbeencommittedifweacceptthenullhypothesiswhenitisfalse.
ThefollowingtablesummarizesthevarioussituationsthatcanarisewhentestingH 0again
stH1:
AcceptH0 AcceptH1
H0istrue Noerror TypeIError
H1istrue TypeIIerror Noerror
TheprobabilitiesofcommittingatypeIandtypeIIerrorsarecalledlevelofsignificanceofth
etestsandarewrittenasand,respectively.iscalledthesizeofthetestand(1-
)iscalledthepowerofthetest,and(1-
)isalsotheprobabilityofrejectingnullhypothesis(H 0)whenitisfalse.Theareasuchthatif
thesamplepointfallsinitwerejectH0iscalledthecriticalregion.Whentheprimaryconcern
ofatestistoseewhetherthenullhypothesiscanberejected,suchatestiscalledatestofsignif
icance.Inthatcase,thequantityiscalledthelevelofsignificanceatwhichthetestisbeingc
onducted.
OneandTwoTailedTest
Atestofanystatisticalhypothesiswherethealternativeisonesidedsuchas:
H0:=0 or H0:=0
H1:>0 H1:<0
Iscalledaone-
tailedtest.ThecriticalregionforH1:>0liesentirelyintherighttailwhilethecriticalregio
nforH1:<0liesentirelyinthelefttail.
Atestofanystatisticalhypothesiswherethealternativeistwo-sidedsuchas:
H0:=0
H1:0
Iscalledatwo-
tailedtest,valuesinthebothtailsofthedistributionconstitutethecriticalregion.
TESTPROCEDUREANDSTEPS
Thestepsinvolvedingeneralandintheutilizationofanytestofsignificanceare:
i. Findthetypeofproblemandthequestiontobeanswered.
ii. Tostatethenullhypothesis(H0)andtheappropriatealternative(H1)hypothesis
iii. Selectionoftheappropriatetesttobeutilizedandcalculationofthetestcriterionbas
edonthetypeoftest.
iv. Fixationofthelevelofsignificance
v. Decisionmakingontestcriterionvalue,whethertorejectoracceptthehypothesis.
vi. Drawingoftheconclusion(orinference)onthebasisoflevelofsignificanceisdecidi
ngwhetherthedifferenceobservedisduetochanceorduetosomeotherknownfact
ors.
‘P’Values
‘P’Valuesareusedtoassessthedegreeofdissimilaritybetweentwoormoresetsofm
easurementsorbetweenonesetofmeasurementsandastandard.The‘P’valueisactuallya
probability,usuallytheprobabilityofobtainingaresultofextremeasormoreextremetha
ntheoneobservedifthedissimilarityisentirelyduetovariationinmeasurementsorinsubj
ectresponse,thatis,ifitistheresultofchancealone.
‘P’valuesmeasurethestrengthofevidenceinscientificstudiesbyindicatingthepro
babilitythataresultatleastasextremeastheobservedwouldoccurbychance.
‘P’valuesarederivedfromstatisticalteststhatdependonthesizeanddirectionofth
eeffect.‘P’Valuesshouldbeconsideredinmakingdecisionsabouttheusefulnessofatreat
ment.
Onepopularapproachistoindicateonlythatthe‘P’valueissmallerthan0.05(P<0.0
5)orsmallerthan0.01(P<0.01).When‘P’valueisbetween0.05and0.01,theresultisusual
lycalledstatisticallysignificant,whenitislessthan0.01or0.005aretakentobeveryhighlys
ignificant.
TESTSCONCERNINGTHEMEAN(FORLARGESAMPLE).
Wewillassumethatthesamplingdistributionofthesampleestimateswillbeapproximate
lynormalandthatthevarianceisknown.Hence,forlargesamples(n30),wecanusetheno
rmalprobabilitydistributionfortestingahypothesizedvalueofthepopulationmean.
Theteststatistics
X̄ − μ
Z=
S.E . ( X̄ )
Where
X̄ isthesamplemean
isthepopulationmean
S.E.( X̄ )isthestandarderrorofthesamplemean.
σ
S . E . ( X̄ ) =
√n
Where
isthepopulationstandarddeviation(usuallyknown)
nisthesamplesize.
WethencomparethemodulusofZ,thatis,(/
Z/)toitsvalueatthegivenlevelofsignificance,usuallyat5%and1%.Thecorrespondingva
luesofZforbothonetailedandtwo-tailedtestsaretabulatedbelow:
One-tailed Two-tailed
5%(or0.05) 1.64 1.96
1%(or0.01) 2.33 2.58
Decision
i. IfZcalculatedislessthantheZtabulatedthenthereisnoreasontorejectthenullhypo
thesisH0.
ii. IfZcalculatedismorethantheZtabulatedthenwerejectthenullhypothesisH 0anda
cceptH1thealternativehypothesis.
Example
Abottlingcompanywhichbottlesasoftdrinkclaimsthattheliquidscontentis35clwithsta
ndarddeviation0.75cl.Aresearcherrandomlycollects50bottles,measuredtheircontent
sandgotmeanof34.2cl.Testat0.01levelofsignificancethatthebottlingcompanyhasbeen
cheatingtheirconsumers.
Solution
=35cl
=0.75cl
n=50
X̄ =34.2
=0.01(1%)
H0:=35thatis,thecompanyhasnotbeencheatingtheconsumers.
H1:<35thatthecompanyhasbeencheatingtheconsumers.
Teststatisticsis
( X̄ − μ ) √ n
Z=
σ
( 34 .2 − 35 ) √ 50
=
0 . 75
−0. 8 × 7 . 0711
=
0. 75
=-7.54
Thus,|Z|=|-7.541|=7.54
At0.01levelofsignificancetheZtabulatedvalue(onetailed)is2.33
Decision:theZcalculatedvalue7.54isgreaterthantheZtabulatedvalue2.33.werejectH 0a
ndacceptH1.
Conclusion:Thereissignificantdifferencebetweenthepopulationandsamplemean.Hen
ce,thebottlingcompanyhasbeencheatingtheirconsumers.
Example
Themeanheightfromarandomsampleofsize100is64cm.Thestandarddeviationisknow
ntobe3cm.testthestatementthatthemeanheightofthepopulationis67cmat5%levelofsi
gnificance.
Solution
X̄ =64cm
=3cm
=67cm
n=100
=0.05levelofsignificance
H0:=67cm
H1:67cm
Teststatistics
( X̄ − μ ) √ n
Z=
σ
( 64 − 67 ) √ 100
=
3
=-10
Thus,|Z|=|-10|=10
At0.05levelofsignificancetheZtabulatedvalue(two-tailed)is1.96
Decision:SinceZcal>ZtabwerejectH0andacceptH1
Conclusion:Themeanheightofthepopulationcouldnotbe67cm.
TESTCONCERNINGTHEMEANS(SMALLSAMPLES)
Therearesituationsinreallifeexperiment,suchas,testingtheefficiencyofanewlyproduc
eddrug,whereitisimpracticabletogetalargesampleandyettestsofsignificancestillhavet
obecarriedout.Whenwedonotknownthevalueofthepopulationstandarddeviationandt
hesamplesizeissmall(n<30),weshallassumeagainthatthepopulationwearesamplingf
romhasroughlytheshapeofanormaldistribution.Theteststatisticsis:
X̄ − μ ( X̄ − μ ) √ n
t= =
S S
√n
Whosesamplingdistributionisthetdistributionwithn-
1degreeoffreedom.Sisthesamplestandarddeviation.Aswithlargesamples,wecomparei
twithitsvalueatagivenlevelofsignificance,andthendrawourconclusions.
Example
Supposethatwewanttotestonthebasisofarandomsampleofsizen=5whetherornotthef
atcontentofacertainkindoficecreamexceeds12percent.Whatcanweconcludeaboutthe
nullhypothesis.=12percentatthe0.01levelofsignificance,ifthesamplehasthemean X̄
as12.7percentandthestandarddeviationSis0.38percent.
Solution
Hypothesis
H0:=12%
H1:>12
=0.01
n=5
d.f.=n–1=t0.01,4degreeoffreedom
Teststatistics
X̄ − μ
t =
S
√n
12. 7 − 12
t=
0 .38
√5
0.7
t= = 4 .12
0 . 1699
t0.01,4=4.12
Decision:Sincetcal>ttab,werejectH0
Conclusion:Therefore,thecontentofthegivenkindoficecreamexceeds12percent.
Example
Thelifetimeofelectricbulbsforarandomsampling10fromalargeconsignmentgivethefol
lowingdata:
Item Lifein1,000hrs x- X̄ (X- X̄ )2
1 4.2 -0.1 0.01
2 4.0 -0.3 0.09
3 3.9 -0.4 0.16
4 4.1 -0.2 0.04
5 5.2 0.9 0.81
6 3.8 -0.5 0.25
7 3.9 -0.5 0.16
8 4.3 0 0
9 4.4 0.1 0.01
10 5.6 1.3 1.69
Canweacceptthehypothesisthattheaveragelifetimeofbulbsis4,000hoursat5%le
velofsignificance?
Solution:
Hypothesis
H0:=4,000hours
H1:4,000hours
=0.05levelofsignificance
Since,n=10 d.f.=n-1=10–1,9
tα = t 0 .25 , 9
2 ( n−1 ) ( n−1 )
n
∑ Xi
i=1 4 . 2 + 4 . 0 + ,. .. , + 5. 6 43 . 5
X̄ = = =
n 10 10
X̄ = 4.3
10
∑ ( X i − X̄ )2
i =1
S2 =
n−1