You are on page 1of 416

StatisticsandProbability

forEngineeringApplications
WithMicrosoft

Excel
[Thisisablankpage.]
StatisticsandProbability
forEngineeringApplications
WithMicrosoft

Excel
by
W.J.DeCoursey
CollegeofEngineering,
UniversityofSaskatchewan
Saskatoon
Ams t er dam Bos t on London New Yor k Ox f or d Par i s
Sa n Di e go Sa n F r a n c i s c o Si n ga po r e Sy dn e y To k y o
NewnesisanimprintofElsevierScience.
Copyright2003,ElsevierScience(USA). Allrightsreserved.
Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,or
transmittedinanyformorbyanymeans,electronic,mechanical,photocopy-
ing,recording,orotherwise,withoutthepriorwrittenpermissionofthe
publisher.
Recognizingtheimportanceofpreservingwhathasbeenwritten,
ElsevierScienceprintsitsbooksonacid-freepaperwheneverpossible.
Library of Congress Cataloging-in-Publication Data
ISBN: 0-7506-7618-3
British Library Cataloguing-in-Publication Data
AcataloguerecordforthisbookisavailablefromtheBritishLibrary.
Thepublisheroffersspecialdiscountsonbulkordersofthisbook.
For information, please contact:
ManagerofSpecialSales
ElsevierScience
225WildwoodAvenue
Woburn, MA 01801-2041
Tel: 781-904-2500
Fax: 781-904-2620
ForinformationonallNewnespublicationsavailable,contactourWorldWide
Web home page at: http://www.newnespress.com
10987654321
PrintedintheUnitedStatesofAmerica
Contents
Preface................................................................................................xi
WhatsontheCD-ROM?.................................................................xiii
ListofSymbols..................................................................................xv
1.Introduction:ProbabilityandStatistics.........................................1
1.1 SomeImportantTerms...................................................................1
1.2 Whatdoesthisbookcontain?.......................................................2
2.BasicProbability .............................................................................6
2.1 FundamentalConcepts..................................................................6
2.2 BasicRulesofCombiningProbabilities.........................................11
2.2.1 AdditionRule....................................................................11
2.2.2 MultiplicationRule............................................................16
2.3 PermutationsandCombinations..................................................29
2.4 MoreComplexProblems:BayesRule..........................................34
3.DescriptiveStatistics:SummaryNumbers...................................41
3.1 CentralLocation..........................................................................41
3.2 VariabilityorSpreadoftheData...................................................44
3.3 Quartiles,Deciles,Percentiles,andQuantiles................................51
3.4 UsingaComputertoCalculateSummaryNumbers......................55
4.GroupedFrequenciesandGraphicalDescriptions .....................63
4.1 Stem-and-LeafDisplays................................................................63
4.2 BoxPlots......................................................................................65
4.3 FrequencyGraphsofDiscreteData..............................................66
4.4 ContinuousData:GroupedFrequency.........................................66
4.5 UseofComputers........................................................................75
v
5.ProbabilityDistributionsofDiscreteVariables...........................84
5.1 ProbabilityFunctionsandDistributionFunctions..........................85
(a) ProbabilityFunctions...............................................................85
(b) CumulativeDistributionFunctions..........................................86
5.2 ExpectationandVariance.............................................................88
(a) ExpectationofaRandomVariable..........................................88
(b) VarianceofaDiscreteRandomVariable ..................................89
(c) MoreComplexProblems .........................................................94
5.3 BinomialDistribution.................................................................101
(a) IllustrationoftheBinomialDistribution.................................101
(b) GeneralizationofResults......................................................102
(c) ApplicationoftheBinomialDistribution...............................102
(d) ShapeoftheBinomialDistribution.......................................104
(e) ExpectedMeanandStandardDeviation................................105
(f) UseofComputers ................................................................107
(g) RelationofProportiontotheBinomialDistribution ...............108
(h) NestedBinomialDistributions...............................................110
(i) Extension:MultinomialDistributions .....................................111
5.4 PoissonDistribution...................................................................117
(a) CalculationofPoissonProbabilities.......................................118
(b) MeanandVarianceforthePoissonDistribution ....................123
(c) ApproximationtotheBinomialDistribution..........................123
(d) UseofComputers................................................................125
5.5 Extension:OtherDiscreteDistributions.......................................131
5.6 RelationBetweenProbabilityDistributionsand
FrequencyDistributions...............................................................133
(a) ComparisonsofaProbabilityDistributionwith
CorrespondingSimulatedFrequencyDistributions................133
(b) FittingaBinomialDistribution ...............................................135
(c) FittingaPoissonDistribution .................................................136
6.ProbabilityDistributionsofContinuousVariables...................141
6.1 ProbabilityfromtheProbabilityDensityFunction........................141
6.2 ExpectedValueandVariance .....................................................149
6.3 Extension:UsefulContinuousDistributions................................155
6.4 Extension:Reliability...................................................................156
vi
7.TheNormalDistribution.............................................................157
7.1 Characteristics............................................................................157
7.2 ProbabilityfromtheProbabilityDensityFunction........................158
7.3 UsingTablesfortheNormalDistribution....................................161
7.4 UsingtheComputer..................................................................173
7.5 FittingtheNormalDistributiontoFrequencyData......................175
7.6 NormalApproximationtoaBinomialDistribution......................178
7.7 FittingtheNormalDistributiontoCumulative
FrequencyData..........................................................................184
7.8 TransformationofVariablestoGiveaNormalDistribution..........190
8.SamplingandCombinationofVariables..................................197
8.1 Sampling...................................................................................197
8.2 LinearCombinationofIndependentVariables............................198
8.3 VarianceofSampleMeans.........................................................199
8.4 ShapeofDistributionofSampleMeans:
CentralLimitTheorem................................................................205
9.StatisticalInferencesfortheMean............................................212
9.1 InferencesfortheMeanwhenVarianceIsKnown......................213
9.1.1 TestofHypothesis...........................................................213
9.1.2 ConfidenceInterval.........................................................221
9.2 InferencesfortheMeanwhenVarianceIs
EstimatedfromaSample...........................................................228
9.2.1 ConfidenceIntervalUsingthet-distribution....................232
9.2.2 TestofSignificance:ComparingaSampleMean
toaPopulationMean.....................................................233
9.2.3 ComparisonofSampleMeansUsingUnpairedSamples..234
9.2.4 ComparisonofPairedSamples........................................238
10.StatisticalInferencesforVarianceandProportion.................248
10.1 InferencesforVariance...............................................................248
10.1.1 ComparingaSampleVariancewitha
PopulationVariance........................................................248
10.1.2 ComparingTwoSampleVariances..................................252
10.2 InferencesforProportion...........................................................261
10.2.1 ProportionandtheBinomialDistribution........................261
vii
10.2.2 TestofHypothesisforProportion....................................261
10.2.3 ConfidenceIntervalforProportion..................................266
10.2.4 Extension........................................................................269
11.IntroductiontoDesignofExperiments...................................272
11.1 Experimentationvs.UseofRoutineOperatingData...................273
11.2 ScaleofExperimentation............................................................273
11.3 One-factor-at-a-timevs.FactorialDesign....................................274
11.4 Replication.................................................................................279
11.5 BiasDuetoInterferingFactors...................................................279
(a) SomeExamplesofInterferingFactors....................................279
(b) PreventingBiasbyRandomization........................................280
(c) ObtainingRandomNumbersUsingExcel..............................284
(d) PreventingBiasbyBlocking..................................................285
11.6 FractionalFactorialDesigns........................................................288
12.IntroductiontoAnalysisofVariance.......................................294
12.1 One-wayAnalysisofVariance....................................................295
12.2 Two-wayAnalysisofVariance....................................................304
12.3 AnalysisofRandomizedBlockDesign........................................316
12.4 ConcludingRemarks..................................................................320
13.Chi-squaredTestforFrequencyDistributions........................324
13.1 CalculationoftheChi-squaredFunction....................................324
13.2 CaseofEqualProbabilities.........................................................326
13.3 GoodnessofFit..........................................................................327
13.4 ContingencyTables....................................................................331
14.RegressionandCorrelation.....................................................341
14.1 SimpleLinearRegression............................................................342
14.2 AssumptionsandGraphicalChecks...........................................348
14.3 StatisticalInferences...................................................................352
14.4 OtherFormswithSingleInputorRegressor...............................361
14.5 Correlation................................................................................364
14.6 Extension:IntroductiontoMultipleLinearRegression................367
viii
15.SourcesofFurtherInformation...............................................373
15.1 UsefulReferenceBooks.............................................................373
15.2 ListofSelectedReferences.........................................................374
Appendices......................................................................................375
AppendixA:Tables.............................................................................376
AppendixB:SomePropertiesofExcelUseful
AppendixC:FunctionsUsefulOncethe
DuringtheLearningProcess.......................................................382
FundamentalsAreUnderstood...................................................386
AppendixD:AnswerstoSomeoftheProblems..................................387
EngineeringProblem-SolverIndex...............................................391
Index................................................................................................393
ix
[Thisisablankpage.]
Preface
Thisbookhasbeenwrittentomeettheneedsoftwodifferentgroupsofreaders.On
onehand,itissuitableforpracticingengineersinindustrywhoneedabetterunder-
standingorapracticalreviewofprobabilityandstatistics.Ontheotherhand,this
bookiseminentlysuitableasatextbookonstatisticsandprobabilityforengineering
students.
Areasofpracticalknowledgebasedonthefundamentalsofprobabilityand
statisticsaredevelopedusingalogicalandunderstandableapproachwhichappealsto
thereadersexperienceandpreviousknowledgeratherthantorigorousmathematical
development.Theonlyprerequisitesforthisbookareagoodknowledgeofalgebra
andafirstcourseincalculus.Thebookincludesmanysolvedproblemsshowing
applicationsinallbranchesofengineering,andthereadershouldpaycloseattention
tothemineachsection.Thebookcanbeusedprofitablyeitherforprivatestudyorin
aclass.
Somematerialinearlierchaptersisneededwhenthereadercomestosomeofthe
latersectionsofthisbook.Chapter1isabriefintroductiontoprobabilityand
statisticsandtheirtreatmentinthiswork.Sections2.1and2.2ofChapter2onBasic
Probabilitypresenttopicsthatprovideafoundationforlaterdevelopment,andsodo
sections3.1and3.2ofChapter3onDescriptiveStatistics.Section4.4,which
discussesrepresentingdataforacontinuousvariableintheformofgroupedfre-
quencytablesandtheirgraphicalequivalents,isusedfrequentlyinlaterchapters.
Mathematicalexpectationandthevarianceofarandomvariableareintroducedin
section5.2.ThenormaldistributionisdiscussedinChapter7andusedextensivelyin
laterdiscussions.ThestandarderrorofthemeanandtheCentralLimitTheoremof
Chapter8areimportanttopicsforlaterchapters.Chapter9developstheveryuseful
ideasofstatisticalinference,andtheseareappliedfurtherintherestofthebook.A
shortstatementofprerequisitesisgivenatthebeginningofeachchapter,andthe
readerisadvisedtomakesurethatheorsheisfamiliarwiththeprerequisitematerial.
Thisbookcontainsmorethanenoughmaterialforaone-semesterorone-quarter
courseforengineeringstudents,soaninstructorcanchoosewhichtopicstoinclude.
Sectionsonuseofthecomputercanbeleftforlaterindividualstudyorclassstudyif
sodesired,butreaderswillfindthesesectionsusingExcelveryuseful.Inmyopinion
acourseonprobabilityandstatisticsforundergraduateengineeringstudentsshould
xi
includeatleastthefollowingtopics:introduction(Chapter1),basicprobability
(sections2.1and2.2),descriptivestatistics(sections3.1and3.2),groupedfrequency
(section4.4),basicsofrandomvariables(sections5.1and5.2),thebinomialdistribu-
tion(section5.3)(notabsolutelyessential),thenormaldistribution(sections7.1,7.2,
7.3),varianceofsamplemeansandtheCentralLimitTheorem(fromChapter8),
statisticalinferencesforthemean(Chapter9),andregressionandcorrelation(from
Chapter14).Anumberofothertopicsareverydesirable,buttheinstructororreader
canchooseamongthem.
Itisapleasuretothankanumberofpeoplewhohavemadecontributionstothis
bookinonewayoranother.Thebookgrewoutofteachingasectionofageneral
engineeringcourseattheUniversityofSaskatchewaninSaskatoon,andmyapproach
wasaffectedbydiscussionswiththeotherinstructors.Manyoftheexamplesandthe
problemsforreaderstosolvewerefirstsuggestedbycolleagues,includingRoy
Billinton,BillStolte,RichardBurton,DonNorum,ErnieBarber,MadanGupta,
GeorgeSofko,DennisOShaughnessy,MoSachdev,JoeMathews,VictorPollak,
A.B.Bhattacharya,andD.R.Budney.DiscussionswithDennisOShaughnessyhave
beenhelpfulinclarifyingmyideasconcerningthepairedt-testandblocking.
Example7.11isbasedonmeasurementsdonebyRichardEvitts.Colleagueswere
verygenerousinreadingandcommentingondraftsofvariouschaptersofthebook;
theseincludeBillStolte,DonNorum,ShehabSokhansanj,andparticularlyRichard
Burton.BillStoltehasprovidedusefulcommentsafterusingpreliminaryversionsof
thebookinclass.KarenBurlocktypedthefirstversionofChapter7.Ithankallof
thesefortheircontributions.Whatevererrorsremaininthebookare,ofcourse,my
ownresponsibility.
Iamgratefultomyeditor,CarolS.Lewis,forallhercontributionsinpreparing
thisbookforpublication.Thankyou,Carol!
W.J.DeCoursey
DepartmentofChemicalEngineering
CollegeofEngineering
UniversityofSaskatchewan
Saskatoon,SK,Canada
S7N5A9
xii
WhatsontheCD-ROM?
IncludedontheaccompanyingCD-ROM:
afullysearchableeBookversionofthetextinAdobepdfform
datasetstoaccompanytheexamplesinthetext
intheExtrasfolder,usefulstatisticalsoftwaretoolsdevelopedbythe
StatisticalEngineeringDivision,NationalInstituteofScienceand
Technology(NIST).Onceagain,youarecautionednottoapplyanytech-
niqueblindlywithoutfirstunderstandingitsassumptions,limitations,and
areaofapplication.
RefertotheRead-MefileontheCD-ROMformoredetailedinformationon
thesefilesandapplications.
xiii
[Thisisablankpage.]
ListofSymbols
A A complementofA

or
A B intersectionofA andB
A B unionofA andB
B |A conditionalprobability
E(X) expectationofrandomvariableX
f(x) probabilitydensityfunction
f frequencyofresultx
i i
i ordernumber
n numberoftrials
C numberofcombinationsofnitemstakenr atatime
n r
P numberofpermutationsofnitemstakenr atatime
n r
p probabilityofsuccessinasingletrial
p estimatedproportion
p(x
i
) probabilityofresultx
i
Pr[...] probabilityofstatedoutcomeorevent
q probabilityofnosuccessinasingletrial
Q(f ) quantilelargerthanafractionfofadistribution
s estimateofstandarddeviationfromasample
s
2
estimateofvariancefromasample
2
s combinedorpooledestimateofvariance
c
2
estimatedvariancearoundaregressionline s
y x
t intervaloftimeorspace.Alsotheindependentvariableofthe
t-distribution.
X (capitalletter) arandomvariable
x (lowercase) aparticularvalueofarandomvariable
x arithmeticmeanormeanofasample
z ratiobetween(x )andforthenormaldistribution
regressioncoefficient
regressioncoefficient
meanrateofoccurrenceperunittimeorspace
meanofapopulation
standarddeviationofpopulation
standarderrorofthemean
x

2
varianceofpopulation
xv
[Thisisablankpage.]
CHAPTER
1
Introduction:
ProbabilityandStatistics
Probabilityandstatisticsareconcernedwitheventswhichoccurbychance.Examples
includeoccurrenceofaccidents,errorsofmeasurements,productionofdefectiveand
nondefectiveitemsfromaproductionline,andvariousgamesofchance,suchas
drawingacardfromawell-mixeddeck,flippingacoin,orthrowingasymmetrical
six-sideddie.Ineachcasewemayhavesomeknowledgeofthelikelihoodofvarious
possibleresults,butwecannotpredictwithanycertaintytheoutcomeofanyparticu-
lartrial.Probabilityandstatisticsareusedthroughoutengineering.Inelectrical
engineering,signalsandnoiseareanalyzedbymeansofprobabilitytheory.Civil,
mechanical,andindustrialengineersusestatisticsandprobabilitytotestandaccount
forvariationsinmaterialsandgoods.Chemicalengineersuseprobabilityandstatis-
ticstoassessexperimentaldataandcontrolandimprovechemicalprocesses.Itis
essentialfortodaysengineertomasterthesetools.
1.1 SomeImportantTerms
(a) Probabilityisanareaofstudywhichinvolvespredictingtherelativelikeli-
hoodofvariousoutcomes.Itisamathematicalareawhichhasdeveloped
overthepastthreeorfourcenturies.Oneoftheearlyuseswastocalculate
theoddsofvariousgamblinggames.Itsusefulnessfordescribingerrorsof
scientificandengineeringmeasurementswassoonrealized.Engineersstudy
probabilityforitsmanypracticaluses,rangingfromqualitycontroland
qualityassurancetocommunicationtheoryinelectricalengineering.Engi-
neeringmeasurementsareoftenanalyzedusingstatistics,asweshallsee
laterinthisbook,andagoodknowledgeofprobabilityisneededinorderto
understandstatistics.
(b) Statisticsisawordwithavarietyofmeanings.Tothemaninthestreetitmost
oftenmeanssimplyacollectionofnumbers,suchasthenumberofpeople
livinginacountryorcity,astockexchangeindex,ortherateofinflation.
Theseallcomeundertheheadingofdescriptivestatistics,inwhichitemsare
countedormeasuredandtheresultsarecombinedinvariouswaystogive
usefulresults.Thattypeofstatisticscertainlyhasitsusesinengineering,and
1
Chapter1
wewilldealwithitlater,butanothertypeofstatisticswillengageour
attentioninthisbooktoamuchgreaterextent.Thatisinferentialstatisticsor
statisticalinference.Forexample,itisoftennotpracticaltomeasureallthe
itemsproducedbyaprocess.Instead,weveryfrequentlytakeasampleand
measuretherelevantquantityoneachmemberofthesample.Weinfer
somethingaboutalltheitemsofinterestfromourknowledgeofthesample.
Aparticularcharacteristicofalltheitemsweareinterestedinconstitutesa
population.Measurementsofthediameterofallpossibleboltsastheycome
offaproductionprocesswouldmakeupaparticularpopulation.Asampleis
achosenpartofthepopulationinquestion,saythemeasureddiametersof
twelveboltschosentoberepresentativeofalltheboltsmadeundercertain
conditions.Weneedtoknowhowreliableistheinformationinferredabout
thepopulationonthebasisofourmeasurementsofthesample.Perhapswe
cansaythatnineteentimesoutoftwentytheerrorwillbelessthana
certainstatedlimit.
(c)Chanceisanecessarypartofanyprocesstobedescribedbyprobability
orstatistics.Sometimesthatelementofchanceisduepartlyorevenperhaps
entirelytoourlackofknowledgeofthedetailsoftheprocess.Forexample,
ifwehadcompleteknowledgeofthecompositionofeverypartoftheraw
materialsusedtomakebolts,andofthephysicalprocessesandconditionsin
theirmanufacture,inprinciplewecouldpredictthediameterofeachbolt.
Butinpracticewegenerallylackthatcompleteknowledge,sothediameter
ofthenextbolttobeproducedisanunknownquantitydescribedbya
randomvariation.Undertheseconditionsthedistributionofdiameterscanbe
describedbyprobabilityandstatistics.Ifwewanttoimprovethequalityof
thoseboltsandtomakethemmoreuniform,wewillhavetolookintothe
causesofthevariationandmakechangesintherawmaterialsortheproduc-
tionprocess.Butevenafterthat,therewillverylikelybearandomvariation
indiameterthatcanbedescribedstatistically.
Relationswhichinvolvechancearecalledprobabilisticorstochasticrela-
tions.Thesearecontrastedwithdeterministicrelations,inwhichthereisno
elementofchance.Forexample,OhmsLawandNewtonsSecondLaw
involvenoelementofchance,sotheyaredeterministic.However,measure-
mentsbasedoneitheroftheselawsdoinvolveelementsofchance,so
relationsbetweenthemeasuredquantitiesareprobabilistic.
(d)Anothertermwhichrequiressomediscussionisrandomness.Arandom
actioncannotbepredictedandsoisduetochance.Arandomsampleisone
inwhicheverymemberofthepopulationhasanequallikelihoodofappear-
ing.Justwhichitemsappearinthesampleisdeterminedcompletelyby
chance.Ifsomeitemsaremorelikelytoappearinthesamplethanothers,
thenthesampleisnotrandom.
2
Introduction:ProbabilityandStatistics
1.2 Whatdoesthisbookcontain?
Wewillstartwiththebasicsofprobabilityandthencoverdescriptivestatistics.Then
variousprobabilitydistributionswillbeinvestigated.Thesecondhalfofthebook
willbeconcernedmostlywithstatisticalinference,includingrelationsbetweentwo
ormorevariables,andtherewillbeintroductorychaptersondesignandanalysisof
experiments.Solvedproblemexamplesandproblemsforthereadertosolvewillbe
importantthroughoutthebook.Thegreatmajorityoftheproblemsaredirectly
appliedtoengineering,involvingmanydifferentbranchesofengineering.Theyshow
howstatisticsandprobabilitycanbeappliedbyprofessionalengineers.
Somebooksonprobabilityandstatisticsuserigorousdefinitionsandmanyderiva-
tions.Experienceofteachingprobabilityandstatisticstoengineeringstudentshasled
thewriterofthisbooktotheopinionthatarigorousapproachisnotthebestplan.
Therefore,thisbookapproachesprobabilityandstatisticswithoutgreatmathematical
rigor.Eachnewconceptisdescribedclearlybutbrieflyinanintroductorysection.Ina
numberofcasesanewconceptcanbemademoreunderstandablebyrelatingitto
previoustopics.Thenthefocusshiftstoexamples.Thereaderispresentedwithcare-
fullychosenexamplestodeepenhisorherunderstanding,bothofthebasicideasand
ofhowtheyareused.Inafewcasesmathematicalderivationsarepresented.Thisis
donewhere,intheopinionoftheauthor,thederivationshelpthereadertounderstand
theconceptsortheirlimitsofusefulness.Insomeothercasesrelationshipsareverified
bynumericalexamples.Instillotherstherearenoderivationsorverifications,butthe
readersconfidenceisbuiltbycomparisonswithotherrelationshipsorwitheveryday
experience.Theaimofthisbookistohelpdevelopinthereadersmindaclearunder-
standingoftheideasofprobabilityandstatisticsandofthewaysinwhichtheyare
usedinpractice.Thereadermustkeeptheassumptionsofeachcalculationclearlyin
mindasheorsheworksthroughtheproblems.Asinmanyotherareasofengineering,
itisessentialforthereadertodomanyproblemsandtounderstandthemthoroughly.
Thisbookincludesanumberofcomputerexamplesandcomputerexercises
whichcanbedoneusingMicrosoftExcel.Computerexercisesareincludedbe-
causestatisticalcalculationsfromexperimentaldatausuallyrequiremanyrepetitive
calculations.Thedigitalcomputeriswellsuitedtothissituation.Thereforeabook
onprobabilityandstatisticswouldbeincompletenowadaysifitdidnotinclude
exercisestobedoneusingacomputer.Theuseofcomputersforstatisticalcalcula-
tionsisintroducedinsections3.4and4.5.
Thereisadanger,however,thatthereadermayobtainonlyanincomplete
understandingofprobabilityandstatisticsifthefundamentalsareneglectedinfavor
ofextensivecomputerexercises.Thereadershouldcertainlyperformseveralofthe
morebasicproblemsineachsectionbeforedoingtheoneswhicharemarkedas
computerproblems.Ofcourse,eventhemorebasicproblemscanbeperformedusing
aspreadsheetratherthanapocketcalculator,andthatisoftendesirable.Evenifa
spreadsheetisused,someofthesimplerproblemswhichdonotrequirerepetitive
3
Chapter1
calculationsshouldbedonefirst.Thecomputerproblemsareintendedtohelpthe
readerapplythefundamentalideasinconjunctionwiththecomputer:theyarenot
black-boxproblemsforwhichthecomputer(reallythatmeanstheoriginalpro-
grammer)doesthethinking.Thestrongadviceofmanygenerationsofengineering
instructorsapplieshere:alwaysshowyourwork!
MicrosoftExcelhasbeenchosenasthesoftwaretobeusedwiththisbookfortwo
reasons.First,Excelisusedasageneralspreadsheetbymanyengineersandengi-
neeringstudents.Thus,manyreadersofthisbookwillalreadybefamiliarwithExcel,
soverylittlefurthertimewillberequiredforthemtolearntoapplyExceltoprob-
abilityandstatistics.Ontheotherhand,thereaderwhoisnotalreadyfamiliarwith
Excelwillfindthatthemodestinvestmentoftimerequiredtobecomereasonably
adeptatExcelwillpaydividendsinotherareasofengineering.Excelisavery
usefultool.
ThesecondreasonforchoosingtouseExcelinthisbookisthatcurrentversions
ofExcelincludeagoodnumberofspecialfunctionsforprobabilityandstatistics.
Version4.0andlaterversionsgiveatleastfiftyfunctionsintheStatisticalcategory,
andwewillfindmanyofthemusefulinconnectionwiththisbook.Someofthese
functionsgiveprobabilitiesforvarioussituations,whileothershelptosummarize
massesofdata,andstillotherstaketheplaceofstatisticaltables.Thereaderis
warned,however,thatsomeofthesespecialfunctionsfallinthecategoryofblack-
boxsolutionsandsoarenotusefuluntilthereaderunderstandsthefundamentals
thoroughly.
AlthoughthevariousversionsofExcelallcontaintoolsforperformingcalcula-
tionsforprobabilityandstatistics,someofthedetailedprocedureshavebeen
modifiedfromoneversiontothenext.Thedetailedproceduresinthisbookare
generallycompatiblewithExcel2000.Thus,ifareaderisusingadifferentversion,
somemodificationswilllikelybeneeded.However,thosemodificationswillnot
usuallybeverydifficult.
SomesectionsofthebookhavebeenlabelledasExtensions.Theseareverybrief
sectionswhichintroducerelatedtopicsnotcoveredindetailinthepresentvolume.For
example,thebinomialdistributionofsection5.3iscoveredindetail,butsubsection
5.3(i)isabriefextensiontothemultinomialdistribution.
Thebookincludesalargenumberofengineeringapplicationsamongthesolved
problemsandproblemsforthereadertosolve.Thus,Chapter5containsapplications
ofthebinomialdistributiontosomesamplingschemesforqualitycontrol,and
Chapters7and9containapplicationsofthenormaldistributiontosuchcontinuous
variablesasburningtimeforelectriclampsbeforefailure,strengthofsteelbars,and
pHofsolutionsinchemicalprocesses.Chapter14includesexamplestouchingonthe
relationshipbetweentheshearresistanceofsoilsandnormalstress.
4
Introduction:ProbabilityandStatistics
Thegeneralplanofthebookisasfollows.Wewillstartwiththebasicsof
probabilityandthendescriptivestatistics.Thenvariousprobabilitydistributionswill
beinvestigated.Thesecondhalfofthebookwillbeconcernedmostlywithstatistical
inference,includingrelationsbetweentwoormorevariables,andtherewillbe
introductorychaptersondesignandanalysisofexperiments.Solvedproblemex-
amplesandproblemsforthereadertosolvewillbeimportantthroughoutthebook.
Apreliminaryversionofthisbookappearedin1997andhasbeenusedin
second-andthird-yearcoursesforstudentsinseveralbranchesofengineeringatthe
UniversityofSaskatchewanforfiveyears.Somerevisionsandcorrectionsweremade
eachyearinthelightofcommentsfrominstructorsandtheresultsofaquestionnaire
forstudents.Morecompleterevisionsofthetext,includingupgradingthereferences
forExceltoExcel2000,wereperformedin2000-2001and2002.
5
CHAPTER
2
BasicProbability
Prerequisite:Agoodknowledgeofalgebra.
Inthischapterweexaminethebasicideasandapproachestoprobabilityandits
calculation.Welookatcalculatingtheprobabilitiesofcombinedevents.Undersome
circumstancesprobabilitiescanbefoundbyusingcountingtheoryinvolvingpermu-
tationsandcombinations.Thesameideascanbeappliedtosomewhatmorecomplex
situations,someofwhichwillbeexaminedinthischapter.
2.1 FundamentalConcepts
(a) Probabilityasaspecifictermisameasureofthelikelihoodthataparticular
eventwilloccur.Justhowlikelyisitthattheoutcomeofatrialwillmeeta
particularrequirement?Ifwearecertainthataneventwilloccur,itsprobability
is1or100%.Ifitcertainlywillnotoccur,itsprobabilityiszero.Thefirst
situationcorrespondstoaneventwhichoccursineverytrial,whereasthesecond
correspondstoaneventwhichneveroccurs.Atthispointwemightbetemptedto
saythatprobabilityisgivenbyrelativefrequency,thefractionofallthetrialsina
particularexperimentthatgiveanoutcomemeetingthestatedrequirements.But
ingeneralthatwouldnotberight.Why?Becausetheoutcomeofeachtrialis
determinedbychance.Saywetossafaircoin,onewhichisjustaslikelytogive
headsastails.Itisentirelypossiblethatsixtossesofthecoinwouldgivesix
headsorsixtails,oranythinginbetween,sotherelativefrequencyofheads
wouldvaryfromzerotoone.Ifitisjustaslikelythataneventwilloccurasthat
itwillnotoccur,itstrueprobabilityis0.5or50%.Buttheexperimentmight
wellresultinrelativefrequenciesallthewayfromzerotoone.Thentherelative
frequencyfromasmallnumberoftrialsgivesaveryunreliableindicationof
probability.Insection5.3wewillseehowtomakemorequantitativecalcula-
tionsconcerningtheprobabilitiesofvariousoutcomeswhencoinsaretossed
randomlyorsimilartrialsaremade.Ifwewereabletomakeaninfinitenumber
oftrials,thenprobabilitywouldindeedbegivenbytherelativefrequencyofthe
event.
6
BasicProbability
Asanillustration,supposetheweathermanonTVsaysthatforaparticular
regiontheprobabilityofprecipitationtomorrowis40%.Letusconsider100
dayswhichhavethesamesetofrelevantconditionsasprevailedatthetimeof
theforecast.Accordingtotheprediction,precipitationthenextdaywouldoccur
atanypointintheregioninabout40ofthe100trials.(Thisiswhattheweather
manpredicts,butweallknowthattheweathermanisnotalwaysright!)
(b) Althoughwecannotmakeaninfinitenumberoftrials,inpracticewecanmakea
moderatenumberoftrials,andthatwillgivesomeusefulinformation.The
relativefrequencyofaparticularevent,ortheproportionoftrialsgivingout-
comeswhichmeetcertainrequirements,willgiveanestimateoftheprobability
ofthatevent.Thelargerthenumberoftrials,themorereliablethatestimatewill
be.Thisistheempiricalorfrequencyapproachtoprobability.(Rememberthat
empiricalmeansbasedonobservationorexperience.)
Example2.1
260boltsareexaminedastheyareproduced.Fiveofthemarefoundtobedefective.
Onthebasisofthisinformation,estimatetheprobabilitythataboltwillbedefective.
Answer:Theprobabilityofadefectiveboltisapproximatelyequaltotherelative
frequency,whichis5/260=0.019.
(c) Anothertypeofprobabilityisthesubjectiveestimate,basedonapersons
experience.Toillustratethis,sayageologicalengineerexaminesextensive
geologicalinformationonaparticularproperty.Hechoosesthebestsitetodrill
anoilwell,andhestatesthatonthebasisofhispreviousexperienceheestimates
thattheprobabilitythewellwillbesuccessfulis30%.(Anotherexperienced
geologicalengineerusingthesameinformationmightwellcometoadifferent
estimate.)This,then,isasubjectiveestimateofprobability.Theexecutivesofthe
companycanusethisestimatetodecidewhethertodrillthewell.
(d) Athirdapproachispossibleincertaincases.Thisincludesvariousgambling
games,suchastossinganunbiasedcoin;drawingacoloredballfromanumber
ofballs,identicalexceptforcolor,whichareputintoabagandthoroughly
mixed;throwinganunbiaseddie;ordrawingacardfromawell-shuffleddeckof
cards.Ineachofthesecaseswecansaybeforethetrialthatanumberofpossible
resultsareequallylikely.Thisistheclassicaloraprioriapproach.Thephrase
aprioricomesfromLatinwordsmeaningcomingfromwhatwasknown
before.Thisapproachisoftensimpletovisualize,sogivingabetterunderstand-
ingofprobability.Insomecasesitcanbeapplieddirectlyinengineering.
7
Chapter2
Example2.2
Threenutswithmetricthreadshavebeenaccidentallymixedwithtwelvenutswith
U.S.threads.Toapersontakingnutsfromabucket,allfifteennutsseemtobethe
same.Onenutischosenrandomly.Whatistheprobabilitythatitwillbemetric?
Answer:Therearefifteenwaysofchoosingonenut,andtheyareequallylikely.
Threeoftheseequallylikelyoutcomesgiveametricnut.Thentheprobabilityof
choosingametricnutmustbe3/15,or20%.
Example2.3
Twofaircoinsaretossed.Whatistheprobabilityofgettingoneheadsandonetails?
Answer:Forafairorunbiasedcoin,foreachtossofeachcoin
1
Pr[heads]=Pr[tails]=
2
Thisassumesthatallotherpossibilitiesareexcluded:ifacoinislostthattosswillbe
eliminated.Thepossibilitythatacoinwillstandonedgeaftertossingcanbeneglected.
Therearetwopossibleresultsoftossingthefirstcoin.Theseareheads(H)and
tails(T),andtheyareequallylikely.Whethertheresultoftossingthefirstcoinis
headsortails,therearetwopossibleresultsoftossingthesecondcoin.Again,these
areheads(H)andtails(T),andtheyareequallylikely.Thepossibleoutcomesof
tossingthetwocoinsareHH,HT,TH,andTT.SincetheresultsHandTforthefirst
coinareequallylikely,andtheresultsHandTforthesecondcoinareequallylikely,
thefouroutcomesoftossingthetwocoinsmustbeequallylikely.Theserelation-
shipsareconvenientlysummarizedinthefollowingtreediagram,Figure2.1,in
whicheachbranchpoint(ornode)representsapointofdecisionwheretwoormore
resultsarepossible.
Outcome
H
H
/2
T HT
H
T
T TT
HH
Pr[H]=1/2
Pr[H]=1
Pr[T]=1/2
TH
Pr[H]=1/2
Pr[T]=1/2
Pr[T]=1/2
Figure2.1:
SimpleTreeDiagram
FirstCoin SecondCoin
8
BasicProbability
Sincetherearefourequallylikelyoutcomes,theprobabilityofeachis
1
.Both
4
HTandTHcorrespondtogettingoneheadsandonetails,sotwoofthefourequally
likelyoutcomesgivethisresult.Thentheprobabilityofgettingoneheadsandone
2 1
tailsmustbe

or0.5.
4 2
Inthestudyofprobabilityaneventisasetofpossibleoutcomeswhichmeets
statedrequirements.Ifasix-sidedcube(calledadie)istossed,wedefinetheout-
comeasthenumberofdotsonthefacewhichisupwardwhenthediecomestorest.
Thepossibleoutcomesare1,2,3,4,5,and6.Wemightcalleachoftheseoutcomesa
separateeventforexample,thenumberofdotsontheupturnedfaceis5.Onthe
otherhand,wemightchooseaneventasthoseoutcomeswhichareeven,orthose
evenlydivisiblebythree.InExample2.3theeventofinterestisgettingoneheads
andonetailsfromthetossoftwofaircoins.
(e) Rememberthattheprobabilityofaneventwhichiscertainis1,andtheprobabil-
ityofanimpossibleeventis0.Thennoprobabilitycanbemorethan1orless
than0.Ifwecalculateaprobabilityandobtainaresultlessthan0orgreaterthan
1,weknowwemusthavemadeamistake.Ifwecanwritedownprobabilitiesfor
allpossibleresults,thesumofalltheseprobabilitiesmustbe1,andthisshould
beusedasacheckwheneverpossible.
Sometimessomebasicrequirementsforprobabilityarecalledtheaxiomsof
probability.Thesearethataprobabilitymustbebetween0and1,andthesimple
additionrulewhichwewillseeinpart(a)ofsection2.2.1.Theseaxiomsare
thenusedtoderivetheoreticalrelationsforprobability.
(f) Analternativequantity,whichgivesthesameinformationastheprobability,is
calledthefairodds.Thisoriginatedinbettingongamblinggames.Ifthegameis
tobefair(inthesensethatnoplayerhasanyadvantageinthelongrun),each
playershouldexpectthatheorshewillneitherwinnorloseanymoneyifthe
gamecontinuesforaverylargenumberoftrials.Theniftheprobabilitiesof
variousoutcomesarenotequal,theamountsbetonthemshouldcompensate.
Thefairoddsinfavorofaresultrepresenttheratiooftheamountwhichshould
bebetagainstthatparticularresulttotheamountwhichshouldbebetforthat
result,inordertogivefairnessasdescribedabove.Saytheprobabilityofsuccess
inaparticularsituationis3/5,sotheprobabilityoffailureis13/5=2/5.Then
tomakethegamefair,foreverytwodollarsbetonsuccess,threedollarsshould
bebetagainstit.Thenwesaythattheoddsinfavorofsuccessare3to2,andthe
oddsagainstsuccessare2to3.Toreasonintheotherdirection,takeanother
exampleinwhichthefairoddsinfavorofsuccessare4to3,sothefairodds
againstsuccessare3to4.Then
4 4
Pr[success]=

=0.571.
4+ 3 7
9
Chapter2
Ingeneral,ifPr[success]=p,Pr[failure]=1p,thenthefairoddsinfavorof
p 1 p
successare to1,andthefairoddsagainstsuccessare to1.Theseare
1 p p
therelationswhichweusetorelateprobabilitiestothefairodds.
NoteforCalculation:Howmanyfigures?
Howmanyfiguresshouldbequotedintheanswertoaproblem?That
dependsonhowprecisetheinitialdatawereandhowprecisethemethodof
calculationis,aswellashowtheresultswillbeusedsubsequently.Itisimpor-
tanttoquoteenoughfiguressothatnousefulinformationislost.Ontheother
hand,quotingtoomanyfigureswillgiveafalseimpressionoftheprecision,and
thereisnopointinquotingdigitswhichdonotprovideusefulinformation.
Calculationsinvolvingprobabilityusuallyarenotveryprecise:thereare
oftenapproximations.Inthisbookprobabilitiesasanswersshouldbegivento
notmorethanthreesignificantfiguresi.e.,threefiguresotherthanazero
thatindicatesoremphasizesthelocationofadecimalpoint.Thus,0.019
containstwosignificantfigures,while0.571containsthreesignificant
figures.Insomecases,asinExample2.1,fewerfiguresshouldbequoted
becauseofimpreciseinitialdataorapproximationsinherentinthecalculation.
Itisimportantnottoroundofffiguresbeforethefinalcalculation.That
wouldintroduceextraerrorunnecessarily.Carrymorefiguresinintermediate
calculations,andthenattheendreducethenumberoffiguresintheanswerto
areasonablenumber.
Problems
1. Abagcontains6redballs,5yellowballsand3greenballs.Aballisdrawnat
random.Whatistheprobabilitythattheballis:(a)green,(b)notyellow,(c)red
oryellow?
2. Apilotplanthasproducedmetallurgicalbatcheswhicharesummarizedas
follows:
Lowstrength Highstrength
Lowinimpurities 2 27
Highinimpurities 12 4
Iftheseresultsarerepresentativeoffull-scaleproduction,findestimated
probabilitiesthataproductionbatchwillbe:
i) lowinimpurities
ii) highstrength
iii)bothhighinimpuritiesandhighstrength
iv)bothhighinimpuritiesandlowstrength
10
BasicProbability
3. Ifthenumbersofdotsontheupwardfacesoftwostandardsix-sideddicegive
thescoreforthatthrow,whatistheprobabilityofmakingascoreof7inone
throwofapairoffairdice?
4. Ineachofthefollowingcasesdetermineadecimalvaluefortheprobabilityof
theevent:
a) thefairoddsagainstasuccessfuloilwellare10-to-1.
b) thefairoddsthatabidwillsucceedare1-to-6.
5. TwonutshavingU.S.coarsethreadsandthreenutshavingU.S.finethreadsare
mixedaccidentallywithfournutshavingmetricthreads.Thenutsareotherwise
identical.Anutischosenatrandom.
a) WhatistheprobabilityithasU.S.coarsethreads?
b) Whatistheprobabilitythatitsthreadsarenotmetric?
c) IfthefirstnuthasU.S.coarsethreads,whatistheprobabilitythatasecond
nutchosenatrandomhasmetricthreads?
d) Ifyouarerepairingacarengineandaccidentallyreplaceonetypeofnutwith
anotherwhenyouputtheenginebacktogether,verybriefly,whatmaybethe
consequences?
6. (a) Howmanydifferentpositivethree-digitwholenumberscanbeformedfrom
thefourdigits2,6,7,and9ifanydigitcanberepeated?
(b) Howmanydifferentpositivewholenumberslessthan1000canbeformed
from2,6,7,9ifanydigitcanberepeated?
(c) Howmanynumbersinpart(b)arelessthan680(i.e.upto679)?
(d) Whatistheprobabilitythatapositivewholenumberlessthan1000,chosen
atrandomfrom2,6,7,9andallowinganydigittoberepeated,willbeless
than680?
7. Answerquestion7againforthecasewherethedigits2,6,7,9cannotberepeated.
8. Foreachofthefollowing,determine(i)theprobabilityofeachevent,(ii)thefair
oddsagainsteachevent,and(iii)thefairoddsinfavourofeachevent:
(a) afiveappearsinthetossofafairsix-sideddie.
(b) aredjackappearsindrawofasinglecardfromawell-shuffled52-card
bridgedeck.
2.2 BasicRulesofCombiningProbabilities
Thebasicrulesorlawsofcombiningprobabilitiesmustbeconsistentwiththe
fundamentalconcepts.
2.2.1 AdditionRule
Thiscanbedividedintotwoparts,dependinguponwhetherthereisoverlapbetween
theeventsbeingcombined.
(a) Iftheeventsaremutuallyexclusive,thereisnooverlap:ifoneeventoccurs,
othereventscannotoccur.Inthatcasetheprobabilityofoccurrenceofone
11
Chapter2
oranotherofmorethanoneeventisthesumoftheprobabilitiesofthe
separateevents.Forexample,ifIthrowafairsix-sideddietheprobability
ofanyonefacecomingupisthesameastheprobabilityofanyotherface,
orone-sixth.Thereisnooverlapamongthesesixpossibilities.ThenPr[6]=
1 1 1
1/6,Pr[4]=1/6,soPr[6or4]is + .This,then,istheprobability
6 6 3
ofobtainingasixorafouronthrowingonedie.Noticethatitisconsistent
withtheclassicalapproachtoprobability:ofsixequallylikelyresults,two
givetheresultwhichwasspecified.TheAdditionRulecorrespondstoa
logicalorandgivesasumofseparateprobabilities.
Oftenwecandivideallpossibleoutcomesintotwogroupswithoutoverlap.If
onegroupofoutcomesiseventA,theothergroupiscalledthecomplementofAand
iswritten Aor A .SinceAand A togetherincludeallpossibleresults,thesumof
Pr[A]andPr[ A]mustbe1.IfPr[ A]ismoreeasilycalculatedthanPr[A],thebest
approachtocalculatingPr[A]maybebyfirstcalculatingPr[ A].
Example2.4
Asampleoffourelectroniccomponentsistakenfromtheoutputofaproduction
line.Theprobabilitiesofthevariousoutcomesarecalculatedtobe:Pr[0defectives]
=0.6561,Pr[1defective]=0.2916,Pr[2defectives]=0.0486,Pr[3defectives]=
0.0036,Pr[4defectives]=0.0001.Whatistheprobabilityofatleastonedefective?
Answer:Itwouldbeperfectlycorrecttocalculateasfollows:
Pr[atleastonedefective]=Pr[1defective]+Pr[2defectives]+
Pr[3defectives]+Pr[4defectives]
=0.2916+0.0486+0.0036+0.0001=0.3439.
butitiseasiertocalculateinstead:
Pr[atleastonedefective]=1Pr[0defectives]
=10.6561
=0.3439or0.344.
(b) Iftheeventsarenotmutuallyexclusive,therecanbeoverlapbetweenthem.
ThiscanbevisualizedusingaVenndiagram.Theprobabilityofoverlap
mustbesubtractedfromthesumofprobabilitiesoftheseparateevents(i.e.,
wemustnotcountthesameareaontheVennDiagramtwice).
ThecirclemarkedArepresentstheprobability
(orfrequency)ofeventA,thecirclemarkedB
representstheprobability(orfrequency)ofeventB,
andthewholerectanglerepresentsallpossibilities,
soaprobabilityofoneorthetotalfrequency.Theset
consistingofallpossibleoutcomesofaparticular
experimentiscalledthesamplespaceofthatexperi-
ment.Thus,therectangleontheVenndiagram Figure2.2:VennDiagram
AB
A B
12
BasicProbability
correspondstothesamplespace.Anevent,suchasAorB,isanysubsetofasample
space.Insolvingaproblemwemustbeveryclearjustwhattotalgroupofeventswe
areconcernedwiththatis,justwhatistherelevantsamplespace.
Setnotationisuseful:
Pr [AB)=Pr[occurrenceofAorBorboth],theunionofthetwoevents
AandB.
Pr [AB)=Pr[occurrenceofbothAandB],theintersectionofevents
AandB.
TheninFigure2.2,theintersectionABrepresentstheoverlapbetweenevents
AandB.
Figure2.3showsVenndiagramsrepresentingintersection,union,andcomple-
ment.Thecross-hatchedareaofFigure2.3(a)representseventA.Thecross-hatched
areaonFigure2.3(b)showstheintersectionofeventsAandB.Theunionofevents
AandBisshownonpart(c)ofthediagram.Thecross-hatchedareaofpart(d)
representsthecomplementofeventA.
A
A
B
(a)EventA (b)Intersection
A
B
A
A'
(c)Union (d)Complement
Figure2.3:SetRelationsonVennDiagrams
Iftheeventsbeingconsideredarenotmutuallyexclusive,andsotheremaybe
overlapbetweenthem,theAdditionRulebecomes
Pr[A B)=Pr[A]+Pr[B]Pr[A B] (2.1)
Inwords,theprobabilityofAorBorbothisthesumoftheprobabilitiesofAandof
B,lesstheprobabilityoftheoverlapbetweenAandB.Theoverlapistheintersec-
tionbetweenAandB.
13
Chapter2
Example2.5
Ifonecardisdrawnfromawell-shuffledbridgedeckof52playingcards(13ofeach
suit),whatistheprobabilitythatthecardisaqueenoraheart?Noticethatacardcan
bebothaqueenandaheart.Thenaqueenofhearts(orqueen heart)overlapsthe
twocategories.
Answer: Pr[queen]=4/52.
Pr[heart]=13/52.
Pr[queen heart]=1/52.
ThesequantitiesareshownontheVenndiagramofFigure2.4:
heart
intersection
queen
Figure2.4:
VennDiagramforQueenofHearts
oroverlap
ThenPr[queen heart]=Pr[queen]+Pr[heart]Pr[queen heart]
4 13 1 16
+
52 52 52 52
Thesimpleadditionlaw,sometimesequation2.1,andthedefinitionsofintersec-
tionsandunionscanbeusedwithVenndiagramstosolveproblemsinvolvingthree
eventswithbothsingleanddoubleoverlaps.Thisusuallyrequiresustoapplysome
formoftheadditionlawseveraltimes.Oftenanappropriateapproachistofindthe
frequencyorprobabilitycorrespondingtoaseriesofsimpleareasonthediagram,
eachonerepresentingeitherapartofonlyoneeventwithoutoverlap(suchas
A B C)oronlyaclearlydefinedoverlap(suchas A B C).
Example2.6
Theclassregistrationsof120studentsareanalyzed.Itisfoundthat:
30ofthestudentsdonottakeanyofAppliedMechanics,Chemistry,
orComputers.
15ofthemtakeonlyAppliedMechanics.
25ofthemtakeChemistryandComputersbutnotAppliedMechanics.
20ofthemtakeAppliedMechanicsandComputersbutnotChemistry.
14
BasicProbability
10ofthemtakeallthreeofAppliedMechanics,Chemistry,andComputers.
Atotalof45ofthemtakeChemistry.
5ofthemtakeonlyChemistry.
a) HowmanyofthestudentstakeAppliedMechanicsandChemistrybutnot
Computers?
b) HowmanyofthestudentstakeonlyComputers?
c) WhatisthetotalnumberofstudentstakingComputers?
d) IfastudentischosenatrandomfromthosewhotakeneitherChemistrynor
Computers,whatistheprobabilitythatheorshedoesnottakeApplied
Mechanicseither?
e) Ifoneofthestudentswhotakeatleasttwoofthethreecoursesischosenat
random,whatistheprobabilitythatheorshetakesallthreecourses?
Answer:LetsabbreviatethecoursesasAM,Chem,andComp.
Thenumberofitemsinthesamplespace,whichisthetotalnumberofitems
underconsideration,isoftenmarkedjustabovetheupperright-handcornerof
therectangle.Inthisexamplethatnumberis120.ThentheVenndiagramincor-
poratingthegiveninformationforthisproblemisshownbelow.Twoofthe
simpleareasonthediagramcorrespondtounknownnumbers.Oneoftheseis
(AM Chem Comp),whichistakenbyxstudents.Theotheris
(AMChem Comp),soonlyComputersbutnottheothercourses,andthatis
takenbyystudents.
IntermsofquantitiescorrespondingtosimpleareasontheVenndiagram,thegiven
informationthatatotalof45ofthestudentstakeChemistryrequiresthat
x+10+25+5=45
Thenx=5.
Figure2.5:
VennDiagramforClass
Registrations
AM
Chem
Comp
15
5
20
10
25
30
x
y
15
120
Chapter2
Letn(...)bethenumberofstudentswhotakeaspecifiedcourseorcombination
ofcourses.Thenfromthetotalnumberofstudentsandthenumberwhodonottake
anyofthethreecourseswehave
n(AM Chem Comp)=12030=90
ButfromtheVenndiagramandtheknowledgeofthetotaltakingChemistrywehave
n(AM Chem Comp)=n(Chem)+n(AM Chem Comp)+n(AM Chem Comp)
+n(AM
Chem Comp)
=45+15+20+y
=80+y
Theny=9080=10.
Nowwecananswerthespecificquestions.
a) ThenumberofstudentswhotakeAppliedMechanicsandChemistrybutnot
Computersis5.
b) ThenumberofstudentswhotakeonlyComputersis10.
c) ThetotalnumberofstudentstakingComputersis10+20+10+25=65.
d) ThenumberofstudentstakingneitherChemistrynorComputersis15+30
=45.Ofthese,thenumberwhodonottakeAppliedMechanicsis30.Then
ifastudentischosenrandomlyfromthosewhotakeneitherChemistrynor
Computers,theprobabilitythatheorshedoesnottakeAppliedMechanics
eitheris
30
45

2
3
.
e) Thenumberofstudentswhotakeatleasttwoofthethreecoursesis
n(AM Chem Comp)+n(AM Chem Comp)+n(AM Chem Comp)+
n(AM Chem Comp)
=5+20+25+10
=60
Ofthese,thenumberwhotakeallthreecoursesis10.Ifastudenttakingatleasttwo
coursesischosenrandomly,theprobabilitythatheorshetakesallthree
10 1
coursesis
60

6
.
2.2.2MultiplicationRule
(a) Thebasicideaforcalculatingthenumberofchoicescanbedescribedas
follows:Saytherearen
1
possibleresultsfromoneoperation.Foreachoneof
these,therearen
2
possibleresultsfromasecondoperation.Thenthereare(n
1
n
2
)possibleoutcomesofthetwooperationstogether.Ingeneral,the
numbersofpossibleresultsaregivenbyproductsofthenumberofchoicesat
eachstep.Probabilitiescanbefoundbytakingratiosofpossibleresults.
16
BasicProbability
Example2.7
Inonecaseabyteisdefinedasasequenceof8bits.Eachbitcanbeeitherzeroor
one.Howmanydifferentbytesarepossible?
Answer:Wehave2choicesforeachbitandasequenceof8bits.Thenthenumberof
possibleresultsis(2)
8
=256.
(b) ThesimplestformoftheMultiplicationRuleforprobabilitiesisasfollows:Ifthe
eventsareindependent,thentheoccurrenceofoneeventdoesnotaffectthe
probabilityofoccurrenceofanotherevent.Inthatcasetheprobabilityofoccur-
renceofmorethanoneeventtogetheristheproductoftheprobabilitiesofthe
separateevents.(Thisisconsistentwiththebasicideaofcountingstatedabove.)
IfAandBaretwoseparateeventsthatareindependentofoneanother,the
probabilityofoccurrenceofbothAandBtogetherisgivenby:
Pr[A B]=Pr[A]Pr[B] (2.2)
Example2.8
Ifaplayerthrowstwofairdice,theprobabilityofadoubleone(oneonthefirstdie
andoneontheseconddie)is(1/6)(1/6)=1/36.Theseeventsareindependentbecause
theresultfromonediehasnoeffectatallontheresultfromtheotherdie.(Notethat
dieisthesingularword,anddiceisplural.)
(c) Iftheeventsarenotindependent,oneeventaffectstheprobabilityfortheother
event.Inthiscaseconditionalprobabilitymustbeused.Theconditionalprobabil-
ityofBgiventhatAoccurs,oronconditionthatAoccurs,iswrittenPr[B|A].
ThisisreadastheprobabilityofBgivenA,ortheprobabilityofBoncondition
thatAoccurs.Conditionalprobabilitycanbefoundbyconsideringonlythose
eventswhichmeetthecondition,whichinthiscaseisthatAoccurs.Among
theseevents,theprobabilitythatBoccursisgivenbytheconditionalprobability,
Pr[B|A].InthereducedsamplespaceconsistingofoutcomesforwhichA
occurs,theprobabilityofeventBisPr[B|A].Theprobabilitiescalculatedin
parts(d)and(e)ofExample2.6wereconditionalprobabilities.
ThemultiplicationrulefortheoccurrenceofbothAandBtogetherwhenthey
arenotindependentistheproductoftheprobabilityofoneeventandtheconditional
probabilityoftheother:
Pr[A B]=Pr[A]Pr[B|A]=Pr[B]Pr[A|B] (2.3)
Thisimpliesthatconditionalprobabilitycanbeobtainedby
Pr
[
A B
]

Pr[B|A]=
Pr
[ ]
(2.4)
A
or
Pr
[
A B
]

Pr[A|B]=
Pr
[ ]
(2.5)
B
Theserelationsareoftenveryuseful.
17
Chapter2
Example2.9
Fourofthelightbulbsinaboxoftenbulbsareburntoutorotherwisedefective.If
twobulbsareselectedatrandomwithoutreplacementandtested,(i)whatisthe
probabilitythatexactlyonedefectivebulbisfound?(ii)Whatistheprobabilitythat
exactlytwodefectivebulbsarefound?
Answer: Atreediagramisveryusefulinproblemsinvolvingthemultiplicationrule.
LetususethesymbolsD
1
foradefectivefirstbulb,D
2
foradefectivesecondbulb,
G
1
foragoodfirstbulb,andG
2
foragoodsecondbulb.
D
1
Atthebeginningtheboxcontainsfourbulbswhich
Pr[D
1
]=4/10
aredefectiveandsixwhicharegood.Thentheprobabil-
itythatthefirstbulbwillbedefectiveis4/10andthe
probabilitythatitwillbegoodis6/10.Thisisshownin
thepartialtreediagramatleft.
Pr[G
1
]=6/10
G
1
Probabilitiesforthe
secondbulbvary,depend-
Pr[D
2
|D
1
]=3/9
D
2
Figure2.6:FirstBulb
ingonwhatwastheresult
forthefirstbulb,andsoaregivenbyconditional D
1
probabilities.Theserelationsforthesecondbulbare
shownatrightinFigure2.7.
Pr[G
2
|D
1
]=6/9
G
2
Ifthefirstbulbwasdefective,theboxwillthen
Pr[D
2
|G
1
]=4/9
D
2
containthreedefectivebulbsandsixgoodones,sothe
conditionalprobabilityofobtainingadefectivebulbon
G
1
3
theseconddrawis
9
,andtheconditionalprobability
6
Pr[G
2
|G
1
]=5/9
G
2
ofobtainingagoodbulbis
9
.
Ifthefirstbulbwasgoodtheboxwillcontainfour
Figure2.7:SecondBulb
defectivebulbsandfivegoodones,sotheconditional
4
probabilityofobtainingadefectivebulbontheseconddrawis
9
,andtheconditional
5
probabilityofobtainingagoodbulbis
9
.Noticethattheseargumentsholdonly
whenthebulbsareselectedwithoutreplacement;ifthechosenbulbshadbeen
replacedintheboxandmixedwellbeforeanotherbulbwaschosen,therelevant
probabilitieswouldbedifferent.
Nowletuscombinetheseparateprobabilities.
j
4
(
\
,
j
3
\
12
Theprobabilityofgettingtwodefectivebulbsmustbe
(
,
10
, (
9
(
,

90
,theprobability
ofgettingadefectivebulbonthefirstdrawandagoodbulbontheseconddrawis
j
4
\ j
6
\
24
, ( , (

90
,theprobabilityofgettingagoodbulbonthefirstdrawandadefective
(
10
, (
9
,
j
6
\
(
j
,
4
\
(
24
bulbontheseconddrawis
(
,
10
, (
9
,

90
,andtheprobabilityofgettingtwogood
18
BasicProbability
j
6
(
\
,
j
5
\
30
,
bulbsis
(
10
, (
9
(
,

90
. Insymbolswehave:
j
4
\
(
j
,
3
\
(

12
Pr[D
1
D
2
]=Pr[D
1
]Pr[D
2
|D
1
]=
(
,
10
, (
9
,
90
j
4
\
(
j
,
6
\
(

24
Pr[D
1
G
2
]=Pr[D
1
]Pr[G
2
|D
1
]=
(
,
10
, (
9
,
90
j
6
\
(
j
,
4
\
(

24
Pr[G
1
D
2
]=Pr[G
1
]Pr[D
2
|G
1
]=
(
,
10
, (
9
,
90
j
6
\
(
j
,
5
\
(

30
Pr[G
1
G
2
]=Pr[G
1
]Pr[G
2
|G
1
]=
(
,
10
, (
9
,
90
NoticethatbothD
1
G
2
andG
1
D
2
correspondtoobtaining1goodbulband1
defectivebulb.
ThecompletetreediagramisshowninFigure2.8.
Event Probability
D
2
2defectivebulbs
12
90
G
2
1good,1defective
24
90
D
2
1good,1defective
24
90
Pr[D
2
|D
1
]=3/9
Pr[G
2
|D
1
]=6/9
Pr[D
2
|G
1
]=4/9
D
1
Pr[D
1
]=4/10
Pr[G
1
]=6/10
G
1
Pr[G
2
|G
1
]=5/9
G
2
2goodbulbs
30
90
FirstBulb SecondBulb
Figure2.8:CompleteTreeDiagram
Noticethatalltheprobabilitiesofeventsadduptoone,astheymust:
12+ 24+ 24+ 30
1
90
Nowwehavetoanswerthespecificquestionswhichwereasked:
i) Pr[exactlyonedefectivebulbisfound]=Pr[D
1

G
2
]+Pr[G
1

D
2
]
24+ 24 48
=

=0.533.
90 90
Thefirsttermcorrespondstogettingfirstadefectivebulbandthenagoodbulb,and
thesecondtermcorrespondstogettingfirstagoodbulbandthenadefectivebulb.
19
Chapter2
12
ii) Pr[exactlytwodefectivebulbsarefound]=Pr[D
1
D
2
]= =0.133.Thereis
90
onlyonepathwhichwillgivethisresult.
Noticethattestingcouldcontinueuntileitherall4defectivebulbsorall6good
bulbsarefound.
Example2.10
Afairsix-sideddieistossedtwice.Whatistheprobabilitythatafivewill
occuratleastonce?
Answer: Notethatthisproblemincludesthepossibilityofobtaining
1
twofives.Onanyonetoss,theprobabilityofafiveis
6
,andthe
5
probabilityofnofivesis
6
.Thisproblemwillbesolvedinseveralways.
5
No5
Pr[a5]=1/6
Pr[no5]=5/6
Pr[a5]=1/6
Pr[a 5]=1/6
Pr[no5]=5/6
Pr[no5]=5/6
5
Figure2.9:
TreeDiagramforTwoTosses
No5
5
No5
FirstToss SecondToss
Firstsolution(consideringallpossibilitiesusingatreediagram):
j
1
\ j
1
\
(

1
Pr[5onthefirsttoss 5onthesecondtoss]=
,
(
6
(
,
,
(
6
,
36
j
1
\ j
5
\
(

5
Pr[5onthefirsttoss no5onthesecondtoss]=
,
(
6
(
,
,
(
6
,
36
j
5
\ j
1
\
(

5
Pr[no5onthefirsttoss 5onthesecondtoss]=
,
(
6
(
,
,
(
6
,
36
j
5
\ j
5
\
(

25
Pr[no5onthefirsttoss no5onthesecondtoss]=
,
(
6
(
,
,
(
6
,
36
Totalofallprobabilities(asacheck)= 1
1 5 5 11
ThenPr[atleastonefiveintwotosses]=
+ +
36 36 36 36
20
BasicProbability
Secondsolution(usingconditionalprobability):
Theprobabilityofatleastonefiveisgivenby:
Pr[5onthefirsttoss]Pr[atleastone5intwotosses|5onthefirsttoss]
+Pr[no5onthefirsttoss]Pr[atleastone5intwotosses|no5onthefirsttoss].
1
ButPr[5onthefirsttoss]=Pr[5onanyonetoss]=
6
andPr[atleastone5intwotosses|5onthefirsttoss]=1(adeadcertainty!)
5
AlsoPr[no5onthefirsttoss]=Pr[no5onanytoss]=
6
,
1
andPr[atleastone5intwotosses|no5onthefirsttoss]=Pr[5onthesecondtoss]=
6
.
(
1
j
5
\ j
1
\
11
ThenPr[atleastone5intwotosses]=
(
j
,
6
1
,
\
( )+
(
,
6
,
(
(
,
6
,
(

36
Thirdsolution(usingtheadditionrule,eq.2.1):
Pr[atleastone5intwotosses]
=Pr[(5onthefirsttoss) (5onthesecondtoss)]
=Pr[5onthefirsttoss]+Pr[5onthesecondtoss]
Pr[(5onthefirsttoss)

(5onthesecondtoss)]
=
1
+
1

j
,
1
\
(
j
,
1
\
(

6
+
6

1

11
6 6
(
6
, (
6
,
36 36 36 36
Fourthsolution:Lookatthesamplespace(i.e.,considerallpossibleoutcomes).
Letsuseamatrixnotationwhereeachentrygivesfirsttheresultofthefirsttossand
thentheresultofthesecondtoss,asfollows:
1,1 1,2 1,3 1,4 1,5 1,6
2,1 2,2 2,3 2,4 2,5 2,6
3,1 3,2 3,3 3,4 3,5 3,6
4,1 4,2 4,3 4,4 4,5 4,6
5,1 5,2 5,3 5,4 5,5 5,6
6,1 6,2 6,3 6,4 6,5 6,6
Figure2.10:SampleSpaceofTwoTosses
Inthefifthrowtheresultofthefirsttossisa5,andinthefifthcolumntheresultof
thesecondtossisa5.Thisrowandthiscolumnhavebeenshadedandrepresentthe
partofthesamplespacewhichmeetstherequirementsoftheproblem.Thisarea
contains11entries,whereasthewholesamplespacecontains36entries,
11
soPr[atleastone5intwotosses]=
36
.
21
Chapter2
Fifthsolution(andthefastest):Theprobabilityofnofivesintwotossesis
j
5
\ j
5
\
(

25
, ( ,
(
6
, (
6
,
36
Becausetheonlyalternativetonofivesisatleastonefive,
25 11
Pr[atleastone5intwotosses]=
1
36

36
Beforewestarttocalculateweshouldconsiderwhetheranothermethodmay
giveafastercorrectresult!
Example2.11
Aclassofengineeringstudentsconsistsof45people.Whatistheprobabilitythatno
twostudentshavebirthdaysonthesameday,notconsideringtheyearofbirth?To
simplifythecalculation,assumethatthereare365daysintheyearandthatbirthsare
equallylikelyonallofthem.Thenwhatistheprobabilitythatsomemembersofthe
classhavebirthdaysonthesameday?
Answer:Thefirstpersonintheclassstateshisbirthday.Theprobabilitythatthe
364
secondpersonhasadifferentbirthdayis
365
,andtheprobabilitythatthethird
363
personhasadifferentbirthdaythaneitherofthemis .Wecancontinuethis
365
calculationuntilthebirthdaysofall45peoplehavebeenconsidered.Thenthe
probabilitythatnotwostudentsintheclasshavethesamebirthdayis
j
364
\ j
363
\ j
362
\ j
365 i+ 1
\ j
365 45+ 1
\
( , ( , (
...
, (
...
, (
=0.059.(Themultiplicationwas (1)
(
,
365
, (
365
, (
365
, (
365
, (
365
,
doneusingaspreadsheet.)Thentheprobabilitythatatleastonepairofstudentshave
birthdaysonthesamedayis10.059=0.941.
Infact,somedaysoftheyearhavehigherfrequenciesofbirthsthanothers,sothe
probabilitythatatleastonepairofstudentswouldhavebirthdaysonthesamedayis
somewhatlargerthan0.941.
Thefollowingexampleisalittlemorecomplex,butitinvolvesthesameapproach.
Becausethiscaseusesthemultiplicationrule,treediagramsareveryhelpful.
Example2.12
AnoilcompanyisbiddingfortherightstodrillawellinfieldAandawellinfield
B.TheprobabilityitwilldrillawellinfieldAis40%.Ifitdoes,theprobabilitythe
wellwillbesuccessfulis45%.TheprobabilityitwilldrillawellinfieldBis30%.
Ifitdoes,theprobabilitythewellwillbesuccessfulis55%.Calculateeachofthe
followingprobabilities:
a) probabilityofasuccessfulwellinfieldA,
b) probabilityofasuccessfulwellinfieldB,
c) probabilityofbothasuccessfulwellinfieldAandasuccessfulwellinfieldB,
d) probabilityofatleastonesuccessfulwellinthetwofieldstogether,
22
BasicProbability
e) probabilityofnosuccessfulwellinfieldA,
f) probabilityofnosuccessfulwellinfieldB,
g) probabilityofnosuccessfulwellinthetwofieldstogether(calculatebytwo
methods),
h) probabilityofexactlyonesuccessfulwellinthetwofieldstogether.
Showacheckinvolvingtheprobabilitycalculatedinparth.
Answer:
ForFieldA:
Result Probability
Pr[success]=0.45
success (0.40)(0.45)=0.18
Pr[well]=0.40 well
(0.40)(0.55)=0.22
Pr[failure]=0.55
failure
nowell nowell
Pr[nowell]=0.60
0.60
Total 1.00
Figure2.11:TreeDiagramforFieldA
a) ThenPr[asuccessfulwellinfieldA]=Pr[awellinA]Pr[success|well
inA]
=(0.40)(0.45)
=0.18 (usingequation2.3)
ForFieldB:
Result Probability
Pr[success]=0.55
success (0.30)(0.55)=0.165
Pr[well]=0.30 well
(0.30)(0.45)=0.135
Pr[failure]=0.45
failure
nowell
Pr[nowell]=0.70
0.70
Total 1.00
Figure2.12:TreeDiagramforFieldB
b) ThenPr[asuccessfulwellinfieldB]=Pr[awellinB]Pr[success|well
inB]
=(0.30)(0.55)
=0.165 (usingequation2.3)
23
Chapter2
c) Pr[bothasuccessfulwellinfieldAandasuccessfulwellinfieldB]
=Pr[asuccessfulwellinfieldA]Pr[asuccessfulwellinfieldB]
=(0.18)(0.165)
=0.0297 (usingequation2.2,sinceprobabilityofsuccessin
onefieldisnotaffectedbyresultsintheotherfield)
d) Pr[atleastonesuccessfulwellinthetwofields]
=Pr[(successfulwellinfieldA) (successfulwellinfieldB)]
=Pr[successfulwellinfieldA]+Pr[successfulwellinfieldB]
Pr[bothsuccessful]
=0.18+0.1650.0297
=0.3153or 0.315 (usingequation2.1)
e) Pr[nosuccessfulwellinfieldA]
=Pr[nowellinfieldA]+Pr[unsuccessfulwellinfieldA]
=Pr[nowellinfieldA]+Pr[wellinfieldA]Pr[failure|wellinA]
=0.60+(0.40)(0.55)
=0.60+0.22
=0.82 (usingequation2.3andthesimpleadditionrule)
f) Pr[nosuccessfulwellinfieldB]
=Pr[nowellinfieldB]+Pr[unsuccessfulwellinfieldB]
=Pr[nowellinfieldB]+Pr[wellinfieldB]Pr[failure|wellinB]
=0.70+(0.30)(0.45)
=0.70+0.135
=0.835(usingequation2.3andthesimpleadditionrule)
g) Pr[nosuccessfulwellinthetwofields]canbecalculatedintwoways.One
methodusestherequirementthatprobabilitiesofallpossibleresultsmust
addupto1.Thisgives:
Pr[nosuccessfulwellinthetwofields]=1Pr[atleastonesuccessfulwell
inthetwofields]
=10.3153
=0.6847or0.685
Thesecondmethodusesequation2.2:
Pr[nosuccessfulwellinthetwofields]
=Pr[nosuccessfulwellinfieldA]Pr[nosuccessfulwellinfieldB]
=(0.82)(0.835)
=0.6847or0.685
h) Pr[exactlyonesuccessfulwellinthetwofields]
=Pr[(successfulwellinA) (nosuccessfulwellinB)]
+Pr[(nosuccessfulwellinA) (successfulwellinB)]
=(0.18)(0.835)+(0.82)(0.165)
=0.1503+0.1353
=0.2856or0.286(usingequation2.2andthesimpleadditionrule)
24
BasicProbability
Check:Forthetwofieldstogether,
Pr[twosuccessfulwells]= 0.0297(frompartc)
Pr[exactlyonesuccessfulwell]= 0.2856(fromparth)
Pr[nosuccessfulwells]= 0.6847(frompartg)
Total(check)= 1.0000
Problems
1. Pastrecordsshowthat4of135partsaredefectiveinlength,3of141aredefec-
tiveinwidth,and2of347aredefectiveinboth.Usethesefigurestoestimate
probabilitiesoftheindividualeventsassumingthatdefectsoccurindependently
inlengthandwidth.
a) Whatistheprobabilitythatapartproducedunderthesameconditionswill
bedefectiveinlengthorwidthorboth?
b) Whatistheprobabilitythatapartwillhaveneitherdefect?
c) Whatarethefairoddsagainstadefect(inlengthorwidthorboth)?
2. Inagroupof72students,14takeneitherEnglishnorchemistry,42takeEnglish
and38takechemistry.Whatistheprobabilitythatastudentchosenatrandom
fromthisgrouptakes:
a) bothEnglishandchemistry?
b) chemistrybutnotEnglish?
3. Arandomsampleof250studentsenteringtheuniversityincluded120females,
ofwhom20belongedtoaminoritygroup,65hadaveragesover80%,and10fit
bothcategories.Amongthe250students,atotalof105peopleinthesamplehad
averagesover80%,andatotalof40belongedtotheminoritygroup.Fifteen
malesintheminoritygrouphadaveragesover80%.
i) Howmanyofthosenotintheminoritygrouphadaveragesover80%?
ii) Givenapersonwasamalefromtheminoritygroup,whatistheprobability
hehadanaverageover80%?
iii) Whatistheprobabilitythatapersonselectedatrandomwasmale,didnot
comefromtheminoritygroup,andhadanaveragelessthan80%?
4. TwohundredstudentsweresampledintheCollegeofArtsandScience.Itwas
foundthat:137takemath,50takehistory,124takeEnglish,33takemathand
history,29takehistoryandEnglish,92takemathandEnglish,18takemath,
historyandEnglish.Findtheprobabilitythatastudentselectedatrandomoutof
the200takesneithermathnorhistorynorEnglish.
5. Amongagroupof60engineeringstudents,24takemathand29takephysics.
Also10takebothphysicsandstatistics,13takebothmathandphysics,11take
mathandstatistics,and8takeallthreesubjects,while7takenoneofthethree.
a) Howmanystudentstakestatistics?
b) Whatistheprobabilitythatastudentselectedatrandomtakesallthree,
givenhetakesstatistics?
25
Chapter2
6. Of65students,10takeneithermathnorphysics,50takemath,and40take
physics.Whatarethefairoddsthatastudentchosenatrandomfromthisgroup
of65takes(i)bothmathandphysics?(ii)mathbutnotphysics?
7. 16partsareexaminedfordefects.Itisfoundthat10aregood,4haveminor
defects,and2havemajordefects.Twopartsarechosenatrandomfromthe16
withoutreplacement,thatis,thefirstpartchosenisnotreturnedtothemix
beforethesecondpartischosen.Notice,then,thattherewillbeonly15possible
choicesforthesecondpart.
a) Whatistheprobabilitythatbotharegood?
b) Whatistheprobabilitythatexactlyoneparthasamajordefect?
8. TherearetworoadsbetweentownsAandB.Therearethreeroadsbetween
townsBandC.JohngoesfromtownAtotownC.Howmanydifferentroutes
canhetravel?
9. AhikerleavespointAshowninFigure2.13below,choosingatrandomonepath
fromAB,AC,AD,andAE.Ateachsubsequentjunctionshechoosesanother
pathatrandom,butshedoesnotimmediatelyreturnonthepathshehasjust
taken.
a) WhataretheoddsthatshearrivesatpointX?
b) YoumeetthehikeratpointX.Whatistheprobabilitythatthehikercamevia
pointCorE?
A
B C D E
Y Z X W V U
F
G
Figure2.13:PathsforHiker
10. Theprobabilitythatacertaintypeofmissilewillhitthetargetonanyonefiring
is0.80.Howmanymissilesshouldbefiredsothatthereisatleast98%probabil-
ityofhittingthetargetatlestonce?
11. Towinadailydoubleatahorseraceyoumustpickthewinninghorsesinthe
firsttworaces.Ifthehorsesyoupickhavefairoddsagainstof3:2and5:1,what
arethefairoddsinfavorofyourwinningthedailydouble?
12. Ahockeyteamwinswithaprobabilityof0.6andloseswithaprobabilityof0.3.
Theteamplaysthreegamesovertheweekend.Findtheprobabilitythattheteam:
26
BasicProbability
a) winsallthreegames.
b) winsatleasttwiceanddoesntlose.
c) winsonegame,losesone,andtiesone(inanyorder).
13. Toencouragehissonspromisingtenniscareer,afatheroffersthesonaprizeif
hewins(atleast)twotennissetsinarowinathree-setseries.Theseriesistobe
playedwiththefatherandtheclubchampionalternately,sointheorderfather-
champion-fatherorchampion-father-champion.Thechampionisabetterplayer
thanthefather.WhichseriesshouldthesonchooseifPr[sonbeatingthecham-
pion]=0.4,andPr[sonbeatinghisfather]=0.8?Whatistheprobabilityofthe
sonwinningaprizeforeachofthetwoalternatives?
14. Threeballsaredrawnoneaftertheotherfromabagcontaining6redballs,5
yellowballsand3greenballs.Whatistheprobabilitythatallthreeballsare
yellowif:
a) theballisreplacedaftereachdrawandthecontentsarewellmixed?
b) theballisnotreplacedaftereachdraw?
15. Whenbuyingadozeneggs,Mrs.Murphyalwaysinspects3eggsforcracks;if
oneormoreoftheseeggshasacrack,shedoesnotbuythecarton.Assuming
thateachsubsetof3eggshasanequalprobabilityofbeingselected,whatis
theprobabilitythatMrs.Murphywillbuyacartonwhichhas5eggswith
cracks?
16. Of20lightbulbs,3aredefective.Fivebulbsarechosenatrandom.(a)Usethe
rulesofprobabilitytofindtheprobabilitythatnonearedefective.(b)Whatisthe
probabilitythatatleastoneisdefective?
17. OfflightsfromSaskatoontoWinnipeg,89.5%leaveontimeandarriveontime,
3.5%leaveontimeandarrivelate,1.5%leavelateandarriveontime,and5.5%
leavelateandarrivelate.Whatistheprobabilitythat,givenaflightleaveson
time,itwillarrivelate?Whatistheprobabilitythat,givenaflightleaveslate,it
willarriveontime?
18. Eightengineeringstudentsarestudyingtogether.Whatistheprobabilitythatat
leasttwostudentsofthisgrouphavethesamebirthday,notconsideringtheyear
ofbirth?Simplifythecalculationsbyassumingthatthereare365daysinthe
yearandthatallareequallylikelytobebirthdays.
19. Theprobabilitiesofthemonthlysnowfallexceeding10cmataparticularloca-
tioninthemonthsofDecember,January,andFebruaryare0.2,0.4,and0.6,
respectively.Foraparticularwinter:
a) Whatistheprobabilitythatsnowfallwillbelessthan10cminallthreeof
themonthsofDecember,JanuaryandFebruary?
b) Whatistheprobabilityofreceivingatleast10cmsnowfallinatleast2of
the3months?
c) Giventhatthesnowfallexceeded10cmineachofonlytwomonths,whatis
theprobabilitythatthetwomonthswereconsecutive?
27
Chapter2
20. Acircuitconsistsoftwocomponents,AandB,connectedasshownbelow.
Input Output
A
B
Figure2.14:CircuitDiagram
Eachcomponentcanfail(i)toanopencircuitmodeor
(ii)toashortcircuitmode.
Theprobabilitiesofthecomponentsfailingtothesemodesinayearare:
Probabilityoffailingto
OpenCircuit ShortCircuit
Component Mode Mode
A 0.100 0.150
B 0.200 0.100
Thecircuitfailstoperformitsintendedfunctionif(i)thecomponentinatleast
onebranchfailstotheshortcircuitmode,orif(ii)bothcomponentsfailtothe
opencircuitmode.
Calculatetheprobabilitythatthecircuitwillfunctionadequatelyattheendofa
two-yearperiod.
21. Tenmarriedcouplesareinaroom.
a) Iftwopeoplearechosenatrandomfindtheprobabilitythat(i)oneismale
andoneisfemale,(ii)theyaremarriedtoeachother.
b) If4peoplearechosenatrandom,findtheprobabilitythat2marriedcouples
arechosen.
c) Ifthe20peoplearerandomlydividedintotenpairs,findtheprobabilitythat
eachpairisamarriedcouple.
22. Aboxcontainsthreecoins,twoofthemfairandonetwo-headed.Acoinis
selectedatrandomandtossed.Ifheadsappearsthecoinistossedagain;iftails
appearsthenanothercoinisselectedfromthetworemainingcoinsandtossed.
a) Findtheprobabilitythatheadsappearstwice.
b) Findtheprobabilitythattailsappearstwice.
23. Theprobabilityofprecipitationtomorrowis0.30,andtheprobabilityofprecipi-
tationthenextdayis0.40.
a) Usethesefigurestofindtheprobabilitytherewillbenoprecipitationduring
thetwodays.Stateanyassumption.Whatistheprobabilitytherewillbe
someprecipitationinthenexttwodays?
28
BasicProbability
b) Whyisthiscalculationnotstrictlycorrect?Iffigureswereavailable,how
couldtheprobabilityofnoprecipitationduringthenexttwodaysbecalcu-
latedmoreaccurately?Showthiscalculationinsymbols.
2.3 PermutationsandCombinations
Permutationsandcombinationsgiveusquick,algebraicmethodsofcounting.They
areusedinprobabilityproblemsfortwopurposes:tocountthenumberofequally
likelypossibleresultsfortheclassicalapproachtoprobability,andtocountthe
numberofdifferentarrangementsofthesameitemstogiveamultiplyingfactor.
(a) Eachseparatearrangementofallorpartofasetofitemsiscalledapermutation.The
numberofpermutationsisthenumberofdifferentarrangementsinwhichitemscan
beplaced.Noticethatiftheorderoftheitemsischanged,thearrangementisdiffer-
ent,sowehaveadifferentpermutation.Saywehaveatotalofnitemstobearranged,
andwecanchooserofthoseitemsatatime,wherern.Thenumberofpermuta-
tionsofnitemschosenratatimeiswritten
n
P
r
.Forpermutationsweconsiderboth
theidentityoftheitemsandtheirorder.
Letusthinkforaminuteaboutthenumberofchoiceswehaveateachstep
alongtheway.Iftherearendistinguishableitems,wehavenchoicesforthefirst
item.Havingmadethatchoice,wehave(n1)choicesfortheseconditem,then
(n2)choicesforthethirditem,andsoonuntilwecometotherthitem,for
whichwehave(nr+1)choices.Thenthetotalnumberofchoicesisgivenby
theproduct(n)(n1)(n2)(n3)...(nr+1).Butrememberthatwehavea
short-handnotationforarelatedproduct,(n)(n1)(n2)(n3)...(3)(2)(1)=n!,
whichiscallednfactorialorfactorialn.Similarly,r!=(r)(r1)(r2)(r3)...
(3)(2)(1),and(nr)!=(nr)(nr1)((nr2)...(3)(2)(1).Thenthetotal
numberofchoices,whichiscalledthenumberofpermutationsofnitemstaken
ratatime,is
n!
n n 1
)(
n 2
)

(
2
)(
1
) (
P
n r

(
n r
)
!
(
n r
)(
n r 1
)

(
3
)(
2
)(
1
)
(2.6)
Bydefinition,0!=1.Thenthenumberofchoicesofnitemstakennatatimeis
n
P
n
=n!.
Example2.13
AnengineerintechnicalsalesmustvisitplantsinVancouver,Toronto,and
Winnipeg.Howmanydifferentsequencesorordersofvisitingthesethreeplants
arepossible?
Answer: Thenumberofdifferentsequencesisequalto
3
P
3
=3!=6different
permutations.Thiscanbeverifiedbythefollowingtreediagram:
29
Chapter2
First Second Third Route
V
T
W
T W VTW
W
T VWT
V W TVW
W V TWV
T V WTV
V T WVT
Figure2.15:TreeDiagramforVisitstoPlants
(b) Thecalculationofpermutationsismodifiedifsomeoftheitemscannotbe
distinguishedfromoneanother.Wespeakofthisascalculationofthe
numberofpermutationsintoclasses.Wehavealreadyseenthatifnitemsare
alldifferent,thenumberofpermutationstakennatatimeisn!.However,if
someofthemareindistinguishablefromoneanother,thenumberofpossible
permutationsisreduced.Ifn
1
itemsarethesame,andtheremaining(nn
1
)
itemsarethesameofadifferentclass,thenumberofpermutationscanbe
n!
showntobe
n
1
!
(
n n
1
)
!
.Thenumerator,n!,wouldbethenumberofpermutations
ofndistinguishableitemstakennatatime.But n
1
oftheseitemsare
1
indistinguishable,soreducingthenumberofpermutationsbyafactor
n
1
!
,
andanother(nn
1
)itemsarenotdistinguishablefromoneanother,soreducing
1
thenumberofpermutationsbyanotherfactor
(n n
1
)!
. Ifwehaveatotalof
nitems,ofwhichn
1
arethesameofoneclass,n
2
arethesameofasecondclass,
andn
3
arethesameofathirdclass,suchthatn
1
+n
2
+n
3
=1,thenumber
n!
ofpermutationsis
n
1
!n
2
!n
3
!
.Thiscouldbeextendedtofurtherclasses.
Example2.14
Amachinistproduces22itemsduringashift.Threeofthe22itemsaredefectiveand
therestarenotdefective.Inhowmanydifferentorderscanthe22itemsbearranged
ifallthedefectiveitemsareconsideredidenticalandallthenondefectiveitemsare
identicalofadifferentclass?
Answer:Thenumberofwaysofarranging3defectiveitemsand
22!
22 20
19nondefectiveitemsis
( )(
21
)( )
1540.
! ( )( 3 2 1 ) 3 19 !) ( )( )(
30
BasicProbability
Anothermodificationofcalculationofpermutationsgivescircularpermutations.
Ifnitemsarearrangedinacircle,thearrangementdoesntchangeifeveryitemis
movedbyoneplacetotheleftortotheright.Thereforeinthissituationoneitemcan
beplacedatrandom,andalltheotheritemsareplacedinrelationtothefirstitem.
Thus,thenumberofpermutationsofndistinctitemsarrangedinacircleis(n1)!.
Theprincipaluseofpermutationsinprobabilityisasamultiplyingfactorthat
givesthenumberofwaysinwhichagivensetofitemscanbearranged.
(c) Combinationsaresimilartopermutations,butwiththeimportantdifference
thatcombinationstakenoaccountoforder.Thus,ABandBAaredifferent
permutationsbutthesamecombinationofletters.Thenthenumberof
permutationsmustbelargerthanthenumberofcombinations,andtheratio
betweenthemmustbethenumberofwaysthechosenitemscanbearranged.
Sayonanexaminationwehavetodoanyeightquestionsoutoften.The
numberofpermutationsofquestionswouldbe
10
P
8
=
10!
.Remember
2!
thatthenumberofwaysinwhicheightitemscanbearrangedis8!,sothe
numberofcombinationsmustbereducedbythefactor
1
.Thenthenumber
8!
ofcombinationsof10distinguishableitemstaken8atatimeis ,
j
10!
\
(
j
,
1
\
( .In
(
2!
, (
8!
,
general,thenumberofcombinationsofnitemstakenratatimeis
P n!
n r
C
n r
r!
(
n r
)
! !
(2.7)
r
C
r
givesthenumberofequallylikelywaysofchoosingritemsfromagroupof
ndistinguishableitems.Thatcanbeusedwiththeclassicalapproachtoprobabil-
ity.
n
Example2.15
Fourcardplayerscutforthedeal.Thatis,eachplayerremovesfromthetopofa
well-shuffled52-carddeckasmanycardsasheorshechooses.Hethenturnsthem
overtoexposethebottomcardofhiscut.Heorsheretainsthecutcard.The
highestcardwillwin,withtheacehigh.Ifthefirstplayerdrawsanine,whatisthen
hisprobabilityofwinningwithoutarecutfortie?
Answer:Forthefirstplayertowin,eachoftheotherplayersmustdrawaneightor
lower.ThenPr[win]=Pr[otherthreeplayersallgeteightorlower].
Thereare(4)(7)=28cardsleftinthedeckbelownineafterthefirstplayers
draw,andthereare521=51cardsleftintotal.Thenumberofcombinationsof
threecardsfrom51cardsis
51
C
3
,allofwhichareequallylikely.Ofthese,thenumber
ofcombinationswhichwillresultinawinforthefirstplayeristhenumberof
combinationsofthreeitemsfrom28items,whichis
28
C
3
.
31
Chapter2
Theprobabilitythatthefirstplayerwillwinis
Likemanyotherproblems,thisonecanbedoneinmorethanoneway.Asolu-
tionbythemultiplicationruleusingconditionalprobabilityisasfollows:
28
Pr[player#2getseightorlower|player#1drewanine]=
51
Ifthathappens,Pr[player#3getseightorlower]
=Pr[thirdplayergetseightorlower|firstplayerdrewanineandsecondplayer
dreweightorlower]
27
=
50
Ifthathappens,Pr[player#4getseightorlower]
=Pr[fourthplayergetseightorlower|firstplayerdrewanineandbothsecond
andthirdplayersdreweightorlower]
26
=
49
Theprobabilitythatthefirstplayerwillwinis
j 28 \j 27 \j 26 \
.
, (, (, (
0 157 .
(
51
,(
50
,(
49
,
Problems
1. Abenchcanseat4people.Howmanyseatingarrangementscanbemadefroma
groupof10people?
2. Howmanydistinctpermutationscanbeformedfromallthelettersofeachofthe
followingwords:(a)them,(b)unusual?
3. Astudentistoanswer7outof9questionsonamidtermtest.
i) Howmanyexaminationselectionshashe?
ii) Howmanyifthefirst3questionsarecompulsory?
iii) Howmanyifhemustansweratleast4ofthefirst5questions?
4. Fourlightbulbsareselectedatrandomwithoutreplacementfrom16bulbs,of
which7aredefective.Findtheprobabilitythat
a) nonearedefective.
b) exactlyoneisdefective.
c) atleastoneisdefective.
5. Of20lightbulbs,3aredefective.Fivebulbsarechosenatrandom.
a) Usepermutationsorcombinationstofindtheprobabilitythatnoneare
defective.
b) Whatistheprobabilitythatatleastoneisdefective?
(Thisisamodificationofproblem15ofthepreviousset.)
32
BasicProbability
6. Aboxcontains18lightbulbs.Ofthese,fouraredefective.Fivebulbsarechosen
atrandom.
a) Usepermutationsorcombinationstofindtheprobabilitythatnoneare
defective.
b) Whatistheprobabilitythatexactlyoneofthechosenbulbsisdefective?
c) Whatistheprobabilitythatatleastoneofthechosenbulbsisdefective?
7. Howmanydifferentsumsofmoneycanbeobtainedbychoosingtwocoinsfrom
aboxcontaininganickel,adime,aquarter,afifty-centpiece,andadollarcoin?
Isthisaprobleminpermutationsorincombinations?
8. Ifthreeballsaredrawnatrandomfromabagcontaining6redballs,4white
balls,and8blueballs,whatistheprobabilitythatallthreearered?Usepermuta-
tionsorcombinations.
9. Inapokerhandconsistingoffivecards,whatistheprobabilityofholding:
a) twoacesandtwokings?
b) fivespades?
c) A,K,Q,J,10ofthesamesuit?
10. Inhowmanywayscanagroupof7personsarrangethemselves
a) inarow,
b) aroundacirculartable?
11. Inhowmanywayscanacommitteeof3peoplebeselectedfrom8people?
12. Inplayingpoker,fivecardsaredealttoaplayer.Whatistheprobabilityofbeing
dealt(i)four-of-a-kind?(ii)afullhouse(three-of-a-kindandapair)?
13. Ahockeyclubhas7forwards,5defensemen,and3goalies.Eachcanplayonly
inhisdesignatedsubgroup.Acoachchoosesateamof3forwards,2defense,
and1goalie.
a) Howmanydifferenthockeyteamscanthecoachassembleifpositionwithin
thesubgroupisnotconsidered?
b) PlayersA,BandCprefertoplayleftforward,center,andrightdefense,respec-
tively.Whatistheprobabilitythatthesethreeplayerswillplayonthesameteam
intheirpreferredpositionsifthecoachassemblestheteamatrandom?
14. Ashipmentof17radiosincludes5radiosthataredefective.Thereceiver
samples6radiosatrandom.Whatistheprobabilitythatexactly3oftheradios
selectedaredefective?Solvetheproblem
a) usingaprobabilitytreediagram
b) usingpermutationsandcombinations.
15. Threemarriedcoupleshavepurchasedtheaterticketsandareseatedinarow
consistingofjustsixseats.Iftheytaketheirseatsinacompletelyrandom
fashion,whatistheprobabilitythat
a) JimandPaula(husbandandwife)sitinthetwoseatsonthefarleft?
b) JimandPaulaendupsittingnexttooneanother.
33
Chapter2
2.4 MoreComplexProblems:BayesRule
Morecomplexproblemscanbetreatedinmuchthesamemanner.Youmustreadthe
questionverycarefully.Iftheprobleminvolvesthemultiplicationrule,atreediagram
isalmostalwaysverystronglyrecommended.
Example2.16
Acompanyproducesmachinecomponentswhichpassthroughanautomatictesting
machine.5%ofthecomponentsenteringthetestingmachinearedefective.However,
themachineisnotentirelyreliable.Ifacomponentisdefectivethereis4%probabil-
itythatitwillnotberejected.Ifacomponentisnotdefectivethereis7%probability
thatitwillberejected.
a) Whatfractionofallthecomponentsarerejected?
b) Whatfractionofthecomponentsrejectedareactuallynot defective?
c) Whatfractionofthosenotrejectedaredefective?
Answer:LetDrepresentadefectivecomponent,andGagoodcomponent.
LetRrepresentarejectedcomponent,andAanacceptedcomponent.
Part(a)canbeanswereddirectlyusingatreediagram.
R
Figure2.16:
Testing Sequences
A
R
D
G
Pr[D]=0.05
Pr[G]=0.95
Pr[R|D]=0.96
Pr[R|G]=0.07
Pr[A|D]=0.04
Pr[A|G]=0.93 A
Nowwecancalculatetheprobabilitiesofthevariouscombinedevents:
Pr[D R]= Pr[D]Pr[R|D]= (0.05)(0.96)= 0.0480 Rejected
Pr[D A]= Pr[D]Pr[A|D]= (0.05)(0.04)= 0.0020 Accepted
Pr[G R]= Pr[G]Pr[R|G]= (0.95)(0.07)= 0.0665 Rejected
Pr[G A]= Pr[G]Pr[
A|G
]= (0.95)(0.93)= 0.8835 Accepted
Total = 1.0000 (Check)
34
BasicProbability
Becauseallpossibilitieshavebeenconsideredandthereisnooverlapamong
them,weseethattherejectedareaiscomposedofonlytwopossibilities,sothe
probabilityofrejectionisthesumoftheprobabilitiesoftwointersections.Thesame
canbesaidoftheacceptedarea.
Then Pr[R]= Pr[D R] + Pr[G R] = 0.0480+0.0665= 0.1145
and Pr[A]= Pr[D A] + Pr[G A] = 0.0020+0.8835= 0.8855
a) Theanswertopart(a)isthatinthelongrunthefractionrejectedwillbethe
probabilityofrejection,0.1145or(withrounding)0.114or11.4%.
Nowwecancalculatetherequiredquantitiestoanswerparts(b)and(c)using
conditionalprobabilitiesintheoppositeorder,soinasenseapplyingthem
backwards.
b) Fractionofcomponentsrejectedwhicharenotdefective
=probabilitythatacomponentisgood,giventhatitwasrejected
Pr GR
] 0 0665 [ .
=Pr[G|R]=
Pr R

0 1145
= 0.58 or 58%.
[ ]
.
c) Fractionofcomponentspassedwhichareactuallydefective
=probabilitythatacomponentisdefective,giventhatitwaspassed
Pr DA
] 0 0020 [ .
Usingequation2.4,thisisPr[D|A]=
Pr A

0 8855
= 0.0023 or 0.23%.
[ ]
.
(NotethatPr[G|R]Pr[R|G],andPr[D|A] Pr[A|D].)
Thusthefractionofdefectivecomponentsinthestreamwhichispassedseemsto
beacceptablysmall,butthefractionofnon-defectivecomponentsinthestream
whichisrejectedisunacceptablylarge.Inpractice,somethingwouldhavetobedone
aboutthat.
Notetwopointshereaboutthecalculation.First,toobtainanswerstoparts(b)
and(c)ofthisproblemwehaveappliedconditionalprobabilityintwodirections,
firstforwardinthetreediagram,thenbackward.Botharelegitimateapplicationsof
Equation2.3or2.4.Second,wecangofromtheideaofthesamplespace,consisting
ofallpossibleresults,tothereducedsamplespace,consistingofthoseoutcomes
whichmeetaparticularcondition.HereforPr[D|A]thereducedsamplespace
consistsofalloutcomesforwhichthecomponentisnotrejected.Theconditional
probabilityistheprobabilitythataniteminthereducedsamplespacewillsatisfythe
requirementthatthecomponentisdefective,orthelong-runfractionoftheitemsin
thereducedsamplespacethatsatisfythenewrequirement.
BayesTheoremorRuleisthenamegiventotheuseofconditionalprobabilities
inbothdirections,withcombinationofalltheintersectionsinvolvingaparticular
35
Chapter2
eventtogivetheprobabilityofthatevent.TheBayesianapproachcanbesummarized
asfollows:
First,applythemultiplicationrulewithconditionalprobabilityforward
alongthetreediagram:
Pr[A B]=Pr[A]Pr[B|A] (2.3a)
Second,applytheadditionruletoreconstructtheprobabilityofaparticular
eventasareducedsamplespace:
Pr[B]=Pr[A B]+Pr[AB] (2.8)
where ArepresentsnotA,theabsenceofAorcomplementofA.
Third,applytherelationforconditionalprobability,intheoppositedirection
onthetreediagramfromthefirststep,usingthisreducedsamplespace:
Pr AB
]
Pr[A|B]=
Pr B
(2.5)
[
[ ]
BayesRuleshouldalwaysbeusedwithatreediagram.Thus,forExample2.16
wehave:
R
A
R
Figure2.17:
TreeDiagramforBayesRule
D
G
Pr[D]=0.05
Pr[G]=0.95
Pr[R|D]=0.96
Pr[R|G]=0.07
Pr[A|D]=0.04
Pr[A|G]=0.93 A
ThestepscorrespondingtothereasoningbehindBayesRuleforthistreedia-
gramare:
First,Pr[D R]=Pr[D]Pr[R|D],andsoon,correspondingtoequation2.3a.
Then,Pr[R]=Pr[D R]+Pr[G R],andsimilarlyforPr[A],correspondingto
equation2.8.
Pr GR
]
Then,Pr [G|R]=
Pr R
,andsimilarlyforPr[D|A],correspondingtoequa-
[
[ ]
tion2.5.
AnimportantuseofBayesRuleisinmodifyingearlierestimatesofprobability
withlaterobserveddata.
36
BasicProbability
HereisanotherexampleoftheuseofBayesRule:
Example2.17
Amanhasthreeidenticaljewelryboxes,eachwithtwoidenticaldrawers.Inthefirst
boxbothdrawerscontaingoldwatches.Inthesecondboxbothdrawerscontainsilver
watches.Inthethirdboxonedrawercontainsagoldwatch,andtheotherdrawer
containsasilverwatch.Themanwantstowearagoldwatch.Ifheselectsaboxat
random,opensadraweratrandom,andfindsasilverwatch,whatistheprobability
thattheotherdrawerinthatboxcontainsagoldwatch?
Answer:(Itisinterestingatthispointtoguesswhattherightanswerwillbe!Tryit.)
IfGstandsforagoldwatchandSstandsforasilverwatch,thethreeboxesandtheir
contentscanbeshownasfollows:
1 2 3
G S G
G S S
Figure2.18:JewelryBoxes
Iftheselectedboxcontainsbothasilverwatchandagoldwatch,itmustbeBox3.
ThenweneedtocalculatetheprobabilitythatthemanchoseBox3oncondition
thathefoundasilverwatch,Pr[B
3
|S],whereB
3
standsforBox3andsimilar
notationsapplyforotherboxes.Westartwithatreediagramandapplyconditional
probabilitiesalongthetree.
Pr[S|B
1
]=0
S
Box1
Pr[B
1
]=1/3
Pr[G|B
1
]=1
G
S
Pr[B
2
]=1/3
Pr[S|B
2
]=1
Box2
Pr[G|B
2
]=0
G
Pr[S|B
3
]=1/2 S
Pr[B
3
]=1/3
Box3
Pr[G|B
3
]=1/2
G
Figure2.19:
TreeDiagramforJewelryBoxes
Usingequation2.5,Pr[S B
i
]=Pr[B
i
]Pr[S|B
i
], andsimilarly Pr[G B
i
]=
Pr[B
i
]Pr[G|B
i
],sowehave:
37
Chapter2
i,BoxNo. Pr[S B
i
] Pr[G B
i
]
j
1
1
j
1
\
( )
=0
\
( )
=
1
(
3
,
0
(
3
,
1
3
1
2
j
1
\
( )
j
1
\
( ) 0
(
3
,
1
(
3
,
0
3
1
,
\
,
\
1
,
\
,
\ j
1
(
j
1
(

j
1
(
j
1
(

3
(
3
,(
2
,
6
(
3
,(
2
,
6
1 1
Total
2 2
3
1 1
ThenPr[S]=
Pr[S B
i
] 0+
1
+
3 6 2
i1
3
1 1
and Pr[G]=
Pr[G B
i
]
1
+ 0+
3 6 2
Total =1 (check)
i1
Pr
[
B S
] 1 6 1
3

ThenwehavePr[B
3
|S]=
Pr
[ ]
S 1 2 3
1
Thentheprobabilitythattheotherdrawercontainsagoldwatchis
3
.
Otherrelativelycomplexproblemswillbeencounteredwhentheconceptsof
basicprobabilityarecombinedwithotherideasordistributionsinlaterchapters.
Problems
1. ThreedifferentmachinesMl,M2,andM3areusedtoproducesimilarelectronic
components.MachinesMl,M2,andM3produce20%,30%and50%ofthe
componentsrespectively.Itisknownthattheprobabilitiesthatthemachines
producedefectivecomponentsare1%forM1,2%forM2,and3%forM3.Ifa
componentisselectedrandomlyfromalargebatch,andthatcomponentis
defective,findtheprobabilitythatitwasproduced:(a)byM2,and(b)byM3.
2. Afloodforecasterissuesafloodwarningundertwoconditionsonly:(i)iffall
rainfallexceeds10cmandwintersnowfallisbetween15and20cm,or(ii)if
wintersnowfallexceeds20cmregardlessoffallrainfall.Theprobabilityoffall
rainfallexceeding10cmis0.10,whiletheprobabilitiesofwintersnowfall
exceeding15and20cmare0.15and0.05respectively.
a) Whatistheprobabilitythathewillissueawarninganygivenspring?
b) Giventhatheissuesawarning,whatistheprobabilitythatfallrainfallwas
greaterthan10cm?
38
BasicProbability
3. Acertaincompanyhastwocarassemblyplants,AandB.PlantAproducestwice
asmanycarsasplantB.PlantAusesenginesandtransmissionsfromasubsid-
iaryplantwhichproduces10%defectiveenginesand2%defective
transmissions.PlantBusesenginesandtransmissionsfromanothersourcewhere
8%oftheenginesand4%ofthetransmissionsaredefective.Cartransmissions
andenginesateachplantareinstalledindependently.
a) Whatistheprobabilitythatacarchosenatrandomwillhaveagoodengine?
b) WhatistheprobabilitythatacarfromplantAhasadefectiveengine,ora
defectivetransmission,orboth?
c) Whatistheprobabilitythatacarwhichhasagoodtransmissionanda
defectiveenginewasassembledatplantB?
4. Itisknownthatofthearticlesproducedbyafactory,20%comefromMachine
A,30%fromMachineB,and50%fromMachineC.Thepercentagesofsatisfac-
toryarticlesamongthoseproducedare95%forA,85%forBand90%forC.An
articleischosenatrandom.
a) Whatistheprobabilitythatitissatisfactory?
b) Assumingthatthearticleissatisfactory,whatistheprobabilitythatitwas
producedbyMachineA?
5. Ofthefeedmaterialforamanufacturingplant,85%issatisfactory,andtherestis
not.Ifitissatisfactory,theprobabilityitwillpassTestAis92%.Ifitisnot
satisfactory,theprobabilityitwillpassTestAis9.5%.IfitpassesTestAitgoes
ontoTestB;99%willpassTestBifthematerialissatisfactory,and16%will
passTestBifthematerialisnotsatisfactory.IfitfailsTestAitgoesontoTest
C;82%willpassTestCifthematerialissatisfactory,butonly3%willpassTest
Cifthematerialisnotsatisfactory.MaterialisacceptedifitpassesbothTestA
andTestB.MaterialisrejectedifitfailsbothTestAandTestC.Materialis
reprocessedifitfailsTestBorpassesTestC.
a) Whatpercentageofthefeedmaterialisaccepted?
b) Whatpercentageofthefeedmaterialisreprocessed?
c) Whatpercentageofthematerialwhichisreprocessedwassatisfactory?
6. InasmallisolatedtowninNorthernSaskatchewan,90%oftheColaconsumed
bythetownspeopleispurchasedfromtheGeneralStore,whiletherestispur-
chasedfromothervendors.Recordsshow60%ofallthebottlessoldare
returned.Accordingtoaspecialstudy,abottlepurchasedattheGeneralStoreis
fourtimesaslikelytobereturnedasabottlepurchasedelsewhere.
a) CalculatetheprobabilitythatapersonbuyingabottleofColafromthe
GeneralStorewillreturntheemptybottle.
b) IfaColabottleisfoundlyinginthestreet,whatistheprobabilitythatitwas
notpurchasedattheGeneralStore?
7. Threeroadconstructionfirms,X,YandZ,bidforacertaincontract.Frompast
experience,itisestimatedthattheprobabilitythatXwillbeawardedthecontract
is0.40,whileforYandZtheprobabilitiesare0.35and0.25.IfXdoesreceive
39
Chapter2
thecontract,theprobabilitythattheworkwillbesatisfactorilycompletedon
timeis0.75.ForYandZtheseprobabilitiesare0.80and0.70.
a) WhatistheprobabilitythatYwillbeawardedthecontractandcompletethe
worksatisfactorily?
b) Whatistheprobabilitythattheworkwillbecompletedsatisfactorily?
c) Itturnsoutthattheworkwasdonesatisfactorily.Whatistheprobabilitythat
Ywasawardedthecontract?
8. Twoservicestationscompetewithoneanother.Theoddsare3to1thatamotor-
istwillgotostationAratherthanstationB.Giventhatamotoristgoestostation
B,theprobabilitythathewillbeaskedwhetherhewantshisoilcheckedis0.76.
Asurveyindicatesthatofthemotoristswhoareaskedwhethertheywanttheoil
checked,79%wenttostationA.GiventhatamotoristgoestostationA,whatis
theprobabilitythathewillbeaskedwhetherhewantshisoilchecked?
9. Amachiningprocessproduces98.6%goodcomponents.Therestaredefective.
Eachcomponentpassesthroughapneumaticgaugingsystem.96%ofthedefec-
tivecomponentsarerejectedbythegaugingsystem,but5%ofthegood
componentsarerejectedalso.Allcomponentsrejectedbythegaugingsystem
passthroughatester.Thetesteraccepts98%ofthegoodcomponentsand12%of
thedefectivecomponentswhichreachit.Thecomponentswhichareacceptedby
thetestergoasecondtimethroughthegaugingsystem,whichnowaccepts92%
ofthegoodcomponentsand6%ofthedefectivecomponentswhichpassthrough
it.Thetotalrejectstreamconsistsofcomponentsrejectedbythetesterand
componentsrejectedbythesecondpassthroughthegaugingsystem.Thetotal
acceptedstreamconsistsofcomponentsacceptedbythegaugingsystemineither
pass.
a) Whatpercentageofallthecomponentsarerejected?
b) Whatpercentageofthetotalrejectstreamwasacceptedbythetester?
c) Whatpercentageofthetotalrejectstreamarenotdefective?
40
CHAPTER
3
DescriptiveStatistics:SummaryNumbers
Prerequisite:Agoodknowledgeofalgebra.
Thepurposeofdescriptivestatisticsistopresentamassofdatainamoreunder-
standableform.Wemaysummarizethedatainnumbersas(a)someformofaverage,
orinsomecasesaproportion,(b)somemeasureofvariabilityorspread,and(c)
quantitiessuchasquartilesorpercentiles,whichdividethedatasothatcertain
percentagesofthedataareaboveorbelowthesemarks.Furthermore,wemaychoose
todescribethedatabyvariousgraphicaldisplaysorbythebargraphscalledhisto-
grams,whichshowthedistributionofdataamongvariousintervalsofthevarying
quantity.Itisoftennecessaryordesirabletoconsiderthedataingroupsanddeter-
minethefrequencyforeachgroup.Thischapterwillbeconcernedwithvarious
summarynumbers,andthenextchapterwillconsidergroupedfrequencyandgraphi-
caldescriptions.
Useofacomputercanmaketreatmentofmassivesetsofdatamucheasier,so
computercalculationsinthisareawillbeconsideredindetail.However,itisneces-
sarytohavethefundamentalsofdescriptivestatisticsclearlyinmindwhenusingthe
computer,sotheideasandrelationsofdescriptivestatisticswillbedevelopedfirstfor
pencil-and-papercalculationswithapocketcalculator.Thencomputermethodswill
beintroducedandillustratedwithexamples.
First,considerdescribingasetofdatabysummarynumbers.Thesewillinclude
measuresofacentrallocation,suchasthearithmeticmean,markerssuchasquartiles
orpercentiles,andmeasuresofvariabilityorspread,suchasthestandarddeviation.
3.1 CentralLocation
Variousaveragesareusedtoindicateacentralvalueofasetofdata.Someofthese
arereferredtoasmeans.
(a) ArithmeticMean
Oftheseaverages,themostcommonandfamiliaristhearithmeticmean,definedby
N
1
xor

x
i (3.1)
N
i1
41
Chapter3
Ifwerefertoaquantityasameanwithoutanyspecificmodifier,thearithmetic
meanisimplied.Inequation3.1 x isthemeanofasample,and isthemeanofa
population,butbothmeansarecalculatedinthesameway.
Thearithmeticmeanisaffectedbyallofthedata,notjustanyselectionofit.
Thisisagoodcharacteristicinmostcases,butitisundesirableifsomeofthedata
aregrosslyinerror,suchasoutliersthatareappreciablylargerorsmallerthanthey
shouldbe.Thearithmeticmeanissimpletocalculate.Itisusuallythebestsingle
averagetouse,especiallyifthedistributionisapproximatelysymmetricaland
containsnooutliers.
Ifsomeresultsoccurmorethanonce,itisconvenienttotakefrequenciesinto
account.Iff
i
standsforthefrequencyofresultx
i
,equation3.1becomes

x f
i i
xor

f
i
(3.2)
Thisisinexactlythesameformastheexpressionforthex-coordinateofthecenter
ofmassofasystemofNparticles:

x m
i i
x
C of M

m
i
(3.3)
Justasthemassofparticlei,m
i
,isusedastheweightingfactorinequation3.3,
thefrequency,f
i
,isusedastheweightingfactorinequation3.2.
Noticethatfromequation3.1
N
Nx

x
i
0
i1
N
so
(
x
i
x
)
0
i1
Inwords,thesumofallthedeviationsfromthemeanisequaltozero.
Wecanalsowriteequation3.2as
, ]
N
f
j ]
xor

x
j
,
,
]
j1
,

f
i
]
(3.2a)

alli
]
f
j
Thequantityinthisexpressionisthemeanofapopulation.Thequantity is
n

f
i
therelativefrequencyofx
i
.
i1
42
DescriptiveStatistics:SummaryNumbers
Toillustrate,supposewetosstwocoins15times.Thepossiblenumberofheads
oneachtossis0,1,or2.Supposewefindnoheads3times,onehead7times,and
twoheads5times.Thenthemeannumberofheadspertrialusingequation3.2is
)(
7 + 2
(
0 3
)
+
(
1
)( ) ( )(
5
) 17
x 1.13
3 7 + + 5 15
Thesameresultcanbeobtainedusingequation3.2a.
(b) OtherMeans
Wemustnotthinkthatthearithmeticmeanistheonlyimportantmean.Thegeomet-
ricmean,logarithmicmean,andharmonicmeanareallimportantinsomeareasof
engineering.Thegeometricmeanisdefinedasthenthrootoftheproductofn
observations:
n
geometricmean= x x x
3
x (3.4)
1 2 n
or,intermsoffrequencies,
f
1
f
2
f
3
f
n
1
geometricmean=

f
i
( ) (
x
2
) (
x
3
)

( )
x
1
x
n
1
Nowtakinglogarithmsofbothsides,

f
i
logx
i
log(geometricmean)=

f
i
(3.5)
Thelogarithmicmeanoftwonumbersisgivenbythedifferenceofthenatural
logarithmsofthetwonumbers,dividedbythedifferencebetweenthenumbers,or
1nx
2
1nx
1
x
2
x
1
.Itisusedparticularlyinheattransferandmasstransfer.
Theharmonicmeaninvolvesinversesi.e.,onedividedbyeachofthequanti-
ties.Theharmonicmeanistheinverseofthearithmeticmeanofalltheinverses,so
1
1 1
+ +
x
2
x
1
Inthisbookwewillnotbeconcernedfurtherwithlogarithmicorharmonic
means.
(c) Median
Anotherrepresentativequantity,quitedifferentfromamean,isthemedian.Ifallthe
itemswithwhichweareconcernedaresortedinorderofincreasingmagnitude(size),
fromthesmallesttothelargest,thenthemedianisthemiddleitem.Considerthefive
items:12,13,21,27,31.Then21isthemedian.Ifthenumberofitemsiseven,the
medianisgivenbythearithmeticmeanofthetwomiddleitems.Considerthesix
items:12,13,21,27,31,33.Themedianis(21+27)/2=24.Ifweinterpretan
43
Chapter3
itemthatisrightatthemedianasbeinghalfaboveandhalfbelow,theninallcases
themedianisthevalueexceededby50%oftheobservations.
Onedesirablepropertyofthemedianisthatitisnotmuchaffectedbyoutliers.If
thefirstnumericalexampleinthepreviousparagraphismodifiedbyreplacing31by
131,themedianisunchanged,whereasthearithmeticmeanischangedappreciably.
Butalongwiththisadvantagegoesthedisadvantagethatchangingthesizeofany
itemwithoutchangingitspositionintheorderofmagnitudeoftenhasnoeffecton
themedian,sosomeinformationislost.Ifadistributionofitemsisveryasymmetri-
calsothattherearemanymoreitemslargerthanthearithmeticmeanthansmaller
(orvice-versa),themedianmaybeamoreusefulrepresentativequantitythanthe
arithmeticmean.Considerthesevenitems:1,1,2,3,4,9,10.Themedianis3,with
asmanyitemssmallerthanitaslarger.Themeanis4.29,withfiveitemssmaller
thanit,butonlytwoitemslarger.
(d) Mode
Ifthefrequencyvariesfromoneitemtoanother,themodeisthevaluewhichappears
mostfrequently.Assomeofyoumayknow,thewordmodemeansfashionin
French.Thenwemightthinkofthemodeasthemostfashionableitem.Inthecase
ofcontinuousvariablesthefrequencydependsuponhowmanydigitsarequoted,so
themodeismoreusefullyconsideredasthemidpointoftheclasswiththelargest
frequency(seethegroupedfrequencyapproach
insection4.4).Usingthatinterpretation,the
modeisaffectedsomewhatbytheclasswidth,
GroupA:
butthisinfluenceisusuallynotverygreat.
3.2 VariabilityorSpread
oftheData
0 1 2 3 4 5 6 7 8 9 10 11 12
Thefollowinggroupsallhavethesamemean,
4.25:
GroupB:
GroupA:2,3,4,8
GroupB:1,2,4,10
GroupC:0,1,5,11
ThesedataareshowngraphicallyinFigure
0 1 2 3 4 5 6 7 8 9 10 11 12
3.1.
GroupC:
ItisclearthatGroupBismorevariable
(showsalargerspreadinthenumbers)than
GroupA,andGroupCismorevariablethan
GroupB.Butweneedaquantitativemeasure
ofthisvariability.
0 1 2 3 4 5 6 7 8 9 10 11 12
Figure3.1:ComparisonofGroups
44
DescriptiveStatistics:SummaryNumbers
(a) SampleRange
Onesimplemeasureofvariabilityisthesamplerange,thedifferencebetweenthe
smallestitemandthelargestitemineachsample.ForGroupAthesamplerangeis6,
forGroupBitis9,andforGroupCitis11.Forsmallsamplesallofthesamesize,
thesamplerangeisausefulquantity.However,itisnotagoodindicatorifthe
samplesizevaries,becausethesamplerangetendstoincreasewithincreasing
samplesize.Itsothermajordrawbackisthatitdependsononlytwoitemsineach
sample,thesmallestandthelargest,soitdoesnotmakeuseofallthedata.This
disadvantagebecomesmoreseriousasthesamplesizeincreases.Becauseofits
simplicity,thesamplerangeisusedfrequentlyinqualitycontrolwhenthesample
sizeisconstant;simplicityisparticularlydesirableinthiscasesothatpeopledonot
needmucheducationtoapplythetest.
(b) InterquartileRange
Theinterquartilerangeisthedifferencebetweentheupperquartileandthelower
quartile,whichwillbedescribedinsection3.3.Itisusedfairlyfrequentlyasa
measureofvariability,particularlyintheBoxPlot,whichwillbedescribedinthe
nextchapter.Itisusedlessthansomealternativesbecauseitisnotrelatedtoanyof
theimportanttheoreticaldistributions.
(c) MeanDeviationfromtheMean
N
Themeandeviationfromthemean,definedas

(
x
i
x
)
/N,where
i1
x

x
i
/N,isuselessbecauseitisalwayszero.Thisfollowsfromthe
discussionofthesumofdeviationsfromthemeaninsection3.1(a).
(d) MeanAbsoluteDeviationfromtheMean
However,themeanabsolutedeviationfromthemean,
N
x
i
x /N
definedas
i1
isusedfrequentlybyengineerstoshowthevariabilityoftheirdata,althoughitis
usuallynotthebestchoice.Itsadvantageisthatitissimplertocalculatethanthe
mainalternative,thestandarddeviation,whichwillbediscussedbelow.ForGroups
A,B,andCthemeanabsolutedeviationisasfollows:
GroupA:(2.25+1.25+0.25+3.75)/4=7.5/4=1.875.
GroupB:(3.25+2.25+0.25+5.75)/4=11.5/4=2.875.
GroupC:(4.25+3.25+0.75+6.75)/4=15/4=3.75.
Itsdisadvantageisthatitisnotsimplyrelatedtotheparametersoftheoretical
distributions.Forthatreasonitsroutineuseisnotrecommended.
(e) Variance
Thevarianceisoneofthemostimportantdescriptionsofvariabilityforengineers.It
isdefinedas
45
Chapter3
N
2

(
x
i

)
2 i1 (3.6)

N
Inwordsitisthemeanofthesquaresofthedeviationsofeachmeasurementfromthe
meanofthepopulation.Sincesquaresofbothpositiveandnegativerealnumbersare
alwayspositive,thevarianceisalwayspositive.Thesymbolstandsforthemeanof
theentirepopulation,and
2
standsforthevarianceofthepopulation.(Remember
thatinChapter1wedefinedthepopulationasaparticularcharacteristicofallthe
itemsinwhichweareinterested,suchasthediametersofalltheboltsproduced
undernormaloperatingconditions.) Noticethatvarianceisdefinedintermsofthe
population mean,.Whenwecalculatetheresultsfromasample(i.e.,apartofthe
population)wedonotusuallyknowthepopulationmean,sowemustfindawayto
usethesamplemean,whichwecancalculate.Noticealsothatthevariancehasunits
ofthequantitysquared,forexamplem
2
ors
2
iftheoriginalquantitywasmeasuredin
metersorseconds,respectively.Wewillfindlaterthatthevarianceisanimportant
parameterinprobabilitydistributionsusedwidelyinpractice.
(f) StandardDeviation
Thestandarddeviationisextremelyimportant.Itisdefinedasthesquarerootofthe
variance:
N
2

(
x
i

)
i1 (3.7)

N
Thus,ithasthesameunitsastheoriginaldataandisarepresentativeofthedevia-
tionsfromthemean.Becauseofthesquaring,itgivesmoreweighttolarger
deviationsthantosmallerones.Sincethevarianceisthemeansquareofthedevia-
tionsfromthepopulationmean,thestandarddeviationistheroot-mean-square
deviationfromthepopulationmean.Root-mean-squarequantitiesarealsoimportant
indescribingthealternatingcurrentofelectricity.Ananalogycanbedrawnbetween
thestandarddeviationandtheradiusofgyrationencounteredinappliedmechanics.
(g) EstimationofVarianceandStandardDeviationfromaSample
Thedefinitionsofequations3.6and3.7canbeapplieddirectlyifwehavedatafor
thecompletepopulation.Butusuallywehavedataforonlyasampletakenfromthe
population.Wewanttoinferfromthedataforthesampletheparametersforthe
population.Itcanbeshownthatthesamplemean, x,isanunbiasedestimateofthe
populationmean,.Thismeansthatifverylargerandomsamplesweretakenfrom
thepopulation,thesamplemeanwouldbeagoodapproximationofthepopulation
mean,withnosystematicerrorbutwitharandomerrorwhichtendstobecome
smallerasthesamplesizeincreases.
46
DescriptiveStatistics:SummaryNumbers
However,ifwesimplysubstitute x forinequations3.6and3.7,therewillbea
systematicerrororbias.Thisprocedurewouldunderestimatethevarianceand
standarddeviationofthepopulation.Thisisbecausethesumofsquaresofdeviations
fromthesamplemean, x,issmallerthanthesumofsquaresofdeviationsfromany
otherconstantvalue,including. x isanunbiasedestimateof,butingeneral
x ,sojustsubstituting x forinequations3.6and3.7wouldtendtogive
estimatesofvarianceandstandarddeviationthataretoosmall.Toillustratethis,
considerthefournumbers11,13,10,and14asasample.Theirsamplemeanis12.
Theymightwellcomefromapopulationofmean13.Thenthesumofsquaresof
2
deviationsfromthepopulationmean,

(
x
i

) =(1113)
2
+(1313)
2
+(10
i
2
13)
2
+(1413)
2
=2
2
+0
2
+3
2
+1
2
=14,whereas

(
x
i
x
) =(1112)
2
+(13
i 2

(
x
i
x
)
12)
2
+(1012)
2
+(1412)
2
=1
2
+1
2
+2
2
+2
2
=10.Thus,
i
would
underestimatethevariance.
N
Theestimateofvarianceobtainedusingthesamplemeaninplaceofthepopulation
j N \
meancanbemadeunbiasedbymultiplyingbythefactor
(
,
N1
(
,
.Thisiscalled
Besselscorrection.Theestimateof
2
isgiventhesymbols
2
andiscalledthe
varianceestimatedfromasample,ormorebrieflythesamplevariance.Sometimes
thisestimatewillbehigh,sometimesitwillbelow,butinthelongrunitwillshow
nobiasifsamplesaretakenrandomly.TheresultofBesselscorrectionisthatwe
have
N
2

(
x
i
x
)
s
2

i1 (3.8)
N1
Thestandarddeviationisalwaysthesquarerootofthecorrespondingvariance,
sosiscalledthesamplestandarddeviation.Itistheestimatefromasampleofthe
standarddeviationofthepopulationfromwhichthesamplecame.Thesample
standarddeviationisgivenby
N
2

(
x
i
x
)
s
2

i1 (3.9)
N1
Equations3.8and3.9(ortheirequivalents)shouldbeusedtocalculatethe
varianceandstandarddeviationfromasampleunlessthepopulationmeanisknown.
Ifthepopulationmeanisknown,aswhenweknowallthemembersofthepopula-
tion,weshoulduseequations3.6and3.7directly.NoticethatwhenNisverylarge,
Besselscorrectionbecomesapproximately1,sothenitmightbeneglected.How-
47
Chapter3
ever,toavoiderrorweshouldalwaysuseequations3.8and3.9(ortheirequivalents)
unlessthepopulationmeanisknownaccurately.
(h) MethodforFasterCalculation
Amodificationofequations3.6to3.9makescalculationofvarianceandstandard
deviationfaster.Inmostcasesinthisbookwehaveomittedderivations,butthiscase
isanexceptionbecausethealgebraissimpleandmaybehelpful.
Equations3.8and3.9includetheexpression
2
2
2

(
x
i
x
)

x
i
2x

x + Nx
i

x
i
Butbydefinition x
N
Thenwehave
2 2
2 2
N
(

x
i
)

(
x
i
x
)

x
i

2
(

x
i
)
+
N
2
N
2
(

x
i
) 2

x
i

N (3.10)
2
Noticethat

x
i
meansweshouldsquareallthexsandthenaddthemup.Onthe
2
otherhand,
(

x
i
)
meansweshouldaddupallthexsandsquaretheresult.They
arenotthesame.
Analternativetoequation3.10is
2
2
2

(
x
i
x
)

x
i
N
( )
(3.10a) x
Thenwehave
N
j \
2
,
x
i (
N N
2
2

x
2

( i1 ,

x
i
N
( )
i
s
2

i1
N

i1
x
(3.11)
N1 N1
Itisoftenconvenienttouseequation3.11intheformforfrequencies:
2
2

f x
(

f x
i i
)
/
(

f
i
)
s
(

f
i
1
)
(3.12)
2
i i
48
DescriptiveStatistics:SummaryNumbers
N
2
Equations3.6and3.7include
(
x
i

)
,whereforacompletepopulation
i1
N
1
x
i .Thensimilarexpressionstoequations3.10to3.12(butdividingbyN
N
i1
insteadof(N1))applyforcaseswherethecompletepopulationisknown.
Themodifiedequationssuchasequation3.11or3.12shouldbeusedforcalcula-
tionofvariance(andthesquarerootofoneofthemshouldbeusedforcalculationof
standarddeviation)byhandorusingagoodpocketcalculatorbecauseitinvolves
fewerarithmeticoperationsandsoisfaster.However,somethoughtisrequiredifa
digitalcomputerisused.Thatisbecausesomecomputerscarryrelativelyfew
N
2
significantfiguresinthecalculation.Sinceinequation3.11thequantities
x
i and
i1
N
j \
2
,
x
i (
( i1 , or N x
2
areofsimilarmagnitudes,thedifferencesinequation3.11may
N
( )
involvecatastrophiclossofsignificancebecauseofroundingoffiguresinthecompu-
tation.Mostpresent-daycomputersandcalculators,however,carryenough
significantfiguressothatthislossofsignificanceisnotusuallyaseriousproblem,
butthepossibilityofsuchadifficultyshouldbeconsidered.Itcanoftenbeavoided
bysubtractingaconstantquantityfromeachnumber,anoperationwhichdoesnot
changethevarianceorstandarddeviation.Forexample,thevarianceof3617.8,
3629.6,and 3624.9isexactlythesameasthevarianceof17.8,29.6,and24.9.
However,thenumberoffiguresinthesquaredtermsismuchsmallerinthesecond
case,sothepossibilityoflossofsignificanceisgreatlyreduced.Theningeneral,
fewerfiguresarerequiredtocalculatevariancebysubtractingthemeanfromeachof
thevalues,thensquaring,adding,anddividingbythenumberofitems(i.e.,using
equation3.8directly),butthisaddstothenumberofarithmeticoperationsandso
requiresmoretimeforcalculations.Ifthecalculatingdevicecarriesenoughsignifi-
cantfigurestoallow3.11or3.12tobeused,thatisthepreferredmethod.
MicrosoftExcelcarriesaprecisionofabout15decimaldigitsineachnumerical
quantity.Statisticalcalculationsseldomrequiregreaterprecisioninanyfinalanswer
thanfourorfivedecimaldigits,solossofsignificanceisveryseldomaproblemif
Excelisbeingused.Acomparisontoverifythatstatementinaparticularcasewillbe
includedinExample4.4.
(i) IllustrationofCalculation
Nowletusreturntoanexampleofcalculationsusingthegroupsofnumberslistedat
thebeginningofsection3.2.
Example3.1
Thenumberswereasfollows:
49
Chapter3
GroupA: 2,3,4,8
GroupB: 1,2,4,10
GroupC: 0,1,5,11
Findthesamplevarianceandthesamplestandarddeviationofeachgroupofnum-
bers.Usebothequation3.8andequation3.11tocheckthattheygivethesameresult.
Answer:SincethemeanofGroupA(andalsooftheothergroups)is4.25,the
samplevarianceofGroupAusingthebasicdefinition,equation3.8,is
[(24.25)
2
+(34.25)
2
+(44.25)
2
+(84.25)
2
]/(41)
=[5.0625+1.5625+0.0625+14.0625]/3=20.75/3=6.917,
sothesamplestandarddeviationis 6.917 2.630.
ThevarianceofGroupAcalculatedbyequation3.11is
[2
2
+3
2
+4
2
+8
2
(4)(4.25)
2
]/(41)=[4+9+16+6472.25]/3=6.917
(again).Wecanseethattheadvantageofequation3.11isgreaterwhenthemeanis
notasimpleinteger.
Usingequation3.11onGroupBgives
[1
2
+2
2
+4
2
+10
2
(4)(4.25)
2
]/(41)=[1+4+16+10072.25]/3=48.75/3=16.25
forthesamplevariance,sothesamplestandarddeviationis4.031.
Usingequation3.11onGroupCgives
[0
2
+1
2
+5
2
+11
2
(4)(4.25)
2
]/(41)=[0+1+25+12172.25]/3=74.75/3=24.917
forthevariance,sothestandarddeviationis4.992.
(j) CoefficientofVariation
Adimensionlessquantity,thecoefficientofvariationistheratiobetweenthestan-
darddeviationandthemeanforthesamesetofdata,expressedasapercentage.This
canbeeither(/)or(s/
x
),whicheverisappropriate,multipliedby100%.
(k) Illustration:AnAnecdote
Abriefstorymayhelpthereadertoseewhyvariabilityisoftenimportant.Some
yearsagoacompanywasproducingnickelpowder,whichvariedconsiderablyin
particlesize.Ametallurgicalengineerintechnicalsaleswasgiventhetaskofdevel-
opingnewcustomersinthealloysteelindustryforthepowder.Somepotential
buyerssaidtheywouldpayapremiumpriceforaproductthatwasmoreclosely
sized.Aftersomediscussionwiththemanagementoftheplant,specificationsfor
threenewproductsweredeveloped:finepowder,mediumpowder,andcoarsepow-
der.Anorderwasobtainedforfinepowder.Althoughthespecificationsforthisfine
powderwerewithinthesizerangeofpowderwhichhadbeenproducedinthepast,
theengineersintheplantfoundthatverylittleofthepowderproducedattheirbest
50
DescriptiveStatistics:SummaryNumbers
guessoftheoptimumconditionswouldsatisfythespecifications.Thus,themean
sizeofthespecificationwassatisfactory,butthespecifiedvariabilitywasnot
satisfactoryfromthepointofviewofproduction.Tomakeproductionoffinepowder
morepractical,itwasnecessarytochangethespecificationsforfinepowderto
correspondtoalargerstandarddeviation.Whenthiswasdone,theplantcould
producefinepowdermuchmoreeasily(butthecustomerwasnotwillingtopaysuch
alargepremiumforit!).
3.3 Quartiles,Deciles,Percentiles,andQuantiles
Quartiles,deciles,andpercentilesdivideafrequencydistributionintoanumberof
partscontainingequalfrequencies.Theitemsarefirstputintoorderofincreasing
magnitude.Quartilesdividetherangeofvaluesintofourparts,eachcontainingone
quarterofthevalues.Again,ifanitemcomesexactlyonadividingline,halfofitis
countedinthegroupaboveandhalfiscountedbelow.Similarly,decilesdivideinto
tenparts,eachcontainingonetenthofthetotalfrequency,andpercentilesdivideinto
ahundredparts,eachcontainingonehundredthofthetotalfrequency.Ifwethink
againaboutthemedian,itisthesecondormiddlequartile,thefifthdecile,andthe
fiftiethpercentile.Ifaquartile,decile,orpercentilefallsbetweentwoitemsinorder
ofsize,forourpurposesthevaluehalfwaybetweenthetwoitemswillbeused.Other
conventionsarealsocommon,buttheeffectofdifferentchoicesisusuallynot
important.Rememberthatwearedealingwithaquantitywhichvariesrandomly,so
anothersamplewouldlikelyshowadifferentquartileordecileorpercentile.
Forexample,iftheitemsafterbeingputinorderare1,2,2,3,5,6,6,7,8,a
totalofnineitems,thefirstorlowerquartileis(2+2)/2=2,themedianis5,andthe
upperorthirdquartileis(6+7)/2=6.5.
Example3.2
Tostartaprogramtoimprovethequalityofproductioninafactory,alltheproducts
comingoffaproductionline,underwhatwehavereasontobelievearenormal
operatingconditions,areexaminedandclassifiedasgoodproductsordefective
products.Thenumberofdefectiveproductsineachsuccessivegroupofsixis
counted.Theresultsfor60groups,sofor360products,areshowninTable3.1.Find
themean,median,mode,firstquartile,thirdquartile,eighthdecile,ninthdecile,
proportiondefectiveinthesample,firstestimateofprobabilitythatanitemwillbe
defective,samplevariance,samplestandarddeviation,andcoefficientofvariation.
Table3.1:NumbersofDefectivesinGroupsofSixItems
1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 1 0
0 1 0 0 1 0 0 0 0 2 0 0
0 0 0 0 2 0 0 1 0 0 1 0
1 0 0 0 0 1 0 0 1 0 0 0
51
Chapter3
Answer: ThedatainTable3.1canbesummarizedintermsoffrequencies.Ifx
i
representsthenumberofdefectivesinagroupofsixproductsandf
i
representsthe
frequencyofthatoccurrence,Table3.2isasummaryofTable3.1.
Table3.2:FrequenciesforNumbersofDefectives
Numberofdefectives,x
i
Frequency,f
i
0 48
1 10
2 2
>2 0
Thenthemeannumberofdefectivesinagroupofsixproductsis
(
48
)(
0
)
+
(
10
)(
1
)
+
( )( ) 14
2 2
0.233
48+10+ 2 60
Noticethatthemeanisnotnecessarilyapossiblememberoftheset:inthiscasethe
meanisafraction,whereaseachnumberofdefectivesmustbeawholenumber.
Amongatotalof60products,themedianisthevaluebetweenthe30thand31st
products inorderofincreasingmagnitude,so(0+0)/2=0.
Themodeisthemostfrequentvalue,so0.
Thelowerorfirstquartileisthevaluebetweenthe15thand16thproductsin
orderofsize,thusbetween0and0,so0.Theupperorthirdquartileisthevalue
betweenthe45thand46thproductsinorderofsize,thusbetween0and0,soagain
0.Theeighthdecileisthevaluelargerthanthe48thitemandsmallerthanthe49th
item,sobetween0and1,or0.5.Theninthdecileisthevaluebetweenthe54thand
the55thproducts,sobetween1and1,so1.
Wehave14defectiveproductsinasampleof360items,sotheproportion
defectiveinthissampleis14/360=0.0389or0.039.Aswehaveseenfromsection
2.1,proportionorrelativefrequencygivesanestimateofprobability.Thenwecan
estimatetheprobabilitythatanitem,chosenrandomlyfromthepopulationfrom
whichthesamplecame,willbedefective.Forthissamplethatfirstestimateofthe
probabilitythatarandomlychoseniteminthepopulationwillbedefectiveis0.039.
Thisestimateisnotveryprecise,butitwouldgetbetterifthesizeofthesamplewere
increased.
Nowletuscalculatethesamplevarianceandstandarddeviationusingequation3.12:
f
i
x
i
2
=(48)(0)
2
+(10)(1)
2
+(2)(2)
2
=18
f
i
x
i
=(48)(0)+(10)(1)+(2)(2)=14
f
i
=48+10+2=60
2
2

f x
(

f x
i
)
/
(

f
i
)
i
Thenfromequation3.12,s
2

i i
(

f
i
1
)
,
52
DescriptiveStatistics:SummaryNumbers
2
18
(
( )
/ 60
)
14
whichgivess
2
0.2497,
601
sos=0.4997or0.500.
s j \
, (
(
100%
)

j
,
0.4997
(
\
(
100%
)
214%.
Thecoefficientofvariationis
( , (
0.2333
,
x
Thegeneraltermforaparameterwhichdividesafrequencydistributionintoparts
containingstatedproportionsofadistributionisaquantile.ThesymbolQ(f)isused
forthequantile,whichislargerthanafractionfofadistribution.Thenalower
quartileisQ(0.25)orQ(1/4),andanupperquartileisQ(0.75).
Infact,ifitemsaresortedinorderofincreasingmagnitude,fromthesmallestto
thelargest,eachitemcanbeconsideredsomesortofquantile,onadividinglineso
thathalfoftheitemisabovethelineandhalfbelow.Thentheithitemofatotalofn
j i 0.5\
itemsisaquantilelargerthan(i0.5)itemsofthen,sothe
,
,
(
quantileor
(
n
j i 0.5\
Q
, (
.Saythesorteditemsare1,4,5,6,7,8,9,atotalofsevenitems.Think
(
n
,
ofeachoneasbeingexactlyonadividingline,sohalfaboveandhalfbelowtheline.
Thentheseconditem,4,islargerthanone-and-ahalfitemsoftheseven,sowecan
1.5
callitthe quantileorQ(0.21).Similarly,5islargerthantwo-and-a-halfitemsof
7
2.5
theseven,soitisthe quantileorQ(0.36).Forpurposesofillustrationweare
7
usingsmallsetsofnumbers,butquantilesareusefulinpracticeprincipallytocharac-
terizelargesetsofdata.
Sinceproportionfromasetofdatagivesanestimateofthecorresponding
j i 0.5\
probability,thequantile
Q
, (
givesanestimateoftheprobabilitythata
(
n
,
variableissmallerthantheithiteminorderofincreasingmagnitude.Ifanitemis
repeated,wehavetwoseparateestimatesofthisprobability.
Wecanalsousethegeneralrelationtofindvariousquantiles.Ifwehaveatotal
j i 0.5\
ofnitems,then
Q
, (
willbegivenbytheithitem,evenifiisnotaninteger.
(
n
,
j \ 1
Consideragainthesevenitemswhichare1,4,5,6,7,8,9.Themedian,
Q
, (
,would
2
( ,
i 0.5 1
j \ 1
7 betheitemforwhich

,so i
( )
, (
+ 0.5 4;thatis,thefourthitem,
7 2
( ,
2
whichis6.Thatagreeswiththedefinitiongiveninsection3.1.Now,whatisthefirst
orlowerquartile?Thiswouldbeavaluelargerthanonequarteroftheitems,or
i 0.5 1
j \ 1
7 Q(0.25).Then

,so i
( )
, (
+ 0.5 2.25. Sincethisisafraction,the
7 4
( ,
4
53
Chapter3
firstquartilewouldbebetweenthesecondandthirditemsinorderofmagnitude,so
between4and5.Thenbyourconventionwewouldtakethefirstquartileas4.5.
i 0.5 3
Similarly,forthethirdquartile,Q(0.75),sowehave

,i=5.75,andthe
7 4
thirdquartileisbetweenthe5thand6thitemsinorderofmagnitude(7and8)andso
istakenas(7+8)/2=7.5.
Example3.3
Considerthesampleconsistingofthefollowingnineresults:
2.3,7.2,3.7,4.6,5.0,7.0,3.7,4.9,4.2.
a) Findthemedianofthissetofresultsbytwodifferentmethods.
b) Findthelowerquartile.
c) Findtheupperquartile.
d) Estimatetheprobabilitythatanitem,fromthepopulationfromwhichthis
samplecame,wouldbelessthan4.9.
e) Estimatetheprobabilitythatanitemfromthatpopulationwouldbelessthan3.7.
Answer: Thefirststepistosortthedatainorderofincreasingmagnitude,giving
thefollowingtable:
i 1 2 3 4 5 6 7 8 9
x(i) 2.3 3.7 3.7 4.2 4.6 4.9 5 7 7.2
a) Thebasicdefinitionofthemedianasthemiddleitemaftersortinginorderof
i 0.5
increasingmagnitudegivesx(5)=4.6.Putting =0.5givesi =
9
(9)(0.5)+0.5=5,soagainthemedianisx(5)=4.6.
i 0.5
b) Thelowerquartileisobtainedbyputting =0.25,whichgives
9
i=(9)(0.25)+0.5=2.75.Sincethisisafraction,thelowerquartileis
2 x( )
+ x(
3
) 3.7+ 3.7
3.7.
2 2
i 0.5
c) Theupperquartileisobtainedbyputting =0.75,whichgivesi =
9
(9)(0.75)+0.5=7.25.Sincethisisagainafraction,theupperquartileis
7
+
x( )
+ x(
8
) 5 7
6 .
2 2
d) Probabilitiesofvaluessmallerthanthevariousitemscanbeestimatedasthe
correspondingfractions.4.9isthe6thitemofthe9itemsinorderofincreasing
6 0.5
magnitude,and =0.61.Thentheprobabilitythatanitem,fromthe
9
populationfromwhichthissamplecame,wouldbelessthan4.9isestimatedto
be0.61.
54
DescriptiveStatistics:SummaryNumbers
e) 3.7istheitemoforderboth2and3,sowehavetwoestimatesoftheprobability
2 0.5
thatanitemfromthesamepopulationwouldbelessthan3.7.Theseare

9
3 0.5
and ,or0.17and0.28.
9
3.4 UsingaComputertoCalculateSummaryNumbers
Apersonalcomputer,eitheraPCoraMac,isveryfrequentlyusedwithaspreadsheet
tocalculatethesummarynumberswehavebeendiscussing.Oneofthespreadsheets
usedmostfrequentlybyengineersisMicrosoftExcel,whichincludesagood
numberofstatisticalfunctions.Excelwillbeusedinthecomputermethodsdis-
cussedinthisbook.
Usingacomputercancertainlyreducethelaborofcharacterizingalargesetof
data.Inthissectionwewillillustrateusingacomputertocalculateusefulsummary
numbersfromsetsofdatawhichmightcomefromengineeringexperimentsor
measurements.Theinstructionswillassumethereaderisalreadyreasonablyfamiliar
withMicrosoftExcel;ifnot,heorsheshouldrefertoareferencebookonExcel;a
numberareavailableatmostbookstores.Someofthemaintechniquesusefulin
statisticalcalculationsandrecommendedforuseduringthelearningprocessare
discussedbrieflyinAppendixB.Calculationsinvolvingformulas,functions,sorting,
andsummingareamongthecomputertechniquesmostusefulduringboththe
learningprocessandsubsequentapplications,sotheyandsimpletechniquesfor
producinggraphsarediscussedinthatappendix.Furthermore,inAppendixCthereis
abrieflistingofmethodswhichareusefulinpracticeforExceloncetheconceptsare
thoroughlyunderstood,buttheyshouldnotbeusedduringthelearningprocess.
TheHelpfeatureonExcelisveryusefulandconvenient.Accesstoitcanbe
obtainedinvariousways,dependingontheversionofExcelwhichisbeingused.
ThereisusuallyaHelpmenu,andsometimesthereisaHelptool(markedbyan
arrowandaquestionmark,orjustaquestionmark).
Furtherdiscussionandexamplesoftheuseofcomputersinstatisticalcalcula-
tionswillbefoundinsection4.5,Chapter4.Someprobabilityfunctionswhichcan
beevaluatedusingExcelwillbediscussedinlaterchapters.
Example3.4
Thenumbersgivenatthebeginningofsection3.2wereasfollows:
GroupA:
GroupB:
GroupC:
2,3,4,8
1,2,4,10
0,1,5,11
55
Chapter3
Findthesamplevarianceandthesamplestandarddeviationofeachgroupofnumbers.
Usebothequation3.8andequation3.11tocheckthattheygivethesameresult.This
exampleismostlythesameasExample3.1,butnowitwillbedoneusingExcel.
Answer:
Table3.3:ExcelWorksheetforExample3.4
A B C D E
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
GroupA GroupB GroupC
Entries 2 1 0
3 2 1
4 4 5
8 10 11
Sum 17 17 17
Arith. Mean C6/4=, etc. 4.25 4.25 4.25
Deviations C2-$C$7=, etc. 2.25 3.25 4.25
C3-$C$7=, etc. 1.25 2.25 3.25
C4-$C$7=, etc. 0.25 0.25 0.75
C5-$C$7=, etc. 3.75 5.75 6.75
Deviations Sqd C8^2=,etc 5.0625 10.5625 18.0625
1.5625 5.0625 10.5625
0.0625 0.0625 0.5625
14.0625 33.0625 45.5625
SumDevnSqd Sums 20.75 48.75 74.75
Variance C17/3=, etc. 6.917 16.25 24.92
EntriesSqd C2^2=, etc. 4 1 0
9 4 1
16 16 25
64 100 121
SumEntriesSqd Sums 93 121 147
Correction 4*C7^2=, etc. 72.25 72.25 72.25
Corrected Sum C24-C25=, etc. 20.75 48.75 74.75
Variance C26/3=, etc. 6.917 16.25 24.92
StdDev,s SQRT(C27)=, etc. 2.630 4.031 4.992
56
DescriptiveStatistics:SummaryNumbers
TheworksheetisshowninTable3.3.ThelettersA,B,C,etc.acrossthetopare
thecolumnreferences,andthenumbers1,2,3,etc.ontheleft-handsidearetherow
references.TheheadingsforGroupsA,B,andCwereplacedincolumnsC,D,andE
ofrow1.NamesofquantitieswereplacedincolumnA.Statementsofformulasare
givenincolumnB.TheindividualentriesorvalueswereplacedincellsC2:E5,that
is,rows2to5ofcolumnsCtoE.CellC6wasselected,andtheAutoSumtool(see
section(d)ofAppendixB)wasusedtofindthesumoftheentriesinGroupA.The
sumsoftheentriesintheothertwogroupswerefoundsimilarly.Notethatthe
AutoSumtoolmaynotchoosetherightsetofcellstobesummedincellE6.CellC7
wasselected,andtheformula=C6/4wastypedintoitandentered,givingtheresult
4.25.ThentheformulaincellC7wascopied,thenpastedintocellD7(toappearas
=D6/4becauserelativereferenceswereused)andentered;thesamecontentwas
pastedintocellE7as=E7/4andentered.Againbothresultswere4.25.
N
2

(
x
i
x
)
2 i1
Accordingtoequation3.8thesamplevarianceisgivenby
s .
N1
Deviationsfromthearithmeticmeanswerecalculatedinrows8to11.CellC8was
selected,andtheformula=C2$C$7wastypedintoitandentered,givingtheresult
2.25.Noticethatnow,althoughthereferenceC2isrelative,thereference$C$7is
absolute.ThenwhentheformulaincellC8wascopied,thenpastedintocellC9,the
formulabecame=C3$C$7;theformulawasentered,givingtheresult1.25.
PastingtheformulaintocellsC10andC11andenteringgavetheresults0.25and
(+)3.75.Similarly,theformula=D2$D$7wasenteredincellD8andcopiedto
cellsD9,D10,D11andenteredineachcase.Asimilarformulawasenteredincell
E8,copiedseparatelytocellsE9,E10,E11,andenteredineach.
Deviationsweresquaredinrows13to16.Theformula=C8^2incellC13was
copiedtocellsD13andE13,andsimilaroperationswerecarriedoutincells
C14:E14,C15:E15,andC16:E16.DeviationsweresummedusingtheAutoSumtool
incellsC17:E17,butwehavetobecarefulagainwiththesumincellE17.Then
variancesarethequantitiesincellsC17:E17dividedineachcaseby41=3.
ThereforetheformulaC17/3wasenteredincellC18,thencopiedtocellD18and
modifiedtoD17/3beforebeingentered,andsimilarlyforcellE18.Asthequantities
incellsC18:E18wereanswerstospecificquestions,theywereputinboldtypeby
choosingtheBoldtool(markedwithB)onthestandardtoolbar.Furthermore,they
wereputinaformatwiththreedecimalplacesbychoosingtheFormatmenu,the
Numberformat,Number,thenwritinginthecode0.000beforechoosingOKor
Return.Thisgavetheanswersaccordingtoequation3.8.
N
2
2

x
i
N
( )
x
2 i1
Accordingtoequation3.11thesamplevarianceisgivenby
s .
N1
57
Chapter3
SquaresofentrieswereplacedincellsC20:E23byentering=C2^2incellC20,
copying,thenpastingincellsD20andE20,andrepeatingwithmodificationsin
C21:E21, C22:E22,andC23:E23.Thesquaresofentriesweresummedusingthe
AutoSumtoolincellsC24,D24,andE24.Fourtimesthesquaresofthearithmetic
means,4*C7^2,4*D7^2,and4*E7^2,wereenteredincellsC25,D25,andE25
respectively.Thesequantitiesweresubtractedfromthesumsofsquaresofentriesby
entering=C24-C25incellC26,andcorrespondingquantitiesincellsD26andE26.
Thenvaluesofvarianceaccordingtoequation3.11werefoundincellsC27,D27,
andE27.Thesealsowereputinboldtypeandformattedforthreedecimalplaces.
Finally,standarddeviationswerefoundincellsC28,D28,andE28bytakingthe
squarerootsofthevariancesincellsC27,D27,andE27.Asanswers,thesealsowere
putinboldtypeandformattedforthreedecimals.
Theresultsverifythatequations3.8and3.11givethesameresults,butequation
3.11generallyinvolvesfewerarithmeticoperations.
UsingExcelonacomputercansaveagooddealoftimeifthedatasetislarge,
butifasherethedatasetissmall,handcalculationsareprobablyquicker.Resultsof
experimentalstudiesoftengiveverybigdatasets,socomputercalculationsarevery
oftenadvantageous.
Example3.5
Tostartaprogramtoimprovethequalityofproductioninafactory,alltheitems
comingoffaproductionline,underwhatwehavereasontobelievearenormal
operatingconditions,areexaminedandclassifiedasgooditemsordefective
items.Thenumberofdefectiveitemsineachsuccessivegroupofsixiscounted.The
resultsfor60groups,360items,areshowninTable3.4.Findthemean,median,
mode,firstquartile,thirdquartile,eighthdecile,ninthdecile,proportiondefectivein
thesample,firstestimateofprobabilitythatanitemwillbedefective,samplevari-
ance,samplestandarddeviation,andcoefficientofvariation.
Table3.4:NumbersofDefectivesinGroupsofSixItems
1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 1 0
0 1 0 0 1 0 0 0 0 2 0 0
0 0 0 0 2 0 0 1 0 0 1 0
1 0 0 0 0 1 0 0 1 0 0 0
ThisisthesameasExample3.2,butnowwewilluseExcel.
Answer: ThedataofTable3.4wereenteredincolumnAofanExcelworksheet;
extractsareshowninTable3.5.ThesedatawerecopiedtocolumnB,thensortedin
ascendingorderasdescribedinsection(c)ofAppendixB.Theordernumberswere
58
DescriptiveStatistics:SummaryNumbers
obtainedincolumnCusingtheAutoFillfeaturewiththefillhandle,asalsode-
scribedinthatsectionofAppendixB.Rows3to62showpartofthediscretedataof
Example3.2aftersortingandnumberingonMicrosoftExcel.
Table3.5:ExtractsofWorkSheetforExample3.5
A B C D E
1
2
3
4
5
NumbersofDefectiveItems
Unsorted Sorted OrderNo.
1 0 1
0 0 2
0 0 3
.. .. .. .. ..
49
50
51
52
1 0 47
0 0 48
1 1 49
0 1 50
.. .. .. .. ..
60
61
62
0 1 58
0 2 59
0 2 60
64
65
66
67
68
69
70
71
72
73
74
75
Number Frequency
xi fi xi*fi xi^2*fi
A67*B67=, etc. A67^2*B67=,
etc.
0 48 0 0
1 10 10 10
2 2 4 8
Total=SUM 60 14 18
xbar= C70/B70= 0.233
s^2= (D70-(C70^2/B70))/(B70-1)= 0.250
s= SQRT(E73)= 0.500
Coeff.ofvar.= E74/E72= 214%
59
Chapter3
WiththesorteddataincolumnBofTable3.7andtheordernumbersincolumn
C,itiseasytopickoffthefrequenciesofvariousnumbersofdefectives.Thus,the
numberofgroupscontainingzerodefectivesis48,thenumbercontainingone
defectiveis5848=10,andthenumbercontainingtwodefectivesis6058=2.
Theresultingnumbersofdefectivesandthefrequencyofeachweremarkedincells
A64:B69.Themodeisthenumberofdefectiveswiththelargestfrequency,soitis0
inthisexample.Productsx
i
*f
i
andx
i
2
*f
i
werefoundincellsC65:D69.Theformulas
wereenteredintheformforrelativereferencesincellsC67andD67,socopying
themoneandtwolinesbelowgaveappropriateproducts.ThentheAutosumtool
(marked)onthestandardtoolbarwasusedtosumthecolumnsforeachoff
i
,x
i
f
i
,
andx
i
2
f
i
andentertheresultsinrow70.Thesumofthecalculatedfrequenciesshould
checkwiththetotalnumberofgroups,whichis60inthiscase.Thenfrom

f x
i i
14
equation3.2, x

f
i
0.233 incellE72.Fromequation3.12,
60
2 2
2

f x
(

f x
i
)
/

f
i
18
( )
/ 60
i
s
2

i i

f
i
1

14
0.250 incellE73,andthesample
60 1
standarddeviation,s,isfoundincellE74,witharesultof0.500.Thecoefficientof
variationisgivenincellE75as214%.Ofcourse,allquantitiesmustbeclearly
labeledonthespreadsheet.Labelsareshowninrows1,2,64,65,70,and72to75,
andexplanationsaregiveninrows66and72to75.
Problems
1. Thesamedimensionwasmeasuredoneachofsixsuccessivepartsastheycame
offaproductionline.Theresultswere21.14mm,21.87mm,21.53mm,21.37
mm,21.61mmand21.93mm.Calculatethemeanandmedian.
2. Forthemeasurementsgiveninproblem1above,findthevariance,standard
deviation,andcoefficientofvariation
a) consideringthissetofvaluesasacompletepopulation,and
b) consideringthissetofvaluesasasampleofallpossiblemeasurementsof
thisdimension.
3. Fouritemsinasequenceweremeasuredas50,160,100,and400mm.Findtheir
arithmeticmean,geometricmean,andmedian.
4. Thetemperatureinachemicalreactorwasmeasuredeveryhalfhourunderthe
sameconditions.Theresultswere78.1C,79.2C,78.9C,80.2C,78.3C,
78.8C,79.4C.Calculatethemean,median,lowerquartile,andupperquartile.
5. Forthetemperaturesofproblem4,calculatethevariance,standarddeviation,and
coefficientofvariation
a) consideringthissetofvaluesasacompletepopulation,and
b) consideringthissetofvaluesasasampleofallpossiblemeasurementsofthe
temperatureundertheseconditions.
60
DescriptiveStatistics:SummaryNumbers
6. Thetimestoperformaparticularstepinaproductionprocessweremeasured
repeatedly.Thetimeswere20.3s,19.2s,21.5s,20.7s,22.1s,19.9s,21.2s,
20.6s.Calculatethearithmeticmean,geometricmean,median,lowerquartile,
andupperquartile.
7. Forthetimesofproblem6,calculatethevariance,standarddeviation,and
coefficientofvariation
a) consideringthissetofvaluesasacompletepopulation,and
b) consideringthissetofvaluesasasampleofallpossiblemeasurementsofthe
timesforthisstepintheprocess.
8. Thenumbersofdefectiveitemsinsuccessivegroupsoffifteenitemswere
countedastheycameoffaproductionline.Theresultscanbesummarizedas
follows:
No.ofDefectives Frequency
0 57
1 57
2 18
3 5
4 3
>4 0
a) Calculatethemeannumberofdefectivesinagroupoffifteenitems.
b) Calculatethevarianceandstandarddeviationofthenumberofdefectivesin
agroup.Takethegivendataasasample.
c) Findthemedian,lowerquartile,upperquartile,ninthdecile,and95th
percentile.
d) Onthebasisofthesedataestimatetheprobabilitythatthenextitempro-
ducedwillbedefective.
9. Electricalcomponentswereexaminedastheycameoffaproductionline.The
numberofdefectiveitemsineachgroupofeighteencomponentswasrecorded.
Theresultscanbesummarizedasfollows:
No.ofDefectives Frequency
0 94
1 52
2 19
3 3
>3 0
a) Calculatethemeannumberofdefectivesinagroupof18components.
b) Takingthegivendataasasample,calculatethevarianceandstandard
deviationofthenumberofdefectivesinagroup.
c) Findthemedian,lowerquartile,upperquartile,and95thpercentile.
e) Onthebasisofthesedata,estimatetheprobabilitythatthenextcomponent
producedwillbedefective.
61
Chapter3
ComputerProblems
UseMSExcelinsolvingthefollowingproblems:
C10.Thenumbersofdefectiveitemsinsuccessivegroupsoffifteenitemswere
countedastheycameoffaproductionline.Theresultscanbesummarizedasfollows:
No.ofDefectives Frequency
0 57
1 57
2 18
3 5
4 3
>4 0
a) Calculatethemeannumberofdefectivesinagroupoffifteenitems.
b) Calculatethevarianceandstandarddeviationofthenumberofdefectivesin
agroup.Takethegivendataasasample.
c) Findthemedian,lowerquartile,upperquartile,ninthdecile,and95th
percentile.
d) Onthebasisofthesedataestimatetheprobabilitythatthenextitempro-
ducedwillbedefective.
ThisisthesameasProblem8,butnowitistobesolvedusingExcel.
C11.Electricalcomponentswereexaminedastheycameoffaproductionline.The
numberofdefectiveitemsineachgroupofeighteencomponentswasrecorded.The
resultscanbesummarizedasfollows:
No.ofDefectives Frequency
0 94
1 52
2 19
3 3
>3 0
a) Calculatethemeannumberofdefectivesinagroupof18components.
b) Takingthegivendataasasample,calculatethevarianceandstandard
deviationofthenumberofdefectivesinagroup.
c) Findthemedian,lowerquartile,upperquartile,and95thpercentile.
e) Onthebasisofthesedata,estimatetheprobabilitythatthenextcomponent
producedwillbedefective.
ThisisthesameasProblem9,butnowitistobesolvedusingExcel.
62
CHAPTER
4
GroupedFrequenciesand
GraphicalDescriptions
Prerequisite:Agoodknowledgeofalgebra.
LikeChapter3,thischapterconsiderssomeaspectsofdescriptivestatistics.Inthis
chapterwewillbeconcernedwithstem-and-leafdisplays,boxplots,graphsfor
simplesetsofdiscretedata,groupedfrequencydistributions,andhistogramsand
cumulativedistributiondiagrams.
4.1 Stem-and-LeafDisplays
Thesesimpledisplaysareparticularlysuitableforexploratoryanalysisoffairlysmall
setsofdata.Thebasicideaswillbedevelopedwithanexample.
Example4.1
Datahavebeenobtainedonthelivesofbatteriesofaparticulartypeinanindustrialapp-
lication.Table4.1showsthelivesof36batteriesrecordedtothenearesttenthofayear.
Table4.1:BatteryLives,years
4.1 5.2 2.8 4.9 5.6 4.0 4.1 4.3 5.4
4.5 6.1 3.7 2.3 4.5 4.9 5.6 4.3 3.9
3.2 5.0 4.8 3.7 4.6 5.5 1.8 5.1 4.2
6.3 3.3 5.8 4.4 4.8 3.0 4.3 4.7 5.1
Forthesedatawechoosestemswhicharethemainmagnitudes.Inthiscasethe
digitbeforethedecimalpointisareasonablechoice:1,2,3,4,5,6.Nowwegothroughthe
dataandputeachleaf,inthiscasethedigitafterthedecimalpoint,onitscorresponding
stem.Thedecimalpointisnotusuallyshown.TheresultcanbeseeninTable4.2.The
numberofstemsoneachleafcanbecountedandshownundertheheadingofFrequency.
Table4.2:Stem-and-LeafDisplay
Stem Leaf Frequency
1 8 1
2 83 2
3 792730 6
4 1901355938624837 16
5 264605181 9
6 13 2
63
Chapter4
Fromthelistofleavesoneachstemwehaveanimmediatevisualindicationof
therelativenumbers.Wecanseewhetherornotthedistributionisapproximately
symmetrical,andwemaygetapreliminaryindicationofwhetheranyparticular
theoreticaldistributionmayfitthedata.Wewillseesometheoreticaldistributions
laterinthisbook,andwewillfindthatsomeofthedistributionsweencounterinthis
chaptercanberepresentedwellbytheoreticaldistributions.
Wemaywanttosorttheleavesoneachsteminorderofmagnitudetogivemore
detailandfacilitatefindingparameterswhichdependontheorder.Theresultof
sortingbymagnitudeisshowninTable4.3.
Table4.3:SortedStem-and-LeafDisplay
Stem Leaf Frequency
1 8 1
2 38 2
3 023779 6
4 0112333455678899 16
5 011245668 9
6 13 2
Anotherpossibilityistodoublethenumberofstems(ormultiplythemfurther),
especiallyifthenumberofdataislargeinrelationtotheinitialnumberofstems.
Stemamighthaveleavesfrom0to4,andstembmighthaveleavesfrom5to9.
TheresultwithoutsortingisshowninTable4.4.
Table4.4:Stem-and-LeafPlotwithDoubleLeaf
Stem Leaf Frequency
1b 8 1
2a 3 1
2b 8 1
3a 230 3
3b 797 3
4a 10133243 8
4b 95598687 8
5a 24011 5
5b 6658 4
6a 13 2
Ofcourse,wemightbothdoublethenumberofstemsandsorttheleavesoneach
stem.Inothercasesitmightbemoreappropriatetoshowtwosignificantfigureson
eachleaf,withappropriateseparationbetweenleaves.Therearemanypossible
variations.
64
GroupedFrequenciesandGraphicalDescriptions
4.2 BoxPlots
Aboxplot,orbox-and-whiskerplot,isagraphicaldevicefordisplayingcertain
characteristicsofafrequencydistribution.Anarrowboxextendsfromthelower
quartiletotheupperquartile.Thusthelengthoftheboxrepresentstheinterquartile
range,ameasureofvariability.Themedianismarkedbyalineextendingacrossthe
box.Thesmallestvalueinthedistributionandthelargestvaluearemarked,andeach
isjoinedtotheboxbyastraightline,thewhisker.Thus,thewhiskersrepresentthe
fullrangeofthedata.
Figure4.1isaboxplotforthedataofTable4.1onthelifeofbatteriesunder
industrialconditions.Thelabels,smallest,largest,median,andquartiles,are
usuallyomitted.
Median
Smallest Largest
Quartiles
0 2 4 6 8
BatteryLife,years
Figure4.1:BoxPlotforLifeofBattery
Boxplotsareparticularlysuitableforcomparingsetsofdata,suchasbeforeand
aftermodificationsweremadeintheproductionprocess.Figure4.2showsacom-
parisonoftheboxplotofFigure4.1withaboxplotforsimilardataundermodified
productionconditions,bothforthesamesamplesize.Althoughthemedianhasnot
changedverymuch,wecanseethatthesamplerangeandtheinterquartilerangefor
modifiedconditionsareconsiderablysmaller.
Modifiedconditions
Initialconditions
0 2 4 6 8
BatteryLife,years
Figure4.2:ComparisonofBoxPlots
65
Chapter4
4.3 FrequencyGraphsofDiscreteData
Example3.2concernedthenumberofdefectiveitemsinsuccessivesamplesofsix
itemseach.ThedataweresummarizedinTable3.2,whichisreproducedbelow.
Table3.2:FrequenciesforNumbersofDefectives
Numberofdefectives,x
i
Frequency,f
i
0 48
1 10
2 2
>2 0
Thesedatacanbeshowngraphicallyinaverysimpleformbecausetheyinvolve
discretedata,asopposedtocontinuousdata,andonlyafewdifferentvalues.The
variateisdiscreteinthesensethatonlycertainvaluesarepossible:inthiscasethe
numberofdefectiveitemsinagroupofsixmustbeanintegerratherthanafraction.
Thenumberofdefectiveitemsineachgroupofthisexampleisonly0,1,or2.The
frequenciesofthesenumbersareshownabove.Thecorrespondingfrequencygraphis
showninFigure4.3.Theisolatedspikescorrespondtothediscretecharacterofthe
variate.
NumberofDefectivesinSixItems
50
Figure4.3:
DistributionofNumbersof
40
DefectivesinGroupsofSixItems
30
20
10
0
0 1 2
No.ofDefectives
Ifthenumberofdifferentvaluesisverylarge,itmaybedesirabletousethe
groupedfrequencyapproach,asdiscussedbelowforcontinuousdata.
4.4 ContinuousData:GroupedFrequency
Ifthevariateiscontinuous,anyvalueatallinanappropriaterangeispossible.
Betweenanytwopossiblevalues,thereareaninfinitenumberofotherpossible
F
r
e
q
u
e
n
c
y

66
GroupedFrequenciesandGraphicalDescriptions
values,althoughmeasuringdevicesarenotabletodistinguishsomeofthemfrom
oneanother.Measurementswillberecordedtoonlyacertainnumberofsignificant
figures.Eventothisnumberoffigures,therewillusuallybealargenumberof
possiblevalues.Ifthenumberofpossiblevaluesofthevariateislarge,toomany
occuronatableorgraphforeasycomprehension.Wecanmakethedataeasierto
comprehendbydividingthevariateintointervalsorclassesandcountingthefre-
quencyofoccurrenceforeachclass.Thisiscalledthegroupedfrequencyapproach.
Thus,frequencygroupingisusedtomakethedistributionmoreeasilyunder-
stood.Thewidthofeachclass(thedifferencebetweenitslowerboundaryandits
upperboundary)shouldbeconstantfromoneclasstoanother(thereareexceptionsto
thisstatement,butwewillomitthemfromthisbook).Thenumberofclassesshould
befromseventotwenty,dependingchieflyonthesizeofthepopulationorsample
beingrepresented.Ifthenumberofclassesistoolarge,theresultistoodetailedand
itishardtoseeanunderlyingpattern.Ifthenumberofclassesistoosmall,thereis
appreciablelossofinformation,andthepatternmaybeobscured.Anempirical
relationwhichgivesanapproximatevalueoftheappropriatenumberofclassesis
SturgesRule:
numberofclassintervals1+3.3log
10
N (4.1)
whereNisthetotalnumberofobservationsinthesampleorpopulation.
Theprocedureistostartwiththerange,thedifferencebetweenthelargestand
thesmallestitemsinthesetofobservations.Thentheconstantclasswidthisgiven
approximatelybydividingtherangebytheapproximatenumberofclassintervals
fromequation4.1.Roundofftheclasswidthtoaconvenientnumber(rememberthat
thereisnothingsacredorexactaboutSturgesRule!).
Theclassboundariesmustbeclearwithnogapsandnooverlaps.Forproblems
inthisbookchoosetheclassboundarieshalfwaybetweenpossiblemagnitudes.This
givesadefiniteandfairboundary.Forexample,iftheobservationsarerecordedto
onedecimalplace,theboundariesshouldendinfiveintheseconddecimalplace.If
2.4and2.5arepossibleobservations,aclassboundarymightbechosenas2.45.The
smallestclassboundaryshouldbechosenataconvenientvaluealittlesmallerthan
thesmallestiteminthesetofobservations.
Eachclassmidpointishalfwaybetweenthecorrespondingclassboundaries.
Thenthenumberofitemsineachclassshouldbetalliedandshownasclass
frequencyinatablecalledagroupedfrequencytable.Therelativefrequencyisthe
classfrequencydividedbythetotalofalltheclassfrequencies,whichshouldagree
withthetotalnumberofitemsinthesetofobservations.Thecumulativefrequencyis
thetotalofallclassfrequenciessmallerthanaclassboundary.Theclassboundary
ratherthanclassmidpointmustbeusedforfindingcumulativefrequencybecausewe
canseefromthetablehowmanyitemsaresmallerthanaclassboundary,butwe
cannotknowhowmanyitemsaresmallerthanaclassmidpointunlesswegobackto
67
Chapter4
theoriginaldata.Therelativecumulativefrequencyisthefraction(orpercentage)of
thetotalnumberofitemssmallerthanthecorrespondingupperclassboundary.
Letusconsideranexample.
Example4.2
Thethicknessofaparticularmetalpartofanopticalinstrumentwasmeasuredon
121successiveitemsastheycameoffaproductionlineunderwhatwasbelievedto
benormalconditions.TheresultsareshowninTable4.5.
Table4.5:ThicknessesofMetalParts,mm
3.40 3.21 3.26 3.37 3.40 3.35 3.40 3.48 3.30 3.38 3.27
3.35 3.28 3.39 3.44 3.29 3.38 3.38 3.40 3.38 3.44 3.29
3.37 3.41 3.45 3.44 3.35 3.35 3.46 3.31 3.33 3.47 3.33
3.37 3.31 3.51 3.36 3.32 3.33 3.43 3.39 3.39 3.28 3.33
3.25 3.28 3.30 3.41 3.39 3.33 3.27 3.34 3.33 3.42 3.35
3.34 3.32 3.42 3.31 3.38 3.44 3.37 3.35 3.57 3.41 3.28
3.49 3.26 3.44 3.46 3.32 3.36 3.41 3.39 3.38 3.26 3.37
3.28 3.35 3.36 3.34 3.42 3.38 3.39 3.51 3.44 3.39 3.36
3.35 3.42 3.34 3.36 3.42 3.38 3.46 3.34 3.37 3.39 3.42
3.37 3.33 3.39 3.30 3.35 3.38 3.38 3.27 3.31 3.32 3.45
3.49 3.45 3.38 3.41 3.35 3.39 3.24 3.35 3.34 3.37 3.37
Thicknessisacontinuousvariable,sinceanynumberatallintheappropriate
rangeisapossiblevalue.ThedatainTable4.5aregiventotwodecimalplaces,butit
wouldbepossibletomeasuretogreaterorlesserprecision.Thenumberofpossible
resultsisinfinite.ThemassofnumbersinTable4.5isverydifficulttocomprehend.
Letusapplythemethodsofthissectiontothissetofdata.
407.59
Applyingequation3.1tothenumbersinTable4.5givesameanof =
121
3.3685or3.369mm.(Wewillseelaterthatthemeanofalargegroupofnumbersis
considerablymoreprecisethantheindividualnumbers,soquotingthemeantomore
significantfiguresisjustified.)Sincethedataconstituteasampleofallthethick-
nessesofpartscomingofftheproductionlineunderthesameconditions,thisisa
samplemean,so x =3.369mm.Thentheappropriaterelationtocalculatethe
varianceisequation3.8:
N
j \
2
,
x
i (
N
2

( i1 ,

x
i
s
2

i1
N
N 1
68
GroupedFrequenciesandGraphicalDescriptions
2
1373.4471
(
407.59
)
/121
2
s
120
1373.44711372.971968

120

0.475132
0.003959mm
2
120
andthesamplestandarddeviationis 0.003959 =0.0629mm.Thecoefficientof
s j \
variationis
, (
(
100%
)
=(0.0629/3.369)(100%)=1.87%.
x
( ,
NoteforCalculation:AvoidingLossofSignificance
Whenevercalculationsinvolvetakingthedifferenceoftwoquantitiesof
similarmagnitude,wemustremembertomakesurethatenoughsignificant
figuresarecarriedtogivethedesiredaccuracyintheresult.InExample4.2
above,thecalculationofvariancebyequation3.11requiresustosubtract
1372.971968from1373.4471,giving0.475132.Ifthenumbersbeingsub-
tractedhadbeenroundedtofourfiguresas1373.0from1373.4,the
calculatedresultwouldhavebeen0.4.Thiswouldhavebeen16%inerror.
Toavoidsuchlossofsignificance,carryasmanysignificantfiguresas
possibleinintermediateresults.Donotroundthenumberstoareasonable
numberoffiguresuntilafinalresulthasbeenobtained.Ifacalculatorisbeing
used,leaveintermediateresultsinthememoryofthecalculator.Similarly,ifa
spreadsheetisbeingused,donotreducethenumberoffigures,exceptperhaps
forpurposesofdisplayingareasonablenumberoffiguresinafinalresult.
Ifthecalculatingdevicebeinguseddoesnotprovideenoughsignificant
figures,itisoftenpossibletoreducethenumberofrequiredfiguresbysub-
tractingaconstantvaluefromeachfigure.Forinstance,inExample4.2we
couldsubtract3fromeachofthenumbersinTable4.5.Thiswouldnotaffect
thefinalvarianceorstandarddeviation,butitwouldmakethelargestnumber
0.57insteadof3.57,givingasquareof0.3249insteadof12.7449,sorequiring
fourfiguresinsteadofsixatthispoint.Therequirednumberoffiguresinother
quantitieswouldbereducedsimilarly.However,mostmoderncomputing
devicescaneasilyretainenoughfiguressothatthisstepisnotrequired.
Themedianofthe121numbersinTable4.5isthe61stnumberinorderofmagni-
tude.Thisis3.37mm.Thefifthpercentileisbetweenthe6thand7thitemsinorder
ofmagnitude,so(3.26+3.27)/2=3.265mm.Theninthdecileisbetweenthe108th
and109thnumbersinincreasingorderofmagnitude,so(3.44+3.45)/2=3.445mm.
69
Chapter4
NowletusapplythegroupedfrequencyapproachtothenumbersinTable4.5.
Thelargestiteminthetableis3.57,andthesmallestis3.21,sotherangeis0.36.
ThenumberofclassintervalsaccordingtoSturgesRuleshouldbeapproximately
1+(3.3)(log
10
121)=7.87.Thentheclasswidthshouldbeapproximately0.36/7.87
=0.0457.Letuschooseaconvenientclasswidthof0.05.Thethicknessesarestated
totwodecimalplaces,sotheclassboundariesshouldendinfiveinthethirddecimal.
Letuschoosethesmallestclassboundary,then,as3.195.Theresultinggrouped
frequencytableisshowninTable4.6.
Table4.6:GroupedFrequencyTableforThicknesses
Lower Upper Class TallyMarks Class Relative Cumulative
Class Class Midpoint, Frequency Frequency Frequency
Boundary, Boundary, mm
mm mm
3.195 3.245 3.220 || 2 0.017 2
3.245 3.295 3.270 |||||||||||||| 14 0.116 16
3.295 3.345 3.320 |||||||||||||||||||||||| 24 0.198 40
3.345 3.395 3.370 ||||||||||||||||||||||||| 46 0.380 86
|||||||||||||||||||||
3.395 3.445 3.420 |||||||||||||||||||||| 22 0.182 108
3.445 3.495 3.470 |||||||||| 10 0.083 118
3.495 3.545 3.520 || 2 0.017 120
3.545 3.595 3.570 | 1 0.008 121
Total 121 1.000
Inthistabletheclassfrequencyisobtainedbycountingthetallymarksforeach
class.Thisbecomeseasierifwedividethetallymarksintogroupsoffiveasshownin
Table4.6.Therelativefrequencyissimplytheclassfrequencydividedbythetotal
numberofitemsinthetable,i.e.thetotalfrequency,whichis121inthiscase.The
cumulativefrequencyisobtainedbyaddingtogetheralltheclassfrequenciesfor
classeswithvaluessmallerthanthecurrentupperclassboundary.Thus,inthethirdline
ofTable4.6,thecumulativefrequencyof40isthesumoftheclassfrequencies2,14
40
and24.Thecorrespondingrelativecumulativefrequencywouldbe =0.331,or
121
33.1%.Thecumulativefrequencyinthelastlinemustbeequaltothetotalfrequency.
FromTable4.6themodeisgivenbytheclassmidpointoftheclasswiththe
largestclassfrequency,3.370mm.Themean,medianandmode,3.369,3.37and
3.370mm,areincloseagreement.Thisindicatesthatthedistributionisapproxi-
matelysymmetrical.
Graphicalrepresentationsofgroupedfrequencydistributionsareusuallymore
readilyunderstoodthanthecorrespondingtables.Someofthemaincharacteristicsof
thedatacanbeseeninhistogramsandcumulativefrequencydiagrams.Ahistogramis
abargraphinwhichtheclassfrequencyorrelativeclassfrequencyisplottedagainst
70
GroupedFrequenciesandGraphicalDescriptions
valuesofthequantitybeingstudied,sotheheightofthebarindicatestheclassfre-
quencyorrelativeclassfrequency.Classmidpointsareplottedalongthehorizontalaxis.
C
l
a
s
s

F
r
e
q
u
e
n
c
y
Inprinciple,ahistogramforcontinuousdatashouldhavethebarstouchingoneanother,
p
e
r

C
l
a
s
s

W
i
d
t
h

o
f

0
.
0
5

m
m

andthatshouldbedoneforproblemsinthisbook.However,thebarsareoftenshown
separated,andsomecomputersoftwaredoesnotallowthebarstotouchoneanother.
ThehistogramforthedataofTable4.5isshowninFigure4.4foraclasswidth
of0.05mmasalreadycalculated.Relativeclassfrequencyisshownontheright-
handscale.
ThicknessofPart
50
0.413
40
30
20
10
0
R
e
l
a
t
i
v
e

C
l
a
s
s

F
r
e
q
u
e
n
c
y
0.331
Figure4.4:
Histogramfor
0.248
ClassWidthof0.05mm
0.165
0.083
0
3.220 3.270 3.320 3.370 3.420 3.470 3.520 3.570
Thickness,mm
Histogramsforclasswidthsof0.03mmand0.10mmareshowninFigures4.5
and4.6forcomparison.
ThicknessofPart ThicknessofPart
30
80
3
.
2
1
3
.
2
4
3
.
2
7
3
.
3
3
.
3
3
3
.
3
6
3
.
3
9
3
.
4
2
3
.
4
5
3
.
4
8
3
.
5
1
3
.
5
4
3
.
5
7

C
l
a
s
s

F
r
e
q
u
e
n
c
y
p
e
r

C
l
a
s
s

W
i
d
t
h

o
f

0
.
1
0

m
m

0
3.245 3.345 3.445 3.545
C
l
a
s
s

F
r
e
q
u
e
n
c
y
p
e
r

C
l
a
s
s

W
i
d
t
h

o
f

0
.
0
3

m
m
25
20
15
10
5
0
60
40
20
Thickness, mm Thickness,mm
Figure4.5:HistogramforClass Figure4.6:HistogramforClass
Widthof0.03mm Widthof0.10mm
71
Chapter4
Ofthesethree,theclasswidthof0.05mminFigure4.4seemsmostsatisfactory
(inagreementwithSturgesRule).
CumulativefrequenciesareshowninthelastcolumnofTable4.6.Acumulative
frequencydiagramisaplotofcumulativefrequencyvs.theupperclassboundary,
withsuccessivepointsjoinedbystraightlines.Acumulativefrequencydiagramfor
thethicknessesofTable4.5isshowninFigure4.7.
CumulativeFrequencyDiagram
C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y

140
120
100
80
60
40
20
0
Figure4.7:
Cumulative Frequency
DiagramforThickness
3.1 3.2 3.3 3.4 3.5 3.6
Thickness,mm
ThecumulativefrequencydiagramofFigure4.7couldbechangedintoarelative
cumulativefrequencydiagrambyachangeofscalefortheordinate.
Example4.3
Asampleof120electricalcomponentswastestedbyoperatingeachcomponent
continuouslyuntilitfailed.Thetimetothenearesthouratwhicheachcomponent
failedwasrecorded.TheresultsareshowninTable4.7.
Table4.7:TimestoFailureofElectricalComponents,hours
1347 33 1544 1295 1541 14 2813 727 3385 2960
2075 215 346 153 735 1452 2422 1160 2297 594
2242 977 1096 965 315 209 1269 447 1550 317
3391 709 3416 151 2390 644 1585 3066 17 933
1945 844 1829 1279 1027 5 372 869 535 635
932 61 3253 47 4732 120 523 174 2366 323
1296 755 28 305 710 1075 74 1765 1274 180
1104 248 863 1908 2052 1036 359 202 1459 3
916 2344 581 1913 2230 1126 22 1562 219 166
678 1977 167 573 186 804 6 637 316 159
983 1490 877 152 2096 185 53 39 3997 310
1878 1952 5312 4042 4825 639 1989 132 432 1413
72
GroupedFrequenciesandGraphicalDescriptions
Onceagain,frequencygroupingisneededtomakesenseofthismassofdata.
Whenthedataaresortedinorderofincreasingmagnitude,thelargestvalueisfoundto
be5312hoursandthesmallestis3hours.Thentherangeis53123=5309hours.
Thereare120datapoints.ThenapplyingSturgesRule,equation4.1indicatesthatthe
numberofclassintervalsshouldbeapproximately1+3.3log
10
120=7.86.Thenthe
classwidthshouldbeapproximately5309/7.86=675hours.Amoreconvenientclass
widthis600hours.Sincetimestofailurearestatedtothenearesthour,eachclass
boundaryshouldbeanumberendingin0.5.Thesmallestclassboundarymustbe
somewhatlessthanthesmallestvalue,3.Thenaconvenientchoiceofthesmallest
classboundaryis0.5hours.TheresultinggroupedfrequencytableisshowninTable
4.8.ThecorrespondinghistogramisFigure4.8,andthecumulativefrequencydiagram
(lastcolumnofTable4.8vs.upperclassboundary)isFigure4.9.
Table4.8:GroupedFrequencyTableforFailureTimes
Lower Upper Class TallyMarks Class Relative Cumulative
Class Class Midpoint, Frequency Frequency Frequency
Boundary, Boundary, mm
mm mm
0.5 600.5 300.5 ||||||||||||||||||||||||||||||||||| 46 0.383 46
|||||||||||
600.5 1200.5 900.5 |||||||||||||||||||||||||||| 28 0.233 74
1200.5 1800.5 1500.5 |||||||||||||||| 16 0.133 90
1800.5 2400.5 2100.5 ||||||||||||||||| 17 0.142 107
2400.5 3000.5 2700.5 ||| 3 0.025 110
3000.5 3600.5 3300.5 ||||| 5 0.042 115
3600.5 4200.5 3900.5 || 2 0.017 117
4200.5 4800.5 4500.5 | 1 0.008 118
4800.5 5400.5 5100.5 || 2 0.017 120
Total 120 1.000
FailureTimesofComponents
C
l
a
s
s

F
r
e
q
u
e
n
c
y
p
e
r

C
l
a
s
s

W
i
d
t
h

o
f
6
0
0
h

50
40
30
20
10
0
Figure4.8:
HistogramofTimesto
FailureforElectricalComponents
3
0
0
.5
9
0
0
.5
1
5
0
0
.5
2
1
0
0
.5
2
7
0
0
.5
3
3
0
0
.5
3
9
0
0
.5
4
5
0
0
.5
5
1
0
0
.5

TimestoFailure,h
73
Chapter4
CumulativeFrequencyDiagram
140
120
100
Figure4.9:
80
Cumulative Frequency
DiagramforTimetoFailure 60
40
20
0
C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
0 1000 2000 3000 4000 5000 6000
HourstoFailure
Figures4.4and4.8arebothhistogramsforcontinuousdata,buttheirshapesare
quitedifferent.Figure4.4isapproximatelysymmetrical,whereasFigure4.8is
stronglyskewedtotheright(i.e.,thetailtotherightisverylong,whereasnotailto
theleftisevidentinFigure4.8).Correspondingly,thecumulativefrequencydiagram
ofFigure4.7iss-shaped,withitsslopefirstincreasingandthendecreasing,whereas
thecumulativefrequencydiagramofFigure4.9showstheslopegenerallydecreasing
overitsfulllength.
Nowthemean,medianandmodeforthedataofTable4.7(correspondingto
Figures4.8and4.9)willbecalculatedandcompared.Themeanis x
i
/N=140746/120
=1173hours.Themedianistheaverageofthetwomiddleitemsinorderofmagni-
tude,869and877,so873hours.ThemodeaccordingtoTable4.8isthemidpointof
theclasswiththelargestfrequency,300.5hours,butofcoursethevaluewouldvarya
littleiftheclasswidthorstartingclassboundarywerechanged.SinceFigure4.8
showsthatthedistributionisveryasymmetricalorskewed,itisnotsurprisingthat
themean,medianandmodearesowidelydifferent.
Thevarianceisgivenbyequation3.11,
N
j \
2
,
x
i (
N
2

( i1 ,

x
i
s
2

i1
N
N 1
=(317,335,200(140,746)
2
/120)/119
=(317,335,200165,078,637.7)/119
=1,279,467h
2
74
GroupedFrequenciesandGraphicalDescriptions
andsotheestimateofthestandarddeviationbasedonthissampleiss=
1,279,467
s j \
= 1131hours.Thecoefficientofvariationis
, (
(
100%
) =1131/1173100%=
x
96.4%.
( ,
4.5 UseofComputers
Inthissectionthetechniquesillustratedinsection3.4willbeappliedtofurther
examples.Furthertechniques,includingproductionofgraphs,willbeshown.Once
again,thereaderisreferredtobriefdiscussionsofsomeExceltechniquesforstatisti-
caldatainAppendixB.
Example4.4
Thethicknessofaparticularmetalpartofanopticalinstrumentwasmeasuredon
121successiveitemsastheycameoffaproductionlineunderwhatwasbelievedto
benormalconditions.TheresultswereshowninTable4.5.Findthemeanthickness,
samplevariance,samplestandarddeviation,coefficientofvariation,median,fifth
percentile,andninthdecile.UseSturgesRuleinchoosingasuitableclasswidthfor
agroupedfrequencydistribution.Constructtheresultinghistogramandcumulative
frequencydiagram.UsetheExcelspreadsheetinsolvingthisproblem,andcheckthat
roundingerrorscausenoappreciablelossofsignificance.
Answer: ThisisessentiallythesameproblemasinExample4.2,butnowitwillbe
solvedusingMicrosoftExcel.
FirstthethicknessesweretransferredfromTable4.5tocolumnBofanewwork
sheet.Thesedataweresortedbyincreasing(ascending)thicknessusingtheSort
commandontheDatamenuforlateruseinfindingquantiles.Extractsofthework
sheetareshowninTable4.9.Noticeagainthateachquantitymustbeclearlylabeled.
Table4.9:ExtractsofWorkSheetforExample4.4
A B C D E F
1
2
3
4
5
IncolumnC Thickness,ximm dev=xi-xbar dev^2 xi*xi Order no.
deviation= 3.21 -0.158512397 0.02512618 10.3041 1
B2:B122-B124 3.24 -0.128512397 0.01651544 10.4976 2
3.25 -0.118512397 0.01404519 10.5625 3
3.26 -0.108512397 0.01177494 10.6276 4
.. .. .. .. .. ..
119
120
121
122
123
124
125
126
3.49 0.121487603 0.01475924 12.1801 118
3.51 0.141487603 0.02001874 12.3201 119
3.51 0.141487603 0.02001874 12.3201 120
3.57 0.201487603 0.04059725 12.7449 121
Totals 407.59 6.66134E-14 0.47513223 1373.4471
xbar,B123/121= 3.368512397 s^2= D123/120= 0.003959
s^2= (E123-B123^2/121)/120= 0.003959
diff = E124-E125= 1.21E-15
75
Chapter4
127
128
s=SQRT(E125)= 0.062924
s/xbar= D127/B124= 1.87%
129
130
131 A B C D E F
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
Lower Class Upper Class Class Class Relative Cumulative
Boundary Boundary Midpoint Frequency Class Class
mm mm mm Frequency Frequency
3.195 0 0
3.195 3.245 3.22 2 0.017 2
3.245 3.295 3.27 14 0.116 16
3.295 3.345 3.32 24 0.198 40
3.345 3.395 3.37 46 0.380 86
3.395 3.445 3.42 22 0.182 108
3.445 3.495 3.47 10 0.083 118
3.495 3.545 3.52 2 0.017 120
3.545 3.595 3.57 1 0.0083 121
3.595 3.645 3.62 0
Total 121
In cells:
A137:A144 B136:B144 C136:C144 D136:D144 E136:E144 F136:F144
The corresponding explanations are (same column):
A136:A143+0.05= A136:A144+0.05= (A136:A144+B136:B144)/2= D136:D144/D145=
Frequency(B1:B122,B136:B144)=
F135:F143+
D136:D144
Quantitiesinrows2to122wereaddedusingtheAutosumtool;totalswere
placedinrow123.Thisgaveatotalthicknessof407.59mmincellB123forthe121
items.Thenthemeanthickness, x,wasfoundincellB124tobe3.3685mm.Next,
deviationsfromthemean,x
i
x,werefoundincolumnCusinganarrayformula
(whichdoesagroupofsimilarcalculationstogetherseeexplanationinsection(b)
ofAppendixB).Thedeviationscalculatedinthiswayweresquaredbythearray
formula=(C3:C123)^2,enteredincellsD2:D122.(Rememberthatenteringanarray
formularequiresustopressmorethanonekeysimultaneously.SeeAppendixB.)
Thenthesamplevariancewasfoundusingequation3.8incellE124bydividingthe
sumofsquaresofdeviationsby120.Thisgave0.003959mm
2
.Noticethatthis
methodofcalculationofvariancerequiresmorearithmeticstepsthanthealternative
method,whichwillbeusedinthenextparagraph.Thefirstmethodisusedinthis
exampletoprovideacomparisongivingacheckonround-offerrors,buttheother
methodshouldbeusedunlesssuchacomparisonisrequired.
Thesquaresofindividualthicknesses,(x
i
)
2
,werefoundincellsE2:E122bythe
arrayformula =B2^2.Accordingtoequation3.11,thevarianceestimatedfromthe
sampleiss
2
=(x
i
2
(x
i
)
2
/N)/(N1),whereinthiscaseN,thenumberofdata
76
GroupedFrequenciesandGraphicalDescriptions
points,is121.ThenincellE125thesamplevarianceiscalculatedas0.003959mm
2
,
whichagreeswiththepreviousvalue.Thesamplestandarddeviationwasfoundin
cellD127,takingthesquarerootofthevariance.Thisgave0.0629mm.Thecoeffi-
cientofvariation(fromcellD128)is1.87%,whichwasformulatedasapercentage
usingtheFormatmenu.
Nowwecanobtainsomeindicationsoferrorduetoround-offinMicrosoft
Excel.IncellC123thesumofall121deviationsfromthesamplemeanisshownas
6.66E14,whereasitshouldbezero.Thisisconsistentwiththestatementthat
Excelstoresvaluestoaprecisionofabout15decimaldigits.Thedifferencebetween
thevalueofthesamplevarianceincellE124andthevalueofthesamequantityin
cellE125wascalculatedbytheappropriateformula,=D125E125,andenteredin
cellE126.Itis1.21E15,againconsistentwiththestatementregardingthepreci-
sionofnumberscalculatedandstoredinExcel.Astheseerrorsareverysmallin
comparisontothequantitiescalculated,roundingerrorsarenegligible.
Theordernumbersfrom1to121wereenteredincellsF3:F123.Afterthefirst
twonumberswereentered,thefillhandlewasdraggedtoproducetheseries.From
theordernumbersincellsF3:F123andthethicknessesincellsB3:B123,numbersto
calculatethemedian(ordernumber61,soincellB63),fifthpercentile(betweenorder
numbers6and7,cellsB8andB9),andninthdecile(betweenordernumbers108and
109,cellsB110andB111)wereread.Thenthemedianis3.37mm,thefifthpercen-
tileis(3.26+3.27)/2=3.265mm,andtheninthdecileis(3.44+3.45)/2=3.445mm.
Fortheclasswidthandthesmallestclassboundaryforthegroupedfrequency
tablethereasoningisthesameasinExample4.3.Thelargestthickness,incell
B123,is3.57mm,andthesmallestthickness,incellB3, is3.21mm,sotherangeis
3.573.21=0.36mm.Sincethereare121items,thenumberofclassintervals
accordingtoSturgesRuleshouldbeapproximately1+(3.3)(log
10
121)=7.87.This
callsforaclasswidthofapproximately0.36 /7.87=0.0457mm,andwechoosea
convenientvalueof0.05mm.Thesmallestclassboundaryshouldbealittlesmaller
thanthesmallestthicknessandhalfwaybetweenpossiblevaluesofthethickness,
whichwasmeasuredtotwodecimalplaces.Thenthesmallestclassboundarywas
chosenas3.195mm.
Columnheadingsforthegroupedfrequencytablewereenteredincells
A132:F134.Thesmallestclassboundary,3.195mm,wasenteredincellA136.To
obtainanextraclassofzerofrequencyforthecumulativefrequencydistribution,
3.195wasenteredalsoincellB135,andzerowasenteredincellD135.Foraclass
widthof0.05mmthenextlowerclassboundaryof3.245wasenteredincellA137,
andthefillhandlewasdraggedto3.595incellA144.Upperclassboundarieswere
enteredincellsB136:B144bythearrayfunction=A136:A144+0.05.Classmid-
pointswereenteredincellsC136:C144bythearrayfunction=(A136:A144+
B136:B144)/2.
77
Chapter4
AsavingintimecanbeobtainedatthispointbyusingoneofExcelsbuilt-in
functions(seesection(e)ofAppendixB).Classfrequencieswereenteredincells
D135:D144bythearrayformula=FREQUENCY(B2:B122,B135:B143),wherethe
cellsB2:B122containthedataarray(thicknessinmminthiscase)andthecells
B135:B143containthecorrespondingupperclassboundaries.Forfurtherinforma-
tion,fromtheHelpmenuselectMicrosoftExcelHelp,andthentheFrequency
worksheetfunction.NotethatthenumberofcellsinD135:D144isnine,onemore
thanthenumberofcellsinB135:B143.ThelastitemincolumnD(cellD144)is0
andrepresentsthefrequencyabovethelargesteffectiveupperclassboundary,3.595
mm.TheclassfrequenciesincellsD135:D144agreewiththevaluesgiveninTable
4.6.ThetotalfrequencywasfoundincellD145usingtheAutosumtool.Itis121,as
before.RelativeclassfrequenciesincellsE136:E143werefoundusingthearray
formula=D136:D143/121.Againtheresultsagreewithpreviousresults.Thefirst
cumulativefrequencyincellF135isthesameasthecorrespondingclassfrequency,
soitisgivenby=D135.CumulativeclassfrequenciesincellsF136:F143werefound
bythearrayformula=F135:F142+D136:D143.Theycanbecheckedbycomparison
withthelargestordernumbersintheupperpartofTable4.9correspondingtoa
thicknesslessthananupperclassboundary.Forexample,thelargestordernumber
correspondingtoathicknesslessthantheupperclassboundary3.495is118.Minor
changes,suchascentering,weremadeinformattingcellsA132:F145.Insteadofthe
functionFrequency,thefunctionHistogramcanbeusedifitisavailable.
Toproducethehistogram,theclassmidpoints(cellsD133:D141)andtheclass
frequencies(cellsE133:E141)wereselected;fromtheInsertmenu,Chartwas
selected.TheChartWizardguidedchoicesforthechart.Asimplecolumnchart
waschosenwithdataseriesincolumns,x-axistitledThickness,mm,y-axistitled
Classfrequency,andnolegend.ThechartwasopenedasanewsheettitledEx-
ample4.4.
ThechartwasmodifiedbyselectingitandopeningtheChartmenu.Onemodifi-
cationwasofthefontsizeforthetitlesofaxes.Thex-axistitlewaschosen,andfrom
theFormatmenutheSelectedAxisTitlewaschosen,thenthefontsizewaschanged
from10pointto12point.They-axistitlewasmodifiedsimilarly.Tomakethebars
ofthehistogramtouchoneanotherwithoutgaps,abarwasclickedandfromthe
FormatmenutheSelectedDataSerieswaschosen;theOptiontabwasclicked,and
thenthegapwidthwasreducedtozero.Thisleftthehistograminsolidblack.To
remedythis,thebarsweredouble-clicked:thescreenforFormatDataPointappeared
withthePatternstab,andtheFillEffectsbarwasclicked.Asuitablediagonalpattern
wasselectedforthefillofeachbar,withthediagonalsslopingindifferentdirections
onadjacentbars.ThefinalhistogramisverysimilartoFigure4.4,differingfromit
mainlyasaresultofusingdifferentsoftware,CA-CricketGraphIIIvs.Excel.
78
GroupedFrequenciesandGraphicalDescriptions
Toobtainthecumulativefrequencydiagram,firsttheupperclassboundaries,
cellsB135:B144,wereselected.Thenthecorrespondingcumulativeclassfrequen-
cies,cellsF135:F144,wereselectedwhileholdingdownCrtlinExcelforWindows
orCommandinExcelfortheMacintosh,becausethisisanonadjacentselectionto
beaddedtotheselectionofclassboundaries.ThenfromtheInsertmenu,Chartwas
clicked.Asimplelinechartwaschosenwithhorizontalgrids.Thedataseriesarein
columns,thefirstcolumncontainsx-axislabels,andthefirstrowgivesthefirstdata
point.Achoicewasmadetohavenolegend.ThecharttitlewaschosentobeCumu-
lativeFrequencyDiagram.Thetitleforthex-axiswaschosentobeThickness,
mm.Thetitleforthey-axiswaschosentobeCumulativeFrequency.Theresultis
essentiallythesameasFigure4.7.
Example4.5
Asampleof120electricalcomponentswastestedbyoperatingeachcomponent
continuouslyuntilitfailed.Thetimetothenearesthouratwhicheachcomponent
failedwasrecorded.TheresultswereshowninTable4.7.Calculatethemean,
median,mode,variance,standarddeviation,andcoefficientofvariationforthese
data.Prepareagroupedfrequencytablefromwhichahistogramandcumulative
frequencydiagramcouldbeprepared.CalculateusingExcel.
Answer: ThisisarepeatofmostofExample4.3,butusingExcel.
Thetimestofailure,t
i
hours,wereenteredincolumnB,rows3to122,ofanew
worksheet.TheyweresortedfromthesmallesttothelargestusingtheSortcom-
mandontheDatamenu.Theworksheetmustincludeheadings,labels,and
explanations.ExtractsoftheworksheetareshowninTable4.10.Thisissimilarto
theworksheetofExample4.4,whichwasshowninTable4.9.
Table4.10:ExtractsfromWorkSheet,Example4.5
A B C D E F
1
2
3
4
..
61
62
63
64
..
120
121
122
123
124
Time,ti h ti^2 Order No.
(B3:B122)^2=
3 9 1
5 25 2
.. .. .. ..
863 744769 59
869 755161 60
877 769129 61
916 839056 62
.. .. .. ..
4732 22391824 118
4825 23280625 119
5312 28217344 120
Sums 140742 317324464
Mean, tbar=
79
Chapter4
125 B123/120= 1172.85
126 s^2= (C123-B123*B123/120)/(120-1)= 1.28E6
127 1130
E127/B125= 96%
129
s= SQRT(E126)=
128 c.v.= s/xbar=
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
Lower Class
Boundary
Upper Class
Boundary
Class
Midpoint
Class
Frequency
Relative
Class
Cumulative
Class
h h h Frequency Frequency
0.5 0 0
0.5 600.5 300.5 46 0.383333 46
600.5 1200.5 900.5 28 0.233333 74
1200.5 1800.5 1500.5 16 0.133333 90
1800.5 2400.5 2100.5 17 0.141667 107
2400.5 3000.5 2700.5 3 0.025 110
3000.5 3600.5 3300.5 5 0.041667 115
3600.5 4200.5 3900.5 2 0.016667 117
4200.5 4800.5 4500.5 1 0.008333 118
4800.5 5400.5 5100.5 2 0.016667 120
Total 120
In cells:
A136:A143 B135:B143 C135:C143 D134:D143 E135:E143 F135:F143
the corresponding explanations are (same column):
A135:A142+600= (A135:A143+B135:B143)/2= D134:D142+
D135:D143=
A135:A143+600= Frequency(B3:B122,B135:B143)=
IncellsE135:E143theexplanationisD135:D143/D144.
AppendixClistssomefunctionswhichshouldnotbeusedduringthelearning
processbutareusefulshortcutsoncethereaderhaslearnedthefundamentalsthor-
oughly.
Concluding Comment
Inthischapterandtheonebefore,wehaveseenseveraltypesoffrequencydistribu-
tionsfromnumericaldata.Inthe nextfewchapterswewillencountertheoretical
probabilitydistributions,andsomeofthesewillbefoundtorepresentsatisfactorily
someofthefrequencydistributionsofthesechapters.
Problems
1. Thedailyemissionsofsulfurdioxidefromanindustrialplantintonnes/daywere
asfollows:
4.2 6.7 5.4 5.7 4.9 4.6 5.8 5.2 4.1 6.2
5.5 4.9 5.1 5.6 5.9 6.8 5.8 4.8 5.3 5.7
80
GroupedFrequenciesandGraphicalDescriptions
a) Prepareastem-andleafdisplayforthesedata.
b) Prepareaboxplotforthesedata.
2. Asemi-commercialtestplantproducedthefollowingdailyoutputsintonnes/
day:
1.3 2.5 1.8 1.4 3.2 1.9 1.3 2.8 1.1 1.7
1.4 3.0 1.6 1.2 2.3 2.9 1.1 1.7 2.0 1.4
a) Prepareastem-andleafdisplayforthesedata.
b) Prepareaboxplotforthesedata.
3. Overaperiodof60daysthepercentagerelativehumidityinavegetablestorage
buildingwasmeasured.Meandailyvalueswererecordedasshownbelow:
60 63 64 71 67 73 79 80 83 81
86 90 96 98 98 99 89 80 77 78
71 79 74 84 85 82 90 78 79 79
78 80 82 83 86 81 80 76 66 74
81 86 84 72 79 72 84 79 76 79
74 66 84 78 91 81 64 76 78 82
a) Makeastem-and-leafdisplaywithatleastfivestemsforthesedata.Show
theleavessortedinorderofincreasingmagnitudeoneachstem.
b) Makeafrequencytableforthedata,withamaximumboundof100.5%
relativehumidity(sincenorelativehumiditycanbemorethan100%).Use
Sturgesruletoapproximatethenumberofclasses.
c) Drawafrequencyhistogramforthesedata.
d) Drawarelativecumulativefrequencydiagram.
e) Findthemedian,lowerquartile,andupperquartile.
f) Findthearithmeticmeanofthesedata.
g) Findthemodeofthesedatafromthegroupedfrequencydistribution.
h) Drawaboxplotforthesedata.
i) Estimatefromthesedatatheprobabilitythatthemeandailyrelativehumid-
ityundertheseconditionsislessthan85%.
4. Arandomsamplewastakenofthethicknessofinsulationintransformerwind-
ings,andthefollowingthicknesses(inmillimeters)wererecorded:
18 21 22 29 25 31 37 38 41 39
44 48 54 56 56 57 47 38 35 36
29 37 32 42 43 40 48 36 37 37
36 38 40 41 44 39 38 34 24 32
39 44 42 30 37 30 42 37 34 37
32 24 42 36 49 39 23 34 36 40
a) Makeastem-and-leafdisplayforthesedata.Showatleastfivestems.Sort
thedataoneachsteminorderofincreasingmagnitude.
b) Estimatefromthesedatathepercentageofallthewindingsthatreceived
morethan30mmofinsulationbutlessthan50mm.
81
Chapter4
c) Findthemedian,lowerquartile,andninthdecileofthesedata.
d) Makeafrequencytableforthedata.UseSturgesrule.
e) Drawafrequencyhistogram.
f) Addandlabelanaxisforrelativefrequency.
g) Drawacumulativefrequencygraph.
h) Findthemode.
i) Showaboxplotofthesedata.
5. Thefollowingscoresrepresentthefinalexaminationgradesforanelementary
statisticscourse:
23 60 79 32 57 74 52 70 82 36
80 77 81 95 41 65 92 85 55 76
52 10 64 75 78 25 80 98 81 67
41 71 83 54 64 72 88 62 74 43
60 78 89 76 84 48 84 90 15 79
34 67 17 82 69 74 63 80 85 61
a) Makeastem-and-leafdisplayforthesedata.Showatleastfivestems.Sort
thedataoneachsteminorderofincreasingmagnitude.
b) Findthemedian,lowerquartile,andupperquartileofthesedata.
c) Whatfractionoftheclassreceivedscoreswhichwerelessthan65?
d) Makeafrequencytable,startingthefirstclassintervalatalowerclass
boundaryof9.5.UseSturgesRule.
e) Drawafrequencyhistogram.
f) Drawarelativefrequencyhistogramonthesamex-axis.
g) Drawacumulativefrequencydiagram.
h) Findthemode.
i) Showaboxplotofthesedata.
ComputerProblems
UseMSExcelinsolvingthefollowingproblems:
C6.ForthedatagiveninProblem3:
a) Sortthegivendataandfindthelargestandsmallestvalues.
b) Makeafrequencytable,startingthefirstclassintervalatalowerboundof
59.5%relativehumidity.UseSturgesruletoapproximatethenumberof
classes.
c) Findthemedian,lowerquartile,eighthdecile,and95thpercentile.
d) Findthearithmeticmeanandthemode.
e) Findthevarianceandstandarddeviationofthesedatatakenasacomplete
population,usingbothabasicdefinitionandamethodforfastercalculation.
f) Fromthecalculationsofpart(e)checkorverifyintwowaysthestatement
thatExcelstoresnumberstoaprecisionofaboutfifteendecimalplaces.
82
GroupedFrequenciesandGraphicalDescriptions
C7.ForthedatagiveninProblem4,performthesamecalculationsanddetermina-
tionsasinProblemC6.Chooseareasonablelowerboundaryforthesmallestclass.
C8.ForthedatagiveninProblem5:
a) Sortthedataandfindthelargestandsmallestvalues.
b) Findthemedian,upperquartile,ninthdecile,and90thpercentile.
c) Makeafrequencytable.UseSturgesruletoapproximatethenumberof
classes.
d) Findthearithmeticmeanandmode.
e) Findthevarianceofthedatatakenasasample.
83
CHAPTER
5
ProbabilityDistributionsof
DiscreteVariables
Forthischapterthereadershouldhaveasolidunderstanding
ofsections2.1,2.2,3.1,and3.2.
WesawinChapters3and4somefrequencydistributionsfordiscreteandcontinuous
variates.Examplesincludedfrequenciesofvariousnumbersofdefectiveitemsin
samplestakenfromproductionlines,andfrequenciesofvariousclassesofthick-
nessesofitemsproducedindustrially.
Nowwewanttolookattheprobabilitiesofvariouspossibleresults.Ifweknow
enoughabouttheprobabilitydistributions,wecancalculatetheprobabilityofeach
result.Forinstance,wecancalculatetheprobabilityofeachpossiblenumberof
defectiveitemsinasampleoffixedsize.Fromthatwemightcalculatetheprobabil-
ityoffinding(forexample)threeormoredefectiveitemsinasampleof18items.
Thatmightbeusefulinassessingtheimplicationsforqualitycontroloffindingthree
defectivesinsuchasample.Similarly,ifweknowenoughabouttheprobability
distributionwecancalculatetheprobabilitiesofpartswhicharethickerthanappro-
priatelimits.
Thenumberofdefectiveitemsinasampleof18itemsisarealnumberexpress-
ingaresultdeterminedbychance.Wecantpredictthenumberofdefectiveitemsin
thenextsample,butwemaybeabletocalculatesomeprobabilities.Theprobability
ofanyparticularnumberofdefectiveitemswouldbeafunctionoftheparametersof
theproblem.Aquantitysuchasthisiscalledarandomvariable.
Thedistinctionbetweenadiscreteandacontinuousrandomvariableisthesame
asthedistinctionbetweenadiscreteandacontinuousfrequencydistribution:only
certainresultsarepossibleforadiscreterandomvariable,butanyofaninfinite
numberofresultswithinacertainrangearepossibleforacontinuousrandomvari-
able.Therandomvariabledescribingthenumberofdefectiveitemsinasampleof18
partsisdiscretebecausethenumberofdefectiveitemsinthiscasemustbeeither
zeroorapositivewholenumbernomorethan18,andnotanyothernumberbetween
zeroand18.Anotherexampleofadiscreterandomvariableisthenumberoffailures
inanelectronicdeviceinitsfirstfiveyearsofoperation.Ontheotherhand,thetime
betweensuccessivefailuresofanelectronicdeviceisacontinuousrandomvariable
becausethereareaninfinitenumberofpossibleresultsbetweenanytwopossible
resultsthatwemaychoose(eventhoughpracticalmeasurementdevicesmaynotbe
84
ProbabilityDistributionsofDiscreteVariables
abletodistinguishsomeofthemfromoneanotherbecausetheyreportresultstoa
finitenumberoffigures).Anotherexampleofacontinuousrandomvariableisa
measurementofthediameterofapartasitcomesfromaproductionline.Wecannot
predictanyparticularvalueoftherandomvariablebut,withsufficientdataofthe
typediscussedinChapter4,wemaybeabletofindtheprobabilityofaresultina
particularinterval.
Thischapterisconcernedwithdiscretevariables,andthenextchapteris
concernedwithcaseswherethevariableiscontinuous.Bothtypesofvariablesare
fundamentaltosomeoftheapplicationsdiscussedinlaterchapters.Inthischapter
wewillstartwithageneraldiscussionofdiscreterandomvariablesandtheirprob-
abilityanddistributionfunctions.Thenwewilllookattheideaofmathematical
expectation,orthemeanofaprobabilitydistribution,andtheconceptofthevariance
ofaprobabilitydistribution.Afterthat,wewilllookindetailattwoimportant
discreteprobabilitydistributions,theBinomialDistributionandthePoisson
Distribution.
5.1 ProbabilityFunctionsandDistributionFunctions
(a) ProbabilityFunctions
Saythepossiblevaluesofadiscreterandomvariable,X,arex
0,
x
1
,x
2
,...x
k
,andthe
correspondingprobabilitiesarep(x
0
),p(x
1
),p(x
2
)...p(x
k
).Thenforanychoiceofi,
k
( )
1,wherekisthemaximumpossiblevalueofi.Thenp(x
i
)is p(x
i
) 0,and

p x
i
i0
aprobabilityfunction,alsocalledaprobabilitymassfunction.Analternativenota-
tionisthattheprobabilityfunctionofXiswrittenPr[X=x
i
].Inmanycasesp(x
i
)(or
Pr[X=x
i
])andx
i
arerelatedbyanalgebraicfunction,butinothercasestherelation
isshownintheformofatable.Therelationcanberepresentedbyisolatedspikeson
abargraph,asshownforexamplein
Figure5.1.Byconventiontherandom
0.30
variableisrepresentedbyacapital
letter(forexample,X),andparticular
0.25
valuesarerepresentedbylower-case
Figure5.1:Exampleofa 0.05
ProbabilityFunctionfora
DiscreteRandomVariable
0.00
-1 0 1 2 3 4 5 6 7 8 9 10
Value,x
85
letters(forexample,x,
P
r
o
b
a
b
i
l
i
t
y
,

p
(
x
)
0.20
x
i
,x
0
).
0.15
0.10
Chapter5
(b) CumulativeDistributionFunctions
Cumulativeprobabilities,Pr[X x],whereXstillrepresentstherandomvariableand
xnowrepresentsanupperlimit,arefoundbyaddingindividualprobabilities.
( ) (5.1) Pr[X x]=

p x
i
x x
i
wherep(x
i
)isanindividualprobabilityfunction.Forexample,ifx
i
canbeonlyzero
orapositiveinteger,
Pr[X 3]=p(0)+p(1)+p(2)+p(3)
Thefunctionalrelationshipbetweenthecumulativeprobabilityandtheupper
limit,x,iscalledthecumulativedistributionfunction,ortheprobabilitydistribution
function.
NotethatsincePr[X 2]=p(0)+p(1) +p(2),
wehave p(3)=Pr[X 3]Pr[X 2].
Ingeneral,
p(x
i
)=Pr[X x
i
]Pr[X x
i1
] (5.2)
Asanillustration,considertherandomvariablethatrepresentsthenumberof
headsobtainedontossingfivefaircoins.Theprobabilityofobtainingheadsonany
1
onecoinis
2
.Theprobabilityfunctionandcumulativedistributionaregivenbythe
binomialdistribution,whichwillbeconsideredindetailinsection5.3.Theprobabil-
ityfunctionofpossibleresultsisshowninTable5.1andFigure5.2.
Table5.1:ProbabilityFunctionforTossingCoins
r,no.ofheads Probability,p(r)
0
1
32
1
5
32
2
10
32
3
10
32
4
5
32
5
1
32
Total 1
86
ProbabilityDistributionsofDiscreteVariables
Figure5.2:
0.3
ProbabilityFunctionfor
ResultsofTossingFiveFairCoins
p(r)
0.2
0.1
0
0 1 2 3 4 5
Numberofheads,r
ThecorrespondingcumulativedistributionfunctionisshowninFigure5.3.The
graphofthecumulativedistributionfunctionforadiscreterandomvariableisa
steppedfunctionbecausetherecanbenochangeinthecumulativeprobability
betweenpossiblevaluesofthevariable.
Usingthiscumulativedistributionfunctionwithequation5.2,
26 16 10
p(3) =Pr[R3]Pr[R2]=

=0.3125.
32 32 32
1
0.9
PrR r] [
0.8
0.7
0.6
0.5
Figure5.3:
0.4
CumulativeDistributionfor 0.3
TossingFiveFairCoins
0.2
0.1
0
0 1 2 3 4 5
r,numberofheads
87
Chapter5
5.2 ExpectationandVariance
(a) ExpectationofaRandomVariable
Themathematicalexpectationorexpectedvalueofarandomvariableisanarithmetic
meanthatwecanexpecttocloselyapproximatethemeanresultfromaverylong
seriesoftrials,ifaparticularprobabilityfunctionisfollowed.Theexpectedvalueis
themeanofallpossibleresultsforaninfinitenumberoftrials.Wemustknowthe
completeprobabilityfunctioninordertocalculatetheexpectation.Theexpectation
ofarandomvariableXisdenotedbyE(X)or or.Thelasttwosymbolsindicate
thattheexpectationorexpectedvalueisthemeanvalueofthedistributionofthe
randomvariable.
x
Letusgobacktotheempiricalapproachtoprobability.Theprobabilityofa
particularresultwouldbegiventoagoodapproximationbytherelativefrequencyof
thatresultfroman extremelylargenumberoftrials:
f x
i
Pr[x
i
]

( )
(5.3)
( )

f x
i
alli
Ifthenumberoftrialsbecameinfinite,thisrelationwouldbecomeexact.
Wealsohavefromequation3.2athat
,
N
f x
j
x

x
,
,

( )
]
]
j
( )
]
(5.4)
j1
,
f x
i
]

alli
]
Thefactorwithinsquarebracketsinequation5.4istherelativefrequencyforfactorj.
Thenforaninfinitenumberoftrialswehave,usingequation5.3,that
X
x
i
E(X)= =

( )
Pr
[ ] (5.5)
x
i
allx
i
Inwords,theexpectationorthemeanvalueoftherandomvariableXisgivenbythe
sum,forallpossibleoutcomes,oftheproductsgivenbymultiplyingeachoutcome
byitsprobability.Ifwerepeatedanexperimentaverylargenumberoftimes,the
arithmeticmeanoftheresultswouldcloselyapproximatetheexpectedvalueifthe
statedprobabilitydistributionwasfollowed.Theserelationsapply,aswritten,to
discreterandomvariables,butasimilarrelationwillbefoundinsection6.2fora
continuousrandomvariable.Equation5.5willbeusedfromthispointontocalculate
expectationofadiscreterandomvariable.
Therelationfortheexpectedvaluecanbeillustratedfortherandomvariable,R,
whichwasshowninFigures5.2and5.3.Itisthenumberofheadsobtainedon
tossingfivefaircoins.
=2.500
88
ProbabilityDistributionsofDiscreteVariables
Noticethat,likethearithmeticmean,theexpectedvalueisnotnecessarilyapossible
resultfromasingletrial.
Example5.1
Theprobabilitythatathirty-year-oldmanwillsurviveafixedlengthoftimeis0.995.
Theprobabilitythathewilldieduringthistimeistherefore10.995=0.005.An
insurancecompanywillsellhima$20,000lifeinsurancepolicyforthislengthof
timeforapremiumof$200.00.Whatistheexpectedgainfortheinsurancecom-
pany?
Answer:Ifthemanlivesthroughthefixedlengthoftime,thecompanysgainwill
be$200.00.Theprobabilityofthisis0.995.Ontheotherhand,ifthemandies
duringthistime,thecompanysgainwillbe+$200.00$20,000.00=$19,800.00.
Theprobabilityofthisis0.005.
Usingtheworkingexpression,equation5.5,theexpectedgainforthecompanyis
E(X)=($200.00)(0.995)+($19,800.00)(0.005)
=$199.00$99.00=$100.00
Theideaoffairoddswasintroducedinsection2.1(f)asanalternativeexpression
givingthesameinformationasprobability.Itiseasytoshowfromexpectationthat
therelationsgiveninthatsectionarecorrect.Iftheprobabilityofsuccessina
particulartrialispandtheonlypossibleresultsaresuccessandfailure,the
probabilityoffailuremustbe1p.Iftheprocessiscompletelyfair,theexpecta-
tionofgainforanyindividualmustbezero.Ifthewagerforsuccessis$1,andthe
wageragainstsuccessis$A,theindividualsgaininthecaseofsuccessis$A
andhisgaininthecaseoflossis$1.Thenwemusthave
(p)($A)+(1p)($1)=0
(p)($A)=(1p)($1)
$A 1 p

$1 p
Theratioofonewagertotheotheriscalledtheodds.Thenthefairoddsagainst
successmustbe
1 p
p

to1.Similarly,thefairoddsforsuccessmustbep/(1p)
to1.
(b) VarianceofaDiscreteRandomVariable
Thevariancewasdefinedforthefrequencydistributionofapopulationby
N
2
equation3.6as

(
x
)
/Nthatis,themeanvalueof(x
i
)
2
.Sincethe
i
i 1
quantitycorrespondingtothemeanforaprobabilitydistributionistheexpectation,
thevarianceofadiscreterandomvariablemustbe
89
Chapter5
2
2
E x
x
)
X
(
2

(
x
i

X
)
Pr
[ ]
(5.6)
x
i
i
Analternativeform,liketheonefoundinequation3.10a,isfastertocalculate.It
isobtainedasfollows:
E[(X
X
)
2
] = E[X
2
2(
X
)(X)+
x
2
]
2
=E[X
2
]2
x
E[X]+
x
ButE[X]=
X
.Then
2 2
E X
X
)
2
]
E, ] 2
2
+ X
2
X
,

(
]
]
X X
or
(5.7)
2
, ]
2
E X
2
X
]

X
where
2
E X
2
, ] x Pr
( )
]
i
x
i
(5.8)
alli
Thestandarddeviationisalwayssimplythesquarerootofthecorresponding
variance.Then
( )
( ) ( )
2
2
2
x X
E X
E X
, ]

]
, ]
]
E X
Letuscontinuewiththepreviousillustrationfortherandomvariable,R,givenby
thenumberofheadsobtainedontossingfivefaircoins.
Fromthepreviouscalculation, E R
( )
2.500
R
2
Then E R
2
R
2
( )

R
2
7.500
(
2.500
)
1.25
and
1.251.118
R
90
ProbabilityDistributionsofDiscreteVariables
Example5.2
Aprobabilityfunctionisgivenbyp(0)=0.3164,p(1)=0.4219,p(2)=0.2109,p(3)
=0.0469,andp(4)=0.0039.Finditsmeanandvariance.
Answer: Themeanorexpectedvalueis
(0)(0.3164)+(1)(0.4219)+(2)(0.2109)+(3)(0.0469)+(4)(0.0039)=1.000.
Thevarianceis
(0)
2
(0.3164)+(1)
2
(0.4219)+(2)
2
(0.2109)+(3)
2
(0.0469)+(4)
2
(0.0039)(1.000)
2
=
1.7501.000=0.750.
Problems
1. Theprobabilitiesofvariousnumbersoffailuresinamechanicaltestareas
follows:
Pr[0failures]=0.21,Pr[lfailure]=0.43,Pr[2failures]=0.28,Pr[3failures]=
0.08,Pr[morethan3failures]=0.
(a) Showthisprobabilityfunctionasagraph.
(a) Sketchagraphofthecorrespondingcumulativedistributionfunction.
(b) Whatistheexpectednumberoffailuresthatis,themathematicalexpecta-
tionofthenumberoffailures?
2. Threeitemsareselectedatrandomwithoutreplacementfromaboxcontaining
tenitems,ofwhichfouraredefective.Calculatetheprobabilitydistributionfor
thenumberofdefectivesinthesample.Whatistheexpectednumberof
defectivesinthesample?
3. Anexperimentwasconductedwhereinthreeballsweredrawnatrandomfroma
barrelcontainingtwoblueballs,threeredballs,andfivegreenballs.
a) Findthemeanandvarianceoftheprobabilitydistributionofthenumberof
greenballschosen.
b) Whatistheprobabilitythatalltheballswillhavethesamecolor?
4. AmodifiedversionofthegameofYahtzeehasbeendevelopedandconsistsof
throwingthreediceonce.Thepointsassociatedwiththepossibleresultsareas
follows:
Result Points
Threeofakind 500
Apair 100
Alldifferent 50
a) Findtheprobabilitydistributionofthenumberofpoints.
b) Findtheexpectedvalueofthenumberofpoints.
c) Findthestandarddeviationofthenumberofpoints.
91
Chapter5
5. Adiscreterandomvariable,X,hasthreepossibleresultswiththefollowing
probabilities:
Pr[X=1]=1/6
Pr[X=2]=1/3
Pr[X=3]=1/2
Nootherresultscanoccur.
(a) Sketchagraphoftheprobabilityfunction.
(b) Whatisthemeanorexpectedvalueofthisrandomvariable?
(c) Whatarethevarianceandstandarddeviationofthisrandomvariable?
6. i)Findtheprobabilitythat,when5fairsix-sideddicearerolled,theresultis:
a) 5-of-a-kind(all5numbersthesame);
b) 4-of-a-kind(4numbersthesameand1different);
c) afullhouse(3ofonenumber,2ofanothernumber);
d) 3-of-a-kind(theother2numbersbeingdifferentfromoneanother);
e) asinglepair;
f) twopairs;
g) all5numbersdifferent.
Checkthatallaboveprobabilitiesaddto1.
ii)Theplayersagreetotaketurnsrollingthediceandtocollectaccordingtoa
payoutscheme.Ifthepayoutsare$1000for5-of-a-kind,$40for4-of-a-kind,$20
forafullhouse,$5for3-of-a-kind,$2forapairand$4fortwopair,whatisthe
expectedvalueonasinglerollof5dice?
7. Alocalbodyshopisrunbyfouremployees.However,withsuchasmallstaff,
absenteeismcreatesmanydifficultiesfinancially.Ifonlyoneemployeeisabsent,
thedaystotalincomeisreducedby50%,andifmorethanoneisabsent,the
shopisclosedforthatday.Whenallfourareworking,anincomeof$1000per
daycanberealized.Theshopsexpensesare$600perdaywhenopenedand
$400perdaywhenclosed.If,ontheaverage,oneparticularemployeemissesten
of100daysandtheremainingthreemissfiveof100dayseach,whatisthe
expecteddailyprofitforthecompany?Assumeallabsencesareindependent.
8. Afactoryproduces3diesel-generatorsetsperweek.Attheendofeachweek,the
setsaretested.Ifthesetsareacceptable,theyareshippedtopurchasers.The
probabilitythatasetprovestobeacceptableis0.70.Thesecondpossibilityis
thatminoradjustmentscanbemadesothatasetwillbecomeacceptablefor
shipping;thishasaprobabilityof0.20.Thethirdpossibleoutcomeisthattheset
hastogotothediagnosticshopformajoradjustmentandbeshippedatalater
date;thishasaprobabilityof0.10.Outcomesfordifferentsetsareindependent
ofoneanother.
(a) Findtheprobabilityofeachpossiblenumberofsets,foroneweeksproduc-
tion,whichareacceptablewithoutanyadjustment.
92
ProbabilityDistributionsofDiscreteVariables
(b) Whatistheexpectednumberofsetswhicharetestedandfoundtobeaccept-
ablewithoutadjustment?
(c) Whatisthecumulativeprobabilitydistributionforthenumberofsetswhich
aretestedandfoundtobeacceptablewithoutadjustment? Sketchthe
correspondinggraph.
9.
Probabilities: 0.9 0.8
A B
C D
Input Output
Probabilities: 0.7 0.6
Figure5.4:Series-ParallelSystem
Asystemconsistsoftwobranchesinparallel,eachbranchhavingtwocompo-
nents.TheprobabilitiesofsuccessfuloperationofcomponentsA,B,C,andD
are0.9,0.8,0.7,and0.6,asshownabove.Ifacomponentfails,theoutputfrom
itsbranchiszero.Ifonlyonebranchoperates,theoutputis50%.Ofcourse,if
bothbranchesoperate,theoutputis100%.
a) Findtheprobabilityofzerooutput.
b) Findtheexpectedpercentageoutput.
10. Forconstantrateofinput,therateofoutputofasystemisdeterminedby
whetherA,B,andCoperate,asshownbelow.
Input Output
A
B
C
Figure5.5:ParallelComponents,thenSeries
TheprobabilitiesthatcomponentsA,B,andCoperateareasfollows:
Pr[A]=0.70,Pr[B]=0.60,Pr[C]=0.90.
93
Chapter5
IfallofA,B,andCoperate,thesystemoutputis100.IfbothAandCoperate
butnotB,orbothBandCbutnotA,thesystemoutputis80.IfbothAandB
fail,thesystemoutputis0.IfCfails,thesystemoutputis0.
a) Findtheprobabilityofeachpossibleoutput.
b) Findtheexpectedoutput.
(c) MoreComplexProblems
Nowletuslookattwomorecomplexexamples.Tosolvethemwewillneedtouse
ourknowledgeofbasicprobabilityaswellasknowledgeofexpectedvalues.Wewill
havetoreadeachproblemverycarefully.Inthegreatmajorityofcases,atree
diagramwillbeverydesirable.
Example5.3
Amanufacturerhastwoexpansionoptionsavailabletohim.Theprofitsofthe
expansionsdependonthecostofenergy.Thefairoddsare3:2infavorofenergy
costsbeinggreaterthan8/kwh.Themanufactureristwiceaslikelytochooseoption
1asoption2,regardlessofcircumstances.
Ifthecostofenergyislessthan8/kwh,thenexpansionoption1willyieldreturns
of+$150,000,$0,and$50,000withprobabilitiesof60%,20%,and20%,respec-
tively.Underthoseconditions,expansionoption2willyieldreturnsof+$100,000,
+$20,000,and$20,000withprobabilitiesof70%,10%,and20%,respectively.
Ifthecostofenergyisgreaterthan8/kwh,thenoption1willyieldreturnsof
+$100,000,$0,and$50,000withprobabilitiesof60%,20%,and20%,respectively,
whileoption2willyieldreturnsof+$80,000,$0,and$50,000withprobabilitiesof
70%,10%,and20%,respectively.
a) Whatistheprobabilitythatoption2willbepursuedandthatenergyprices
willexceed8/kwh?
b) Whatisthemanufacturersexpectedreturnfromexpansion?
c) Giventhatseveralyearslatertheexpansionyieldedareturngreaterthan
zero,whatistheprobabilitythatoption2waschosen?
Answer:Thefirststepwillbetodrawatreediagram.(SeeFigure5.6.)
a) Pr[(option2)(energy>8/kwh)]=
=(Pr[energy>8/kwh]) (Pr[(option2)|(energy>8/kwh)])
3 1 \ 1 j \j

, (, (

or 0.200.
5 3
,
5
( ,(
b) Expectedreturn= (returnforeachpossibility)(Pr[thatreturn])
allpossibilities
=[(0.16)(150)+(0.05333)(0)+(0.05333)(50)+(0.09333)(+100)+
+(0.01333)(+20)+(0.02667)(20)+(0.24)(+100)+(0.08)(0)+
+(0.08)(50)+(0.14)(+80)+(0.02)(0)+(0.04)(50)]thousanddollars
=59.6thousanddollars
=$59,600.
94
ProbabilityDistributionsofDiscreteVariables
ProbabilitiesforEnergyCosts 0.4 0.6
Energy<$0.08/kwh
Energy>$0.08/kwh
Probabilitiesforoptions 0.667 0.333 0.667 0.333
Option1 Option2 Option1 Option2
Probabilitiesfor
Returns
0.6 0.2 0.7 0.2 0.6 0.2 0.7 0.2
0.2 0.1 0.2 0.1
Return +150 0 -50 +100+20-20 +100 0 -50 +80 0 -50
(thousanddollars)
Combinedprobabilities:0.16 0.0533 0.0133 0.24 0.08 0.02
0.0533 0.0933 0.0267 0.08 0.14 0.04
Check:Sumofprobabilities=1.
Figure5.6:ExpansionOptions
Pr,
(
option2
)

(
return> 0
)
]
]
c) Pr[option2|(return>0)]=
Pr
(
return> 0
)
Pr,
(
option2
)

(
return> 0
)
]
]

Pr,
(
option2
)

(
return> 0
)
] + Pr,
(
option1
)

(
return> 0
)
]
] ]
0.0933+ 0.01333+ 0.14

(
0.09333+ 0.01333+ 0.14
)
+
(
0.16+ 0.24
)
0.2467

0.2467+ 0.4000
=0.381or38.1%.
(Notethatpart(c)involvesBayesianprobability.)
Example5.4
Afloodforecasterissuesafloodwarningundertwoconditionsonly:
i) Wintersnowfallexceeds20cmregardlessoffallrainfall;or
ii) Fallrainfallexceeds10cmandwintersnowfallisbetween15and20cm.
Theprobabilityofwintersnowfallexceeding20cmis0.05.Theprobabilityof
wintersnowfallbetween15and20cmis0.10.Theprobabilityoffallrainfallexceed-
ing10cmis0.10.
95
Chapter5
a) Whatistheprobabilitythattheforecasterwillissueawarninganygiven
spring?
b) Giventhatheissuesawarning,whatistheprobabilitythatwintersnowfall
wasgreaterthan20cm?
c) Theprobabilityoffloodingis0.75forcondition(i)above,0.60forcondition
(ii)above,and0.05forconditionswherenofloodingisanticipated.Ifthe
costofafloodafterawarningis$100,000,afloodwithnowarningis
$1,000,000,nofloodafterawarningis$200,000,andzerofornowarning
andnoflood,whatistheexpectedcostinanygivenyear?
Answer:Again,thefirststepistodrawatreediagramusingthegiveninformation.
WinterSnowfall
Probability=0.85 0.10 0.05
Snowfall
Snowfall Snowfall
<15cm
between15cmand20cm >20cm
FallRainfall
Probability =0.90 0.10
Rainfall Rainfall
<10cm >10cm
(Conditionii) (Conditioni)
NoWarning FloodWarning FloodWarning
Probability=0.95 0.05 0.40 0.60 0.25 0.75
NoFlood Flood
NoFlood Flood NoFlood Flood
Result:
nowarning, nowarning, awarning, awarning, awarning, awarning,
noflood aflood noflood aflood noflood aflood
Probability:
(0.85)(0.95) (0.85)(0.05) (0.10)(0.10)(0.40) (0.10)(0.10)(0.60) (0.05)(0.25) (0.05)(0.75)
+(0.1)(0.9)(0.95) +(0.1)(0.9)(0.05)
=0.893 = 0.047 =0.004 =0.006 =0.0125 =0.0375
Cost:
$0 $1,000,000 $200,000 $100,000 $200,000 $100,000
Figure5.7:FloodProbabilities
96
ProbabilityDistributionsofDiscreteVariables
IfPr[wintersnowfall>20cm]=0.05andPr[15cm<wintersnowfall<20cm]=
0.10,thenPr[wintersnowfall<15cm]=10.050.10=0.85.
IfPr[fallrainfall>10cm]=0.10,thenPr[fallrainfall<10cm]=10.10=0.90.
a) Usingthetreediagram,Pr[warning]=0.05+(0.10)(0.10)=0.05+0.01=
0.06.
Pr,

(
wintersnowfall20cm
)
warning]
]
b) Pr[wintersnowfall>20cm|warning]

Pr warning
]

[

Pr,

(wintersnowfall20cm)
]
]

[
0.05
Pr warning
]
0.06
0.83
(NoticethatthiscalculationusedBayesRule.)
c) Inordertocalculateexpectedcosts,wewillneedprobabilitiesofeach
combinationofwarningornowarningandfloodornoflood.Theseare
showninthesecond-lastlineofFigure5.7.Weshouldapplyacheckon
thesecalculations:dotheprobabilitiesaddupto1?
0.893+0.047+0.004+0.006+0.0125+0.0375=1.000(check).
Nowusingequation4.5,theexpectedcostinanygivenyearis
($100,000)(0.0375)+($200,000)(0.0125)+($100,000)(0.006)+
($200,000)(0.04)+($1,000,000)(0.047)+($0)(0.893)=$61,850
Problems
1. EverystudentinacertainprogramofstudiestakesallthreeofcoursesA,B,and
C.Theaverageenrollmentintheprogramis50students.
Pasthistoryshowsthatontheaverage:
(1) 5studentsincourseAreceivemarksofatleast75%.
(2) 7.5studentsincourseBreceivemarksofatleast75%.
(3) 6studentsincourseCreceivemarksofatleast75%.
(4) 80%ofstudentswhoreceivemarksofatleast75%incourseAalsodosoin
courseB.
(5) 50%ofstudentswhoreceivemarksofatleast75%incourseBalsodosoin
courseC.
(6) 60%ofstudentswhoreceivemarksofatleast75%incourseCalsodosoin
courseA.
(7) 10studentsreceivemarksofatleast75%inoneormoreoftheseclasses.
Asponsorgivesascholarshipof$500toanyonewhoreceivesamarkofat
least75%inallthreecourses.Whatcanthesponsorexpecttopayonaver-
age?
97
Chapter5
2. Aboxcontainsafaircoinandatwo-headedcoin.Acoinisselectedatrandom
andtossed.Ifheadsappears,theothercoinistossed;iftailsappears,thesame
coinistossed.
a) Findtheprobabilitythatheadsappearsonthesecondtoss.
b) Findtheexpectednumberofheadsfromthetwotosses.
c) Ifheadsappearedonthefirsttoss,findtheprobabilitythatitalsoappeared
onthesecondtoss.
3. Aboxcontainstworedandtwogreenballs.Acontestantinagameshowselects
aballatrandom.Iftheballisgreen,hereceivesnoprizeforthedrawandputs
theballononeside.Iftheballisred,hereceives$1000andputstheballbackin
thebox.Thegameisoverwhenbothgreenballsaredrawnorafterthreedraws,
whichevercomesfirst.
a) Whatistheprobabilityofthecontestantreceivingnoprizeatall?
b) Whatistheexpectedprize?
c) Ifthegamelastsforthreedraws,whatistheprobabilitythatagreenballwas
selectedonthefirstdraw?
4. Theprobabilitiesofthemonthlysnowfallexceeding10cmataparticularloca-
tioninthemonthsofDecember,JanuaryandFebruaryare0.20,0.40and0.60,
respectively.Foraparticularwinter:
a) Whatistheprobabilityofnotreceiving10cmofsnowfallinanyofthe
monthsofDecember,JanuaryandFebruaryinaparticularwinter?
b) Whatistheprobabilityofreceivingatleast10cmsnowfallinamonth,inat
leasttwoofthethreemonthsofthatwinter?
c) Giventhatthesnowfallexceeded10cmineachofonlytwomonths,whatis
theprobabilitythatthetwomonthswereconsecutive?
d) Findtheexpectednumberofmonthsinwhichmonthlysnowfalldoesnot
exceed10cm.
5. TheprobabilitythatJimwillhitatargetonacertainrangeis25%foranyone
shot,regardlessofwhathappenedonthepreviousshotorshots.Hefiresfour
shots.
a) WhatistheprobabilitythatJimwillhitthetargetexactlytwice?
b) Whatistheprobabilitythathewillhitthetargetatleastonce?
c) Findtheexpectednumberofhitsonthetarget.
d) IffivepersonswhoareequallygoodmarksmenasJimshootatfivetargets,
whatistheprobabilitythatexactlytwotargetsarehitatleastonce?
6. Threeboxescontainingred,whiteandblueballsareusedinanexperiment.Box
#1containstwored,threewhiteandfiveblueballs;Box#2containsoneredand
threewhiteballs;andBox#3containsthreered,onewhiteandthreeblueballs.
TheexperimentconsistsofdrawingaballatrandomfromBox#1andplacingit
withtheotherballsinBox#2,thendrawingaballatrandomfromBox#2and
placingitinBox#3.
98
ProbabilityDistributionsofDiscreteVariables
a) DrawtheprobabilitydistributionofthenumberofredballsinBox#3atthe
endoftheexperiment.
b) WhatistheexpectednumberofredballsinBox#3attheendoftheexperiment?
c) GiventhatattheendoftheexperimenttherearethreeredballsinBox#3,
whatistheprobabilitythatawhiteballwaspickedfromBox#l?
d) Aftertheexperimentiscompleted,aballisdrawnfromBox#3.Whatisthe
probabilitythattheballiswhite?
7. Twooctahedraldicewithfacesmarked1through8areconstructedtobeoutof
balancesothatthe8is1.5timesasprobableasthe2through7,andthesumof
theprobabilitiesofthelandthe8equalsthatoftheotherpairsonopposing
faces,i.e.the2and7,the3and6,andthe4and5.
a) Findtheprobabilitydistributionandthemeanandvarianceofthenumber
thatcanshowupononerollofthetwodice.
b) Findtheprobabilitiesofgettingbetween5and9(inclusive)onatleast3out
of10rollsofthetwodice.
c) Findtheprobabilityofgettingoneoccurrenceofbetween2and4,five
occurrencesofbetween5and9,andfouroccurrencesofbetween9and16,
in10rollsofthetwodice.
Allrangesofnumbersareinclusive.
8. Apanelofpeopleisassembledtotesttheabilitytocorrectlydistinguishan
improvedproductfromanolderproduct.Thepanelistsarechosenfroma
populationconsistingof20%ruraland80%urbanpeople.Two-thirdsofthe
populationareyoungerthan30yearsofage,whileone-thirdareolder.The
probabilitythattheurbanpanelistsunder30yearsofagewillcorrectlyidentify
theimprovedproductis12%,whileforolderurbanpanelists,theprobability
increasesto45%.Regardlessofage,ruralpanelistsaretwiceaslikelyasurban
paneliststocorrectlyidentifytheimprovedproduct.
a) Whatistheprobabilitythatanyonepanelistchosenatrandomfromthis
populationwillcorrectlyidentifytheimprovedproduct?
b) Forapanelof10persons,whatistheexpectednumberofpanelistswhowill
correctlyidentifytheimprovedproduct?
c) Ifapanelisthascorrectlyidentifiedtheimprovedproduct,whatisthe
probabilitythatthepanelistisunder30yearsofage?
d) Ifapanelistisunder30yearsofage,whatistheprobabilitythatthepanelist
willcorrectlyidentifytheimprovedproduct?
9. Certaindevicesarereceivedatanassemblyplantinbatchesof50.Thesampling
schemeusedtotestallbatcheshasbeensetupinthefollowingway.Oneofthe
50devicesischosenrandomlyandtested.Ifitisdefective,alltheremaining49
itemsinthatbatcharereturnedtothesupplierforindividualtesting;ifthetested
deviceisnotdefective,anotherdeviceischosenrandomlyandtested.Ifthe
seconditemisnotdefective,thecompletebatchisacceptedwithoutanymore
testing;iftheseconddeviceisdefective,athirddeviceischosenrandomlyand
99
Chapter5
tested.Ifthethirddeviceisnotdefective,thecompletebatchisacceptedwithout
anymoretesting,buttheonedefectivedeviceisreplacedbythesupplier.Ifthe
thirddeviceisdefective,allremaining47itemsinthatbatcharereturnedtothe
supplierforindividualtesting.
Thereceiverpaysforallinitialsingle-itemtests.However,wheneverthe
remainingdevicesinabatcharereturnedtothesupplierforindividualtests,the
costsofthisextratestingarepaidbythesupplier.Ifabatchisreturnedtothe
supplier,thesuperintendentmustensurethatthereceiverissent50itemswhich
havebeentestedandshowntobegood.Assumethatthesuperintendentaccepts
theresultsofthereceiverstests.Eachdeviceisworth$60.00andthecostof
testingis$10.00perdevice.
Considerabatchwhichcontains12defectiveitemsand38gooditems.
a) Whatistheprobabilitythatthebatchwillbeaccepted?
b) Whatistheexpectedcosttothesupplierofthetestingandofreplacing
defectives?
c) Ofthe12defectiveitemsinthebatch,findtheexpectednumberwhichwill
beaccepted.
10. Anoilrefineryhasaproblemwithairpollution.Inanyoneyeartheprobability
ofescapeofSO
2
is23%,andprobabilityofescapeofastickyoilis16%.Escape
ofSO
2
andescapeoftheoilwillnotoccuratthesametime.Ifthewinddirection
isright,theSO
2
oroilwillblowawayfromthecityandnodamagewillresult.
Theprobabilityofthisis55%.Otherwise,anescapeofSO
2
willresultindamage
claimsof$80,000,anescapeofoilwillresultindamageclaimsof$45,000,and
therewillbepossibilityofafine.IfthepollutantisSO
2
,undertheseconditions
thereis90%probabilityofafine,whichwillbe$150,000.Ifthepollutantisoil,
theprobabilityofafinedependsonwhethertheoilaffectsaprominent
politicianshouseornot.Ifoilcausesdamage,theprobabilityitwillaffecthis
houseis5%.Ifitaffectshishouse,theprobabilityofafineis96%.Ifitdoesnot
affecthishouse,theprobabilityofafineis65%.Ifthereisafineforpollutionby
oil,itis$175,000.Answerthefollowingquestionsforthenextyear.
a) WhatistheprobabilitytherewillbedamageclaimsforescapeofSO
2
?
b) Whatistheprobabilitytherewillbedamageclaimsforescapeofoil?
c) Whatistheprobabilityofa$150,000fine?
d) Whatistheexpectedcostfordamagesandfines?
11. Aminingcompanyisplanningstrategywithrespecttoitsoperations.Ithasthe
optionofdeveloping3properties,butonlyinagivensequenceofA,B,andC.
TheprobabilityofAbeingsuccessfulandyieldinganetprofitof$1.5millionis
0.7,andtheprobabilityofitsfailingandcausingalossof$0.5millionis0.3.If
Aissuccessful,Bhas0.6probabilityofbeingsuccessfulandproducingagainof
$1.2million,and0.4probabilityofbeingafailureandcausingalossof$.75
million.IfAisafailure,Bhas0.4probabilityofbeingasuccesswithagainof
$1million,and0.6probabilityofbeingafailurewithalossof$1.8million.If
100
ProbabilityDistributionsofDiscreteVariables
bothAandBarefailures,thenthecompanywillnotproceedwithC.IfbothA
andBaresuccesses,Cwillbeasuccesswithprobabilityof0.9andagainof
$2.5million,orafailurewithprobabilityof0.1andalossof$1.5million.If
eitherAorBisafailure(butnotboth)thenCisattempted.Inthatcase,the
probabilityofsuccessofCwouldbe0.3butagainof$5millionwouldresult;
failureofC,probability0.7,wouldresultinalossof$0.8million.Thecompany
decidestoproceedwiththisstrategy.
a) Whatistheexpectedgainorloss?
b) GiventhatAisafailure,whatistheexpectedtotalgainfromprojectsBandC?
c) Giventhatthereisanetlossforallthree(ortwo)projectstakentogether,
whatistheprobabilitythatBwasafailure?
5.3 BinomialDistribution
Thisimportantdistributionappliesinsomecasestorepeatedtrialswherethereare
onlytwopossibleoutcomes:headsortails,successorfailure,defectiveitemorgood
item,ormanyotherpossiblepairs.Theprobabilityofeachoutcomecanbecalcu-
latedusingthemultiplicationrule,perhapswithatreediagram,butitisusually
muchfasterandmoreconvenienttouseageneralformula.
Therequirementsforusingthebinomialdistributionareasfollows:
Theoutcomeisdeterminedcompletelybychance.
Thereareonlytwopossibleoutcomes.
Alltrialshavethesameprobabilityforaparticularoutcomeinasingletrial.
Thatis,theprobabilityinasubsequenttrialisindependentoftheoutcomeof
aprevioustrial.Letthisconstantprobabilityforasingletrialbep.
Thenumberoftrials,n,mustbefixed,regardlessoftheoutcomeofeachtrial.
(a) IllustrationoftheBinomialDistribution
Allitemsfromaproductionlinearetestedastheyareproduced.Eachitemisclassi-
fiedaseitherdefective(D)orgood(G).Therearenootherpossibleoutcomes.Pr[D]
=0.100,Pr[G]=1Pr[D]=0.900.Letusconsiderallthepossibleresultsfora
sampleconsistingofthreeitems,calculatingtheirprobabilitiesfrombasicprinciples
usingthemultiplicationruleofsection2.2.2.
Outcome ProbabilityofthatOutcome
GGG (0.900)
3
=0.729
DGG (0.100)(0.900) =0.081
GDG (0.900)(0.100)(0.900) =0.081
GGD (0.900)
2
(0.100) =0.081
DDG (0.100)
2
(0.900) =0.009
GDD (0.900)(0.100)
2
=0.009
DGD (0.100)(0.900)(0.100) =0.009
DDD (0.100)
3
=0.001
Total = 1.000 (Check)
101
Chapter5
Noticethattheoutcomecontainingthreegooditemsappearedonce,andsodid
theoutcomecontainingthreedefectiveitems.Theoutcomecontainingtwogood
itemsandonedefectiveappearedthreetimes,whichisthenumberofpermutationsof
twoitemsofoneclassandoneitemofanotherclass.Theoutcomecontainingone
gooditemandtwodefectivesalsoappearsthreetimes(asDDG,GDD,andDGD);
again,thisisthenumberofpermutationsofoneitemofoneclassandtwoitemsof
anotherclass.
(b) GeneralizationofResults
Nowwelldevelopmoregeneralresults.Lettheprobabilitythatanitemisdefective
bep.Lettheprobabilitythatanitemisgoodbeq,suchthatq=1p.Noticethatthe
definitionsofpandqcanbeinterchanged,andothertermssuchassuccessand
failurecanbeusedinstead(andoftenare).Letthefixednumberoftrialsben.The
probabilitythatallnitemsaredefectiveisp
n
.Theprobabilitythatexactlyritemsare
defectiveand(nr)itemsaregood,inanyonesequence,isp
r
q
(nr)
.Butrdefective
itemsand(nr)gooditemscanbearrangedinvariousways.Howmanydifferent
ordersarepossible? Thisisthenumberofpermutationsintotwoclasses,consisting
ofrdefectiveitemsand(nr)gooditems,respectively.Fromsection2.2.3this
n!
numberofpermutationsisgivenby
r n r
)
!
.Butthisisexactlytheexpressionfor
!
(
thenumberofcombinationsofnitemstakenratatime,
n
C
r
.Thenthegeneral
expressionfortheprobabilityofexactlyrdefectiveitems(orsuccesses,heads,etc.)
inanyorderinntrialsmustbep
r
q
(nr)
multipliedby
n
C
r
,or
r
p
r
q
(nr)
Pr[R=r]=
n
C (5.9)
Thelefthandsideofthisequationshouldbereadastheprobabilitythatexactlyr
itemsaredefective(orsuccesses,heads,etc.).
Thenamegiventothisdiscreteprobabilitydistributionisthebinomialdistribu-
tion.Thisnamearisesbecausetheexpressionforprobabilityinequation5.9isthe
sameasthe(n+1)thterminthebinomialexpansionof(q+p)
n
.
Tablesofcumulativebinomialprobabilitiesarefoundinmanyreferencebooks.
Individualbinomialprobabilities,likethosegiveninequation5.9,arefoundfrom
cumulativebinomialprobabilitiesbysubtractionusingequation5.2.Bothindividual
andcumulativeprobabilitiescanbecalculatedalsousingcomputersoftwaresuchas
Excel.Thatwillbediscussedbrieflyinsection5.3(f).
(c) ApplicationoftheBinomialDistribution
Thebinomialdistributionisoftenusedinqualitycontrolofitemsmanufacturedbya
productionlinewheneachitemisclassifiedaseitherdefectiveornondefective.To
meettherequirementsofthebinomialdistributiontheprobabilitythatanitemis
defectivemustbeconstant.Thisconditionisnotmetbysamplingwithoutreplace-
mentfromasmallbatchbecause,aswehaveseenfromExample2.7,inthatcasethe
102
ProbabilityDistributionsofDiscreteVariables
probabilitythattheseconditemdrawnwillbedefectivedependsonwhetherthefirst
itemdrawnwasdefectiveornot,andsoon.Theconditionofconstantprobabilityis
mettoanacceptableapproximationifthetotalnumberoftrialsismuchlessthanthe
batchsize,soforasufficientlysmallsamplefromalargeenoughbatch.Thenthe
probabilityof adefect(orsuccessetc.)onasingletrialwillbeapproximatelyconstant.
Theconditionismetforsamplingitembyitemfromcontinuousproduction
underconstantconditions.Itisalsometforsamplingfromasmallbatchifeachitem
whichisremovedasaspecimenisreturnedtothebatchandmixedthoroughlywith
theotheritems,onceithasbeenexaminedandclassifiedasdefectiveorgood.This,
however,isnotoftenapracticalprocedure:ifweknowthatanitemisdefective,we
shouldnotmixitwithotheritemsofproduction.Indeed,sometimeswecant,
becausethetestproceduremaydestroythesample.
Example5.5
Onthebasisofpastexperience,theprobabilitythatacertainelectricalcomponent
willbesatisfactoryis0.98.Thecomponentsaresampleditembyitemfromcontinu-
ousproduction.Inasampleoffivecomponents,whataretheprobabilitiesoffinding
(a)zero,(b)exactlyone,(c)exactlytwo,(d)twoormoredefectives?
Answer:Therequirementsofthebinomialdistributionaremet.
n=5,p=0.98,q=0.02,wherepistakentobetheprobabilitythatanitemwillbe
satisfactory,andsoqistheprobabilitythatanitemwillbedefective.
(a) Pr[0defectives] =(0.98)
5
=0.9039or0.904.
(b) Pr[1defective] =
5
C
1
(0.98)
4
(0.02)
1
=(5)(0.98)
4
(0.02)
1
=0.0922or0.092.
(c) Pr[2defectives]=
5
C
2
(0.98)
3
(0.02)
2
5 4
)
=
( )(
(0.98)
3
(0.02)
2
=0.0038.
2
(d) Pr[2ormoredefectives] =1Pr[0def.]Pr[1def.]
=10.90390.0922
=0.0038.
Example5.6
Acompanyisconsideringdrillingfouroilwells.Theprobabilityofsuccessforeach
wellis0.40,independentoftheresultsforanyotherwell.Thecostofeachwellis
$200,000.Eachwellthatissuccessfulwillbeworth$600,000.
a) Whatistheprobabilitythatoneormorewellswillbesuccessful?
b) Whatistheexpectednumberofsuccesses?
c) Whatistheexpectedgain?
d) Whatwillbethegainifonlyonewellissuccessful?
e) Consideringallpossibleresults,whatistheprobabilityofalossratherthanagain?
f) Whatisthestandarddeviationofthenumberofsuccesses?
103
Chapter5
Answer:Thebinomialdistributionapplies.Letusstartbycalculatingtheprobability
ofeachpossibleresult.Weusen=4,p=0.40,q=0.60.
No.ofSuccesses Probability
0 (1)(0.40)
0
(0.60)
4
=0.1296
1 (4)(0.40)
1
(0.60)
3
=0.3456
4 3
)
2
( )(
(0.40)
2
(0.60)
2
=0.3456
2
3 (4)(0.40)
3
(0.60)
1
=0.1536
4 (1)(0.40)
4
(0.60)
0
=0.0256
Total =1.000(check)
(Noticethat
n
C =
n
C
(nr)
)
r
Nowwecananswerthespecificquestions.
a) Pr[oneormoresuccessfulwells]=1Pr[nosuccessfulwells]
=10.1296
=0.8704or0.870.
b) Expectednumberofsuccesses=(1)(0.3456)+(2)(0.3456)+(3)(0.1536)+4)(0.0256)
=1.600.
c) Expectedgain=(1.6)($600,000)(4)($200,000)=$160,000.
d) Ifonlyonewellissuccessful,gain=(1)($600,000)(4)($200,000)
=$200,000(soaloss).
e) Therewillbealossif0or1wellissuccessful,sotheprobabilityofalossis
(0.1296+0.3456)=0.4752or0.475.
2
f) Usingequation4.3,
2
=E(X
2
) ,
x x
whereE(X
2
)=(0.3456)(1)
2
+(0.3456)(2)
2
+(0.1536)(3)
2
+(0.0256)(4)
2
=3.5200,
so
2
=3.5200(1.600)
2
=0.9600.
0.9600 Thestandarddeviationofthenumberofsuccessesis 0.980.
P
r

[
R
=
r
]
P
r

[
R
=
r
]

(d) ShapeoftheBinomialDistribution
(a)p=0.05
(b)p=0.5 (c)p=0.95
0.8 0.8 0.4
0.6 0.6 0.3
P
r

[
R
=
r
]
0.4
0.2 0.4
0.2
0.1 0.2
0.0 0.0 0.0
0 1 2 3 4 5
0 1 2 3 4 5 0 1 2 3 4 5
r
r r
Figure5.8:EffectofVaryingProbabilityofSuccessinaSingleTrial
whentheNumberofTrialsis5
104
ProbabilityDistributionsofDiscreteVariables
Figure5.8comparestheshapesofthedistributionsforpequalto0.05,0.50,and
0.95,allfornequalto5.Whenpisclosetozeroorone,thedistributionisvery
skewed,andthedistributionforpequaltop
1
isthemirrorimageofthedistribution
forpequalto(1p
1
).Whenpisequalto0.500,thedistributionissymmetrical.
(a)n=10 (b)n=20
0.0
0.0
0.1
0.1
0.2
0.2
0.3
P
r

[
R
=
r
]

0.00
0.05
0.10
0.15
0.20
P
r

[
R
=
r
]

0 1 2 3 4 5 6 7 8 9 10 11 0 2 4 6 8 10 12 14 16 18 20
r r
Figure5.9:EffectofVaryingNumberofTrialswhentheProbabilityofSuccessIs0.35
Figure5.9comparestheshapeofthedistributionsfornequalto10and20,both
forpequalto0.35.Atthisintermediatevalueofp,thedistributionisratherskewed
forsmallnumbersoftrials,butitbecomesmoresymmetricalandbell-shapedasn
increases.
(e) ExpectedMeanandStandardDeviation
Foranydiscreterandomvariable,equation5.5givesthattheexpectedmeanis
E R
R
( )

(
or
)
(numberofsuccesses)(probabilityofthatnumberofsuc-
cesses)forallpossibleresults.
Forthebinomialdistribution,fromequation5.9theprobabilityofrsuccesses
inntrialsisgivenby
Pr[R=r]=
n
C
r
(1p)
nr
p
r
Then
n n
(n r )
r

( )
Pr
[
R r
]

( )(
C
)(
1 p
)
p r r
n r
r0 r0
Ifthealgebraisfollowedthrough,theresultis
=np (5.10)
Thus,themeanvalueofthebinomialdistributionistheproductofthenumberof
trialsandtheprobabilityofsuccessinasingletrial.Thisseemstobeintuitively
correct.
Fromequation5.6,foranydiscreteprobabilitydistribution,
105
Chapter5
n
2 2
E r
)

(
r
)
Pr
[
R r
]
2
(
r0
Substitutingfortheprobabilityforthebinomialdistributionandfollowingthrough
thealgebragives

2
=np(1p)
or

2
=npq (5.11)
Thestandarddeviationisalwaysgivenbythesquarerootofthecorresponding
variance,sothestandarddeviationforthebinomialdistributionis
npq (5.12)
Example5.7
Calculatetheexpectednumberofsuccessesandthestandarddeviationofthenumber
ofsuccessesforExample5.6andcomparewiththeresultsofpartsbandfofthat
example.
Answer:Binomialdistributionwithn=4,p=0.4,q=0.6.
Thentheexpectednumberofsuccessesfromequation5.6isnp=(4)(0.400)=
1.60.ThisagreeswiththeresultsofpartbofExample5.6.
Thestandarddeviationofthenumberofsuccessesfromequation5.8is
(
4
)(
0.400
)(
0.600
)
0.960 0.980.Thisagreeswiththeresultsofpartfof
Example5.6.
Example5.8
Twelvedoughnutssampledfromamanufacturingprocessareweighedeachday.The
probabilitythatasamplewillhavenodoughnutsweighinglessthanthedesign
weightis6.872%.
a) Whatistheprobabilitythatasampleoftwelvedoughnutscontainsexactly
threedoughnutsweighinglessthanthedesignweight?
b) Whatistheprobabilitythatthesamplecontainsmorethanthreedoughnuts
weighinglessthanthedesignweight?
c) Inasampleoftwelvedoughnuts,whatistheexpectednumberofdoughnuts
weighinglessthanthedesignweight?
Answer: In12doughnutsPr[0doughnuts<designweight]=0.06872.
AssumingthatPr[asingledoughnut<designweight]isthesameforalldoughnuts
andthatweightsofdoughnutsvaryrandomly,thebinomialdistributionwillapply.
Letthisprobabilitythatasingledoughnutwillweighlessthanthedesignweightbep.
106
ProbabilityDistributionsofDiscreteVariables
Then(1p)
12
=0.06872.
1p=0.8000
ThenPr[adoughnut<designweight]=10.8000=0.2000.Thenp=0.2,and
n=12.
a) Pr[exactly3doughnutsin12arebelowdesignweight]=
12
C
3
(1p)
9
p
3
( )(
11
)( )
12 10
( )( )
3 2

(
0.8000
9
) (
0.2000
3
)
=0.2362or23.6%.
b) Numberlessthandesignweight Probability
0 (0.8)
12
= 0.0687
1
12
C
1
(0.8)
11
(0.2)
1
= 0.2062
2
12
C
2
(0.8)
10
(0.2)
2
= 0.2835
3
12
C
3
(0.8)
9
(0.2)
3
= 0.2362
Sum 0.7946
Therefore,Pr[morethanthreedoughnutsarebelowdesignweight]=
=1(Pr[R=0]+Pr[R=1]+Pr[R=2]+Pr[R=3])
=10.7946
=0.2054or0.205=20.5%.
c) Expectednumberofdoughnutsbelowthedesignweightis(n)(p)=
(12)(0.200)=2.4.
(f) UseofComputers
Ifacomputerwithsuitablesoftwareisavailable,calculationsforthebinomial
distributioncanbedoneeasily.IfExcelisavailable,thefunctionBINOMDISTwill
befoundtobeveryuseful.Thereisnotusuallyagreatadvantagetouseofacom-
puterifonlyindividualtermsofthedistributionarerequired,asequation5.9is
convenientforthatpurpose.Butifcumulativeexpressionsarerequired,suchasthe
probabilityofsixorfeweroccurrences,thecomputercangreatlyreducetheamount
oflaborrequired.
TheparametersrequiredbytheExcelfunctionBINOMDISTarer,n,p,andan
indicationofwhetheracumulativeexpressionoranindividualtermisrequired.Asin
theearlierpartofthissection,risthenumberofsuccessesinatotalofntrials,and
pistheprobabilityofsuccessineachtrial.Thefourthparametershouldbeentered
asTRUEifthecumulativedistributionfunctionisrequired,givingtheprobabilityof
atmostrsuccesses;thefourthparametershouldbeenteredasFALSEifthe
requiredquantityistheindividualprobability,theprobabilityofexactlyrsuc-
cesses.Forexample,ifwewanttheprobabilityofsixorfewersuccessesinatotal
of12trialswhentheprobabilityofsuccessinasingletrialis0.245,theparameters
forExcelinthefunctionBINOMDISTare6,12,0.245,TRUE.Thefunctionreturns
thecorrespondingprobability,whichis0.9873.
107
x
Chapter5
(g) RelationofProportiontotheBinomialDistribution
Assumingthattheonlyalternativetoarejecteditemisanaccepteditem,thesample
sizeisfixedandindependentoftheresults,andtheprobabilityofrejectioniscon-
stantandindependentofotherfactorssuchaspreviousresults,wehaveseenthatthe
numberofrejectsinasampleofsizenisgovernedbythebinomialdistribution.If
theprobabilitythatanitemwillberejectedisp,theprobabilitythattherewillbe
exactlyxrejectsinthesampleis
n
C
x
p
x
(1p)
(nx)
.Themeannumberofrejectswill
benp,andthevarianceofthenumberofrejectswillbenp(1p).
Wecanlookatthesamplefromasomewhatdifferentviewpoint,focusingonthe
x
proportionofrejectsratherthantheirnumber.Theratio isanunbiasedestimate
n
ofp,theproportionofrejectsinthepopulation,andweusethesymbol p forthis
estimate.Theprobabilitythattheestimateofproportionfromthesamplewillbe p
= isthesameastheprobabilitythattherewillbeexactlyxrejecteditemsina
n
sampleofsizen,andthatis
n
C
x
p
x
(1p)
(nx)
.Ifweassociatethenumber1witheach
rejecteditemandthenumber0witheachitemwhichisnotrejected,thenx,the
numberofrejecteditems,canbeinterpretedasthesumofthezerosandonesfora
x
sampleofsizen.Then p = isasamplemean.Sincenisaconstant,inthewhole
n
populationthemeanproportionrejectedis
X
( )
E
, (
p
p
E P

j \ np
(5.13)
n
( ,
n
Thisseemsreasonable.
Similarly,usingtherelationsforvarianceofavariablemultipliedordividedbya
constantthatwillbediscussedinsection8.2,wefindthatthevarianceofthepropor-
tionrejectedis
/

2 2
X n
p

2
2
X
n


(
2
1 np
n

) p

(
1 p
n

) p
(5.14)
Example5.9
Thetrueproportionofdefectiveitemsinacontinuousstreamis0.0100.Arandom
sampleofsize400istaken.
(a) Calculatetheprobabilitiesthatthesamplewillgivesampleestimatesofthe
0 1 2 3 4 5
proportiondefectiveof
, , , , , and ,
respectively.
400 400 400 400 400 100
(b) Calculatethestandarddeviationoftheproportiondefective.
108
ProbabilityDistributionsofDiscreteVariables
Answer:
(a) p=0.01,n=400
Pr[p

0]=Pr[0defectiveitems]=
400
C
0
(0.01)
0
(0.99)
400
=(1)(1)(0.01795) =0.0180
Pr[p


1
=0.00250]=
400
C
1
(0.01)
1
(0.99)
399
=
400
=(400)(0.01)(0.01813) =0.0725
2
Pr[p =0.00500]=
400
C
2
(0.01)
2
(0.99)
398
=
400
(
400
)(
399
)
= (0.01)
2
(0.99)
398
=0.1462
2
3
Pr[p =0.00750]=
400
C
3
(0.01)
3
(0.99)
397
=
400
(
400
)(
399
)(
398
)
=
3 2
)
(0.01)
3
(0.99)
397
=0.1959
( )(
Pr[p


4
=0.01000]=
400
C
4
(0.01)
4
(0.99)
396
=
400
(
400
)(
399
)(
398
)(
397
)
=
4 3
)(
2
)
(0.01)
4
(0.99)
396
=0.1964
( )(
5
Pr[p =0.01250]=
400
C
5
(0.01)
5
(0.99)
395
=
100
(
400
)(
399
)(
398
)(
397
)(
396
)
=
5 4
)(
3
)(
2
)
(0.01)
5
(0.99)
395
=0.1571
( )(
Thus,theprobabilitythatthe
samplewillgiveanestimateofthe
0.25
proportiondefectivethatagreesexactly
withthetrueproportion(0.01)isless
0.2
than20%,andtheprobabilityofgetting
0.05
Figure5.10:ProbabilitiesofEstimates
WhenTrueProportionIs0.0100
0
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035
Estimatedproportiondefective
109
anyoneofthethreeestimates,0.0075
P
r
o
b
a
b
i
l
i
t
y
0.15
or0.01or0.0125,islessthan55%.
Calculationsofprobabilitiesof
0.1
sampleestimatescanbecontinued.
TheresultsareshowninFigure5.10.
Chapter5
Weseethattherecanbeawiderangeofestimatesfromasample,evenwhenthe
samplesizeisaslargeas400.
(b) Thestandarddeviationoftheproportiondefectiveisgiven,accordingto
equation5.14, by
( )( ) ( )( )
1
400
p p
n


0.01 0.99
0.004975.
Thestandarddeviationisnearlyhalfofthetrueproportiondefective.Again,this
indicatesthatanestimatefromasampleofthissizewillnotbeveryreliable.
(h) NestedBinomialDistributions
Thesearesituationsinwhichonebinomialdistributionisenclosedwithinanother
binomialdistribution.
Example5.10
Aboilercontainingeightweldsismanufacturedinasmallshop.Whentheboileris
completed,eachweldischeckedbyaninspector.Ifmorethanoneweldisdefective
onasingleboiler,thepersonwhomadethatboilerisreportedtotheforeman.
a) If9.0%ofallweldsmadebyJoeSmitharedefective,whatpercentageofall
boilersmadebyhimwillhavemorethanonedefectiveweld?
b) OveralongperiodoftimehowmanytimeswillJoeSmithbereportedtothe
foremanforeach15boilershemakes?
c) IfJoemakes15boilersinashift,whatistheprobabilitythathewillbe
reportedformorethantwoofthese15boilers?
Answer:a)Theprobabilitiesofvariousnumbersofdefectiveweldsonasingle
boileraregivenbythebinomialdistributionwithn=8,p=0.090,q=10.090=
0.910.
Theprobabilityofexactlyrdefectiveweldsonaboilerisgivenby
Pr[R=r]=
8
C
r
(0.910)
(8r)
(0.090)
r
.
Morethanonedefectiveweldcorrespondstoallresultsexceptzerodefectivewelds
andonedefectiveweld.
Pr[R=0]=(1)(0.910)
8
(0.090)
0
=0.4703
Pr[R=1]=(8)(0.910)
7
(0.090)
1
=0.3721
(Fourfiguresarebeingcarriedinintermediateresults,andfinalanswerswillbe
showntothreefigures.)
Pr[morethanonedefectiveweldinasingleboiler]=10.47030.3721=0.1577.
Then15.8%ofboilersmadebyJoewillhavemorethanonedefectiveweld.
b) NowtheproblemshiftstotheouterBinomialproblemforthenumberoftimes
Joewillbereportedtotheforemanforeach15boilershemakes.Thenn=15,
110
ProbabilityDistributionsofDiscreteVariables
p=Pr[beingreportedfor1boiler]=0.1577,andq=1p=0.8423.(Noticethatthe
valueofp,theprobabilityoftoomanydefectsinasingleboilerintheouterbinomial
distribution,isgivenbytheresultofcalculationsfortheinnerbinomialdistribution.)
UndertheseconditionstheexpectednumberoftimesJoewillbereportedtothe
foremanis =np=(15)(0.1577)=2.37.
c) Asinpartb,thiscorrespondstoabinomialproblemwithn=15,p=0.1577,
q=0.8423.
Ingeneral, Pr[R=r]=
15
C
r
(0.8423)
(15r)
(0.1577)
r
Thenspecifically,Pr[R=0]=(1)(0.8423)
15
(0.1577)
0
=0.0762
Pr[R=1]=(15)(0.8423)
14
(0.1577)
1
=0.2141
15
Pr[R=2]=
( )(
14
)
(0.8423)
13
(0.1577)
2
=0.2805
2
TheprobabilitythatJoewillbereportedtotheforemanformorethantwoofthe15
boilershemakesinashiftis10.07620.21410.2805=0.429or42.9%.
(i) Extension:MultinomialDistribution
Themultinomialdistributionissimilartothebinomialdistributionexceptthatthere
aremorethantwopossibleresultsfromeachtrial.Thedetailsofthemultinomial
distributionaregiveninvariousreferences,includingthebookbyWalpoleand
Myers(seetheListofSelectedReferencesinsection15.2).Forexample,mechanical
componentscomingoffaproductionlinemightbeclassifiedonthebasisofa
particulardimensionasundersize,acceptable,oroversize(threepossibleoutcomes).
Iftheoutcomeofanyonetrialisdeterminedcompletelybychance,alltrialsare
independentandhavethesamesetofprobabilitiesforthevariouspossibleoutcomes,
andthenumberoftrialsisfixed,themultinomialdistributionwouldapply.
Noticethatifweconsiderseparatelyjustoneresultandlumptogetherallother
resultsfromeachtrial,themultinomialdistributionbecomesabinomialdistribution.
Thus,intheexampleofmechanicalcomponentsjustcited,ifundersizedandover-
sizedarelumpedtogetherasunacceptable,thedistributionbecomesbinomial.
Problems
1. Undernormaloperatingconditions1.5%ofthetransistorsproducedinafactory
aredefective.Aninspectortakesarandomsampleoffortytransistorsandfinds
thattwoaredefective.
a) Whatistheprobabilitythatexactlytwotransistorswillbedefectivefroma
randomsampleoffortyundernormaloperatingconditions?
b) Whatistheprobabilitythatmorethantwotransistorswillbedefectivefrom
arandomsampleoffortyifconditionsarenormal?
2. Acontrolsystemissetupsothatwhenproductionconditionsarenormal,only
6%ofitemsfromtheproductionlinegivesreadingsbeyondaparticularlimit.If
111
Chapter5
morethantwoofsixsuccessiveitemsarebeyondthelimit,productionisstopped
andallmachinesettingsareexamined.Whatistheprobabilitythatproduction
willbestoppedinthiswaywhenproductionconditionsarenormal?
3. Acompanysupplyingtransistorsclaimsthattheyproducenomorethan2%
defectives.Apurchaserpicks50atrandomfromanorderof5000andteststhe
50.Ifhefindsmorethan1defective,herejectstheorder.Ifthesuppliersclaim
istrueand2%ofthetransistorsaredefective,whatistheprobabilitythatthe
orderwillberejected?
4. Anexperimentwasconductedwhereinthreeballsweredrawnatrandomfroma
barrelcontainingtwoblueballs,threeredballs,andfivegreenballs.Wewantto
findthemeanandvarianceoftheprobabilitydistributionofthenumberofgreen
ballschosen.Explainwhythisprobleminvolvingthreecolourscannotbe
handledusingabinomialdistribution.Supposeweconsiderboththeblueballs
andtheredballstogetherasnot-green.Nowfindtherequiredmeanandvariance.
5. Abinomialdistributionisknowntohavethefollowingcumulativeprobability
distribution:Pr[X0]=1/729,Pr[Xl]=13/729,Pr[X2]=73/729,Pr[X3]
=233/729,Pr[X4]=473/729,Pr[X5]=665/729,Pr[X6]=1.0000.
a) Whatisn,thenumberoftrials?
b) Findpandq,theprobabilitiesofsuccessandfailure.
c) Verifythatwiththesevaluesofn,pandqthecumulativeprobabilitiesareas
stated.
d) Whatistheprobabilitythatthenumberofsuccesses,r,lieswithinone
standarddeviationofthemean?
e) Whatisthecoefficientofvariation?
6. Tenjudgesareaskedtopickthebesttastingorangejuicefromtwosamples
labeledAandB.If,infact,AandBarethesameorangejuice,whatisthe
probabilitythateightormoreofthejudgeswilldeclarethesamesampletobe
thebest?Assumethatnojudgesaysthattheyareequal.
7. Asampleofelevenelectricbulbsisdrawneverydayfromthosemanufacturedat
aplant.Theelevenbulbsaretestedbeforeshipmenttothecustomer.Ananalysis
ofthetestdatacollectedoveranumberofyearsrevealsthattheprobabilityof
findingnodefectivebulbinasampleofelevenbulbsis0.5688.Probabilitiesof
defectivebulbsarerandomandindependentofpreviousresults.
a) Whatistheprobabilityoffindingexactlythreedefectivebulbsinasample?
b) Whatistheprobabilityoffindingthreeormoredefectivebulbsinasample?
8. Therearetenmultiplechoicequestionsonanexamination.Iftherearefive
choicesperquestion,whatistheprobabilitythatastudentwillansweratleast
fivequestionscorrectlyjustbypickingoneansweratrandomfromthepossibili-
tiesforeachquestion? Stateanyassumptions.
112
ProbabilityDistributionsofDiscreteVariables
9. Amongagroupoffivepeopleselectedatrandomfromaparticularpopulationit
isknownthattheprobabilitythatnoonewillbe30oroveris0.01024.
a) Whatistheprobabilitythatexactlyonepersoninthegroupisunder30?
b) Calculatethemeanandvarianceoftheprobabilitydistributionofthenumber
ofpersonsover30andcomparetotheformulavaluesforthistypeofdistri-
bution.
c) Giventhreesuchgroups,whatistheprobabilitythattwooutofthreegroups
havenomorethantwopersons30orover?
d) Stateanyassumptions.
10. Afraction0.014oftheoutputfromaproductionlineisdefective.Asampleof95
itemsistaken.Assumedefectiveitemsoccurrandomlyandindependently.
a) Whatisthestandarddeviationoftheproportiondefectiveinasampleofthis
size?
b) Whatistheprobabilitythattheproportionofdefectiveitemsinthesample
willbewithintwostandarddeviationsofthefractiondefectiveinthewhole
population?
11. Surveyshaveindicatedthatinagivenregion75%ofcaroccupantsuseseatbelts
regardlessofwheretheysitinthecar.Useofseatbeltsintheregionisrandom
andshowsnoregularpattern.Thesurveyshaveshownalsothatin40%ofcars
thedriveristhesoleoccupant,in25%therearetwooccupants,in20%three
occupants,in10%fouroccupants,andin5%fiveoccupants.
a) Whatistheprobabilitythatacarpickedatrandomwillhaveexactlythree
personsnotusingtheirseatbelts? Remembertoconsiderallpossible
numberofoccupants.
b) Whatistheprobabilitythatofthreecarschosenatrandom,exactlytwohave
alloccupantswearingbelts?
12. Asmallhotelhasroomsononlyfourfloors,withfoursmokedetectorsoneach
floor.Becauseofimpropermaintenance,theprobabilitythatanyonedetectoris
functioningisonly0.55.Theprobabilitiesthatsmokedetectorsarefunctioning
arerandomlyandindependentlydistributed.
a) Whatistheprobabilitythatexactlyonesmokedetectorisworkingonthetop
floor?
b) Whatistheprobabilitythatthereisexactlyonedetectorworkingoneachof
twofloorsandtherearetwodetectorsworkingoneachoftheothertwo
floors?
c) Whatistheprobabilitythattherewillbenofunctioningsmokedetectorson
oneparticularfloor? Whatistheprobabilitythattherewillbeatleastone
functioningsmokedetectoronthatfloor?
d) Whatistheprobabilitythatonatleastoneofthefourfloorstherewillbeno
functioningsmokedetectors?
e) Whatistheprobabilitythattherewillbeatleast15functioningsmoke
detectorsinthehotelatanyonetime?
113
Chapter5
13. TheFIXITcompanyistobringinsevennewproductsinasaleslineforwhich
theprobabilitythateachnewproductwillbesuccessfulis0.15.Probabilitiesof
successforthevariousproductsarerandomandindependent.Thecostofbring-
inginanewproductis$75,000.Ifeachproductissuccessful,theexpected
revenuefromsalesforitwillbe$800,000.
a) Whatistheexpectednetprofitfromthesevenproducts?
b) Whatistheprobabilitythatthetotalnetprofitwillbeatleast$1,000,000?
c) Whatistheprobabilitythatnoneoftheproductswillbesuccessful?
d) Ifthenumberofsuccessfulproductsisthreeormore,thesalesengineerwill
bepromoted.Whatistheprobabilitythatthiswillhappen?
14. TheprobabilitythatacertaintypeofICchipwillfailafterinstallationis0.06.A
memoryboardforacomputercontainstwelvesuchchips.Theoperationwillbe
satisfactoryiftenormoreofthechipsontheboarddonotfail.
a) Whatistheprobabilitythatamemoryboardoperatessatisfactorily?
b) Iftherearefivesuchmemoryboardsinagivencomputer,whatistheprob-
abilitythatatleastfourofthemoperatesatisfactorily?
c) Stateanyassumptions.
15. 5%ofalargelotofelectricalcomponentsaredefective.Sixbatchesoffour
componentseacharedrawnfromthislotatrandom.
a) Whatistheprobabilitythatanyonebatchcontainsfewerthantwo
defectives?
b) Whatistheprobabilitythatatleastfiveofthesixbatchescontainfewerthan
twodefectiveseach?
c) Stateanyassumptions.
16. 20%ofalargelotofmechanicalcomponentsarefoundtobefaulty.Fivebatches
offivecomponentseacharedrawnfromthislot.Whatistheprobabilitythatatleast
fourofthesebatchescontainfewerthantwodefectives?Stateanyassumptions.
17. Aconsultantcollecteddataonboltfailuresinananchorassemblyusedintower
construction.Alargenumberofanchorassemblies,eachcontainingthesame
numberofbolts,wereexaminedandeachboltwasgradedeitherasuccessora
failure.Theprobabilitydistributionofthenumberofsatisfactoryboltsinan
assemblyhadameanvalueof3.5andavarianceof1.05.Satisfactoryand
unsatisfactoryboltsoccurrandomlyandindependently.Calculatetheprobabili-
tiesassociatedwiththepossiblenumbersofsatisfactoryboltsinanassembly.If
anassemblyisconsideredtobeadequateiftherearethreeorfewerboltfailures,
whatistheprobabilitythatanassemblychosenatrandomwillbeinadequate?
18. Eachautomobileleavingacertainmotorcompanysplantisequippedwithfive
tiresofaparticularbrand.Tiresareassignedtocarsrandomlyandindependently.
Thetiresoneachof100suchautomobileswereexaminedformajordefectswith
thefollowingresults.
114
ProbabilityDistributionsofDiscreteVariables
No.ofTireswithDefects 0 1 2 3 4 5
No.ofAutomobiles(occurrences) 75 18 4 2 0 1
a) Estimatetheprobabilitythatarandomlyselectedtirefromthismanufacturer
willcontainamajordefect.
b) Supposeyoubuyanautomobileofthismake.Fromtheresultsof(a)calcu-
latetheprobabilitythatitwillhaveatleastonetirewithamajordefect.
c) Whatistheprobabilitythat,inafleetpurchaseofsixofthesecars,atleast
halfthecarshavenodefectivetires?
d) Whatistheexpectednumberofdefectivetiresinthefleetpurchaseofsix
cars?
e) Ifthereplacementcostofadefectivetireis$120,whatisthetotalexpected
replacementcostforthisfleetpurchase?
19. Thirteenelectroniccomponentsfromamanufacturingprocessaretestedevery
day.Componentsfortestingarechosenrandomlyandindependently.Itwas
foundoveralongperiodoftimethat51.33%ofsuchsampleshavenodefectives.
a) Whatistheprobabilityofasamplecontainingexactlytwodefectivecompo-
nents?
b) Whatistheprobabilityoffindingthreeormoredefectivecomponentsina
sample?
c) Theassemblylinehasaweeklybonussystemasfollows:Eachmanreceives
abonusof$500ifnoneofthefivedailysamplesthatweekcontaineda
defective.Thebonusis$250ifonlyonesampleoutofthefivecontaineda
defective,andnoneoftheotherscontainedany.Whatistheexpectedbonus
permanperweek?
20. Trucktiresaretestedoverroughterrain.25%ofthetrucksfailtocompletethe
testrunwithoutablowout.Ofthenextfifteentrucksthroughthetest,findthe
probabilitythat:
a) exactlythreehaveoneormoreblowoutseach;
b) fewerthanfourhaveblowouts;
c) morethantwohaveblowouts.
d) Whatwouldbetheexpectednumberoftruckswithblowoutsofthenext
fifteentested?
e) Whatwouldbethestandarddeviationofthenumberoftruckswithblowouts
ofthenextfifteentested?
f) Iffifteentrucksaretestedoneachofthreedays,whatistheprobabilitythat
morethantwotruckshaveblowoutsonexactlytwoofthethreedays?
g) Stateanyassumptions.
21. Anelevatorarrivesemptyatthemainfloorandpicksupfivepassengers.Itcan
stopatanyofsevenfloorsonitswayup.Whatistheprobabilitythatnotwo
passengersgetoffatthesamefloor?Assumethatthepassengersactindepen-
dentlyandthatapassengerisequallylikelytogetoffatanyoneofthefloors.
115
Chapter5
22. Inaparticularcomputerchip8bitsformabyte,andthechipcontains112bytes.
Theprobabilityofabadbit,onewhichcontainsadefect,is1.2E-04.
a) Whatistheprobabilityofabadbyte,i.e.abytewhichcontainsadefect?
b) Thechipisdesignedsothatitwillfunctionsatisfactorilyifatleast108ofits
112bytesaregood.Whatistheprobabilitythatthechipwillnotfunction
satisfactorily?
23. Inaparticularcomputerchip8bitsformabyte,andthechipcontains112bytes.
Theprobabilityofabadbit,onewhichcontainsadefect,is2.7E-04.
a) Whatistheprobabilityofabadbyte,i.e.abytewhichcontainsadefect?
b) Thechipisdesignedsothatitwillfunctionsatisfactorilyifatleast108ofits
112bytesaregood.Whatistheprobabilitythatthechipwillnotfunction
satisfactorily?
ComputerProblems
C24.Undernormaloperatingconditionstheprobabilitythatamechanicalcomponent
willbedefectivewhenitcomesofftheproductionlineis0.035.Asampleof40
componentsistaken.Inonecase,fourofthecomponentsarefoundtobedefective.
Iftheoperatingconditionsarestillcorrect,whatistheprobabilitythatthatmanyor
morecomponentswillbedefectiveinasampleofsize40?
C25.Acomputerchipisorganizedintobits,bytes,andcells.Eachbytecontains8
bits,andeachcellcontains112bytes.Theprobabilitythatanyonebitwillbebad(or
corrupted)is1.E11(i.e.10
11
).
a) Whatistheprobabilitythatanyonebytewillcontainabadbitandsowillbe
badandgiveanerrorinacalculation?Notethatyoucanneglecttheprob-
abilitythatabytewillcontainmorethanonebadbit.
b) Whatistheprobabilitytherewillbenobadbytesinacell?
c) Whatistheprobabilitytherewillbeexactlyonebadbyteinacell?
d) Whatistheprobabilitytherewillbeexactlytwobadbytes(andsoalso
exactly110goodbytes)inacell?
e) Whatistheprobabilitytherewillbeexactlythreebadbytes(andsoalso
exactly109goodbytes)inacell?
f) Whatistheprobabilitytherewillbetwoormorebadbytesinacell?Calcu-
latethisinthreeways: i)Usetheresultsofsomeofparts(a)(b)(c).
ii) Usetheresultsofparts(d)and(e).
iii) Useacumulativeprobability.
Dotheygivethesameanswer?Ifnot,explainwhynot.
C26.Inordertoestimatethefractiondefectiveamongelectricalcomponentsasthey
areproducedundernormalconditions,asamplecontaining1000componentsis
takenandeachcomponentisclassifiedasdefectiveornon-defective.Ninecompo-
nentsarefoundtobedefectiveinthissample.
116
ProbabilityDistributionsofDiscreteVariables
a) Whatisthebestestimatefromthissampleoftheproportiondefectiveinthe
population?
b) Assumingthatthatestimateisexactlycorrect,whatisthestandarddeviation
oftheproportiondefective?Thenwhatarethelimitsoftheintervalfromthe
bestestimateminustwostandarddeviationstothebestestimateplustwo
standarddeviations?Whatistheprobabilityofaresultoutsidethisinterval?
c) Assumingtheestimateinpart(a)isexactlycorrect,whatistheprobability
thatmorethanthreedefectivecomponentswillbefoundinasampleof100
components?
C27.Asamplecontaining400itemsistakenfromtheoutputofaproductionline.A
fraction0.016oftheitemsproducedbythelinearedefective.Assumedefective
itemsoccurrandomlyandindependently.
a) Whatistheprobabilitythattheproportiondefectiveinthesamplewillbeno
morethan0.0250?
b) Whatisthestandarddeviationoftheproportiondefectiveinasampleofthis
size?
c) Whatsampleproportiondefectivewouldbetwostandarddeviationslessthan
theproportiondefectiveinthewholepopulation?
5.4 PoissonDistribution
Thisisadiscretedistributionthatisusedintwosituations.Itisused,whencertain
conditionsaremet,asaprobabilitydistributioninitsownright,anditisalsoused
asaconvenientapproximationtothebinomialdistributioninsomecircumstances.
ThedistributionisnamedforS.D.Poisson,aFrenchmathematicianofthenineteenth
century.
ThePoissondistributionappliesinitsownrightwherethepossiblenumberof
discreteoccurrencesismuchlargerthantheaveragenumberofoccurrencesina
givenintervaloftimeorspace.Thenumberofpossibleoccurrencesisoftennot
knownexactly.Theoutcomesmustoccurrandomly,thatis,completelybychance,
andtheprobabilityofoccurrencemustnotbeaffectedbywhetherornottheout-
comesoccurredpreviously,sotheoccurrencesareindependent.Inmanycases,
althoughwecancounttheoccurrences,suchasofathunderstorm,wecannotcount
thecorrespondingnonoccurrences.(Wecantcountnon-storms!)
ExamplesofoccurrencestowhichthePoissondistributionoftenappliesinclude
countsfromaGeigercounter,collisionsofcarsataspecificintersectionunder
specificconditions,flawsinacasting,andtelephonecallstoaparticulartelephoneor
officeunderparticularconditions.ForthePoissondistributiontoapplytothese
outcomes,theymustoccurrandomly.
117
Chapter5
(a) CalculationofPoissonProbabilities
Theprobabilityofexactlyroccurrencesinafixedintervaloftimeorspaceunder
particularconditionsisgivenby
r
t e
t
Pr[R=r]=
( )
(5.13)
r!
wheret(inunitsoftime,length,areaorvolume)isanintervaloftimeorspacein
whichtheeventsoccur,andisthemeanrateofoccurrenceperunittimeorspace
(sothattheproducttisdimensionless).Asusual,eisthebaseofnaturalloga-
t
rithms,approximately2.71828.Thentheprobabilityofnooccurrences,r=0,ise ,
theprobabilityofexactlyoneoccurrence,r=1,iste
t
,theprobabilityofexactly
t
t e


twooccurrences,r=2,is
( )
2
, andsoon.Onceoneoftheseprobabilitiesis
2!
calculateditisoftenmoreconvenienttocalculateothermembersofthesequence
fromthefollowingrecurrenceformula:
j t \
Pr[R=r+1]=
(
,
r+1
,
(
Pr[R=r] (5.14)
ThebasicrelationforthePoissondistribution,equation5.13,canbederivedfrom
adifferentialequationorasalimitingexpressionfromthebinomialdistribution.
CumulativePoissonprobabilitiescanbefoundinmanyreferencebooks.Once
again,Poissonprobabilitiesforsingleeventscanbefoundbysubtractionusing
equation5.2:theprobabilityofx
i
isjustthedifferencebetweenthecumulative
probabilitythatX x
i
andthecumulativeprobabilitythatX x
i-1
.
Example5.11
FromtablesforthecumulativePoissondistributiontothreedecimalpoints,for
t
)( )
t
k
t=10.5, Pr[X 12]=
12
(
e
isequalto0.742,

k0
k!
t
)( )
11
(
e t
k
Pr[X 11]= isequalto0.639,and

k0
k!
t
)( )
10
(
e t
k
Pr[X 10]= isequalto0.521.

k0
k!
Thenfort=10.5,wehavePr[R=12]=0.7420.639=0.103,comparedwith
0.1032fromequation5.13,and
t
)( )
t
)( )
12
(
e t
k
10
(
e t
k
Pr[R=11or12]= = 0.7420.521=0.221,

k0

k0
k! k!
118
ProbabilityDistributionsofDiscreteVariables
12
10.5
11
10.5
(
e
)(
10.5
) (
e
)(
10.5
)
comparedwithPr[R=11]+Pr[R=12] = +
11! 12!
=0.1180+0.1032=0.2212.
Thesefigurescheck(tothreedecimalpoints).
TheshapeoftheprobabilityfunctionforthePoissondistributionisusually
skewed,particularlyforsmallvaluesof(t).Figure5.11showstheprobability
functionfort=0.5.Itsmodeisforzerooccurrences,andprobabilitiesdecrease
veryrapidlyas
0.700
Figure5.11:
0.600
ProbabilityFunctionfor
0.500
PoissonDistribution, t=0.5
P
r
o
b
a
b
i
l
i
t
y
0.400
0.300
0.200
0.100
0.000
0 1 2 3 4 5 6
t
thenumberofoccurrencesbecomeslarger.Forcomparison,Figure5.12showsthe
probabilityfunctionfort=5.0.Itisconsiderablymoresymmetrical.
0.2
0.15
P
r
o
b
a
b
i
l
i
t
y
0.1
0.05
0
Figure5.12:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Poisson Probability Function
for t=5.0
t
119
7
Chapter5
Example5.12
Thenumberofmeteorsfoundbyaradarsysteminany30-secondintervalunder
specifiedconditionsaverages1.81.Assumethemeteorsappearrandomlyandinde-
pendently.
a) Whatistheprobabilitythatnometeorsarefoundinaone-minuteinterval?
b) Whatistheprobabilityofobservingatleastfivebutnotmorethaneight
meteorsintwominutesofobservation?
Answer: a)=(1.81)/(0.50minute)=3.62/minute.
Foraone-minuteinterval,=t =3.62.
Pr[noneinoneminute]=e
t
=e
3.62
=0.0268.
b) Fortwominutes,=t =(3.62)(2)=7.24.
r
t e
t
Pr[R=r]=
( )
.
r!
5
7.24
(
7.24
)
e
ThenPr[R=5]= =0.1189.
5!
j t \
Fromequation5.14,Pr[R=r+1]=
(
,
r +1
,
(
Pr[R=r]
j 7.24\
soPr[R=6]=
,
,
(
(0.1189)=0.1435,
(
6
j 7.24\
Pr[R=7]=
,
,
(
(0.1435)=0.1484,
(
7
j 7.24\
andPr[R=8]=
,
,
(
(0.1484)=0.1343.
(
8
ThenPr[atleastfivebutnotmorethaneightmeteorsintwominutes]
=Pr[5or6or7or8meteorsintwominutes]
= 0.1189+0.1435+0.1484+0.1343
=0.545
Example5.13
Theaveragenumberofcollisionsoccurringinaweekduringthesummermonthsata
particularintersectionis2.00.AssumethattherequirementsofthePoissondistribu-
tionaresatisfied.
a) Whatistheprobabilityofnocollisionsinanyparticularweek?
b) Whatistheprobabilitythattherewillbeexactlyonecollisioninaweek?
c) Whatistheprobabilityofexactlytwocollisionsinaweek?
d) Whatistheprobability offindingnotmorethantwocollisionsinaweek?
120
ProbabilityDistributionsofDiscreteVariables
e) Whatistheprobabilityoffindingmorethantwocollisionsinaweek?
f) Whatistheprobabilityofexactlytwocollisionsinaparticulartwo-week
interval?
Answer: =2.00/week,t =1week,sot =2.00.
a) Pr[R =0]=e
t
=e
2.00
=0.135
b) Pr[exactlyonecollisioninaweek]
2.00
=Pr[R =1]=(t)e
t
=2.00e
=0.271
c) Pr[exactlytwocollisionsinaweek]
2 2
2.00
t e
t
(
2.00
)
e
=Pr[R =2]=
( )

2! 2!
=0.271
d) Pr[notmorethantwocollisionsinaweek]
=Pr[R 2]
=Pr[R =0]+Pr[R =1]+Pr[R =2]
=0.135+0.271+0.271
=0.677
e) Pr[morethantwocollisionsinaweek]
=Pr[R >2]
=1Pr[R 2]
=10.677
=0.323
f) Nowwestillhave=2.00/week,butt =2weeks,sot =4.00
ThenPr[exactlytwocollisionsinatwo-weekinterval]
( )
2 2
4.00
t e
t
(
4.00
)
e
=
2! 2!
=0.147
Example5.14
Thedemandforaparticulartypeofpumpatanisolatedmineisrandomandindepen-
dentofpreviousoccurrences,buttheaveragedemandinaweek(7days)isfor2.8
pumps.FurthersuppliesareorderedeachTuesdaymorningandarriveontheweekly
planeonFridaymorning.LastTuesdaymorningonlyonepumpwasinstock,sothe
storesmanorderedsixmoretocomeinFridaymorning.
a) FindtheprobabilitythatonepumpwillstillbeinstockonFridaymorning
whennewstockarrives.
121
Chapter5
b) Findtheprobabilitythatstockwillbeexhaustedandtherewillbeunsatisfied
demandforatleastonepumpbyFridaymorning.
c) FindtheprobabilitythatonepumpwillstillbeinstockthisFridaymorning
andatleastfivewillbeinstocknextTuesdaymorning.
Answer:FirstwehavetorecognizethatthePoissondistributionwillapply.
2.8
=
7 days
=0.4/day.
a) FromTuesdaymorningtoFridaymorningisthreedays.
Thent=(0.4/day)(3days)=1.2.
Pr[nodemandinthreedays]=e
t
=e
1.2
=0.3012.
ThenPr[onepumpwillstillbeinstockFridaymorningwhennewstockarrives]
=0.301.
b) Pr[demandfortwoormorepumpsinthreedays]=
=1Pr[demandforzerooronepumpinthreedays]
=1Pr[demandfornopumpsinthreedays]Pr[demandforonepumpin
threedays]
(
0.3012
)(
1.2
)
=10.3012 (usingequation5.14)
1
=0.3374.
ThenPr[unsatisfieddemandforatleastonepumpbyFridaymorning]=0.337.
c) Frompart(a),Pr[onepumpwillstillbeinstockthisFridaymorning]=
0.3012.
FromFridaymorningtoTuesdaymorningisfourdays,so(t)=(0.4/day)(4
days)=1.6.
Afterthenewstockarriveswewillhave1+6=7pumpsinstockFridaymorning.
IfwehaveatleastfiveinstockTuesdaymorning,thedemandinfourdaysis2
pumps.
Pr[demandfor0pumpsin4days]=e
1.6
0.2019.
1.6
)( ) (
e 1.6
Pr[demandfor1pumpin4days]=
( )
= 0.3230.
1
1.6
1.6
(
e
)( )
2
Pr[demandfor2pumpsin4days]= = 0.2584.
2
ThenPr[demandfor2orfewerpumpsin4days]=0.7834.
ThenPr[atleast5willbeinstocknextTuesdaymorning|onepumpinstock
Fridaymorning]=0.7834.Notethatthisisaconditionalprobability.
122
ProbabilityDistributionsofDiscreteVariables
ThenPr[(oneinstockFridaymorning)(atleastfiveinstockonTuesday
morning)]=
=Pr[oneinstockFridaymorning]Pr[atleast5instockTuesday
A.M.| oneinstockFridayA.M.]
=(0.3012)(0.7834)
=0.236.
(b) MeanandVarianceforthePoissonDistribution
SincethePoissondistributionisdiscrete,themeanandvariancecanbefoundfrom
thepreviousgeneralrelations.Equation5.5gives
( )

(
r
)(
Pr
[
R r
])
E R
allr
Whentheprobabilityfunctionofequation5.13issubstitutedinthisexpressionand
thealgebraisworkedthrough,theresultisthatthemeanorexpectationofthe
numberofoccurrencesaccordingtothePoissonProbabilityDistributionis
=t (5.15)
ThereforeanalternativeformoftheprobabilityfunctionforthePoissondistributionis
Pr
[
R r
]

r
e


(5.16)
r!
Similarly,fromequation5.6,
2 2
E r
)

(r
) (
Pr
[
R r
])
2
(

allr
Again,theprobabilityfunction5.13canbe substituted.Theresultofthisderivation
forthePoissonDistributionisthat

2
=t (5.17)
Thus,thevarianceofthenumberofoccurrencesforthePoissondistributionisequal
tothemeannumberofoccurrences,.
(c) ApproximationtotheBinomialDistribution
Letuscomparetheresultsfromthebinomialdistributionfor=1.2,fromvarious
combinationsofvaluesofn andp,withtheresultsfromthePoissondistributionfor
=t =1.2.IneachcaseletuscalculatePr[R=0]andPr[R=1].Theresultsare
showninTable5.1.
Table5.1:ComparisonofBinomialandPoissonDistributions
FortheBinomialDistribution:
n p Pr[R=0]
4 0.3 1.2 (1)(0.3)
0
(0.7)
4
=0.240 (4)(0.3)
1
(0.7)
3
=0.412
8 0.15 1.2 (1)(0.15)
0
(0.85)
8
=0.272 (8)(0.15)
1
(0.85)
7
=0.385
20 0.06 1.2 (1)(0.06)
0
(0.94)
20
=0.290 (20)(0.06)
1
(0.94)
19
=0.370
100 0.012 1.2 (1)(0.012)
0
(0.988)
100
=0.299 (100)(0.012)
1
(0.988)
99
=0.363
200 0.006 1.2 (1)(0.006)
0
(0.994)
200
=0.300 (200)(0.006)
1
(0.994)
199
=0.362
Pr[R=1]
123
Chapter5
ForthePoissonDistribution:
n p Pr[R=0] Pr[R=1]
1.2 (1.2)
0
(e
1.2
)= 0.301 (1.2)
1
(e
1.2
)= 0.361
InthepartofTable5.1forthebinomialdistribution,nisgraduallyincreasedand
piscorrespondinglydecreasedsothattheproduct(np= )staysconstant.The
resultsarecomparedtothecorrespondingprobabilitiesaccordingtothePoisson
distributionforthisvalueof.Atleastinthisinstancewefindthatasnincreasesand
pdecreasessothatstaysconstant,theresultingprobabilitiesforthebinomial
distributionapproachtheprobabilitiesforthePoissondistribution.Infact,this
relationshipbetweenthebinomialandPoissondistributionsisgeneral.Onewayof
derivingthePoissondistributionistotakethelimitofthebinomialdistributionasn
increasesandpdecreasessuchthattheproductnp(equalto)remainsconstant.
ThusthePoissondistributionisagoodapproximationtothebinomialdistribu-
tionifnissufficientlylargeandpissufficientlysmall.Theusualruleofthumb(that
is,asomewhatarbitraryrule)isthatifn20andp0.05,theapproximationis
reasonablygood.Thatruleshouldbeusedforproblemsinthisbook.Theerroratthe
limitoftheapproximationaccordingtothisruledependsontheparameters,but
someindicationcanbeseenifwelookatthecasewhere=1.2,p=0.05,andson
1.2
=
0.05
=24.AtthispointPr[R=0]bythePoissondistributionis3.2%higherthan
Pr[R=0]bythebinomialdistribution,andPr[R=1]bythePoissondistributionis
2.0%lowerthanPr[R=1]bythebinomialdistribution.
P
r
o
b
a
b
i
l
i
t
y

0.25
0.2
0.15
0.1
0.05
0
Mean
Binomial
PoissonApproximation
p=0.042,n=120
0 1 2 3 4 5 6 7 8 9 10 11 12
Numberofdefectiveitems
Figure5.13:PoissonApproximationtoBinomialDistribution
124
ProbabilityDistributionsofDiscreteVariables
Figure5.13showsacomparisonofthebinomialdistributionandthecorrespond-
ingPoissondistribution,bothforthesamevalueof =np.Thismightbeforacase
ofsamplingitemscomingoffaproductionlinewhenthevalueofp,theprobability
thatanyoneitemwillbedefective,is0.042,andthevalueofnisthesamplesize,
120items.Aswecansee,theagreementisgood.Thiscasemeetstheruleofthumb
quiteeasily,sowewouldexpectgoodagreement.
ThePoissondistributionhasonlyoneparameter,,whereasthebinomialdistri-
butionhastwoparameters,nandp.ProbabilitiesaccordingtothePoisson
distributionareeasiertocalculatewithapocketcalculatorthanforthebinomial
distribution,especiallyforverylargevaluesofnandverysmallvaluesofp.How-
ever,thisadvantageislessimportantnowthatcomputerspreadsheetsarereadily
available.Wesawinsection5.3(f)ofthischapterthatthebinomialdistributioncan
becalculatedeasilyusingMSExcel.
Example5.15
5%ofthetoolsproducedbyacertainprocessaredefective.Findtheprobabilitythat
inasampleof40toolschosenatrandom,exactlythreewillbedefective.Calculate
a)usingthebinomialdistribution,andb)usingthePoissondistributionasan
approximation.
Answer:a)Forthebinomialdistributionwithn=40,p=0.05,
Pr[R=3]=
40
C
3
(0.05)
3
(0.95)
37
(
40
)( )( )
39 38
=
3 2
)(
1
)
(0.05)
3
(0.95)
37
( )(
=0.185
b) ForthePoissondistribution, =(n)(p)=(40)(0.05)=2.00.
3
2.00
(
2.00
)
e
Pr[R=3]=
3 2
)(
1
)
=0.180
( )(
(d) UseofComputers
ValuesofPoissonprobabilitiescanbefoundwiththeExcelfunctionPOISSONwith
parametersr, ort,andanindicationofwhetherornotacumulativevalueis
required.IfthethirdparameterisTRUE,thefunctionreturnsthecumulativeprob-
abilitythatthenumberofrandomeventswillbelessthanorequaltorwheneither
oritsequivalentthasthespecifiedvalue.IfthethirdparameterisFALSE,the
functionreturnstheprobabilitythatthenumberofeventswillbeexactlyrwhen =
thasthevaluestatedinthesecondparameter,Forexample,thecumulativeprob-
abilityof12orfewerrandomoccurrenceswhen =t=10.5isgivenby
POISSON(12,10.5,TRUE)as0.742(tothreedecimalpoints);theprobabilityof
exactly12randomoccurrencesisgivenbyPOISSON(12,10.5,FALSE)as0.103
125
Chapter5
(againtothreedecimalpoints).Asforthebinomialdistribution,useofthecomputer
withExcelisespeciallylabor-savingwhencumulativeprobabilitiesarerequired.
Problems
1. Thenumberofcarsenteringasmallparkinglotisarandomvariablehavinga
Poissondistributionwithameanof1.5perhour.Thelotholdsonly12cars.
a) Findtheprobabilitythatthelotfillsupinthefirsthour(assumingthatall
carsstayinthelotlongerthanonehour).
b) Findtheprobabilitythatmorethan3carsarrivebetween9amand11am.
2. Customersarriveatacheckoutcounteratanaveragerateof1.5perminute.What
distributionwillapplyifreasonableassumptionsaremade?Listthoseassump-
tions.Findtheprobabilitiesthat
a) exactlytwowillarriveinanygivenminute;
b) atleastthreewillarriveduringanintervaloftwominutes;
c) atmost13willarriveduringanintervalofsixminutes.
3. CumulativeprobabilitytablesforthePoissonDistributionindicatethatfor
=2.5,Pr[R6]=0.986andPr[R4]=0.891.Usethesefigurestocalculate
Pr[R=5or6].Checkusingbasicrelations.
4. CumulativeprobabilitytablesindicatethatforaPoissondistributionwith
=5.5,Pr[R6]=0.686andPr[R7]=0.810.Usethesefigurestocalculate
Pr[R=7].Checkusingabasicrelation.
5. Recordsofanelectricaldistributionsysteminaparticularareaindicatethatover
thepasttwentyyearstherehavebeenjustsixyearsinwhichlightninghasnothit
atransformer.Assumethatthefactorsaffectinglightninghitsontransformers
havenotchangedoverthattime,andthathitsoccuratrandomandindepen-
dently.
a) Thenwhatwouldbethebestestimateoftheaveragenumberofhitson
transformersperyear?
b) Inhowmanyofthenexttenyearswouldweexpecttohavemorethantwo
hitsontransformersinayear?
6. Alibraryemployeeshelvesalargenumberofbookseveryday.Theaverage
numberofbooksmisshelvedperdayisestimatedoveralongperiodtobe2.5.
a) Calculatetheprobabilitythatexactlythreebooksaremisshelvedinaparticu-
larday.
b) Calculatetheprobabilitythatfewerthantwobooksononedayandmore
thantwobooksonthenextdayaremisshelved.
c) Whatassumptionshavebeenmadeinthesecalculations?
126
ProbabilityDistributionsofDiscreteVariables
7. Thenumbersoflightningstrikesonpowerpolesinaparticulardistricthavebeen
recorded.Recordsshowthatinthepasttwenty-fiveyearstherehavebeenseven
yearsinwhichnolightningstrikesonpoleshaveoccurred.Assumethatstrikes
occurrandomlyandindependently,andthatthemeannumberofstrikesperunit
timedoesnotchange.
a) Whatdistributionapplies?
b) Whatistheprobabilitythatmorethanonestrikewilloccurnextyear?
c) Whatistheprobabilitythatexactlyonestrikewilloccurinthenexttwo
years?
d) Whatisthebestestimateofthestandarddeviationofnumberofstrikesin
oneyear?
8. Themeannumberoflettersreceivedeachyearbytheuniversityrequesting
informationabouttheprogramsofferedbyaparticulardepartmentis98.8.
Assumethatlettersarereceivedrandomlythroughoutayearwhichconsistsof52
weeks.
a) Whatistheprobabilityofreceivingnolettersinaparticularweek?
b) Whatistheprobabilityofreceivingtwoormorelettersinaparticularweek?
c) Whatistheprobabilityofreceivingnolettersinanyfour-weekperiod?
d) Whatistheprobabilityofhavingtwoweeksinaspecifiedfour-weekperiod
withnoletters?
9. Thenumberofgrainelevatorexplosionsduetospontaneouscombustionhas
been10inthepast25yearsforGreatWestGrain,acompanywithoverathou-
sandgrainelevators.Explosionsoccurrandomlyandindependently.
a) Fromthesedatamakeanestimateofthemeanrateofoccurrenceofexplo-
sionsinayear.
b) Onthebasisofthisestimate,whatistheprobabilitythattherewillbeno
explosionsinthenextfiveyears?
c) Ifthereisatleastoneexplosionayearforthreeyearsinarow,theinsurance
ratespaidbytheelevatorcompanywilldouble.Whatistheprobabilitythat
thiswillhappenoverthenextthreeyears?Usetheestimatefrompart(a).
10. Theaveragenumberoftrafficaccidentsinacertaincityinaseven-dayperiodis
28.Alltrafficaccidentsareinvestigatedonthedayoftheiroccurrencebya
policesquadcar.Amaximumofthreetrafficaccidentscanbeinvestigatedby
onesquadcarinaday.Assumethataccidentsoccurrandomlyandindependently.
a) Whatistheprobabilitythatnoaccidentswillhavetobeinvestigatedona
givenday?
b) Whatistheprobabilitythat,onexactlytwooutofthreesuccessivedays,
morethantwosquadcarswillhavetobeassignedtoinvestigatetraffic
accidents?
11. Recordsfor13summerweeksforeachofthepast80yearsinaparticulardistrict
showthat32weeksintotalwereverywet.Assumethatwetweeksoccurat
randomandindependentlyandthatthepatterndoesnotchangewithtime.
127
Chapter5
a) Whatistheprobabilitythatnoverywetweekswilloccurinthenexttwo
years?
b) Whatistheprobabilitythatatleasttwoverywetweekswilloccurinthenext
twoyears?
c) Whatistheprobabilitythatexactlytwoverywetweekswilloccurinthe
nexttwoyears?
12. In104days,170oiltankersarriveataportforunloading.Thetankersarrive
randomlyandindependently.Probabilitiesarethesameforeverydayofthe
week.Amaximumoftwooiltankerscanbeunloadedeachday.
a) WhatistheprobabilitythatnooiltankerswillarriveonTuesday?
b) WhatistheprobabilitythatmorethantwowillarriveonFriday?Thiswill
meanthatnotallcanbeunloadedonFriday,evenifnooiltankerswereleft
overfromThursday.
c) AssumingthatnooiltankersareleftoverfromTuesday,whatistheprob-
abilitythatexactlyoneoiltankerwillbeleftoverfromWednesdayandnone
willbeleftoverfromThursday?
d) Whatistheprobabilitythatmorethanthreeoiltankerswillarriveinan
intervaloftwodays?
13. TheprobabilityofnofloodsduringayearalongtheSouthSaskatchewanRiver
hasbeenestimatedfromconsiderabledata tobe0.1353.Assumethatfloods
occurrandomlyandindependently.
a) Whatistheexpectednumberoffloodsduringayear?
b) Whatistheprobabilityoftwoormorefloodsduringexactlytwoofthenext
threeyears?
c) Whatarethemeanandstandarddeviationofthenumberoffloodsexpected
inafive-yearperiod?
14. Thenumberofnewcategoriesaddedeachyeartoamajorengineeringhandbook
hasbeenfoundtobearandomvariable,unaffectedbythesizeofthehandbook
anditsrecenthistory.Theprobabilitythatnonewcategorieswillbeaddedinthe
annualupdateis0.1353.Thisyearseditionofthehandbookcontains97categories.
a) Howmanycategoriesisthenexteditionexpectedtocontain?
b) Whatistheprobabilitythattheeditiontwoyearsfromnowwillcontain
fewerthan100categories?
15. Inaplantmanufacturinglightbulbs,1%oftheproductionisknowntobe
defectiveundernormalconditions.Asampleof30bulbsisdrawnatrandom.
Assumedefectivebulbsoccurrandomlyandindependently.Whatistheprobabil-
itythat:
a) thesamplecontainsnodefectivebulbs;
b) morethan3defectivebulbsareinthesample.
Dothisproblemboth(1)usingthebinomialdistribution,and(2)usingthe
Poissondistribution.Comparetheconditionsofthisproblemtotheruleof
thumbstatedinsection5.4(c).
128
ProbabilityDistributionsofDiscreteVariables
16. Fifteenpercentofpigletsraisedintotalconfinementundercertainconditions
willlivelessthanthreeweeksafterbirth.Assumethatdeathsoccurrandomly
andindependently.Consideragroupofeightnewbornpiglets.
a) Whatprobabilitydistributionapplieswithoutanyapproximationtothe
numberofpigletswhichwilllivelessthanthreeweeks?
b) Whatistheexpectedmeannumberofdeaths?
c) Whatistheprobabilitythatexactlythreepigletswilldiewithinthreeweeks
ofbirth? Usethebinomialdistribution.
d) Calculatetheprobabilitythatexactlythreepigletswilldiewithinthree
weeksofbirth,butnowusethePoissondistribution.
e) Comparetheconditionsofthisproblemtotheruleofthumbstatedinsection
5.4(c).ThenwouldweexpectthePoissondistributiontobeagoodapproxi-
mationinthiscase?
f) Usethebinomialdistributiontocalculatetheprobabilitythatfewerthan
threepigletswilldiewithinthreeweeksofbirth.
g) UsethePoissondistributiontocalculatetheprobabilitiesthatexactly0,1,
and2pigletswilldiewithinthreeweeksofbirth,andthenthatfewerthan3
pigletswilldiewithinthreeweeksofbirth.
17. Testsonthebrakesandsteeringgearof200carsindicatethattheprobabilityof
defectivebrakesis0.17andtheprobabilityofdefectivesteeringis0.14.
a) Ifdefectivebrakesanddefectivesteeringareindependentofoneanother,
whatistheprobabilityoffindingbothonthesamecar?
b) Considerprobabilitydistributionswhichmightapplytotheoccurrenceof
bothdefectivebrakesanddefectivesteeringamongthe200cars.Assume
occurrencesofbotharerandomandindependentofotheroccurrences.What
probabilitydistributionwouldbeexpectedfundamentallyiftheprobability
ofsuccessisconstantfromtrialtotrial? Whatprobabilitydistribution
wouldbeapplicableasamoreconvenientapproximation,andwhy?Give
theparametersofbothdistributions.
c) Applytheapproximatedistributiontofindtheprobabilitythatatleasteleven
carsof200wouldhavebothdefectivebrakesanddefectivesteeringifthey
areindependentofoneanother.
d) Ifinfact11ofthe200carshavebothdefectivebrakesanddefectivesteering,
isitreasonabletoconcludethatdefectivebrakesanddefectivesteeringare
independentofoneanother?
ComputerProblems
C18.ThenumberofcarsenteringaparkinglotisarandomvariablehavingaPoisson
Distributionwithameanoffourperhour.Thelotholdsonly12cars.
a) Findtheprobabilitythatthelotfillsupinthefirsthour(assumingthatall
carsstayinthelotlongerthanonehour).
b) Findtheprobabilitythatfewerthan12carsarriveduringaneight-hourday.
129
Chapter5
C19.Customersarriveatacheckoutcounteratanaveragerateof1.5perminute.
Whatdistributionwillapplyifreasonableassumptionsaremade?Listthoseassump-
tions.Findtheprobabilitythatatmost13customerswillarriveduringanintervalof
sixminutes.
C20.Alibraryemployeeshelvesalargenumberofbookseveryday.Theaverage
numberofbooksmisshelvedperdayisestimatedoveralongperiodtobe2.5.
Calculatetheprobabilitythatbetweenfiveandfifteenbooks(includingbothlimits)
aremisshelvedinafour-dayperiod.
C21.Theaveragenumberofvehiclesarrivingatanintersectionundercertaincondi-
tionsisconstant,butvehiclesarriveindependentlyandtheactualnumberarrivingin
anyintervaloftimeisdeterminedbychance.Theaveragerateatwhichvehicles
arriveattheintersectionis360vehiclesperhour.Trafficlightsatthisintersectiongo
throughacompletecyclein40seconds.Duringthegreenlightonlysevenvehicles
canpassthroughtheintersection.
a) Whatistheprobabilitythatexactlysevenvehiclesarriveduringonecycle?
b) Whatistheprobabilitythatfewerthansevenvehiclesarriveduringone
cycle?
c) Whatistheprobabilitythatexactlyeightvehiclesarriveduringonecycle,so
thatonevehicleisheldforthenextcycle(assumingtherewerenohold-overs
fromthepreviouscycle)?
d) Whatistheprobabilitythatonevehicleisheldoverfromcycle1asinpart
(c)andallthevehiclespassthroughonthefollowingcycle?
C22.Grainloadingfacilitiesataporthavecapacitytoloadfiveshipsperday.Past
experienceofmanyyearsindicatesthatontheaverage28shipscomeintopickup
graininaseven-dayperiod.Shipsarriverandomlyandindependently.
a) Whatistheprobabilitythatonagivendaythecapacityofthedockwillbe
exceededbyatleastoneship,giventhatnoshipwaswaitingatthebeginning
oftheday?
b) Whatistheprobabilitythatexactlyfourshipswillshowupattheportina
two-dayperiod?
c) Byhowmuchshouldthecapacityoftheloadingdocksbeexpandedsothat
theprobabilitythatashipwillnotbeabletodockonagivendaywillbeless
than1%?
C23.TheABCAutoSupplyDepotordersstockatthemiddleofthemonthand
receivesthegoodsatthefirstofthenextmonth.Theaveragenumberofrequestsfor
fuelpumpXY33isfourpermonth.IfonApril15,twoofthesefuelpumpsarein
stockandanadditionalfiveareorderedtobereceivedbyMayl,whatistheprobabil-
itythattheABCDepotwillnotbeabletosupplyalltherequestsforXY33inthe
monthofMay? Requestsforpumpsarerandomandindependentofoneanother.
Requestsarenotcarriedoverfromonemonthtothenext.
130
ProbabilityDistributionsofDiscreteVariables
C24.Amanufacturerofferstoselladeviceforcountinglightningflashesduring
thunderstorms.Thedevicecanrecorduptofivedistinctflashesperminute.
a) Iftheaverageflashintensityexperiencedduringathunderstormatarecord-
inglocationisnine flashesinsixminutes,whatistheprobabilitythatat
leastoneflashwillnotberecordedinaone-minuteperiod? Whatassump-
tionsarebeingmade?
b) Giventhisintensity,whatistheprobabilityofexperiencingsixlightning
flashesinatwo-minuteperiod?
c) Whatisthehighestaverageintensityinflashesperhourforwhichthe
recordercanbeused,iftheprobabilityofnotrecordingallflashesina
minutemustbelessthan10%?
C25.TheprobabilityofnofloodsduringayearalongtheSouthSaskatchewanRiver
hasbeenestimatedfromconsiderabledata tobe0.1353.Assumethatfloodsoccur
randomlyandindependently.Whatistheprobabilityofsevenorfewerfloodsduring
afive-yearperiod?
C26.Thecarspassingacertainpointasafunctionoftimewerecountedduringa
trafficstudyofacityroad.Itwasfoundthattherewasl0%probabilityofobserving
morethantencarsinaneight-minuteinterval.
a) Findtheprobabilitythatexactlyfivecarswillpassinafour-minuteinterval.
Whatassumptionsarebeingmade?
b) Findtheprobabilitythatfewerthantwocarswillpassineachofthree
consecutiveintervals.
c) Findtheprobabilitythatfewerthantwocarswillpassinexactlytwoofthree
consecutiveintervals.
d) Howlonganintervalshouldbeusedsothattheprobabilityofobserving
morethanninecarsbecomes40%?
C27.RainstormsaroundSaskatoonoccuratthemeanrateofsixinfourweeks
duringthespringseason.Ifonestormoccursintheweekafterspringsnowmeltis
over,theprobabilityoffloodingis0.30;iftwostormsoccurthatweek,theprobabil-
itygoesto0.60.Ifmorethantwooccur,theprobabilitybecomes0.75.Ifnostorms
occur,theprobabilityis0.Overall,ifnofloodinghasoccurredbytheendofthefirst
week,theprobabilityoffloodingbecomes0.10ifonerainstormoccursinthenext
twoweeks,and0.15iftwoormorerainstormsoccurinthenexttwoweeks.Assume
thatrainstormsoccurindependentlyandrandomly.
(a) Whatistheprobabilityofatleastfourrainstormsinthefirstthreeweeks?
(b) Whatistheprobabilityoffloodinginthosethreeweeks?
5.5 Extension:OtherDiscreteDistributions
AlthoughthebinomialdistributionandthePoissondistributionareprobablythe
mostcommonandusefuldiscretedistributions,anumberofothersarefounduseful
insomeengineeringapplications.Amongthemarethenegativebinomialdistribution
131
Chapter5
andthegeometricdistribution.Boththesedistributionsareforthesameconditionsas
forthebinomialdistributionexceptthattrialsarerepeateduntilafixednumberof
successeshaveoccurred.Thenegativebinomialdistributiongivestheprobability
thatthekthsuccessoccursonthenthtrial,wherebothkandnarefixedquantities.
Thegeometricdistributionisaspecialcaseofthenegativebinomialdistribution;it
givestheprobabilitythatthefirstsuccessoccursonthenthtrial.Wehavealready
mentionedthemultinomialdistributioninpart(i)ofsection5.3.Asdiscussedthere,
it canbeconsideredageneralizationofthebinomialdistributionwhenthereare
morethantwopossibleoutcomesforeachtrial.Thenegativebinomialdistribution,
thegeometricdistribution,andthemultinomialdistributionaredescribedmorefully
inthebookbyWalpoleandMyers(seetheListofSelectedReferencesinsection
15.2ofthisbook).
TheBernoullidistributionisaspecialcaseofthebinomialdistributionwhenthe
numberoftrialsisone.Thus,theonlypossibleoutcomesfortheBernoullidistribu-
tionarezeroandone.Pr[R=0]=(1p),andPr[R=1]=p.
Thehypergeometricprobabilitydistributionappliestoasituationwherethereare
onlytwopossibleoutcomestoeachtrial,buttheprobabilityofsuccessvariesfrom
onetrialtoanotherinaccordancewithsamplingfromafinitepopulationwithout
replacement.Thetotalnumberoftrialsandthesizeofthepopulationarethenboth
parameters.Thisdistributionisdescribedinvariousreferencesincludingthebookby
Mendenhall,WackerlyandScheaffer(againseesection15.2).ThebookbyBarnes
(seethatsamesectionofthisbook)givesaguidelineforapproximatingthehyper-
geometricdistributionbythebinomialdistribution:thesamplesizeshouldbeless
thanonetenthofthesizeofthefinitesetofitemsbeingsampled.
UseofComputers: Whenapersonhasbecomefamiliarwiththefundamental
ideasofdiscreterandomvariables,itisoftenconvenienttouseanumberofExcels
statisticalfunctions,includingthefollowing:
HYPGEOMDIST()returnsprobabilitiesaccordingtothehypergeometric
distribution.
NEGBINOMDIST()returnsprobabilitiesaccordingtothenegativebinomial
distribution.
CRITBINOM()returnsthelimitingvalueofaparameterofthebinomialdistri-
butiontomeetarequirement.Thisisusefulinqualityassurance.
InmostcasesthemostconvenientwaytousefunctionsonExcel,including
selectionofargumentsfortheparameters,isprobablytopastetherequiredfunction
intotheappropriatecellonaworksheet.Thedetailedprocedurevariesfromone
versionofExceltoanother.OnExcel2000,forexample,weclickthecellwherewe
wanttoenterthefunction,thenfromtheInsertmenuwechoosethefunctioncat-
egory(forexample,Statistical),thenclickthefunction(forexample,
HYPOGEOMDIST).Furtherdetailsaregiveninpart(b)ofAppendixB.
132
ProbabilityDistributionsofDiscreteVariables
Thesefunctionsshouldnotbeuseduntilthereaderisfamiliarwiththemain
ideasofthischapter.
5.6 RelationBetweenProbabilityDistributionsand
FrequencyDistributions
Thischapterhasbeenconcernedwithprobabilitydistributionsfordiscreterandom
variables.Chapter3includeddescriptionsandexamplesoffrequencydistributions
fordiscreterandomvariables.Probabilitydistributionsandfrequencydistributions
aresimilar,butofcoursethereareimportantdifferencesbetweenthem.Theprobabil-
itydistributionswehavebeenconsideringaretheoreticalanddependon
assumptions,whereasfrequencydistributionsareusuallyempirical,theresultof
experiments.Probabilitydistributionsshowpredictablevariationswiththevaluesof
thevariable.Frequencydistributionsshowadditionalrandomvariations,thatis,
variationswhichdependonchance.
Inthissectionwewillfirstlookatcomparisonsofsomeprobabilitydistributions
withsimulatedfrequencydistributionsforthesameparameters.Thenwewilldiscuss
fittingbinomialdistributionsandPoissondistributionstoexperimentalfrequency
distributions.
Randomnumberscanbeusedtosimulatefrequencydistributionscorresponding
tovariousdiscreterandomvariables.Thatis,randomnumberscanbecombinedwith
theparametersofaprobabilitydistributiontoproduceasimulatedfrequencydistri-
bution.Thesimulatedfrequencydistributionsdiscussedinthissectionwereprepared
usingExcel,butthedetailedproceduresarenotrelevanttothepresentdiscussion.
(a) ComparisonofaProbabilityDistributionwithCorrespondingSimulated
FrequencyDistributions
0.30
0.25
0.20
0.15
0.10
0.05
Figure5.14:ProbabilityDistribution:
Binomialwithn=10andp=0.26
0.00
-1 0 1 2 3 4 5 6 7 8 9 10
Value,x
P
r
o
b
a
b
i
l
i
t
y
,

p
(
x
)

133
Chapter5
Figure5.14showsaprobabilitydistributionforabinomialdistributionwithn=
10andp=0.26.CorrespondingtothisisFigure5.15,whichisforthesamevalues
ofnandpbutshowstwosimulatedrelativefrequencydistributions.Thesearefor
samplesofsizeeightthatis,samplescontainingeightitemseach.Aswehaveseen
before,relativefrequenciesareoftenusedasestimatesofprobabilities.However,
withthissmallsamplesizetherelativefrequenciesdonotagreeatallwellwiththe
correspondingprobabilities,andtheydonotagreewithoneanother.
0.4 0.4
0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
0
Values,r Values,r
R
e
l
a
t
i
v
e
Figure5.15:SimulatedFrequencyDistributionsforEightRepetitions
Ifthesamplesizeisincreased,agreementbecomesbetter.Figure5.16showstwo
simulatedrelativefrequencydistributionsforsamplesofsizeforty,stillforabino-
mialdistributionwithn=10andp=0.26.ThegraphsofFigure5.16stilldifferfrom
oneanotherbecauseofrandomfluctuations,buttheyaremuchmoresimilartoone
anotherinshapethanthegraphsofFigure5.15.ComparisontoFigure5.14shows
thatthegeneralshapeoftheprobabilitydistributionisbeginningtocomethrough.
0.4 0.4
F
r
e
q
u
e
n
c
y
0.3 0.3
F
r
e
q
u
e
n
c
y
R
e
l
a
t
i
v
e
R
e
l
a
t
i
v
e
0.2
0.2
0.1 0.1
0.3 0.3
F
r
e
q
u
e
n
c
y
F
r
e
q
u
e
n
c
y
R
e
l
a
t
i
v
e
0.2 0.2
0.1 0.1
0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
r,Values
r,Values
Figure5.16:SimulatedRelativeFrequencyDistributionsforFortyRepetitions
Thus,wecanseethattherelativefrequencydistributionsarebothmoreconsis-
tentwithoneanotherandmoresimilartothecorrespondingprobabilitydistributions
whentheyrepresentfortyrepetitionsratherthaneightrepetitions.Thisseemsreason-
able.Huffpointsoutthatinadequatesamplesizeoftenleadstoincorrector
misleadingconclusions.HegivessomedramaticexamplesofthisinhisbookHowto
LiewithStatistics(seesection15.2forreference).
134
ProbabilityDistributionsofDiscreteVariables
(b) FittingaBinomialDistribution
Weoftenwanttocompareasetofdatafromobservationswithatheoreticalprobability
distribution.Canthedataberepresentedsatisfactorilybyatheoreticaldistribution?
Ifso,thedatacanberepresentedverysuccinctlybytheparametersofthetheoretical
distribution.Specifically,letusconsiderwhetherasetofdatacanberepresentedbya
binomialdistribution.
Thebinomialdistributionhastwoparameters,nandp.Inanypracticalcasewe
willalreadyknown,thenumberoftrials.Howcanweestimatep,theprobabilityof
successinasingletrial? Anintuitiveansweristhatwecanestimatepbythe
fractionofallthetrialswhichweresuccesses,thatis,theproportionorrelative
frequencyofsuccess.Itispossibletoshowmathematicallythatthisintuitive
answeriscorrect,anunbiasedestimateoftheparameterp.
Example5.16
InExample3.2weconsideredthenumberofdefectiveitemsingroupsofsixitemscoming
offaproductionlineinafactory.Wefoundtherewere14defectivesinsixtygroups
givingatotalsampleof360items,sotheproportiondefectivewas14/360=0.0389.
LetustrytofittheobservedfrequencydistributionofTable3.2byabinomial
distribution.Wehaven=6andpisestimated(probablynotveryaccurately)tobe
0.0389.Thentheprobabilityofexactlyrdefectiveitemsinasampleofsixitems
accordingtothebinomialdistributionisgivenbyequation5.9as
Pr[R=r]=
6
C
r
(0.0389)
r
(0.9611)
(6r)
Thispredictionofprobabilitybythebinomialdistributionshouldbecomparedwiththe
observedrelativefrequenciesforvariousnumbersofdefectives.Thesecanbeobtained
simplybydividingthefrequenciesofTable3.2bythetotalfrequencyof60.Since60
groupsisnotaverylargenumberweshouldnotexpecttheagreementtobeveryclose.
TheresultsareshowninTable5.2andFigure5.17:atheoreticalbinomial
probabilityof0.788canbecomparedwithanobservedrelativefrequencyof0.600,
andsoon.
Table5.2:ComparisonofBinomialProbabilitywith
ObservedRelativeFrequency
Numberof Binomial Probability, Observed ObservedRelative
Defectives,r Pr[R=r] Frequency,f Frequency,f/ f
0 0.788 48 0.600
1 0.191 10 0.167
2 0.019 2 0.033
3 0.001 0 0
4 3x10
-5
0 0
5 5x10
-7
0 0
6 3x10
-9
0 0
135
Chapter5
0.8
0.7
0.6
0.5
0.6
0.5
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
0.4
0.2
0.2
0.1 0.1
0
0 5
Numberofdefectives
0
10 0 5 10
Numberofdefectives,r
(a)ObservedDistribution (b)BinomialDistribution
P
r
o
b
a
b
i
l
i
t
y
0.4
0.3
0.3
Figure5.17:ComparisonofRelativeFrequencieswithBinomialProbabilities
Wecanseethatthecomparisonisreasonablygood.Insection13.3wewillseea
morequantitativecomparison.
(c) FittingaPoissonDistribution
WemayhaveasetofdatawhichwesuspectcanberepresentedbyaPoissondistri-
bution.Ifitis,wecandescribeitverycompactlybytheparametersofthat
distribution.Inaddition,theremaybesomeimplication(forexample,regarding
randomness)ifthedatacanberepresentedbyaPoissondistribution.Thus,weneed
toknowhowtofindaPoissondistributionthatwillfitasetofdata.
ThePoissondistributionhasonlyoneparameter,ort.Aswehaveseenin
Chapter3,thesamplemean, x,isanunbiasedestimateofthepopulationmean,.
Therefore,thefirststepinfittingaPoissondistributiontoasetofdataistocalculate
themeanofthedata.ThentherelationforthePoissondistributionisusedtocalcu-
latetheprobabilitiesofvariousnumbersofoccurrencesifthatdistributionholds.
Theseprobabilitiescanbecomparedtotherelativefrequenciesfoundbydividingthe
actualfrequenciesbythetotalfrequency.
Example5.17
Thenumberofcarscrossingalocalbridgewascountedforfortysuccessive6-minute
intervalsfrom1:00to5:00A.M.Thenumberscanbesummarizedasfollows:
x
i
,numberofcarsin6-minuteInterval f
i
,frequency
0 2
1 7
2 10
3 8
4 6
5 3
6 3
7 1
>8 0
FitaPoissonDistributiontothesedata.
136
ProbabilityDistributionsofDiscreteVariables
Answer: First,letuscalculatethesamplemeanasanestimateofthepopulation
mean, .
x
i
f
i
x
i
f
i
0 2 0
1 7 7
2 10 20
3 8 24
4 6 24
5 3 15
6 3 18
7 1 7
>8 0 0
Total 40 115
Then x
i i
i
f x
f


115
40
2.875 .Thentake= t =2.875in6minutes.
Then
t
t


2.875
6
0.479 cars/minute.
2.875
/ AccordingtothePoissonDistribution,then,Pr[R=r]=(2.875)
r
e r!.Itwas
mentionedpreviouslythatonceoneofthePoissonprobabilitiesiscalculated,others
canbecalculatedconvenientlyusingtherecurrencerelationofequation5.14,
j t \
Pr[R=r+1]=
, (
Pr[R=r].
(
r + 1
,
CalculationofPoissonprobabilitiesandrelativefrequenciesgivesthefollowing
results:
r f
i
Pr[R=r] RelativeFrequency
0 2 0.0564 0.0500
1 7 0.1622 0.1750
2 10 0.2332 0.2500
3 8 0.2234 0.2000
4 6 0.1606 0.1500
5 3 0.0923 0.0750
6 3 0.0442 0.0750
7 1 0.0182 0.0250
>8 0 0.0095 0
Total 40
Thefrequenciesfromtheproblemstatementarecomparedwiththecalculated
expectedfrequenciesinFigure5.18.Itcanbeseenthattheagreementbetween
137
Chapter5
recordedandfittedfrequenciesappearstobeverygood,infactbetterthanwemight
expect.
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
o
r

P
r
o
b
a
b
i
l
i
t
y

0.3
0.25
0.2
0.15
0.1
0.05
0
RelativeFrequency
Probability
0 1 2 3 4 5 6 7
NumberofCars
Figure5.18:ComparisonofRelativeFrequencieswith
ProbabilitiesforthePoissonDistribution
Insection13.3wewillseehowtomakeaquantitativeevaluationofthegoodness
offitoftwodistributions.Thisexamplewillbecontinuedatthatpoint.
Examples5.16and5.17havecomparedprobabilitiestorelativefrequencies.An
alternativeprocedureistocalculateexpectedfrequenciesbymultiplyingeachprob-
abilitybythetotalfrequency.Thentheexpectedfrequenciesarecomparedwiththe
observedfrequencies.Thatprocedureislogicallyequivalenttothecomparisonwe
havemadehere.
Problems
1. Asamplingschemeformechanicalcomponentsfromaproductionlinecallsfor
randomsamples,eachconsistingofeightcomponents.Eachcomponentis
classifiedaseithergoodordefective.Theresultsof50suchsamplesaresumma-
rizedinthetablebelow.
NumberofDefectives ObservedFrequency
0 30
1 17
2 3
>2 0
Fromthesedataestimatetheprobabilitythatasinglecomponentwillbe
defective.Calculatetheprobabilitiesofvariousnumbersofdefectivesina
sampleofeightcomponents,andprepareatabletocomparepredictedprobabili-
tiesaccordingtothebinomialdistributionwithobservedrelativefrequenciesfor
variousnumbersofdefectivesinasample.
138
ProbabilityDistributionsofDiscreteVariables
2. Electricalcomponentsareproducedonaproductionline,theninspected.Each
componentisclassifiedasgoodordefective.360successivecomponentswere
groupedintosamples,eachcontainingsixcomponents.Theresultsaresumma-
rizedinthetablebelow.
NumberofDefectives ObservedFrequency
0 34
1 24
2 2
>2 0
Fromthesedataestimatetheprobabilitythatasinglecomponentwillbe
defective.Calculatetheprobabilitiesofvariousnumbersofdefectivesina
sampleofsixcomponents,andprepareatabletocomparepredictedprobabilities
accordingtothebinomialdistributionwithobservedrelativefrequenciesfor
variousnumbersofdefectivesinasample.
3. Astudyoffourblockscontaining52one-hourparkingspaceswascarriedout
andtheresultsaregiveninthefollowingtable.
Numberofvacantone-hourparking
spacesperobservationperiod 0 1 2 3 4 5 6
Observedfrequency 31 45 20 15 7 3
AssumingthatthedatafollowaPoissondistribution,determine:
a) themeannumberofvacantparkingspaces,
b) thestandarddeviationboth(i)fromthegivendataand(ii)fromthetheoreti-
caldistribution,and
c) theprobabilityoffindingoneormorevacantone-hourparkingspaces,
calculatingfromthetheoreticaldistribution.
4. Inanalysisofthetreatedwaterfromasewagetreatmentprocess,liquidcontain-
ingharmfulcellswasplacedonaslideandexaminedsystematicallyundera
microscope.Onehundredcountsofthenumberofharmfulcellsin1mmby1
mmsquaresweremade,withthefollowingfrequenciesbeingobtained.
Count 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency 1 3 8 14 17 19 14 12 6 2 2 2 0
FitaPoissondistributiontothesedata.CalculateexpectedPoissonfrequenciesto
comparewiththeobservedfrequencies.Isthefitreasonablygood?
5. Anairfilterhasbeendesignedtoremoveparticulatematter.Atestcallsfor40
specimensofairtobetested.Of40specimens,itwasfoundthattherewereno
particlesin15specimens,oneparticlein10specimens,twoparticlesin8
specimens,threeparticlesin5specimens,andfourparticlesin2specimens.
a) Whattypeofdistributionshouldthedatafollow?Whatarethenecessary
assumptions?
139
0
Chapter5
b) Estimatethemeanandstandarddeviationofthefrequencydistributionfrom
thegivendata.
c) Whatisthetheoreticalstandarddeviationfortheprobabilitydistribution?
d) Usingprobabilitiescalculatedfromthetheoreticaldistribution,whatisthe
probabilitythatamongtenspecimenstherewouldbeeightormorewithno
particles?
6. Asectionofanoilfieldhasbeendividedinto48equalsub-areas.Countingthe
oilwellsinthe48sub-areasgivesthefollowingfrequencydistribution:
Numberof 0 1 2 3 4 5 6 7
oilwells
Numberof 5 10 11 10 6 4 0 2
sub-areas
Isthereanyevidencefromthesedatathattheoilwellsarenotdistributedran-
domlythroughoutthesectionoftheoilfield?
140
CHAPTER
6
ProbabilityDistributionsof
ContinuousVariables
Forthischapterthereaderneedsagoodknowledgeofintegralcalculus
andthematerialinsections2.1,2.2,5.1,and5.2.
Ifavariableiscontinuous,betweenanytwopossiblevaluesofthevariablearean
infinitenumberofotherpossiblevalues,eventhoughwecannotdistinguishsomeof
themfromoneanotherinpractice.Itisthereforenotpossibletocountthenumberof
possiblevaluesofacontinuousvariable.Inthissituationcalculusprovidesthe
logicalmeansoffindingprobabilities.
6.1 ProbabilityfromtheProbabilityDensityFunction
(a) BasicRelationships
Theprobabilitythatacontinuousrandomvariablewillbebetweenlimitsaandbis
givenbyanintegral,ortheareaunderacurve.
b
< < x Pr
[
a X b
]

f
( )
dx
(6.1)
a
Thefunctionf(x)inequation5.1iscalledaprobabilitydensityfunction.The
probabilitythatthecontinuousrandomvariable,X,isbetweenaandbcorrespondsto
theareaunderthecurverepresentingtheprobabilitydensityfunctionbetweenthe
limitsaandb.Thisisthecross-hatchedareainFigure6.1.Comparethisrelation
withtherelation
fortheprobability
thatadiscrete
Areagivesprobability
randomvariableis
Probability
betweenlimitsa
Density
andb,whichisthe
Function,
sumoftheprob-
f(x)
ability functions
forallvaluesof
thevariableX
betweenaandb,

p x
a b
( ) .
i
x a x
i
b
Figure6.1:ProbabilityforaContinuousRandomVariable
141
Chapter6
Thecumulativedistributionfunctionforacontinuousrandomvariableisgiven
bytheintegraloftheprobabilitydensityfunctionbetweenx= andx=x
1
,where
x
1
isalimitingvalue.Thiscorrespondstotheareaunderthecurvefrom tox
1
.The
cumulativedistributionfunctionisoftenrepresentedbyF(x
1
)orF(x).
x
1

1
x
1
Pr
[
X x
]
F
( )

f
(
x
)
dx
(6.2)

Thisexpressionshouldbecomparedwiththeexpressionforthecumulative
distributionfunctionforadiscreterandomvariable,whichisgivenbyequation5.1to
be

p x
i
( ) .Thus,asummationofindividualprobabilities(foradiscretecase)
x x
1

correspondstoanintegraloftheprobabilitydensityfunctionwithrespecttothe
variable(foracontinuouscase).
( )

f
(
x
)
dx
(6.3) i.e.,

p x
i
(Discrete) (Continuous)
ToincludeallconceivablevaluesofthevariableX,thelimitsinequation6.2
becomefromx= tox=+.Theprobabilityofavalueisthatintervalmustbe1.
Thenwehave
+

f
(
x
)
dx 1
(6.4)
F
( )

Inmanycasesonlyvaluesofthevariableinacertainintervalarepossible.Then
outsidethatinterval,theprobabilitydensityfunctioniszero.Intervalsinwhichthe
probabilitydensityfunctionisidenticallyzerocanbeomittedintheintegration.
Sinceanyprobabilitymustbebetween0and1,aswehaveseenpreviously,the
probabilitydensityfunctionmustalwaysbepositiveorzero,butnotnegative.
( )
0 (6.5) f x
Example6.1
Aprobabilitydensityfunctionisgivenby:
f(x)=0 forx<0
f(x)
2
3
f(x)= x
2
for0<x<2
1.5
8
f(x)=0 forx>2
1
Agraphofthisdensityfunctionis
showninFigure6.2.
0.5
0
0 0.5 1 1.5 2 2.5
x
Figure6.2:ASimpleProbabilityDensityFunction
142
ProbabilityDistributionsofContinuousVariables
Itisnothardtoshowthatf(x)meetstherequirementsforaprobabilitydensity
function.First,sincex
2
isalwayspositiveforanyrealvalueofx,f(x)isalways
greaterthanorequaltozero.Second,theintegraloftheprobabilitydensityfunction
from to+isequalto1,aswecanshowbyintegration:
+ 0 2
2

f
(
x
)
dx

(
0
)
dx+

3
x dx+

( )
dx F
( )
8
2
0
0
3 1
3
\
]
2
,
j \j
0 + 0 +
,, (,
x
(]
8

( ,(
3
,
]
0
3 1
0
j \j \
2
3
+ 0 +
, (, (
( )
8 3
( ,( ,
1
(b)ASimpleIllustration:WaitingTime
Astudentarrivesatabusstopandwaitsforthebus.Heknowsthatthebuscomes
every15minutes(whichwewillassumeisexact),buthedoesntknowwhenthe
nextbuswillcome.Letsassumethebusisaslikelytocomeinanyoneinstantasin
anyotherwithinthenext15minutes.Letthetimethestudenthastowaitforthebus
bexminutes.Letusfirstexploretheprobabilitiesintuitively,andthenapplyequa-
tions6.1and6.2.
i) Whatistheprobabilitythatthewaitingtimewillbelessthanorequalto15
minutes?
Sinceweknowthatthebuscomesevery15minutes,thisprobabilitymustbe1.
ii) Whatistheprobabilitythatthewaitingtimewillbelessthan5minutes?
Sincethebusisaslikelytocomeinanyoneinstantasinanyothertoamaxi-
mumof15minutes,theprobabilitythatthewaitingtimeislessthan5minutes
5 1
mustbe

.
15 3
Similarly,theprobabilitythatthewaitingtimeislessthan10minutesmustbe
10 2

.
15 3
iii) Thenwecangeneralizetheexpressionforprobability.Theprobabilitythatthe
x
waitingtimewillbelessthanxminutes,where0 x 15,mustbe .
15
iv) Whatistheprobabilitythatthewaitingtimewillbebetween5minutesand10
minutes? Thismustbe:
Pr[5<x<10]=Pr[x<10]Pr[x<5]
10 5 5 1
or
.
15 15 15 3
143
Chapter6
Comparisontoequation6.1witha=5andb=10indicatesthat:
10
Pr 5< <10
]

f
( )
dx
10

[
X x
5
15 15
5
x
Whatsimpleexpressionforf(x)willintegratewithrespecttoxtogive
15
?
1
Itmustbe .
15
Thentheprobabilitydensityfunctionmustbegivenby:
f(x)=0 forx<0 (sincewaitingtimecantbenegative).
1
f(x)= for0<x<15
15
f(x)=0 forx>15 (sincewaitingtimecantbemorethan15minutes)
Letschecktheintegraloff(x)forxbetween0and15,theonlyintervalforwhich
15
1 15 0
f(x)isnotequaltozero.Wehave
dx
=1(asrequired),sothe

15 15
0
1
constant value, ,iscorrect.
15
v) Bycomparisontoequation6.2theprobabilitythatthewaitingtimewillbeless
than5minutesmustbe:
5
5

F
( )
f
(
x
)
dx

0 5

0dx+

1
15
dx
0
1 j \
0 5 +
, (
( )
15
( ,
5 1
or
15 3
Thisagreeswithpartii.
vi) Usingtheexpressionsfortheprobabilitydensityfunctionfrompartiv,the
generalexpressionforthecumulativedistributionfunctionforthisillustration
mustbe:
x
1
F x
1
( )

0dx 0
forx
1
<0

x
1
( )
+

1
dx F x
1
0
x
1
for0<x
1
<15
15 15
0
15
1
x
1
F
( )
+

15
dx+
15

0dx x
1
0
0
144
ProbabilityDistributionsofContinuousVariables
15
0 + + 0
15
Theprobabilitydensityfunctionandthecumulativedistributionfunctionare
showngraphicallyinFigure6.3.
F(x
1
) =1 forx
1
>15
1/15
0 15
x,minutes
Figure6.3(a):ProbabilityDensityFunctionforWaitingTimeforaBus
0
0.2
0.4
0.6
0.8
1
1.2
-10 -5 0 5 10 15 20 25
x,minutes
Figure6.3(b):CumulativeDistributionFunctionforWaitingTimeforaBus
(c) Example6.2
Aprobabilitydensityfunctionisgivenby:
f(x)=0 for x<1
f(x)=b/x
2
for 1<x<5
f(x)=0 for x>5
a) Whatisthevalueofb?
b) FromthisobtaintheprobabilitythatXisbetween2and4.
c) WhatistheprobabilitythatXisexactly2?
d) FindthecumulativedistributionfunctionofX.
Answer:
a) Tosatisfyequation6.4:
1 5

0dx+

b
dx+


0dx 1
2
x
1 5
145
Chapter6
5
2
Therefore

b x dx 1
1
5

1
] ,b x
]
1
1
j 1 \
b
,
1
(
1
(
5
,
4
b1
5
b1.25
3
(InExample6.1theconstant
8
wasobtainedinthesameway).
Thenagraphofthedensityfunctionforthisexampleisshownbelow:
1.4
f(x)
1.2
1
0.8
0.6
0.4
Figure6.4:
0.2
GraphofFunctionforExample6.2 0
0 1 2 3 4 5 6
x
4
2
b) Pr 2< < 4
]

1.25x dx
[
X
2
4

1
] , 1.25x
]
2
(
j 1 1 \
1.25
)
,

(
(
4 2
,
0.3125
2
2
c)
Pr
[
X 2exactly
]

1.25x dx
2
2

1
] , 1.25x
]
2
0
146
ProbabilityDistributionsofContinuousVariables
Note:Theresultobtainedhereisimportantandappliestoallcontinuousrandom
variables.Theprobabilitythatanycontinuousrandomvariableisexactlyequaltoa
singlequantityiszero.WewillseethisagaininExample7.2.
x
1
d) Forx
1
<1 ( : F x
)

0dx 0
1

x
1
For1<x
1
<5 ( )
+

1.25x : F x
1
0

dx
2
1
1.25
)
,x
(

1
]
]
x
1
1
(
\
1.25
)
j
,
1
1
(
(
x
1 ,
j 1 \
1.25 1
,

(
(
x
1 ,
x
1
For5<x
1
<: F x
1
( )

f
(
x
)
dx

1 5 x
1
0dx+

1.25 x
2
dx+ 0dx

0 1 3
5

1
] 0
(
+ 1.25
)
,x
]
1
+ 0
(
\
1.25
)
j
,
1
1
(
(
5
,
1
Thentosummarize,thecumulativedistributionfunctionofXis:
0 forx
1
<1
j 1 \
1.25
,
1
(
for0<x
1
<5
(
x
1 ,
and 1 forx
1
>5
Problems
1. Aprobabilitydensityfunctionforxinradiansisgivenby:
f(x)=0 for x</2
1
2
f(x)= cosx for /2<x</2
f(x)=0 for x>/2
147
Chapter6
a) FindtheprobabilitythatXisbetween0and /4.
b) Findanexpressionforthecorrespondingcumulativedistributionfunction,
F(x),for /2 x /2.
c) Ifx= /2,whatisthevalueoff(x)? Explainwhythisisorisnotareason-
ableresult.
d) WhatistheprobabilitythatXisexactly /4?Explainwhythisisorisnota
reasonableresult.
e) Repeatpart(a)usingF(x).
2. Aprobabilitydensityfunctionisgivenby:
f(x)=0 forx<2
f(x)=1/3 for2<x<0
1j x
f(x)=
,
1
\
(
for0<x<2
3
(
2
,
f(x)=0 forx>2
a) WhatistheprobabilitythatXisbetween0and+1?
b) FindthecumulativedistributionfunctionofXforeachinterval.Isthe
cumulativedistributionfunctionforx>2reasonable? Why?
c) Sketchthecumulativedistributionfunction,showingscales.
d) Usetheresultsofpartbtofindtheprobabi1itythatXisbetween0and1.
f) Findthemedianofthisprobabilitydistribution.
3. Aradartelemetrytrackingstationrequiresavastquantityofhigh-qualitymag-
netictape.IthasbeenestablishedthatthedistanceX(inmeters)between
tape-surfaceflawshasthefollowingprobabilitydensityfunctions:
f(x)=0.005e
0.005x
x0
f(x)=0 otherwise
a) Plotagraphoff(x)versusxfor0 x800.
b) Findthecumulativeprobabilitydistributionfunction,
x
1
( )

f
(
x
)
dx forx > 0. F x
i 1

c) Supposeoneflawinthetape-surfacehasbeenidentified.Calculate:
(i) theprobabilitythatanadditionalflawwillbefoundwithinthenext100m
oftape.
(ii) theprobabilitythatanadditionalflawwillnotbefoundforatleast200m.
(iii)theprobabilitythatanadditionalflawwillbefoundbetween100and
200mfromtheflawalreadyidentified.
4. AcontinuousrandomvariableXhasthefollowingprobabilitydensityfunction:
f(x)=kx
1/3
for 0<x<1
f(x)= 0 for x<0andx>1
a) Findk.
148
ProbabilityDistributionsofContinuousVariables
b) Findthecumulativedistributionfunction.
c) Findtheprobabilitythat0.3<X <0.6.
6.2 ExpectedValueandVariance
WesawinChapter5thatthemathematicalexpectationorexpectedvalueofa
discreterandomvariableisameanresultforaninfinitelylargenumberoftrials,soit
isameanvaluethatwouldbeapproximatedbyalargebutfinitenumberoftrials.
Thisholdsalsoforacontinuousrandomvariable.Foradiscreterandomvariablethe
expectedvalueisfoundbyaddinguptheproductofeachpossibleoutcomewithits
probability,giving
( )

(
x
)
Pr
[ ]
. E X
i
x
i
allx
i
Foracontinuousrandomvariablethisbecomes(usingequation6.3)thecorre-
spondingintegralinvolvingtheprobabilitydensityfunction:
+
E X
( ) ( )

x f x dx
(6.6)

WesawinChapter5alsothatthevarianceofadiscreterandomvariableisthe
expectationof(x )
2
.Thiscarriesovertoacontinuousrandomvariableand
i
becomes:
+
2
2 2
E x
)
]

(
x
)
f x dx
(6.7) x
,

( ( )
]

Thealternativeformgivenbyequation5.7
2 2 2
E X
x
( )

x
(6.8)
stillholdsandisgenerallyfasterforcalculations.Forcontinuousrandomvariables
+
E X
2 2
( ) ( )

x f x dx
(6.9)

Example6.3
TherandomvariableofExample6.1hastheprobabilitydensityfunctiongivenby:
f(x)=0 forx <0
3
x
2
f(x)= for0<x <2
8
f(x)=0 forx >2
a) FindtheprobabilitythatX isbetween1and2.
b) FindthecumulativedistributionfunctionofX.
c) FindtheexpectedvalueofX.
d) FindthevarianceandstandarddeviationofX.
149
Chapter6
Answer:
2
a)
Pr 1< < 2
]

f x dx
[
X
( )
1
2
x dx

3
2
8
1
3 1
3
\
]
2
,
j \j

,, (,
x
(]
8

( ,(
3
,
]
1
1 j \
3 3

, (
(
2 1
)
8
( ,
7

8
x
1
b) Pr
[
x x
]
F x
1
( )

1
( )

f x dx

x
1
If x
1
< 0,F x
1
( )

(
0
)
dx 0

0
x
1
If0< < 2,F x
1
x dx x
1
( )

(
0
)
dx +

3
2
8
0
3 1
3
\
]
x
1
,
j \j
0 +
,, (,
x
(]
8

( ,(
3
,
]
0
1
3
x
1
8
0 2
If x
1
> 2,F x
1
2
0
( )

(
0
)
dx +

8
3
x dx+
x

2
1
( )
dx
0
3 1
3
\
]
2
,
j \j
0 + 0 +
,, (,
x
(]
8

( ,(
3
,
]
0
1
Thenthecumulativedistributionfunctionis:
F(x
1
)=0 forx
1
<0
F(x
1
)=
1
x
1
3
for0<x
1
<2
8
F(x
1
)=1 forx
1
>2
150
ProbabilityDistributionsofContinuousVariables
+
c) E X
( ) ( )

x f x dx
x

0 2
x 0 x 0

( )( )
dx +

(
x
)
,
j 3
x
2
(
\
dx +

( )( )
dx
0
(
8
,
2
2
x dx

3
3
8
0
3 ,
j \j 1
4
\
]
2

,, (,
x
(]
8

( ,(
4
,
]
0

j
,
3 \
(
(
16 0
)
(
32
,
1.5
+
( )

x f x dx d) E X
2 2
( )

0 2
2 2

( )( )
dx +

( )
, (
x 0 x 0 x
2
j 3
x
2
\
dx +

8
( )( )
dx
0
(
8
,
2
2
x dx

3
4
8
0
3 1
5
\
]
2
,
j \j

,, (,
x
(]
8

( ,(
5
,
]
0
j
3
\

,
,
8 5
)
(
,
((
32 0
)
(
( )(
96
2.4
40
2 2 2
( )
Then E X
x x
2
2.4
( )
1.5
0.150
and
0.150 0.387
x
151
Chapter6
Example6.4
Intheillustrationofsection6.1(b)theprobabilitydensityfunctionforthewaiting
timewasgivenby
f(x)=0 forx <0
1
f(x)= for0<x <15
15
f(x)=0 forx >15
a) Findtheexpectedvalueofthewaitingtime,X minutes.
b) Findthevarianceandstandarddeviationofthewaitingtime.
c) Whatistheprobabilitythatthewaitingtimeiswithintwostandarddevia-
tionsofitsexpectedmeanvalue?
Answer:
+
( )

x f x dx
a)
E X
( )

15

( )
j
,
1 \
x
(
dx
0
(
15
,
2
,
j 1 \
j x \
]
15

,
, (
, (]

(
15
,
(
2
,
]
0
1

( )(
2
)
(
225 0
)
15
15

2
7.5
Thentheexpectedvalueofthewaitingtime,orthemean, ,oftheprobability
x
distribution,is7.5minutes.Thisseemsreasonable,asitishalfwaybetweenthe
minimumwaitingtime,0minutes,andthemaximumwaitingtime,15minutes.
+
( )

x f x dx b) E X
2 2
( )

15
2
j 1 \
x

( )
, (
dx
0
(
15
,
3
,
j 1 \
j x \
]
15

,
, (
, (]

(
15
,
(
3
,
]
0
3

( )
1
(
3
)
(
15 0
)
15
75
152
ProbabilityDistributionsofContinuousVariables
2 2 2
E X
( )

x x
2
75
( )
7.5
18.75
Thenthevarianceofthewaitingtimeis18.75minute
2
,andthestandarddevia-
tionis 18.75 =4.33minutes.
c) Theintervalwhichiswithintwostandarddeviationsoftheexpectedvalueis
( 2
x
)to( +2
x
),orfrom 7.5(2)(4.33)=1.16
x x
to 7.5+(2)(4.33)=16.16minutes.
Thenwehave:
2 X Pr,

( < < +
x
)
]
] Pr
[
1.16< < 16.16
]
x
2
x
) X (
x
0 15
1
16.16

0dx+

15
dx+
15

0dx
1.16 0
0 1 + + 0
1
Theprobabilitythatthewaitingtimeforthisparticularprobabilitydistributionis
withintwostandarddeviationsofitsexpectedmeanvalueis1or100%.Wewillfind
thatotherdistributionsoftengivedifferentresults.Forexample,adifferentresultis
obtainedforthenormaldistribution,aswewillseeinthenextchapter.
Problems
1. Given f(x)=b/x
2
for 1<x<3
f(x)=0 for x<1andx>3
a) Determinethevalueofbthatwillmakef(x)aprobabilitydensityfunction.
b) Findthecumulativeprobabilitydistributionfunctionanduseittodetermine
theprobabilitythatXisgreaterthan2butlessthan3.
c) FindtheprobabilitythatXisexactlyequalto2.
d) Findthemeanofthisprobabilitydistribution.
e) Findthestandarddeviationofthisprobabilitydistribution.
2. Anelectricalvoltageisdeterminedbytheprobabilitydensityfunction
1
f(x)= for0 x 2
2
f(x)=0 forallothervaluesofx
(Thisisauniformdistribution.)
a) Finditscumulativedistributionfunctionforallvaluesofx.
b) Findthemeanofthisprobabilitydistribution.
c) Finditsstandarddeviation.
153
Chapter6
d) Whatistheprobabilitythatthevoltageiswithintwostandarddeviationsof
itsmean?
3. Anelectricalvoltageisdeterminedbytheprobabilitydensityfunction
f(x)=1 for1 x2
f(x)=0 forallothervaluesofx
(Thisisauniformdistribution.)
a) Finditscumulativedistributionfunctionforallvaluesofx.
b) Findthemeanofthisprobabilitydistribution.
c) Finditsstandarddeviation.
d) Whatistheprobabilitythatthevoltageiswithinonestandarddeviationofits
mean?
4. Thetimebetweenarrivalsoftrucksatawarehouseisacontinuousrandom
variable.Theprobabilityoftimebetweenarrivalsisgivenbytheprobability
densityfunctionforwhich
f(t)=4e
4t
for t0
f(t)=0 for t<0
wheretistimeinhours.(Thisisanexponentialdistribution.Seesection6.3)
a) Whatistheprobabilitythatthetimebetweenarrivalsofthefirstandsecond
trucksislessthan5minutes?
b) Findthemeantimebetweenarrivalsoftrucks,hours.
c) Findthestandarddeviationoftimebetweenarrivalsoftrucks, hours.
d) Whatistheprobabilitythatthewaitingtimebetweenarrivalsoftruckswill
bebetween()hoursand(+)hours?
e) Whatistheprobabilitythatthetimebetweenarrivalsoftrucksattheware-
housewillbebetween(2)hoursand(+2)hours?
5. Theprobabilityoffailureofamechanicaldeviceasafunctionoftimeisgivenby
thefollowingprobabilitydensityfunction:
f(t)=3e
3t
for t0
f(t)=0 for t< 0
wheretistimeinmonths.(Thisisanexponentialdistribution.Seesection6.3)
a) Findthemeanoftheprobabilitydistribution.Thisisthemeanlifetimeofthe
device.
b) Findthestandarddeviationoftheprobabilitydistribution.
c) Whatistheprobabilitythatthedevicewillfailwithinonestandarddeviation
ofitsmeanlifetime?
d) Whatistheprobabilitythatthedevicewillfailwithintwostandarddevia-
tionsofitsmeanlifetime?
154
ProbabilityDistributionsofContinuousVariables
6.3 Extension:UsefulContinuousDistributions
Thenormaldistributionisthecontinuousdistributionwhichisbyfarthemostused
byengineers;itwillbeconsideredinChapter7.However,anumberofothersare
alsousedverywidely.Somearebasedonthenormaldistribution,andthecorre-
spondingtestsassumethattheunderlyingpopulationisatleastapproximately
normallydistributed.Wewillencountersomeofthesecontinuousdistributionsin
Chapters9,10and13becausetheycorrespondtostatisticaltestsusedveryfre-
quently.Thesearethet-distribution,theF-distribution,andthechi-squared
distribution.
Theothercontinuousdistributionswhichshouldbementionedherearethe
uniformdistribution,theexponentialdistribution,theWeibulldistribution,thebeta
distribution,andthegammadistribution.Othersareimportantinvariousspecialized
applications.
Theuniformdistributionisverysimple.Itsprobabilitydensityfunctionisa
constantinaparticularinterval(sayfora<X<b)andzerooutsidethatinterval.We
havealreadyseenanexampleofitinthewaitingtimeforabus,usedasasimple
illustrationofacontinuousdistributioninsection6.1,andithasappearedinsomeof
theproblems.Itissometimesusedtomodelerrorsinelectricalcommunicationwith
pulsecodemodulation.Electricalnoiseontheotherhand,isoftenmodeledbya
normaldistribution.
Theexponentialdistributionhasthefollowingprobabilitydensityfunction:
f(x)= e
x
forx 0
f(x)= 0 forx<0 (6.10)
whereisaconstantcloselyrelatedtothemeanandstandarddeviation.
Forx>0thecumulativedistributionfunctionfortheexponentialdistributionis
foundeasilybyintegration:
( )
Pr 0< < x
1
]
F x
1
[
X
x
1
e
x
dx

(6.11)
1
0
x
1
e
TheexponentialdistributionisrelatedtothePoissondistribution,althoughthe
exponentialdistributioniscontinuouswhereasthePoissondistributionisdiscrete.
ThePoissondistributiongivestheprobabilitiesofvariousnumbersofrandomevents
inagivenintervaloftimeorspacewhenthepossiblenumberofdiscreteeventsis
muchlargerthantheaveragenumberofeventsinthegiveninterval.Ifthevariableis
time,theexponentialdistributiongivestheprobabilitydistributionofthetime
betweensuccessiverandomeventsforthesameconditionsasapplytothePoisson
distribution.
155
Chapter6
Thefollowingexpressioncanbefoundintablesofintegrals:

n ax
!
(n

x e dx n a
+1)
(6.12)
0
Useofitgreatlyreducesthelaboroffindingexpectedvaluesandvariancesfor
theexponentialdistribution.
Theexponentialdistributionisusedforstudiesofreliability,whichwillbe
discussedverybrieflyinsection6.4,andofqueuingtheory.Queuingtheorygives
probabilityasafunctionofwaitingtimeinaqueueforservice.Anexamplemight
be:whatistheprobabilitythatthetimebetweenarrivalofonecustomerandofthe
nextataservicecounterwillbemorethanastatedtime,suchasthreeminutes?
TheWeibulldistribution,thebetadistribution,andthegammadistributionare
morecomplicated,mainlybecauseeachhastwoindependentparameters.Boththe
Weibulldistributionandthegammadistributiongivetheexponentialdistribution
withparticularchoicesofoneoftheirtwoparameters.Thesedistributionsaredis-
cussedmorefullyinthebooksbyMiller,Freund,andJohnsonandbyRoss(seeList
ofSelectedReferences,section15.2),andallbutthegammadistributionaredis-
cussedinthebookbyVardeman.
6.4 Extension:Reliability
Whatistheprobabilitythatanengineeringdevicewillfunctionasspecifiedfora
particularlengthoftimeunderspecifiedconditions?Howwillthisprobabilitybe
modifiedifweputfurthercomponentsinseriesorinparallelwithoneanother?
Thesearethesortsofquestionswhichareaddressedinthestudyofengineering
reliability.
Reliabilityisappliedinmanyareasofengineering,includingdesignofmechani-
caldevices,electronicequipment,andpowertransmissionsystems.Althoughfailures
ofsupplyofelectricitytofactories,offices,andresidenceswereoncefrequent,they
havebecomemuchlessfrequentasengineershavedevotedmoreattentiontoreliabil-
ity.Theconceptsofreliabilityhavebeenexceedinglyimportanttomannedflightsin
space.
Thestudyofreliabilitymakesuseoftheexponentialdistribution,thegamma
distribution,andtheWeibulldistribution.Theoryhasbeendevelopedformany
applications.
AgeneralreferencebookontheuseofreliabilityinengineeringisbyBillinton
andAllan(seeListofSelectedReferencesinsection15.2).
156
CHAPTER
7
TheNormalDistribution
Thischapterrequiresagoodknowledgeofthematerialcoveredinsections
2.1,2.2,3.1,3.2,and4.4.Chapter6isalsohelpfulasbackground.
Thenormaldistributionisthemostimportantofallprobabilitydistributions.Itis
applieddirectlytomanypracticalproblems,andseveralveryusefuldistributionsare
basedonit.Wewillencountertheseotherdistributionslaterinthisbook.
7.1 Characteristics
Manyempiricalfrequencydistributionshavethefollowingcharacteristics:
1. Theyareapproximatelysymmetrical,andthemodeisclosetothecentreofthe
distribution.
2. Themean,median,andmodeareclosetogether.
3. Theshapeofthedistributioncanbeapproximatedbyabell:nearlyflatontop,
thendecreasingmorequickly,thendecreasingmoreslowlytowardthetailsofthe
distribution.Thisimpliesthatvaluesclosetothemeanarerelativelyfrequent,
andvaluesfartherfromthemeantendtooccurlessfrequently.Rememberthat
wearedealingwitharandomvariable,soafrequencydistributionwillnotfit
thispatternexactly.Therewillberandomvariationsfromthisgeneralpattern.
Rememberalsothatmanyfrequencydistributionsdonotconformtothispattern.
Wehavealreadyseenavarietyof
ThicknessofPart
frequencydistributionsinChapter4,
50
0.413
andmanyothertypesofdistribution
C
l
a
s
s

F
r
e
q
u
e
n
c
y
p
e
r

C
l
a
s
s

W
i
d
t
h

o
f

0
.
0
5

m
m
occurinpractice.
R
e
l
a
t
i
v
e

C
l
a
s
s

F
r
e
q
u
e
n
c
y
40
0.331
Example4.2showeddataonthe
thicknessofaparticularmetalpart
30
0.248
ofanopticalinstrumentasitems
cameoffaproductionline.A
20
0.165
histogramfor121itemsisshownin
Figure4.4,reproducedhere.
10 0.083
0 0
Figure4.4:Histogramof 3.220 3.270 3.320 3.370 3.420 3.470 3.520 3.570
ThicknessofMetalPart Thickness,mm
157
Chapter7
Wecanseethatthecharacteristicsstatedabovearepresent,atleastapproxi-
mately,inFigure4.4.Randomvariation(andthearbitrarydivisionintoclassesfor
thehistogram)couldreasonablyberesponsiblefordeviationfromasmoothbell
shape.
Atheoreticaldistributionthathasthestatedcharacteristicsandcanbeusedto
approximatemanyempiricaldistributionswasdevisedmorethantwohundredyears
ago.Itiscalledthenormalprobabilitydistribution,orthenormaldistribution.Itis
sometimescalledtheGaussiandistribution,butothermathematiciansdevelopedit
earlierthanGaussdid.Itwassoonfoundtoapproximatethedistributionofmany
errorsofmeasurement.
7.2 ProbabilityfromtheProbabilityDensityFunction
Theprobabilitydensityfunctionforthenormaldistributionisgivenby:
2
(x)

f x
2
2
( )
1
e (7.1)
2
whereisthemeanofthetheoreticaldistribution,isthestandarddeviation,and
=3.14159...Thisdensityfunctionextendsfrom to +.Itsshapeisshownin
x
Figure7.1below.ThefirstscaleonFigure7.1givesvaluesof ,andthescale
x

belowitgivescorrespondingvaluesofx.Thus,

=0correspondstox=,and
x
=3correspondstox=3.

4 3 2 1 0 1 2 3 4
(x)/
3 2 + +2 +3
x
Figure7.1:ShapeoftheNormalDistribution
158
TheNormalDistribution
Becausethenormalprobabilitydensityfunctionissymmetrical,themean,
medianandmodecoincideatx=.Thus,thevalueofdeterminesthelocationof
thecenterofthedistribution,andthevalueofdeterminesitsspread.
Wehaveseenthatprobabilitiesforacontinuousrandomvariablearegivenby
integrationoftheprobabilitydensityfunction.Thennormalprobabilitiesaregiven
byintegrationofthefunctionshowninequation7.1,ortheareasunderthecorre-
spondingcurve.
Theprobabilitythatavariable,X,isbetweenx
1
andx
2
accordingtothenormal
distributionisgivenby:
[
1
Pr x X < <
]
2
x
2
1
x
x


1
2
( )
2
2
2
x
e



dx
(7.2)
asshowninFigure7.2.
Figure7.2:ProbabilityofXBetweenx andx
1 2
x
1
x
2
x
Acorrespondingcumulativeprobabilityisgivenby:
2
(x)
x

Pr
[
< X x
]
F
( )


1
e
2
2
dx
(7.3)
< x

2
However,theintegralofequations7.2and7.3cannotbeevaluatedanalyticallyin
closedform.Itisevaluatedtoanyrequiredprecisionnumericallyandshownintables
1
orgivenbycomputersoftware.Theconstant,
2
,inequations7.2and7.3is
determinedbytherequirementthatF()=1(seeequation6.4).
Equations7.1,7.2and7.3representaninfinitenumberofnormaldistributions
withvariousvaluesoftheparametersand.Asimplerforminasinglecurveis
obtainedbyachangeofvariable.
x
Let
z
(7.4)

Thenzisaratiobetween(x)and.Itrepresentsthenumberofstandarddevia-
tions betweenanypointandthemean.Sincex,,andallhavethesameunitsin
anyparticularcase,zisdimensionless.
159
Chapter7
Sinceandareconstantsforanyparticulardistribution,differentiationof
equation7.4gives:
1
dz dx

(7.5)
dx dz
Substitutionofequations7.4and7.5inequation7.2gives:
2
z
2
X Pr
[
x
1
< < x
2
]


1
e

z
2
dz
2
z
1
2
z
2

z
1
(7.6)
e
2
dz

2
z
1
x
1
x
2

where,accordingtoequation7.4, z
1
=

andz
2
=

.
termsofz
fromthemean.Itcanbeseenthatalmostallthe
z=3and
z
Figure7.3showsthenormaldistributionin
,thenumberofstandarddeviations
areaunderthecurveisbetween
=+3.Therefore,thepracticalwidthofthe
normaldistributionisaboutsixstandard
deviations.
4 3 2 1 0 1 2 3 4
z
Figure7.3:NormalDistributionasaFunctionofz
Thestandardnormalcumulativedistributionfunction,(z),asafunctionofz,
isdefinedasfollows:

( )
Pr
[
< Z< z
1
]
Pr
[
Z< z
1
]
z
1
2
z
1

z
e
2
dz
(7.7)


1
2
ItcorrespondstotheareaunderthecurveinFigure7.4.

(z)
4 3 2 1 0 1 2 3 4
Figure7.4:StandardCumulativeDistributionFunction
fortheNormalProbabilityDistribution
z
160
-- --
TheNormalDistribution
Ifthechangeofvariableshowninequation7.4isappliedandthecurveshownin
Figure7.1isintegratedaccordingtoequation7.3toobtainacumulativenormal
distribution,theresultisans-shapedcurve,asshowninFigure7.5.
C
u
m
u
l
a
t
i
v
e

P
r
o
b
a
b
i
l
i
t
y

1
Figure7.5:
CumulativeNormalProbability
0.75
0.5
0.25
0
4 2 0 2 4
z
7.3 UsingTablesfortheNormalDistribution
TableA1inAppendixAgivesvaluesofthecumulativenormalprobabilityasa
functionofz,thenumberofstandarddeviationsfromthemean.PartofTableA1is
shownbelow.
PartofTableA1
CumulativeNormalProbability
(z)=Pr[Z<z]
4 3 2 1 0 1 2 3 4
z
z= 0.09 0.07 0.06 0.05 . 0.01 0.00
z
0
z
0
3.7 0.0001 . 0.0001 0.0001 0.0001 . 0.0001 0.0001 3.7
... ... . ... ... ... . ... ... ...
0.8 0.1867 . 0.1922 0.1949 0.1977 . 0.2090 0.2119 0.8
0.7 0.2148 . 0.2206 0.2236 0.2266 . 0.2389 0.2420 0.7
0.6 0.2451 . 0.2514 0.2546 0.2578 . 0.2709 0.2743 0.6
... ... . ... ... ... . ... ... ...
0.0 0.4641 . 0.4721 0.4761 0.4801 . 0.4960 0.5000 0.0
161
Chapter7
TableA1givesvaluesofz
0
(3.7,3.6,...0.1,0.0; 0.0,0.1,...3.7,3.8)
alongthelefthandsideandrighthandsideofthetableovertwopages.Thenumbers
alongthetopofthetablegivesmallerincrements, z=0.09,0.08,...,0.01,0.00
onthefirstpage,andonthesecondpage0.00,0.01,...,0.08,0.09.Thevalueofz
foraparticularrowandcolumnisthesumofthevalueofz
0
forthatrow(alongthe
sides)plustheincrement,z,forthatcolumn(alongthetopofthetable).
z z
0
+ z (7.8)
Toillustrate,seethepartofTableA1shownabove.Saywewant(0.76): welook
fortherowlabeledz
0
=0.7alongthesidesandthecolumnlabeledz=0.06along
thetop(since0.76=(0.7)+(0.06))andread(0.76)=0.2236.
Thediagramatthetopofthetabletowardstherightindicatesthat(z)corre-
spondstotheareaunderthecurvetotheleftofaparticularvalueofz(here
z=0.76).
Supposethatinsteadwewant(+0.76).Thisisgivenonthesecondpageof
TableA1inAppendixA.Asbefore,welookfortheapplicablerow,labeledz
0
=0.7
alongthesides,andthecolumnlabeledz=0.06(since0.76=0.7+0.06).Forthis
valueofzwereadfromthetablethat(0.76)=0.7764.
Becausethedistributionissymmetrical,theremustbeasimplerelationbetween
(0.76)and(+0.76),oringeneralbetween(z)and(+z).Thatrelationis:
( (
z
1
)
1 + z
1
)
(7.9)
orinthiscase(0.76)=1(+0.76)=10.7764=0.2236.Ofcoursethatmeans
that(0.00)=(+0.00)=0.5000,sohalfofthetotalareaunderthecurveistothe
leftofz=0,themeanandmedianandmodeofthedistribution.Ifyouthinkaboutit,
thatmakessense.
Example7.1
a) WhatistheprobabilitythatZforanormal
probabilitydistributionisbetween
0.76and+0.76?
b) WhatistheprobabilitythatZforanormal
probabilitydistributionissmallerthan0.76or
largerthan+0.76?
Areaforpart(b)
Areafor
part(a)
Answer:
AsketchsuchasthatshowninFigure7.6isvery
4 3 2 1 0 1 2 3 4
z
helpfulinvisualizingtherequiredintegraland
findingappropriatevaluesfromthetable. Figure7.6:
ProbabilitiesforExample7.1
162
TheNormalDistribution
a) Pr[0.76<Z<+0.76]correspondstothemiddleareacross-hatchedinFigure7.6.
Thecalculationofprobabilitiesisasfollows:
Pr[0.76<Z<+0.76]=Pr[Z>0.76]Pr[Z>0.76]
=(0.76)(0.76)
=0.77640.2236 (frombefore)
=0.5528
b) Pr[(Z<0.76) (Z>+0.76)] correspondstotheouterareasinthesketch
above.
Pr[(Z<0.76) (Z>+0.76)] =[(0.76)]+[1(+0.76)]
=0.2236+[10.7664]
=0.4472
Check: Betweenthem,parts(a)and(b)coverallpossibleresults:
ThenPr{[0.76<Z<+0.76]+[(Z<0.76)(Z>+0.76]}=0.5528+0.4472
=1.0000 (check)
Becausethenormaldistributionisusedsofrequently,itisimportanttobecome
familiarwithTableA1.
Thereadershouldnotethatotherformsoftablesforthenormaldistributionare
alsoincommonuse.Oneformgivestheprobabilityofaresultinonetailofthe
distribution,thatisPr[Z>z
1
]forz
1
0,orPr[Z<z
1
]forz
1
0.Avariationgives
theprobabilitycorrespondingtobothtailstogether.Anothertypegivestheprobabil-
ityofaresultbetweenthemeanandz
2
standarddeviationsfromthemean,thatis
Pr[Z<z
2
]forz
2
0,orPr[Z>z
2
]forz
2
0.Thesedifferentformsoftablesmust
notbeconfused.Confusionisreducedbecauseasmallgraphatthetopofatable
almostalwaysindicateswhichareacorrespondstothevaluesgiven.
Studythefollowingexamplescarefully.
Example7.2
Acityinstalls2000electriclampsforstreetlighting.Theselampshaveamean
burninglifeof1000hourswithastandarddeviationof200hours.Thenormal
distributionisacloseapproximationtothiscase.
a) Whatistheprobabilitythatalampwillfailinthefirst700burninghours?
z
1

x
1


7001000
1.50
200
FromTableA1forz
1
=1.50=(1.5)+(0.00),
Pr[X<700]=Pr[Z<1.50]
=(1.50)
=0.0668
163
Chapter7
ThenPr[burninglife<700hours]=0.0668
or0.067.
b) Whatistheprobabilitythatalampwillfail
between900and1300burninghours?
Required
Area
700 1000 xhours
z
1
0 z
x
1
9001000
z
1

200 Figure7.7:
Probabilities for
0.50
(
0.5
)
+ 0.00
) (
Example7.2(a)
x
2
13001000
z
2

200
+1.50
(
+1.5
)
+
(
0.00
)
FromTableA1,(z
1
)=(0.50)=0.3085
and (z
2
)=(1.50)=0.9332
Req'd
Area
900 1000 1300 xhours
z
2
ThenPr[900hours<burninglife<1300hours]
z
1
0 z
=(z
2
)(z
1
)
Figure7.8:
Probabilities for
=0.93320.3085
Example7.2(b)
=0.6247or0.625.
c) Howmanylampsareexpectedtofailbetween900and1300burninghours?
Thisisacontinuationofpart(b).Theexpectednumberoffailuresisgivenbythe
totalnumberoflampsmultipliedbytheprobabilityoffailureinthatinterval.
Thentheexpectednumberoffailures=(2000)(0.6247)=1249.4or1250lamps.
Becausetheburninglifeofeachlampisarandomvariable,theactualnumberof
failuresbetween900and1300burninghourswouldbeonlyapproximately1250.
d) Whatistheprobabilitythatalampwillburnforexactly900hours?
Sincetheburninglifeisacontinuousrandomvariable,theprobabilityofalifeof
exactly900burninghours(not900.1hoursor900.01hoursor900.001hours,
etc.)iszero.Anotherwayoflookingatitisthatthereareaninfinitenumberof
possiblelifetimesbetween899and901hours,sotheprobabilityofanyoneof
themisonedividedbyinfinity,sozero.WesawthisbeforeinExample6.2.
e) Whatistheprobabilitythatalampwillburnbetween899hoursand901hours
beforeitfails?
Sincethisisanintervalratherthanasingleexactvalue,theprobabilityoffailure
inthisintervalisnotinfinitesimal(althoughinthisinstancetheprobabilityis
small).
164
TheNormalDistribution
z
1

x
1


8991000
0.505 Area
Req'd
200
9011000
z
2
0.495
200
899 9011000 xhours
Wecouldapplylinearinterpolationbetweenthevalues
z
1
z
2
0 z
giveninTableA1.However,consideringthatinpractice
Figure7.9:
theparametersarenotknownexactlyandtherealdistribu-
Probabilities for
tionmaynotbeexactlyanormaldistribution,theextra
Example7.2(e)
precisionisnotworthwhile.
Pr[899hours<burninglife<901hours]
(0.49)(0.50)
=(0.40.09)(0.50.00)
=0.31210.3085
=0.0036or0.4%
(0.3%wouldalsobeareasonableapproximation).
f) Afterhowmanyburninghourswouldweexpect10%ofthelampstobeleft?
Thiscorrespondstothetimeatwhich
Pr[burninglife>x
1
hours]=0.10,
soPr[burninglife<x
1
hours]=10.10=0.90.
Thus,Pr[Z<z
1
]=0.90
10%
or(z
1
)=0.90
1000 x
1
xhours
0 z
1
z
FromTableA1,
Figure7.10:
(1.2+0.08)=0.8997
Probabilities for
and (1.2+0.09)=0.9015
Example7.2(f)
Onceagain,wecouldapplylinearinterpolationbuttheaccuracyofthe
calculationprobablydoesnotjustifyit.
Since(0.900.8997)<<(0.90150.90),letustakez
1
=1.28.Thenwehave
x
1

z
1
1.28

x
1
1000
1.28
200
x
1

(
200
)(
1.28
)
+10001256
165
Chapter7
Thenafter1256hoursofburning,wewouldexpect10%ofthelampstobe
left.Andagain,becausetheburningtimeisarandomvariable,performingthe
experimentwouldgivearesultwhichwouldbecloseto1256hoursbutprobably
notexactlythat,evenifthenormaldistributionwiththegivenvaluesofthemean
andstandarddeviationappliedexactly.
g) Afterhowmanyburninghourswouldweexpect90%ofthelampstobeleft?
Wewontdrawanotherdiagram,butimaginelookingatFigure7.10fromthe
back.
Pr[Z<z
2
]=0.10or(z
2
)=0.10.FromTableA1wefind
(1.20.08)=0.1003
(1.20.09)=0.0985
soz
2
1.28.(Doyouseeanyresemblancetotheanswertopart(f)?Lookagain
atequation7.9.)
z
2

x
2


x
2
1000
1.28
200
x
2
1000 256
x
2
744
After744hourswewouldexpect90%ofthelampstobeleft.
Example7.3
Inanothercity2500electriclampsareinstalledforstreetlighting.Thelampscome
fromadifferentmanufacturerandhaveameanburninglifeof1050hours.Weknow
frompastexperiencethatthedistributionofburninglivesapproximatesanormal
distribution.The250thlampfailsafter819hours.Approximatelywhatisthestan-
darddeviationofburninglivesforthissetoflamps?
Answer:
250
z
1
0.100
( )

2500
FromTableA1,(1.20.09)=0.0985
and (1.20.08)=0.1003
Then z
1

x
1

1.28
10%

819 1050 xhours
8191050
z
1
0 z
1.28

Figure7.11:
231
Probabilities for

1.28
180
Example7.3
166
TheNormalDistribution
Thenthestandarddeviationofburninghoursisapproximately180hours.(Aswellas
randomvariation,thetermapproximatelycoversacorrectionforcontinuity
whichwewillencounteralittlelater.)
Example7.4
Thestrengthsofindividualbarsmadebyacertainmanufacturingprocessareap-
proximatelynormallydistributedwithmean28.4andstandarddeviation2.95(in
appropriateunits).Toensuresafety,acustomerrequiresatleast95%ofthebarstobe
strongerthan24.0.
a) Dothebarsmeetthespecification?
b) Byimprovedmanufacturingtechniquesthemanufacturercanmakethebarsmore
uniform(thatis,decreasethestandarddeviation).Whatvalueofthestandard
deviationwilljustmeetthespecificationifthemeanstaysthesame?
Answer:
x
1

a) z
1


24.0 28.4
1.49
Req'd
Area
2.95
24.0 28.4 Strength,x
z
(1.49)=(1.40.09)=0.0681
z
1
0
(fromTableA1)
Figure7.12:
Probabilities for
Theprobabilitythatthebarswillbestrongerthan24.0
Example7.4(a)
is10.0681=0.9319or93.2%.Sincethisislessthan
95%,thebarsdonotmeetthespecification.
b) Forthispart,istheunknown.
FromTableA1welookforavalueofzforwhich(z
2
)=0.05.Wefind
(1.65)=0.0495and(1.64)=0.0505.Thenz
2
mustbebetween1.65and1.64.Sinceinthiscasethe
desiredvalueof(z
2
)ishalfwaybetween(1.65)and
(1.64),interpolationisveryeasy,givingz
2
=1.645.
Then z
2

x
2

(z
2
)
95%
24.0 28.4 Strength,x

z
2
0 z
24.0 28.4
1.645
Figure7.13:

Probabilities for
4.4 Example7.4(b)
2.67
1.645
Ifthestandarddeviationcanbereducedto2.67whilekeepingthemeanconstant,the
specificationwilljustbemet.
167
Chapter7
Example7.5
Anengineerdecidestobuyfournewsnowtiresforhiscar.HefindsthatRetailerAis
offeringaspecialcashrebate,whichdependsonhowmuchsnowfallsduringthefirst
winter.Ifthissnowfallislessthan50%ofthemeanannualsnowfallforhiscity,his
rebatewillbe50%ofthelistprice.Ifthesnowfallthatwinterismorethan50%but
lessthan75%ofthemeanannualsnowfall,hisrebatewillbe25%ofthelistprice.If
thesnowfallismorethan75%ofthemeanannualsnowfall,hewillreceiveno
rebate.Theengineerfindsfromareferencebookthattheannualsnowfallforhiscity
hasameanof80cmandstandarddeviationof20cmandapproximatesanormal
distribution.Thelistpriceforthebrandandsizeoftireshewantsis$80.00pertire.
TheengineerchecksotherretailersandfindsthatRetailerBsellsthesamebrandand
sizeoftireswiththesamewarrantyforthesamelistpricebutoffersadiscountof5%
ofthelistpriceregardlessofsnowfallthatyear.
a) Comparetheexpectedcostsofthetwodeals.Whichexpectedcostisless?
b) Howmuchisthedifferenceforfournewsnowtires?Neglecttherelativeadvan-
tagesofacashrebateascomparedtoadiscount.
Answer:a)ForRetailerA:=80cm,=20cm.
50%ofis40cm,and75%ofis60cm
(z
1
) (z
2
)
40 80 Snowfall,cm 60 80 Snowfall,cm
z
1
0 z z
2
0 z
Figure7.14:ProbabilitiesforExample7.5(a)
1
z
1
x


40 80
20

2.00
Pr[snowfall<50%of] = Pr[Z<2.00]
=(2.00)
=0.0228 (fromTableA1)
2
z
2
x


60 80
20

1.00
Pr[snowfall<75%of]=Pr[Z<1.00]
=(1.00)
=0.1587 (fromTableA1)
168
TheNormalDistribution
Then Pr[50%of<snowfall<75%of]=(1.00)(2.00)
=0.15870.0228
=0.1359
ThenexpectedrebatefromRetailerAis:
(50%)(Pr[snowfall<50%of])+(25%)(Pr[50%of<snowfall<75%of])
=(50%)(0.0228)+(25%)(0.1359)
=(1.14+3.40)%
=4.54%oflistprice
DiscountfromRetailerBis5%oflistprice,sothediscountfromRetailerBis
largerthantheexpectedrebatefromRetailerA.Therefore,theexpectedcostof
buyingfromRetailerBisalittlelessthantheexpectedcostofbuyingfromRetailerA.
b) Costoffournewsnowtiresisasfollows.
Listprice: (4)($80.00)=$320.00
AfterrebatefromRetailerA,expectedcost=(10.0454)($320.00)=$305.48
AfterdiscountfromRetailerB,cost=(10.05)($320.00)=$304.00
Thenthedifferenceinexpectedcostforfournewsnowtiresis$1.48.
SomeQuantitativeRelationships
WecanalsouseTableA1tomakemorequantitativecommentsconcerningprobabilities
ofresultsinsideoroutsidechosenintervalsonFigure7.4.
Since Pr[2<Z<+2]=(+2.0+0.00)(2.00.00)
=0.97720.0228
=0.9544
[Check: (z
1
)=1(+z
1
) (fromeq.7.9)
0.0228=10.9772 ]
Thus,95.4%ofallvaluesareexpectedtobewithintwostandarddeviationsfrom
themeanofanormaldistribution.Bysubtractionfrom100%,4.6%ofallvaluesare
expectedtobeoutsidethatinterval.
Similarly,Pr[3<Z<+3]=(+3.0+0.00)(3.00.00)
=0.99870.0013)
=0.9974
So99.7%ofallvaluesareexpectedtobewithinthreestandarddeviationsfrom
themean.Only0.3%ofallvaluesareexpectedtobefartherfromthemeanthan
threestandarddeviations.Then,althoughthenormaldistributionextendsinprinciple
fromto+,thepracticalwidthisaboutsixstandarddeviations.Ifthereissome
169
Chapter7
practicallimitonavariable(mostcommonly,thatthevariableneverbecomes
negative),itwillhavelittleeffectifthelimitingvalueisatleastthreestandard
deviationsfromthemean.
Problems
(Thefollowingproblemscanbesolvedeitherwithapocketcalculatorandtables,or
usingacomputer,aswillbediscussedinsection7.4.)
1. Diametersofboltsproducedbyaparticularmachinearenormallydistributed
withmean0.760cmandstandarddeviation0.012cm.Specificationscallfor
diametersfrom0.720cmto0.780cm.
a) Whatpercentageofboltswillmeetthesespecifications?
b) Whatpercentageofboltswillbesmallerthan0.730cm?
2. TheannualsnowfallinSaskatoonisanormallydistributedvariablewithamean
of80cmandastandarddeviationof20cm.
a) Whatistheprobabilitythatthesnowfallinanyyearwillexceed30cm?
b) Whatistheprobabilitythatthesnowfallinanyyearwillbebetween55and
90cm?
3. Thediametersofscrewsinabatcharenormallydistributedwithmeanequalto
2.10cmandstandarddeviationequalto0.15cm.
a) Whatproportionofscrewsareexpectedtohavediametersgreaterthan2.50cm?
b) Aspecificationcallsforscrewdiametersbetween1.75cmand2.50cm.
Whatproportionofscrewswillmeetthespecification?
4. Diametersofballbearingsproducedbyacompanyfollowanormaldistribution.
Ifthemeandiameteris0.400cmandthestandarddeviationis0.001cm,what
percentageofthebearingscanbeusedonamachinespecifyingasizeof0.399
0.0015cm?Whatistheupperboundofthesizerangethathasalowerboundof
0.398cmandincludes80%ofthebearings?
5. Anengineerworkingforamanufacturerofelectroniccomponentstakesalarge
numberofmeasurementsofaparticulardimensionofcomponentsfromthe
productionline.Shefindsthatthedistributionofdimensionsisnormal,witha
meanof2.340cmandacoefficientofvariationof2.4%.
a) Whatpercentageofmeasurementswillbelessthan2.45cm?
b) Whatpercentageofdimensionswillbebetween2.25cmand2.45cm?
d) Whatvalueofthedimensionwillbeexceededby98%ofthecomponents?
6. Theprobabilitythatariverflowexceeds2,000cubicmeterspersecondis15%.
Thecoefficientofvariationoftheseflowsis20%.Assuminganormaldistribu-
tion,calculate
a) themeanoftheflow.
b) thestandarddeviationoftheflow.
c) theprobabilitythattheflowwillbebetween1300and1900m
3
/s.
170
TheNormalDistribution
7. Bagsoffertilizerareweighedastheycomeoffaproductionline.Theweightsare
normallydistributed,andthecoefficientofvariationis0.085%.Itisfoundthat
2%ofthebagsareunder50.00kg.
a) Whatisthemeanweightofabagoffertilizer?
b) Whatpercentageofthebagsweighmorethan50.020kg?
c) Whatistheupperquartileoftheweights?
8. Thevariationofcoppercontentinaparticularorebodyfollowsanormaldistri-
bution.Thecoefficientofvariationis18%.Theprobabilitythatthecopper
contentexceeds18.2is0.240.
a) Whatisthemeancoppercontent?
b) Whatisthestandarddeviationofthecoppercontent?
c) Whatistheprobabilitythatthecoppercontentwillbelessthan11.2?
9. 30%ofthesoilsamplesobtainedfromaproposedconstructionsitegavetest
resultsforcompressivestrengthofmorethan3.5tonspersquarefoot.The
coefficientofvariationofthestrengthsisknowntobe20%.Calculate:
a) themeansoilstrength,
b) thestandarddeviationofsoilstrengths,
c) theprobabilityofsoilstrengthsfallingbetween2.7and4.0tonspersquare
foot.Stateanyassumptionsmade.
10. Foracertaintypeoffluorescentlightinalargebuilding,thecostperbulbof
replacingbulbsallatonceismuchlessthaniftheyarereplacedindividuallyas
theyburnout.Itisknownthatthelifetimeofthesebulbsisnormallydistributed,
andthat60%lastlongerthan2500hours,while30%lastlongerthan3000
hours.
a) Whataretheapproximatemeanandstandarddeviationofthelifetimesofthe
bulbs?
b) Ifthelightbulbsarecompletelyreplacedwhenmorethan20%haveburned
out,whatisthetimebetweencompletereplacements?
l1. Itisknownthat10%ofconcretesampleshavecompressivestrengthlessthan
30.0MN/m
2
and20%havecompressivestrengthgreaterthan36.0MN/m
2
.Ifthe
minimumacceptablestrengthisspecifiedtobe28.0MN/m
2
,whatistheprob-
abilitythatasamplewillhaveastrengthlessthanthespecifiedminimum?
Whatassumptionisbeingmade?
12. OftheTypeAelectricalresistorsproducedbyafactory,85.0%haveresistance
greaterthan41ohms,and3.7%ofthemhaveresistancegreaterthan45ohms.
Theresistancesfollowanormaldistribution.Whatpercentageoftheseresistors
haveresistancegreaterthan44ohms?
13. Amanufacturedproducthasalengththatisnormallydistributedwithameanof
12cm.Theproductwillbeunusableifthelengthis11cmorless.
a) Iftheprobabilityofthishastobelessthan0.01,whatisthemaximum
allowablestandarddeviation?
171
Chapter7
b) Assumingthisstandarddeviation,whatistheprobabilitythattheproducts
lengthwillbebetween11.75and12.35cm?
14. Theprobabilityofariverflowexceeding2,000cubicmeterspersecondis15%
andthecoefficientofvariationoftheseflowsis20%.Assuminganormal
distributioncalculate
(a) themeanoftheflow,
(b) thestandarddeviationoftheflow,
(c) theprobabilitythattheflowwillbebetween1300and1900meters
3
/sec.
15. Awaterqualityparametermonitoredinalakeisnormallydistributedwitha
meanof24.3.Itisalsoknownthatthereis70%probabilitythattheparameter
willexceed17.6.
a) Findthestandarddeviationoftheparameter.
b) Iftheparameterexceedsthe95thpercentile,aninvestigationofalocal
industrybegins.Whatisthiscriticalvalue?
16. Thetimeofsnowpackformationisthetimeofthefirstsnowfallwhichstaysfor
thewinter.InoneCanadiancitythemeantimeofsnowpackformationismid-
nightofNovember24,the329thdayoftheyear,andthistimeisapproximately
normallydistributed.Thestandarddeviationofthetimeofsnowpackformation
is16.0days.Whatistheprobabilitythatsnowpackformationwilloccurbefore
midnightOctober20,the294thdayoftheyear,fortwoyearsinarow?
17. Inauniversityscholarshipprogram,anyonewithagradepointaverageover7.5
receivesa$l,000scholarship,anyonewithanaveragebetween7.0and7.5
receives$500,anyonewithanaveragebetween6.5and7.0receives$100,andall
othersreceivenothing.Aparticularclassof500studentshasanoverallaverage
of4.8withastandarddeviationof1.2.Calculatethecosttotheuniversityof
supplyingscholarshipsforthisclass.Stateanyassumption.
18. Steelusedforwaterpipelinesissometimescoatedontheinsidewithcement
mortartopreventcorrosion.Inastudyofthemortarcoatingsofapipelineused
inawatertransmissionproject,themortarthicknessesweremeasuredforavery
largenumberofspecimens.Themeanandthestandarddeviationwerefoundto
be0.62inchand0.13inch,respectively,andthethicknesswasfoundtobe
normallydistributed.
a) Inwhatpercentageofthepipelinesisthethicknessofmortarlessthan0.5
inch?
b) Iffourpipesareselectedatrandom,whatistheprobabilitythattwoormore
havemortarthicknesslessthan0.5inch?
c) 100pipesaretakenandtheirmortarthicknessesaremeasuredindividually.If
themortarthicknessofapipeisfoundtobelessthan0.5inch,10%lessis
paidtothemanufacturerforthatpipe.Ifthenormalpriceofapipeis
S125.00,whatistheexpectedcostof100pipes?
172
TheNormalDistribution
19. Onaparticularfarm,profitdependsonrainfall.Therainfallisnormallydistrib-
utedwithameanof31cmandastandarddeviationof9cm.Farmprofitsare:
a) $100,000ifrainfallisover44cm,
b) $150,000ifrainfallisbetween29and44cm,
c) $130,000ifrainfallisbetween22and29cm,
d) $65,000ifrainfallisbetween15and22cm,and
e) $80,000ifrainfallislessthan15cm
Findtheexpectedfarmprofit.
20. Thetimeastudenttakestoarriveatasolutionforastatisticsproblemdepends
uponwhetherheorsherecognizescertainsimplifyingcommentsintheproblem
statement.Theprobabilityofthisrecognitionis0.7.Ifthestudentrecognizesthe
comments,thesolutiontimeisnormallydistributedwithameantimeof20
minutesandstandarddeviationof4.3minutes.Ifthestudentdoesnotrecognize
thesimplifyingcomments,thesolutiontimeisnormallydistributedwithamean
timeof43minuteswithastandarddeviationof10.2minutes.
a) Whatistheexpectedsolutiontimeinalargeclassofstudents?
b) Whatistheprobabilitythatastudentchosenatrandomwillrequiremore
than28.2minutes?
c) Whatistheprobabilitythatheorshewillrequiremorethan43minutes?
21. Anirrigationpumpislocatedonareservoirwhosemeanwaterlevelis550mwith
astandarddeviationof10m.Thewaterlevelaffectstheoutputofthepump.Ifthe
levelisbelow538m,thentheexpectedpumpoutputis250L/minwithastan-
darddeviationof45L/min;ifthelevelisbetween538and555m,thenthe
expectedpumpoutputis325L/minwithastandarddeviationof52L/min;and
ifthelevelisgreaterthan555m,thentheexpectedpumpoutputis375L/min
withastandarddeviationof48L/min.Thevariationintheoutputatanygiven
waterlevelisduetovariationsintheelectricalpowersupplyandwaveactionon
thereservoir.Allvariablesarenormallydistributed.
a) Whataretheprobabilitiesofthelevelsbeing
i. lessthan538m?
ii. between538mand555m?
iii. greaterthan555m?
b) Whatistheexpectedpumpingrate?
c) Ifthecostofpumpingis$25/hrwhentheflowrateislessthan350L/min,
and$35/hrwhentheflowrateexceeds350L/min,calculatetheaverage
costofpumping.
7.4 UsingtheComputer
InsteadofusingtablessuchasTableA1,cumulativenormalprobabilitiescanbe
obtainedfromcomputersoftwaresuchasExcel.Standardcumulativenormalprob-
abilities,(z),canbeobtainedbytheExcelfunction=NORMSDIST(z),where
173
Chapter7
x
z
isthestandardnormalvariable.Theinversefunctionisalsoavailableon

Excel.Ifweknowavalueofthecumulativenormalprobability,(z),andwantto
findthevalueofztowhichitapplies,wecanusethefunction
=NORMSINV(cumulativeprobability).Inbothfunctionnamesthelettersstands
forthestandardformthatis,arelationbetweenandzratherthanbetweenand
x.Bothfunctionnamescanbepastedintotherequiredcellchoosingthestatistical
categoryandthentherequiredfunction,asdiscussedinsection5.5.Alternatively,
theycanbetyped.
TheseExcelfunctionscanbeusedtosolveExamples7.1to7.5andtheProblems
followingsection7.3.Toillustrate,hereisanalternativesolutionofExample7.4.
SketchesoftheprobabilityrelationsshowninFigures7.11and7.12arestillneeded
tocheckthatthecalculatedprobabilitiesarereasonable.
Example7.4(SolutionUsingExcel)
Thestrengthsofindividualbarsmadebyacertainmanufacturingprocessareap-
proximatelynormallydistributedwithmean28.4andstandarddeviation2.95(in
appropriateunits).Toensuresafety,acustomerrequiresatleast95%ofthebarstobe
strongerthan24.0.
a) Dothebarsmeetthespecification?
b) Byimprovedmanufacturingtechniques,themanufacturercanmakethebars
moreuniform(i.e.,decreasethestandarddeviation).Whatvalueofthestandard
deviationwilljustmeetthespecificationifthemeanstaysthesame?
x
1

Answer: a)z
1
with=28.4,=2.95,andx
1
=24.Thenthefunction =

=(2428.4)/2.95wasenteredincellC2withthelabelz
1
incellA2.Explanations
areincolumnB.Since(z
1
)isgivenbyNORMSDIST(z
1
),thefunction
=NORMSDIST(C2)wasenteredincellC3,andthelabelPhi(z1)wasenteredin
cellA3.Thepercentageprobabilitythatthebarswillbestrongerthan24.0is
givenbythefunction=(1C2)*100%,whichwasenteredincellC4,andthe
correspondinglabelPr%(stronger)wasenteredincellA4.Theresultofthe
calculationwas93.2(formattedto1decimalplaceusingtheFormatmenu).The
answertopart(a)oftheproblemwasplacedinrow5.
(b) Nowwerequire(z
2
)=10.95.ThereforethelabelPhi(z2)wasenteredincell
A7,andthefunction=10.95wasenteredincellC7.Thelabelz2wasenteredin
cellA8,andthefunction,=NORMSINV(C7),wasenteredincellC8.Theresult
x
2

was1.645.Sincez
2
= ,thefunction=(24.028.4)/C8wasenteredincell

C9,andthelabelReqdSDwasenteredincellA9.Theresultwas2.675(format-
tedto3decimalplacesusingtheFormatmenu).Theanswertopart(b)was
placedinrows10and11.
174
1
TheNormalDistribution
TheExcelworksheetisshownbelowinTable7.1.Answerstothespecific
questionsareinrows5,10and11.
Table7.1:WorkSheetforExample7.4
A B C
Ex7.4(a)
2
3
4
z1 (2428.4)/2.95= 1.4915254
Phi(z1) NORMSDIST(C1)= 0.06791183
Pr%(stronger) (1C2)*100%= 93.2
5 >Since93.2%<95%,thebarsdonotmeetthespecification.
6 (b)
7
8
9
Phi(z2) 10.95= 0.05
z2 NORMSINV(C7)= 1.644853
ReqdSD (2428.4)/C8= 2.675
10 >Ifstddevcanbereducedto2.675andthemean
11 staysthesame,thespecificationwilljustbemet.
7.5 FittingtheNormalDistributiontoFrequencyData
Wewillfindgreatadvantagesinfittinganormaldistributiontoasetoffrequency
dataifthetwodistributionsagreereasonablywell.Wecansummarizethedatavery
compactlyinthatcasebygivingthemeanandstandarddeviation.Powerfulstatisti-
calteststhatassumethattheunderlyingdistributionisnormalbecomeavailablefor
ouruse.
Inthissectionwewillexaminefittinganormaldistributiontogroupedfrequency
dataandtodiscretefrequencydata.Thisapproachwillbeextendedinsection7.6to
approximatinganotherdistribution(specificallyabinomialdistributionforcertain
circumstances)byanormaldistribution.Theninsection7.7wewilllookatfittinga
normaldistributiontocumulativefrequencydata.
Sinceanormaldistributionisdescribedcompletelybytwoparameters,itsmean
andstandarddeviation,usuallythefirststepinfittingthenormaldistributionisto
calculatethemeanandstandarddeviationfortheotherdistribution.Thenweuse
theseparameterstoobtainanormaldistributioncomparabletotheotherdistribution.
(a) FittingtoaContinuousFrequencyDistribution
First,then,weneedtoestimatetheparametersofthenormaldistributionthatwillfit
thefrequencydistributioninwhichweareinterested.WehaveseeninChapter3how
toestimatethemeanandstandarddeviationofthepopulationfromwhichasample
came.Thenwecancomparethenormaldistributionhavingthoseparameterstothe
correspondinggroupedfrequencydata.
175
Chapter7
Example7.6
Example4.2gavemeasurementsofthethicknessofaparticularmetalpartofan
opticalinstrumenton121successiveitemsfromaproductionline.Takingthesedata
asasample,calculationsshowninExample4.2gavetheestimateofthemeanofthe
populationtobe x =3.369mm,andtheestimateofthestandarddeviationofthe
populationtobes=0.0629mm.
Wesawinsection7.1thattheshapeofthehistogramforthesedataseemstobeat
leastapproximatelyconsistentwithanormaldistribution.Thereforewewillcompare
theclassfrequenciesfoundinExample4.2withtheexpectedfrequenciesfora
normaldistributionwithmeanandstandarddeviationasstatedabove.Thefirststep
inthiscomparisonistocalculatecumulativenormalprobabilities,(z),attheclass
boundariesusingTableA1ortheequivalentExcelfunction.
ClassBoundary,
x
xmm
z
(z)

3.195 2.77 0.0028
3.245 1.97 0.0244
3.295 1.18 0.1190
3.345 0.38 0.3520
3.395 +0.41 0.6591
3.445 +1.21 0.8869
3.495 +2.00 0.9772
3.545 +2.80 0.9974
3.595 +3.59 0.9998
Accordingtothenormaldistribution:
Pr[X<3.195] = 0.0028
Pr[3.195<X<3.245] = 0.02440.0028 = 0.0216
Pr[3.245<X<3.295] = 0.11900.0244 = 0.0946
Pr[3.295<X<3.345] = 0.35200.1190 = 0.2330
Pr[3.345<X<3.395] = 0.65910.3520 = 0.3071
Pr[3.395<X<3.445] = 0.88690.6591 = 0.2278
Pr[3.445<X<3.495] = 0.97720.8869 = 0.0903
Pr[3.495<X<3.545] = 0.99740.9772 = 0.0202
Pr[3.545<X<3.595] = 0.99980.9974 = 0.0024
Pr[X>3.595] = 1 0.9998 = 0.0002
176
TheNormalDistribution
Theexpectedfrequencyforeachintervalisobtainedbymultiplyingthecorre-
spondingprobabilitybythetotalfrequency,121.Theresultsare:
Lower UpperClass Probability Expected Observed
Boundary Boundary Frequency Frequency
3.195 0.0028 0.3 0
3.195 3.245 0.0216 2.6 2
3.245 3.295 0.0946 11.4 14
3.295 3.345 0.2330 28.2 24
3.345 3.395 0.3071 37.2 46
3.395 3.445 0.2278 27.6 22
3.445 3.495 0.0903 10.9 10
3.495 3.545 0.0202 2.4 2
3.545 3.595 0.0024 0.3 1
3.595 0.0002 0.0 0
ExpectedandobservedfrequenciesarecomparedinFigure7.15.
<3.195
3.195-3.245
3.245-3.295
3.295-3.345
3.345-3.395
3.395-3.445
3.445-3.495
3.495-3.545
3.545-3.595
>3.595
Thickness,mm
0 10 20 30 40 50
Frequency
Expected
Observed
Figure7.15:ComparisonofObservedFrequencieswith
ExpectedFrequenciesaccordingtoFittedNormalDistribution
WecanseeinFigure7.5thatactualfrequenciesaresometimesaboveandsome-
timesbelowthetheoreticalexpectedfrequenciesaccordingtothenormaldistribution.
Thedifferencesmightwellbeexplainedbyrandomvariations,sowecanconclude
thatthefrequencydistributionseemstobeconsistentwithanormaldistribution.
177
Chapter7
(b) FittingtoaDiscreteFrequencyDistribution
Ifthedistributiontowhichwecompareanormaldistributionisdiscrete,becausethe
normaldistributioniscontinuousweneedacorrectionforcontinuity.Thecorrection
forcontinuitywillbeexaminedinthenextsection,inwhichthediscretebinomial
P
r
o
b
a
b
i
l
i
t
y

d
e
n
s
i
t
y

o
r

B
i
n
o
m
i
a
l

P
r
o
b
a
b
i
l
i
t
y

distributionisapproximatedbyanormaldistribution.
7.6 NormalApproximationtoaBinomialDistribution
Itisoftendesirabletousethenormaldistributioninplaceofanotherprobability
distribution.Inparticular,itisconvenienttoreplacethebinomialdistributionwith
thenormalwhencertainconditionsaremet.Remember,though,thatthebinomial
distributionisdiscrete,whereasthenormaldistributioniscontinuous.
Theshapeofthebinomialdistributionvariesconsiderablyaccordingtoits
parameters,nandp.Iftheparameterp,theprobabilityofsuccess(oradefective
itemorafailure,etc.)inasingletrial,issufficientlysmall(orifq=1pissuffi-
cientlysmall),thedistributionisusuallyunsymmetrical.Ifporqissufficientlysmall
andifthenumberoftrials,n,islargeenough,abinomialdistributioncanbeapproxi-
matedbyaPoissondistribution.Thiswasdiscussedinsection5.4(c).
Ontheotherhand,ifpissufficientlycloseto0.5andnissufficientlylarge,the
binomialdistributioncanbeapproximatedbyanormaldistribution.Underthese
conditionsthebinomialdistributionisapproximatelysymmetricalandtendstowarda
bellshape.Alargervalueofnallowsgreaterdepartureofpfrom0.5;abinomial
distributionwithverysmallp(orpverycloseto1)canbeapproximatedbyanormal
distributionifnisverylarge.Ifnislargeenough,sometimesboththePoisson
approximationandthenormalapproximationareapplicable.Inthatcase,useofthe
normalapproximationisusuallypreferablebecausethenormaldistributionallows
easycalculationofcumulativeprobabilitiesusingtablesorcomputersoftware.
0.2
0.15
NormalProbabilityDensity
0.1
BinomialProbability
0.05
0
0 5 10 15 20
Numberofdefectives
Figure7.16:ComparisonofaBinomialDistribution
withaNormalDistributionFittedtoIt
178
TheNormalDistribution
Figure7.16comparesabinomialdistributionwithanormaldistribution.The
parametersofthebinomialdistributionarep=0.4andn=20(forinstance,wemight
takesamplesof20itemsfromaproductionlinewhentheprobabilitythatanyone
itemwillrequirefurtherprocessingis0.4).Tofitanormaldistributionweneedto
knowthemeanandthestandarddeviation.Rememberthatthemeanofabinomial
distributionis =np,andthatthestandarddeviationforthatdistributionis
np
(
1 p
)
.Tofitanormaldistributiontothisbinomialdistribution,wemust
have =np=(20)(0.4)=8,and np
(
1 p
)

(
20
)(
0.4
)(
0.6
)
=2.191.In
Figure7.6thecontinuouscurvepassingthroughsmallcirclesrepresentsthedensity
functionforthefittednormaldistribution,whiletheverticallinestoppedbysmall
crossesrepresentbinomialprobabilities.Theagreementappearstobeverygood.
Butwehaveadifficultytodealwith.Thatis,thenormaldistributioniscontinuous,
whereasthebinomialdistributionisdiscrete.Probabilitiesaccordingtothebinomial
distributionaredifferentfromzeroonlywhenthenumberofdefectivesisawhole
number,notwhenthenumberisbetweenthewholenumbers.Ontheotherhand,if
weintegratethenormaldistributiononlyforlimitsinfinitesimallyapartaroundthe
wholenumbers,theareaunderthecurvewillbeinfinitesimallysmall.Thenthe
correspondingprobabilitywillbezero.
Thecommon-sensesolutionistointegrateforwidersteps,whichtogethercover
thewholerange.Wesetlimitsforintegrationofthenormaldistributionhalfway
betweenpossiblevaluesofthediscretevariable.Thismodificationiscalledthe
correctionforcontinuity.InFigure7.6thelimitsforintegrationofthenormal
distributionwouldbefrom5.5to6.5tocomparewithabinomialprobabilityat6
defects.Forcomparisonwiththebinomialvalueat7,thelimitswouldbefrom6.5to
7.5,andsoon.
Thenumericalcomparisonofprobabilitiesusingthecorrectionforcontinuityis
showninExample7.7.Approximatingbinomialprobabilitiesinthiswayiscalled
thenormalapproximationtoabinomialdistribution.
Example7.7
CorrespondingtothecaseshowninFigure7.6,letscalculateprobabilitiesaccording
tothebinomialdistributionandforthenormaldistributionwhichfitsitapproxi-
mately.Inasampleof20itemswhentheprobabilitythatanyoneitemrequires
furtherprocessingis0.4,thebinomialdistributiongivesprobabilitiesthatvarious
numbersofitemswillrequiremoreprocessing.Thisisthenabinomialdistribution
withn=20andp=0.4.
Answer:Samplecalculationswillbeshownfortheprobabilityofsixitemsrequir-
ingfurtherprocessinginasampleof20,andthenalltheresultswillbecompared.
179
Chapter7
Bythebinomialdistribution,
Pr[R=6]=
20
C
6
(0.4)
6
(0.6)
14
=
=0.124
Bythenormalapproximation,
19 18 17
6 5
)(
4 3
)( )
(0.4)
6
(0.6)
14
(
20
)( )( )( )(
16
)(
15
)
( )( )(
2
j 5.58\
Pr[R=6] Pr[5.5<X<6.5]=
j 6.58\

, (

, (
(
2.191
, (
2.191
,
=(0.68)(1.14)
=0.121
Thevaluesforthenormalapproximationshownabovewerereadfromtables
withzevaluatedtotwodecimalplaces.Evaluatingztothreedecimalplacesand
usinglinearinterpolation,orusingcomputersoftwaresuchasthefunction
NORMSDISTfromExcel,wouldgive0.24680.1269=0.120fortheprobability
ofsixdefectives.InTable7.2thenormalapproximationshavebeencalculatedwith
zevaluatedtothreedecimalplacesandwithlinearinterpolationtogiveamore
accurateerrorofapproximation,butinterpolationisnotordinarilyrequired.
Table7.2:ComparisonofBinomialDistributionandNormalApproximation
Numberfor Binomial Normal Errorof
Further Probability Approximation Approximation
Processing
0 0.00004 0.00026 0.0002
1 0.0005 0.0012 0.0007
2 0.0031 0.0045 0.0014
3 0.012 0.014 0.0016
4 0.035 0.035 0.0001
5 0.075 0.072 +0.003
6 0.124 0.120 +0.005
7 0.166 0.163 +0.003
8 0.180 0.180 0.001
9 0.160 0.163 0.003
10 0.117 0.120 0.003
11 0.071 0.072 0.0009
12 0.035 0.035 +0.0004
13 0.015 0.014 +0.0006
14 0.0049 0.0045 +0.0003
15 0.0013 0.0012 +0.0001
180
TheNormalDistribution
16 0.0003 0.0003 +0.0000
17 0.00004 0.00005 0.0000
18 5x10
6
0.00001 0.0000
19 3x10
7
<10
6
20 1x10
8
<10
6
ThelargesterrorinTable7.2is0.005,0.124vs.0.120forsixdefectives.
Asaroughrule,thenormalapproximationtothebinomialdistributionisusually
reasonablygoodifbothnpand(n)(1p)aregreaterthan5.InExample7.7,npis
equalto(20)(0.4)=8and(n)(1p)isequalto(20)(0.6)=12,sotheroughruleis
satisfiedwithsometospare.Theroughruleshouldbeusedinsolvingproblemsin
thisbook.
Theruleisonlyaroughguidebecausethetwoparameters,nandp,affectthe
agreementseparately.Forthesamevalueoftheproductnp,thenormalapproxima-
tiontothebinomialdistributionisbetterwhenpiscloserto0.5.Wecanillustrate
thatbycomparingthebinomialdistributionwiththecorrespondingnormalapproxi-
mationjustatnp=5,thelimitgivenbytheroughrule,atthreecombinationsofn
andp.Figure7.17showsthesecomparisons.
P
r
o
b
a
b
i
l
i
t
y

0.25
0.2
0.15
0.1 Binomial
NormalApproximation
0.05
0
0 1 2 3 4 5 6 7 8 9 10
NumberofOccurrences,i
Figure7.17(a):Comparisonatn=10andp=0.5
181
Chapter7
0
0.05
0.1
0.15
0.2
0.25
P
r
o
b
a
b
i
l
i
t
y

NormalApproximation
Binomial
0 1 2 3 4 5 6 7 8 9 10 11 12
NumberofOccurrences,i
Figure7.17(b):Comparisonatn=25andp=0.2
0
0.05
0.1
0.15
0.2
0.25
P
r
o
b
a
b
i
l
i
t
y

NormalApproximation
Binomial
0 1 2 3 4 5 6 7 8 9 10 11 12
NumberofOccurrences,i
Figure7.17(c):Comparisonatn=250andp=0.02
WecanseefromFigure7.17thatthediscrepanciesaresmallestatn=10and
p=0.5,intermediateatn=25andp=0.2,andlargestatn=250andp=0.02,even
thoughallareatnp=5andn(1p)>5.Atn=10andp=0.5thelargestabsolute
discrepancyis0.002;atn=25andp=0.2thelargestabsolutediscrepancyis0.011;
andatn=250andp=0.02thelargestabsolutediscrepancyis0.071.
Example7.8
Acoinisbiased.Wearetoldthattheprobabilityofheadsonanyonetossis40%and
thecorrespondingprobabilityoftailsis60%.Thecoinistossed120times,giving56
headsand64tails.Fromwhatweweretoldaboutthebias,weexpect(120)(0.40)=
48heads.Ifthegiveninformationiscorrect,whatistheprobabilityofgettingeither
182
TheNormalDistribution
56ormoreheads,or40orfewerheads(i.e.,aresultasfarfromtheexpectedresult
as56headsorfartherineitherdirection)?Istheresultsounlikelythatweshould
doubtthattheprobabilityofheadsonasingletossisonly40%?
Answer: Thisproblemcouldbesolvedusingthebinomialdistributiondirectly:
Pr[R=56]=
120
C
56
(0.4)
56
(0.6)
64
,andsimilarlyforR=57,58,...120andR=0,1,
2,...,39,40,thenaddingupprobabilities.However,thesecalculationsarevery
laborious.ItwouldbelessworktocalculatethesumofPr[R=41],Pr[R=42],...
Pr[R =54],Pr[R=55]andsubtractthatsumfrom1,butthatwouldstillbealotof
labor.Itismucheasiertoapplythenormalapproximation,andresultsshouldbevery
littledifferent.Inthiscasenp=(120)(0.4)=48and(n)(1p)=(120)(0.6)=72,so
theroughruleisveryeasilysatisfied.Forthenormalapproximation =np=
(120)(0.4)=48and
( )(
n p
)(
1 p
)

(
120
)(
0.4
)(
0.6
)
5.367.
Usingthecorrectionforcontinuity,Pr[R=56]correspondstotheareaunderthe
normalprobabilitycurvebetween55.5and56.5.So,Pr[R>55]correspondstothe
areaunderthecurvebeyond55.5.Similarly,Pr[R<41]correspondstothearea
x
1
55.5 48
underthecurveforX<40.5.Ifx
1
=55.5, z
1
1.397
5.367
40.5 48
Similarly,if x
2
40.5,z
2
1.397
5.367
ThenPr[R>55,Binomial] Pr[Z>1.397]
Req'dareas
40.5 48 55.5 x,no.ofheads
=1(1.397)
z
2
0 z
1
1(1.40)
Figure7.18:
=10.9192=0.081.
Probabilities for
Example7.8
ThenPr[morethan55heads] 8.1%.
Similarly,Pr[fewerthan41heads]8.1%.Theprobabilityofaresultasfarfrom
themeanas56headsorfartherineitherdirection,giventhatp=0.400,is(2)(8.1%)
=16.2%.Thiswouldhappenbychanceaboutonetimeinsix,soitisnotvery
unlikely.Thentheresultoftossingthecoingivesusnoevidencethatpisnotequal
to0.400.
Approximationssuchasthenormalapproximationtothebinomialdistribution
arenotasimportantastheyusedtobebecausenearlyexactvaluescanbeobtained
usingcomputersoftware.Aswesawinsection5.5(b),bothsingleandcumulative
valuesforthebinomialdistributioncanbeobtainedfromMicrosoftExcel.However,
evenwhenthesenearlyexactvaluesareavailable,itmaybedesirabletousea
convenientapproximation.
183
Chapter7
7.7 FittingtheNormalDistributiontoCumulative
FrequencyData
(a) CumulativeNormalProbabilityandNormalProbabilityPaper
Insteadofcomparingafrequencydistributionorprobabilitydistributiontoanormal
probabilitydistributionusingahistogramortheequivalent,oftenabetteralternative
istocomparegraphicallyusingcumulativeprobabilities.Thishastheadvantageof
givinganoverallpicture,showingthesumofdeviationstoanyparticularpoint.
However,Figure7.3showsthatthecumulativenormalprobabilityplottedagainstz
givesanS-shapedcurve.Thatwouldalsobetrueplottedagainstx.Itisnotconve-
nienttomakegraphicalcomparisonsusinganS-shapedcurve.
However,thescalecanbemodified(ordistorted)togiveamoreconvenient
comparison.Thescaleismodifiedinsuchawaythatcumulativeprobabilityplotted
againstxorzwillgiveastraightlineforanormaldistribution.Afrequencydistribu-
tionwillstillshowrandomvariations,butrealdeparturefromanormaldistribution
ismucheasiertospot.Thus,cumulativerelativefrequencies(onthemodifiedscale)
areplottedversusthevariable,x,onalinearscale.Ifthedatacamefromanormal
distribution,thisplotwillgiveapproximatelyastraightline.Iftheunderlying
distributionisappreciablydifferentfromanormaldistribution,largerdeviationsand
systematicvariationswillbepresent.
Graphpaperusingsuchamodifiedordistortedscaleforcumulativerelative
frequency,andauniformscaleforthemeasuredvariable,iscallednormalprobability
paper.Thisspecialtypeofcommercialgraphpaper,likethespecialtypesforloga-
rithmicandlog-logscales,isavailablefrommanysuppliers.Commercialnormal
probabilitypapercomeswithadistortedscaleforrelativecumulativefrequency
alongoneaxisandcorrespondingunequallyspacedgridlines.Theotherscale(with
correspondinggridlines)isuniform.Pointsareplottedbyhandonthispaperwith
co-ordinatescorrespondingtorelativecumulativefrequency(onthedistortedscale)
versusthevalueofthevariable(onthelinearscale).Inmostcaseswewillusedata
fromagroupedfrequencydistribution.Sincenormalprobabilitypaperusescumula-
tivefrequencyorprobability,datafromagroupedfrequencydistributionshouldbe
plottedversusclassboundaries,notclassmidpoints.
Thepointssoplottedcanbecomparedwiththestraightlinerepresentinga
normaldistributionfittedtothedataandsohavingthesamemeanandstandard
deviation.Sincethemedianofanormaldistributionisequaltoitsmean,onepoint
onthislineshouldbeat50%relativecumulativefrequencyand x,theestimated
mean.Anotherpointshouldbeat97.7%relativecumulativefrequencyand(x+2s);
athirdshouldbeat 2.3%relativecumulativefrequencyand(x 2s).
184
TheNormalDistribution
Example7.9
ComparethedataofExample4.2andTable4.6withanormaldistributionusing
normalprobabilitypaper.Thedataareformeasurementsofthethicknessofametal
partofanopticalinstrument.ThehistogramshowninFigure4.4seemsqualitatively
consistentwithanormaldistribution.
Answer: Table7.3wasobtainedusingthedataofTable4.6:
Table7.3:DataforPlotonNormalProbabilityPaper
Thickness,mm Cumulative RelativeCumulative
(classboundary) Frequency Frequency,%
3.245 2 1.7
3.295 16 13.2
3.345 40 33.1
3.395 86 77.1
3.445 108 89.3
3.495 118 97.5
3.545 120 99.2
FromTable7.3thicknesswasplotted(onalinearscale)againstrelativecumula-
tivefrequency(onadistortedscale)onnormalprobabilitypaperasshowninFigure
7.19.FromExample4.2theestimateofthemeanis x =3.3685mm,andthe
estimateofthestandarddeviationiss=0.0629mm.Thenthestraightlinewas
drawnonFigure7.9topassthroughthefollowingpoints:3.369and50.0%relative
cumulativefrequency;3.3685(2)(0.0629)=3.243mmand2.3%relativecumula-
tivefrequency;3.3685+(2)(0.0629)=3.494mmand97.7%relativecumulative
frequency.
Thepointsseemtoagreeverywellwiththeline,soitisreasonabletorepresent
thedatabyanormaldistribution.Amorequantitativecomparisonwillbegivenin
Chapter13,butthecomparisonusingnormalprobabilitypaperhastheadvantageof
pointingoutanypartofthedistributionwherelocaldeparturefromthelineoccurs.
(b) ComputerPlotEquivalenttoNormalProbabilityPaper
z
Insteadofobtainingcommercialprobabilitypaperandplottingpointsmanually,it
maybemoreconvenienttomakeessentiallythesamevisualcomparisonusinga
computer.However,itisnotconvenienttoplotdatadirectlytoanonuniformscale
usingacomputer unlessspecializedsoftwareisavailable(butifthespecialized
softwareisavailable,itcancertainlybeused).Thealternativeistoplotz
equivalent
(or
equivalent
)onauniformscaleagainsttheexperimentalvariable,alsoonauniform
scale.Rememberthattherelativecumulativefrequencygivesanapproximationto
185
Chapter7
cumulativenormalprobabilityifthedatacamefromapopulationgovernedbythe
normaldistribution.Rememberalsothataplotequivalenttouseofnormalprobabil-
itypaperwouldgiveapproximatelyastraightlineifthepointsfollowanormal
distribution;ifthatconditionismet,zisapproximatelyalinearfunctionofthe
experimentalvariable,x.Thenz
equivalent
iscalculatedfromtheinversenormalprobabil-
ityfunctionoftherelativecumulativefrequency.ForExcel,z
equivalent
isfoundfrom
NORMSINV(relativecumulativefrequency).
3.195 3.295 3.495
99.8
99.5
99
98
95
90
80
70
60
50
40
30
20
10
5
2
1
0.5
0.2
0.1
99.9
R
e
l
a
t
i
v
e

C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
,

x
bar
-2s
x
bar
x
bar
+2s
3.395 3.595
Thickness,mm
Figure7.19:NormalProbabilityPaper
186
TheNormalDistribution
x x
Sincez= ,thestraightlinecorrespondingtothenormaldistributionis
s
givenbyx= x +zs,wherexistheexperimentalvariableand x andsarethe
samplemeanandtheestimatefromthesampleofthestandarddeviation.This
straightlineisplottedforcomparisonwiththedata.Iftheyagree,thenthedata
correspondapproximatelytoanormaldistribution.
Example7.10
Dataformeasurementsofthethicknessofametalpartofanopticalinstrumentfrom
Example4.2havealreadybeencomparedwithanormaldistributioninExample7.6
(whereobservedfrequencieswerecomparedtoexpectedfrequencies)andExample
7.9 (wherenormalprobabilitypaperwasused).Nowwewillcalculatecumulative
relativefrequencyandz
equivalent
forplottingagainstthicknessatthecorresponding
upperclassboundary.ThecalculationsareshowninTable7.4,andFigure7.10
showstheresultinggraph.
Table7.4:PointsforComputerEquivalentofNormalProbabilityPaper
Thicknessat Relative z
equivalent
=
ClassBoundary Cumulative NORMSINV(rcf)
mm Frequency,%
3.245 1.65 2.131
3.295 13.22 1.116
3.345 33.06 .438
3.395 71.07 .556
3.445 89.26 1.24
3.495 97.52 1.964
3.545 99.17 2.397
Thestraightlineforthenormaldistributioncanbelocatedbyplottinganytwo
x x
pointsonthelinez= .Since x =3.3685ands=0.0629,atx=3.195theline
s
3.3685 3.195
mustpassthrough z= =2.758,andatx=3.595thelinemustpass
0.0629
3.3685 3.595
throughz= =3.601.ThislineisalsoshownonFigure7.10.
0.0629
AnextrascalehasbeenaddedtoFigure7.10givingpercentagerelativecumulative
frequenciescorrespondingtothetickmarksontheuniformverticalscale.Analternative,
whichwillbeadoptedinsomelaterexamples,istomovethex-scaletotherighthand
sideandtomarkthepercentagerelativecumulativefrequenciesforthelefthand,
uniformlyspaced,tickmarks.Therelativecumulativefrequenciesatthesetickmarks
aregivenbycumulativenormalprobabilitiesofthecorrespondingvaluesofz.
187
Chapter7
R
e
l
a
t
i
v
e

C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
,

0.13 3
2.5 0.62
2 2.28
1.5
6.68
1 15.87
30.85 0.5
0
-zequiv
50
69.15
0.5
1 84.13
1.5
93.32
2
97.72
2.5 99.38
99.87 3
3.2 3.3 3.4 3.5
Thickness,mm
3.6
Figure7.20:ComputerEquivalenttoNormalProbabilityPaperofFigure7.19
(c) PlottingIndividualPointsUsingaComputer
Ratherthanusingthegroupedfrequencyapproach,wemaywanttoplotallthe
individualpointsinaformsuitableforvisualcomparisonwithanormaldistribution.
Ifthedatasetissmall,wemightdothatbyhandonnormalprobabilitypaper,but
mostoftenwewoulduseacomputer.Wesawinsection3.3thateachindividual
pointcanbeconsideredaseparatequantile.Ifthepointsarearrangedinorderof
magnitudefromthesmallesttothelargest,theithiteminorderofmagnitudeamong
188
TheNormalDistribution
atotalofnitemsrepresentsarelativecumulativefrequencyof(i0.5)/n.Ifthedata
followanormaldistribution,z
equivalent
calculatedfromNORMSINV(relativecumula-
tivefrequency)willbeapproximatelyastraight-linefunctionoftheindependent
variable.Thepointscanbecomparedtoastraightlinecalculatedfromthesample
meanandsampleestimateofstandarddeviationaccordingtothenormaldistribution.
IfthedataofExample4.2whichwereusedinExamples7.6,7.9,and7.10are
plottedinthisway,theresultisshowninFigure7.11.Someofthecalculationsare
showninTable7.5.
Table7.5:CalculationsforComparisonUsingIndividualPoints
Thicknessat Order Relative z
equivalent
ClassBoundary x^2 number, Cumulative =NORMSINV
xmm i Frequency (relcumfr)
=(i0.5)/n
3.21 10.3041 1 0.0041 2.6411
3.24 10.4976 2 0.0124 2.2446
3.25 10.5625 3 0.0207 2.0403
3.26 10.6276 4 0.0289 1.8968
3.26 10.6276 5 0.0372 1.7843
3.26 10.6276 6 0.0455 1.6906
... ... ... ... ...
3.49 12.1801 118 0.9711 1.8968
3.51 12.3201 119 0.9793 2.0403
3.51 12.3201 120 0.9876 2.2446
3.57 12.7449 121 0.9959 2.6411
x

x
i
/n 407.59/121 3.3685
2
2
,
2
s
,

x
i

(

x
i
)
/n
]
/
(
n 1
)
]
]
2

,
1373.4471
(
407.59
)
/121
]
/ 120
]
s 0.0629
Atx=3.15,linepassesthroughz=(3.153.3685)/0.0629=3.47
Atx=3.6,linepassesthroughz=(3.63.3685)/0.0629=3.68
189
Chapter7
R
e
l
a
t
i
v
e

C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
,

0.13
0.62
2.28
6.68
15.87
30.85
50
69.15
84.13
93.32
97.72
99.38
99.87
3
2.5
2
1.5
1
0.5
0
0.5
1
1.5
2
-2.5
3
z
3.2 3.3 3.4 3.5 3.6
Thickness,mm
Figure7.21:ComputerComparisonUsingIndividualPoints
TheverticalgroupsinFigure7.21occurbecauseofmultiplemeasurements.For
example,thereare11pointscorrespondingtoathicknessof3.35mm(measuredto
twodecimals).
(d)Extension: ProbabilityPlottinginGeneral
Thediscussioninthisbookofspecialplotsforcomparingprobabilitydistributionsor
frequencydistributionsislimitedtocomparisonsfornormaldistributions,butthere
areothertypesforvarioussituations.Otherbooksgivedetailsofthesemethods.
Ageneralizationofthismethodiscalledquantile-quantileplotting,orQ-Q
plotting.Itcanbeusedtocompareonerelativefrequencydistributionwithanother,
sogivinganempiricalcomparison.Itcanalsobeusedtocompareasetofdatawith
anyofseveraltheoreticalprobabilitydistributions,includingtheexponentialand
Weibulldistributions,whichwerediscussedbrieflyinsection5.3.
Agooddiscussionofprobabilityplottinganditsapplicationinindustrycanbe
foundinthebookbyVardeman.SeetheListofSelectedReferencesinsection15.2.
7.8 TransformationofVariablestoGiveaNormal
Distribution
Laterinthisbookwewillseestatisticaltestswhichassumethattheunderlying
distributionisanormaldistribution.Althoughthereareotherteststhatdonotrequire
190
TheNormalDistribution
thisoranysimilarassumption,ingeneraltheseothertestsarelesssensitivethantests
thatdoassumeanunderlyingnormaldistribution.Furthermore,normalprobabilities
areconvenientandfamiliar.
Therefore,iftheoriginalvariableshowsadistributionwhichisnotanormal
distribution,itisveryusefultotrytochangethevariablesothatthenewformwill
followanormaldistribution.Thisstrategyisoftensuccessfuliftheoriginaldistribu-
tionshowedasinglemodesomewherebetweenthesmallestandlargestvaluesofthe
wasx,formsofthenewvariabletotryincludelogx,1/x,
willdothejob.
x:
variable,buttheoriginaldistributionwasnotsymmetrical.Iftheoriginaldistribution
whatever x,and
3
Themostcommontransformationforthispurposeisreplacingxbylnx,log
10
x
orlogarithmofxtoanyotherbase.Ifoneoftheseworks,theotherswill,too.This
changeofvariablearisesnaturallyinsomecases,suchaschanginghydrogen-ion
concentrationtopH,orchangingnoiseintensityorpowertodecibels.Itisfound
usefulfordataofhydrology,fatiguefailures,andparticlesizedistribution.
Example7.11
Thesizedistributionofparticlesfromagrinderwasmeasuredusingascanning
electronmicroscope.Thesizedistribution,ascumulativepercentageofnumberof
particlesasafunctionofparticlesizeinmillimeters,isshownbelow.
ParticleSize RelativeCumulative
xmm FrequencybyNumber,
%
5.9 7.3
9.6 29.8
13.8 41.2
18.3 58.2
24.8 72.6
30.1 86.8
39.2 95.6
62.7 96.8
84.7 97.7
97.3 98.3
127.2 99
170 99.7
Thesedataarealsoshowngraphicallyontheequivalentofnormalprobability
paperinFigure7.22.
191
0
2
Chapter7
3 99.87
2.5 99.38
97.72
Figure7.22:
R
e
l
a
t
i
v
e

C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
,

%
ParticleSizeData
beforeTransformation
Wecanseethatthepattern
ofthepointsshowsagreat
dealofcurvature,indicating
thatthedistributionisfarfrom
anormaldistribution.Infact,
thedistributionisnotsym-
metrical,sincethemeansize
1.5 93.32
84.13 1
69.15 0.5
50
z
z

30.85 0.5
1 15.87
1.5 6.68
2 2.28
is19.9mandthemedianis
2.5
0.62
appreciablydifferentat
approximately16m. 0.13 3
0 20 40 60 80 100 120 140 160 180
Figure7.23showsthe
transformeddata.Thelinear
Particlesize,xmm
particlesize,xmm,hasbeenreplacedbyy=lnx.Againthedataareshownonthe
computerequivalentofnormalprobabilitypaper.ThestraightlineonFigure7.23
hasbeenfittedtothepoints.Thus,thetransformeddatacanbeapproximatedbya
normaldistribution,representedbythestraightline.
Distributionsofthe
3
0.13
randomvariablexforwhich
2.5 0.62
logxisnormallydistributed
R
e
l
a
t
i
v
e

C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
,

%
30.85
50
69.15
84.13
93.32
97.72
occuroftenenoughsothat
theyaregivenaspecialname.
Theyarecalledlognormal
distributions.
2.28
6.68
15.87
0.5
0
0.5
1
1.5
2
Figure7.23:
2.5
ParticleSizeData
99.38
afterTransformation
99.87 3
0 1 2 3 4 5 6
LogofParticleSize,ln(xmm)
192
1
1.5
2
TheNormalDistribution
Problems
1. Fouridenticalfaircoinsaretossed.
a) Calculateanddrawagraphoftheprobabilitydistributionofthenumberof
heads.
b) Whatistheprobabilityofobtainingthreeormoreheads?
c) Fitanormaldistributiontotheprobabilitydistributionofthenumberof
heads.Sketchthisdistributionontopofthedistributiondrawninpart(a).
d) Whatistheprobabilityofobtainingthreeormoreheadsaccordingtothe
normalapproximation?
e) (i) Wouldthenormalapproximationimproveifcoinswhichwerenotfair
wereused?Explainyouranswer.
(ii) Wouldthenormalapproximationimproveifalargernumberofidentical
coinswereused?Explainyouranswer.
2. Alargeshipmentofbookscontains2%whichhaveimperfectbindings.Calculate
theprobabilitythatoutof400books,
a) exactlyl0willhaveimperfectbindings(usingtwodifferentapproximations);
b) morethanl0willhaveimperfectbindings(choosingoneoftheapproxima-
tionsforthiscalculation).
3. Itisknownthat3%oftheplasticpartsmadebyaninjectionmoldingmachine
aredefective.Ifasampleof30partsistakenatrandomfromthismachines
production,calculate:
a) theprobabilitythatexactly3partswillbedefective.
b) theprobabilitythatfewerthan4partswillbedefective.
Doa)andb)using: (1)binomialdistribution,(2)Poissonapproximation,
and (3)normalapproximation.
c) Ifthesamplesizeisincreasedto150partsusethenormalandPoisson
approximationstocalculatetheprobabilityof:
1) morethan5defectives
2) between6and8defectives,inclusive.
4. Themanagersofanelectronicsfirmestimatethat70%ofthenewproductsthey
marketwillbesuccessful.
a) Ifthecompanymarkets20productsinthenexttwoyears,calculateusingthe
binomialformulaeandusingthenormalapproximation:
(i) theprobabilitythatexactlyfournewproductswillnotbesuccessful;
(ii) theprobabilitythatnomorethanfournewproductswillnotbe
successful.
b) Ifthecompanymarkets100productsoverthenextfiveyears,whatisthe
probabilityof:
(i) morethan15unsuccessfulproducts?
(ii) morethan70butlessthan85successfulnewproducts?
193
Chapter7
5. Undercertainconditionstwentypercentofpigletsraisedintotalconfinement
willdieduringthefirstthreeweeksafterbirth.Consideragroupof20newborn
piglets.
a) Calculatetheprobabilitythatexactly10pigletswilllivetothreeweeksof
age.Doby:
(i) Binomialdistribution
(ii) Poissonapproximation
(iii)Normalapproximation.
b) Calculatetheprobabilitythatnomorethan15pigletswilllivetothreeweeks
ofage.Doby:
(i) Binomialdistribution
(ii) Poissonapproximation
(iii)Normalapproximation.
c) Forbothparts(a)and(b),discussthevalidityoftheapproximationstothe
binomialdistribution.
6. Theproportionofmalesinaparticularareais0.52.Asampleof50peopleis
takenatrandom.
a) Whatprobabilitydistributionfitsthiscasewithoutanyapproximation?Why?
b) IsaPoissonapproximationsuitable?Why?
c) Isanormalapproximationsuitable?Why?
d) Usewhicheverof(b)or(c)ismoreexacttofindtheprobabilitythatasample
of50peoplewillcontainatleast29malesbutnomorethan34males.
7. Aconservativecandidatecaptured48percentofthepopularvoteinherridingin
thelastfederalelection.Inasampleof50peoplefromthecandidatesriding,35
claimtohavevotedfortheconservativecandidate.Whatistheprobabilityina
sampleofthissizethat35ormorepersonswouldhavevotedforthiscandidate?
that13orfewerpersonswouldhavevotedforthiscandidate?Thateitheroneor
theotherofthesealternativeswouldhaveoccurred?Thenisthereanyreasonto
suspectthatthissamplemaynotberepresentativeofthetotalpopulationinthe
riding,orthatsomeoftheindividualsinthesamplearenotbeingtruthfulabout
thewaytheyvoted?
8. Considerthefollowingdataonaveragedailyyieldsofcokefromcoalinacoke
ovenplant:
ClassBoundaries Frequency
67.9568.95 1
68.9569.95 8
69.9570.95 22
70.9571.95 22
71.9572.95 9
72.9573.95 8
73.9574.95 2
194
TheNormalDistribution
Themeanandstandarddeviationforthispopulationareestimatedfromthe
datatobe71.25and1.2775,respectively.
a) Drawthefrequencyhistogramforthesegroupedfrequencydata,andsketcha
normaldistribution,fittedtothedata,superimposedonthehistogram.
b) Plotthegroupedfrequencydataonnormalprobabilitypaperoritscomputer
equivalent.Drawtheappropriatestraightlinetorepresentthenormaldistri-
bution.Commentontheapparentfitorlackoffitbetweenthedataandthe
fittednormaldistribution.
c) Estimatetheprobabilityofaveragedailycokeyieldslessthan70.95using:
(i) thegroupedfrequencydata,
(ii) thenormalprobabilitypaperoritscomputerequivalent,
(iii)tabulatedvaluesforthenormaldistribution.
9. Itisknownthatthenegativeofthelogarithmofthesoilpermeability,y=log(k),
inaparticularsoiltype,TypeA,followsanormaldistribution.Itisknownthat
Pr[y>7.2]=30%,andPr[y<5.6]=5%.
a) Findthemeanandthestandarddeviationofy.
b) If40%ofthetotalplotofinteresthassoilTypeAand60%hasTypeB,for
whichyhasameanof7.5andastandarddeviationof0.45,forwhatpercent-
ageoftheplotisygreaterthan7.35?
10. Datawerecollectedontheinsulationthickness(mm)intransformerwindings,
andthedataweregroupedasfollows:
ClassBoundaries Frequency
17.522.5 2
22.527.5 5
27.532.5 8
32.537.5 16
37.542.5 17
42.547.5 6
47.552.5 2
52.557.5 4
Themeanandstandarddeviationestimatedfromthesedataare37.25mm
and8.084mm,respectively.
a) Plotthegroupedfrequencydataandthefittednormaldata(at x, x 2s,and
x +2s)onnormalprobabilitypaper.Commentonthegoodnessoffit.
b) Estimatetheprobabilityofinsulationthicknessbeinggreaterthan27.5mm
using:
(i) thefrequencygroupeddata;
(ii) normalprobabilitypaperoritscomputerequivalent;
(iii)calculatedvaluesforthenormaldistribution.
11. Dataonheightsof60adultmalescanbegroupedasshownbelow.Heightsarein
cm.
195
Chapter7
ClassBounds ClassFrequency
144.95148.95 1
148.95152.95 1
15295156.95 3
156.95160.95 13
160.95164.95 16
164.95168.95 15
168.95172.95 11
Themeanandstandarddeviationofthepopulationasestimatedfromthis
dataare163.68cmand5.39cmrespectively.
a) Plotthedataonnormalprobabilitypaperoritscomputerequivalent.Mark
thepointscorrespondingto x, x 2s,and x +2s,withcorresponding
cumulativeprobabilities.
b) Commentqualitativelyonhowwellthenormaldistributionfitsthedata.
c) Calculatetheprobabilitythatanadultmalefromthepopulationwillbeless
than160.95cmhigh,(i)usingthegroupedfrequencytable;and(ii)using
thefittednormaldistribution.
12. Dataonpercentagerelativehumidityinavegetablestoragebuildingwere
groupedasfollows:
ClassLowerBound ClassUpperBound Frequency
59.5 64.5 4
64.5 69.5 3
69.5 74.5 8
74.5 79.5 16
79.5 84.5 17
84.5 89.5 5
89.5 945 3
94.5 99.5 4
Total 60
Themeanandstandarddeviationofthepopulationasestimatedfromthe
dataare79.2and8.56,respectively.
a) Plotthedataonnormalprobabilitypaperandsuperimposealinethrough
pointscorrespondingto x, x 2s,and x +2s,withprobabilitiesaccording
tothenormaldistribution.Analternativeistoplotcumulativerelative
frequencyvs.cumulativeNormalprobabilityasdiscussedinsection7.7.
b) Estimatetheprobabilityofrelativehumiditiesbeingbetween74.5and84.5
using
i) tabulateddata
ii) tabulatedvaluesforthenormaldistribution
iii) thestraightlineonnormalprobabilitypaperorthealternativeplotusing
acomputer.
196
CHAPTER
8
SamplingandCombinationofVariables
Somepartsofthischapterrequireagoodunderstanding
ofsections3.1and3.2andofChapter7.
Veryfrequently,engineerstakesamplesfromindustrialsystems.Thesesamplesare
usedtoinfersomeofthecharacteristicsofthepopulationsfromwhichtheycame.
Whatfactorsmustbekeptinmindintakingthesamples?Howbigdoesasample
needtobe?Thesearesomeofthequestionstobeconsideredinthischapter.In
answeringsomeofthem,wewillneedtoconsidertheothermajorareaofthis
chapter,thecombinationofvariables.
Weoftenneedtocombinetwoormoredistributions,givinganewvariablethat
maybeasumordifferenceormeanoftheoriginalvariables.Ifweknowthevariance
oftheoriginaldistributions,canwecalculatethevarianceofthenewdistribution?
Canwepredicttheshapeofthenewdistribution?Althoughinsomecasesthe
relationshipsareratherdifficulttoobtain,insomeveryimportantandusefulcases
simplerelationshipsareavailable.Wewillbeconsideringthesesimplerelationships
inthischapter.
8.1 Sampling
RememberthatthetermspopulationandsamplewereintroducedinChapter1.A
populationmightbethoughtofastheentiregroupofobjectsorpossiblemeasure-
mentsinwhichweareinterested.Asampleisagroupofobjectsorreadingstaken
fromapopulationforcountingormeasurement.Fromtheobservationsofthe
sample,weinferpropertiesofthepopulation.Forexample,wehavealreadyseenthat
thesamplemean,x,isanunbiasedestimateofthepopulationmean,,andthatthe
samplevariance,s
2
,isanunbiasedestimateofthecorrespondingpopulationvari-
ance,
2
.Furtherdiscussionofstatisticalinferencewillbefoundlaterinthischapter
andlaterchapters.
However,iftheseinferencesaretobeuseful,thesamplemusttrulyrepresentthe
populationfromwhichitcame.Thesamplemustberandom,meaningthatall
possiblesamplesofaparticularsizemusthaveequalprobabilitiesofbeingchosen
fromthepopulation.Thiswillpreventbiasinthesamplingprocess.Iftheeffectsof
oneormorefactorsarebeinginvestigatedbutotherfactors(notofdirectinterest)
mayinterfere,samplingmustbedonecarefullytoavoidbiasfromtheinterfering
197
Chapter8
factors.Theeffectsoftheinterferingfactorscanbeminimized(andusuallymade
negligible)byrandomizingwithgreatcarebothchoiceofthepartsofthesamplethat
receivedifferenttreatmentsandtheorderwithwhichitemsaretakenforsampling
andanalysis.
IllustrationSupposethereisaninterferingfactorthat,unknowntotheexperiment-
ers,tendstoproduceapercentageofrejectsthatincreasesastimegoeson.Saythe
experimentersapplythepreviousmethodtothefirstthirtyitemsandamodified
methodtothenextthirtyitems.Then,clearly,comparingthenumberofrejectsinthe
firstgrouptothenumberofrejecteditemsinthesecondgroupwouldnotbefairto
themodifiedmethod.However,ifthereisarandomchoiceofeitherthestandard
methodorthemodifiedmethodforeachitemasitisproduced,effectsoftheinterfer-
ingfactorwillbegreatlyreducedandwillprobablybenegligible.Thesamewould
betrueiftheinterferingfactortendedtoproduceadifferentpatternofrejecteditems.
Themostcommonmethodsofrandomizationaresomeformofdrawinglots,use
oftablesofrandomnumbers,anduseofrandomnumbersproducedbyacomputer.
DiscussionofthesemethodswillbelefttoChapter11, IntroductiontotheDesignof
Experiments.
8.2 LinearCombinationofIndependentVariables
Saywehavetwoindependentvariables,XandY.Thenalinearcombinationconsists
ofthesumofaconstantmultipliedbyonevariableandanotherconstantmultiplied
bytheothervariable.Algebraically,thisbecomesW=aX+bY,whereWisthe
combinedvariableandaandbareconstants.
Themeanofalinearcombinationisexactlywhatwewouldexpect:
W aX + bY .
Nothingfurtherneedstobesaid.

Ifwemultiplyavariablebyaconstant,thevarianceincreasesbyafactorofthe
constantsquared:variance(aX)=a
2
variance(X).Thisisconsistentwiththefactthat
variancehasunitsofthesquareofthevariable.Variancesmustincreasewhentwo
variablesarecombined:therecanbenocancellationbecausevariabilitiesaccumulate.
Varianceisalwaysapositivequantity,sovariancemultipliedbythesquareofa
constantwouldbepositive.Thus,thefollowingrelationforcombinationoftwo
independentvariablesisreasonable:
2
2 2
2

2
+ b
Y
(8.1)
a
w x
Morethantwoindependentvariablescanbecombinedinthesameway.
IftheindependentvariablesXandYaresimplyaddedtogether,theconstantsa
andbarebothequaltoone,sotheindividualvariancesareadded:
2

(X Y )
2
+
Y
2
(8.2)
+ x
198
SamplingandCombinationofVariables
Thus,sincethecircumferenceofaboardwithrectangularcross-sectionistwicethe
sumofthewidthandthicknessoftheboard,thevarianceofthesumofwidthand
thicknessisthesumofthevariancesofthewidthandthickness,andthevarianceof
thecircumferenceis2
2
(
width
2
+
thickness
2
)ifthewidthandthicknessareindependent
ofoneanother.Equation8.2canbeextendedeasilytomorethantwoindependent
variables.
IfthevariableWisthesumofnindependentvariablesX,eachofwhichhasthe
sameprobabilitydistributionandsothesamevariance
2
,then
x
2 2 2 2

2
)
+
)
++
)
n
w
(
x
(
x
(
x x
(8.3)
1 2 n
Example8.1
Cansofbeefstewhaveameancontentof300geachwithastandarddeviationof
6g.Thereare24cansinacase.Themeancontentofacaseistherefore(24)(300)=
7200g.Whatisthestandarddeviationofthecontentsofacase?
Answer: Variancesareadditive,standarddeviationsarenot.Thevarianceofthe
contentofacanis(6g)
2
=36g
2
.Thenthevarianceofthecontentsofacaseis
(24)(36g
2
)=864g
2
.Thestandarddeviationofthecontentsofacaseis 864
g=29.4g.
IfthevariableYissubtractedfromthevariableX,wherethetwovariablesare
independentofoneanother,theirvariancesarestilladded:
2

(X Y)
2
+
Y
2
(8.4)
X
Thisisconsistentwithequation8.1witha=1andb=1.Anexampleusingthis
relationshipwillbeseenlaterinthischapter.
Ifthevariablesbeingcombinedarenotindependentofoneanother,acorrection
termtoaccountforthecorrelationbetweenthemmustbeincludedintheexpression
fortheircombinedvariance.Thiscorrectionterminvolvesthecovariancebetweenthe
variables,aquantitywhichwedonotconsiderinthisbook.SeethebookbyWalpole
andMyerswithreferencegiveninsection15.2.
8.3 VarianceofSampleMeans
Wehavealreadyseentheusefulnessofthevarianceofapopulation.Nowweneedto
investigatethevarianceofasamplemean.Itisanindicationofthereliabilityof the
samplemeanasanestimateofthepopulationmean.
Saythesampleconsistsofnindependentobservations.Ifeachobservationis
1
multipliedby ,thesumoftheproductsisthemeanoftheobservations, X.That
is,
n
199

Chapter8
1 1 1
X X
1
+ + + X X
2 n
n n n
1 j \

, (
(
X
1
+ + + X
n
)
X
2
n
( ,
Nowconsiderthevariances.Letthefirstobservationcomefromapopulationof
variance
1
2
,thesecondfromapopulationofvariance
2
,andsoontothenth
2
observation.Thenfromequation8.1thevarianceof X is
1 1 1
2 j \
2
2 j \
2
2 j \
2

, (
+
, (
+ +
, (

2
X
2

n n n
( ,
1
( , ( ,
n
2
Butthevariablesallcamefromthesamedistributionwithvariance ,so
1 1 1
2 j \
2
2
j \
2
2
j \
2
2

, (
+ +
, (

X
+
, (
n n n
( , ( , ( ,
1 j \
2
2

, (
(
n
)
n
( ,
or
2
2


(8.5)
X
n
Thatis,thevarianceofthemeanofnindependentvariables,takenfromaprobability
2

distributionwithvariance
2
,is .Thequantitynisthenumberofitemsinthe
n

2

sample,orthesamplesize.Thesquarerootofthequantity ,thatis ,is
n n
calledthestandarderrorofthemeanforthiscase.Noticethatasthesamplesize
increases,thestandarderrorofthemeandecreases.Thenthesamplemean,X,hasa
smallerstandarddeviationandsobecomesmorereliableasthesamplesizeincreases.
Thatseemsreasonable.
Butequation8.5appliesonlyiftheitemsarechosenindependentlyaswellas
randomly.Theitemsinthesamplearestatisticallyindependentonlyifsamplingis
donewithreplacement,meaningthateachitemisreturnedtothesystemandthe
systemiswellmixedbeforethenextitemischosen.Ifsamplingoccurswithout
replacement,wehaveseenthatprobabilitiesfortheitemschosenlaterdependonthe
identitiesoftheitemschosenearlier.Therefore,therelationforvariancegivenby
equation8.5appliesdirectlyonlyforsamplingwithreplacement.
Inpracticalcases,however,samplingwithreplacementisoftennotfeasible.Ifan
itemisknowntobeunsatisfactory,surelyweshouldremoveitfromthesystem,not
stiritbackin.Somemethodsoftestingdestroythespecimensothatitcantbe
returnedtothesystem.
200
SamplingandCombinationofVariables
Ifsamplingoccurswithoutreplacementacorrectionfactorcanbederivedfor
equation8.5.Theresultisthatthestandarderrorofthemeanforrandomsampling
withoutreplacement,stillwithallsamplesequallylikelytobechosen,isgivenby
N n

n
N1
(8.6)
X
whereNisthesizeofthepopulation,thenumberofitemsinit,andnisthesamplesize.
p
r
o
b
a
b
i
l
i
t
y

Ifthepopulationsizeislargeincomparisontothesamplesize,equation8.6reduces
approximatelytoequation8.5.Oftenengineeringmeasurementscanberepeatedas
manytimesasdesired,sotheeffectivepopulationsizeisinfinite.Inthatcase,
equation8.6canbereplacedbyequation8.5.
Example8.2
Apopulationconsistsofone2,one5,andone9.Samplesofsize2arechosen
randomlyfromthispopulationwithreplacement.Verifyequation8.5forthiscase.
9
Answer:Theoriginalpopulationhasameanof5.3333andavarianceof(2
2
+5
2
+
2
16
2
/3)/3=8.2222,soastandarddeviationof2.8674.Itsprobabilitydistribution
isshownbelowinFigure8.1.
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
value
Figure8.1:ProbabilityDistributionofPopulation
Samplesofsize2withreplacementcanconsistoftwo2s,a2anda5,a2anda
9,two5s,a5anda2,a5anda9,two9s,a9anda2,anda9anda5,atotalof
3
2
=9differentresults.Theseareallthepossibilities,andtheyareallequallylikely.
Theirrespectivesamplemeansare2,3.5,5.5,5,3.5,7,9,5.5,and7.Theprobability
ofeachis1/9=0.1111.Sincesamplemeansof3.5,5.5,and7occurtwice,the
samplingdistributionlookslikethefollowing:
201
Chapter8
0.25
0.2
0.15
p
r
o
b
a
b
i
l
i
t
y

0.1
0.05
0
0 1 2 3 4 5 6 7 8 9 10
value
Figure8.2:ProbabilityDistributionofSamplesofSize2
Theexpectedsamplemeanis
x
=0.1111(2+5+9)+0.2222(3.5+5.5+7)=
5.3333,whichagreeswiththemeanoftheoriginaldistribution.Theexpectedsample
varianceis[(2
2
+5
2
+9
2
)+2(3.5
2
+5.5
2
+7
2
)48
2
/9]/9=4.1111,andtheexpected
standarderrorofthemeanis 4.1111 =2.0276.Fromequation8.5thepredicted
8.2222
varianceis =4.1111.Thus,equation8.5issatisfiedinthiscase.
2
Therelationshipsforstandarderrorofthemeanareoftenusedtodeterminehow
largethesamplemustbetomaketheresultsufficientlyreliable.Thesamplesizeis
thenumberoftimesthecompleteprocessunderstudyisrepeated.Forexample,say
theeffectofanadditiveonthestrengthofconcreteisbeinginvestigated.Wedecide
usingtherelationforthestandarderrorofthemeanthatasampleofsize8isre-
quired.Thenthewholeprocessofpreparingspecimenswithandwithouttheadditive
(withotherfactorsunchangedorchangedinachosenpattern)andmeasuringthe
strengthofspecimensmustberepeated8times.Ifthespecimenswerepreparedonly
oncebutanalysiswasrepeated8times,theanalysiswouldbesampled8times,but
theeffectoftheadditivewouldbeexaminedonlyonce.
Example8.3
Apopulationofsize20issampledwithoutreplacement.Thestandarddeviationof
thepopulationis0.35.Werequirethestandarderrorofthemeantobenomorethan
0.15.Whatistheminimumsamplesize?
N n
Answer:Equation8.6givestherelationship, .
x
n
N1
Inthiscase is0.35andNis20.Whatvalueofnisrequiredif
x
isatthelimiting
valueof0.15?
202
20
SamplingandCombinationofVariables
0.35
Substituting,0.15=
n
20
20 1
n

.
n

=1.868
n 0.35
20n=3.490n
Thenn= =4.45
4.490
Butthesamplesize,thenumberofobservationsinthesample,mustbeaninteger.It
mustbeatleast4.45,sotheminimumsamplesizeis5.Asamplesizeof4wouldnot
satisfytherequirement.
Example8.4
Thestandarddeviationofmeasurementsofalineardimensionofamechanicalpartis
0.14mm.Whatsamplesizeisrequiredifthestandarderrorofthemeanmustbeno
morethan(a)0.04mm,(b)0.02mm?
Answer:Sincethedimensioncanbemeasuredasmanytimesasdesired,thepopula-
tionsizeiseffectivelyinfinite.Then


x
n
(a) For
x
=0.04mmand =0.14mm,
20
n
0.15 19
0.14
=3.50
0.04
n=12.25
Thenfor
x
0.04mm,theminimumsamplesizeis13.
(b) For
x
=0.02mmand =0.14mm,
n
0.14
7.00
0.02
n=49
Thenfor
x
0.02mm,theminimumsamplesizeis49.
Becauseoftheinversesquarerelationshipbetweensamplesizeandthestan-
darderrorofthemean,therequiredsamplesizeoftenincreasesrapidlyasthe
requiredstandarderrorofthemeanbecomessmaller.Atsomepoint,further
decreasingofthestandarderrorofthemeanbythismethodbecomesuneconomic.
203
Chapter8
Example8.5
Aplantmanufactureselectriclightbulbswithaburninglifethatisapproximately
normallydistributedwithameanof1200hoursandastandarddeviationof36hours.
Findtheprobabilitythatarandomsampleof16bulbswillhaveasamplemeanless
than1180burninghours.
Answer:Ifthebulblivesarenormallydistributed,themeansofsamplesofsize16
willalsobenormallydistributed.Thesamplingdistributionwillhavemean =
x
36
1200hoursandstandarddeviation 9 hours.
x
16
(z
1
)
Figure8.3:
DistributionofBurningLives
1180 1200 xhours
z
1
0 z
At1180hourswehave
1
z
1180 1200
9

2.222, andthecumulativenormal
probability,(z
1
)=0.0132(fromTableA1withz
1
takentotwodecimals)or0.0131
(fromtheExcelfunctionNORMSDIST).Thentheprobabilitythatarandomsample
of16bulbswillhaveasamplemeanlessthan1180hoursis0.013or1.3%.
Afinalexampleusesthedifferenceoftwonormaldistributions.
Example8.6
Anassemblyplanthasabinfullofsteelrods,forwhichthediametersfollowa
normaldistributionwithameanof7.00mmandavarianceof0.100mm
2
,andabin
fullofsleevebearings,forwhichthe
diametersfollowanormaldistribution
0.4
withameanof7.50mmandavariance
of0.100mm
2
.Whatpercentageof
0.3
randomlyselectedrodsandbearings
willnotfittogether?
0.2
Answer:
0.1
Figure8.4showstheoverlapbe-
tweenthediametersofsteelrodsand
0
Rods
Bearings
sleevebearings.However,itisnotclear
5.5 6 6.5 7 7.5 8
Diameter,mm
fromthisgraphhowtocalculatean
answertothequestion.
Figure8.4:DistributionofDiameters
ofRodsandBearings
204
8.5
SamplingandCombinationofVariables
Ifforanyselectionofonerodandonebearing,thedifferencebetweenthe
bearingdiameterandtheroddiameterispositive,thepairwillfit.Ifnot,theywont.
(Thatmaybealittleoversimplified,butletusneglectconsiderationofclearance.)
Letdbethedifferencebetweenthebearingdiameterandtheroddiameter.
Becauseboththediametersofbearingsandthediametersofrodsfollownormal
distributions,thedifferencewillalsofollowanormaldistribution.Themeandiffer-
encewillbe7.50mm7.00mm=0.50mm.Thevarianceofthedifferenceswillbe
thesumofthevariancesofbearingsandrods:
d
2
=0.100+0.100=0.200mm
2
.
Then
d
= 0.200=0.447mm.SeeFigure8.5.
(z
1
)
Figure8.5:
DistributionofDifferences
0 0.5 d,difference,mm
z
1
0 z
1
z
0 0.5
0.447

1.118
(z
1
)=(1.118)=0.1314(fromnormaldistributiontablewithztakentotwo
decimals)or0.1318(fromtheExcelfunctionNORMSDIST).
Therefore,13.2%or13.1%ofrandomlyselectedsleevesandrodswillnotfit
together.
8.4 ShapeofDistributionofSampleMeans:
CentralLimitTheorem
LetuslookagainatthedistributionsofExample8.2.Westartedwithadistribution
consistingofthreeunsymmetricalspikes,showninFigure8.1.Theprobability
distributionofsamplemeansofsize2inFigure8.2showsthatvaluesclosertothe
meanhavebecomemorelikely.
Nowletuslookatasamplingdistribution(probabilitydistributionofsample
means)forasampleofsize5fromthesameoriginaldistribution.Thenumberof
equallylikelysamplesofsize5fromapopulationof3itemsis3
5
=243,sothe
completesetofresultsismuchlarger.Therefore,onlysomeofthepossiblesamples
willbeshownhere.
Amongthe243equallylikelyresultingsamplesarethefollowing:
205
Chapter8
2 2 2 2 2 2
5 2 2 2 2 2.6
2 5 2 2 2 2.6
2 2 5 2 2 2.6
2 2 2 5 2 2.6
2 2 2 2 5 2.6
5 5 2 2 2 3.2
5 2 5 2 2 3.2
... ... ... ... ... ...
5 5 5 2 2 3.8
... ... ... ... ... ...
9 9 9 9 9 9
SampleMean
c
l
a
s
s

p
r
o
b
a
b
i
l
i
t
y

Becausetherearesomanydifferentsamplemeanswithvaryingfrequencies,the
samplingdistributionisbestshownasahistogramoracumulativedistribution.
Figure8.6isthehistogramforsamplesofsize5.
0.25
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9 10
x
Figure8.6:SamplingDistributionforSamplesofSize5
Wecanseethatthishistogramshowsthelargestclassfrequenciesarenearthe
mean,andtheybecomegenerallysmallertotheleftandtotheright.Infact,the
distributionseemstobeapproximatelyanormaldistribution.Thisisconfirmedby
theplotofnormalcumulativeprobabilityagainstcumulativeprobabilityforthe
samples,whichisshowninFigure8.7.Thedistributionisnotquitenormal,butitis
fairlyclose.Itwouldcomeclosertonormalifthesamplesizewereincreased.
206
SamplingandCombinationofVariables
3 99.87
2.5 99.38
97.72 2
C
u
m
u
l
a
t
i
v
e

P
r
o
b
a
b
i
l
i
t
y
,

1.5 93.32
1
84.13
0.5
69.15
0
z
50
-0.5
30.85
-1 15.87
-1.5
6.68
-2
2.28
-2.5 0.62
-3
0.13
0 1 2 3 4 5 6 7 8 9
value
Figure8.7:ComparisonwithNormalDistributionon
EquivalentofNormalProbabilityPaper
Infact,thisbehaviorassamplesizeincreasesisgeneral.ThisistheCentralLimit
Theorem:ifrandomandindependentsamplesaretakenfromanypracticalpopulation
ofmeanandvariance
2
,asthesamplesizenincreasesthedistributionofsample
meansapproachesanormaldistribution.Aswehaveseen,thesamplingdistribution
2

willhavemeanandvariance .Howlargedoesthesamplesizehavetobe
n
beforethedistributionofsamplemeansbecomesapproximatelythenormaldistribu-
tion?Thatdependsontheshapeoftheoriginaldistribution.Iftheoriginalpopulation
wasnormallydistributed,meansofsamplesofanysizeatallwillbenormally
distributed(andsumsanddifferencesofnormallydistributedvariableswillalsobe
normallydistributed).Iftheoriginaldistributionwasnotnormal,themeansof
samplesofsizetwoorlargerwillcomeclosertoanormaldistribution.Sample
207
Chapter8
meansofsamplestakenfromalmostalldistributionsencounteredinpracticewillbe
normallydistributedwithnegligibleerrorifthesamplesizeisatleast30.Almostthe
onlyexceptionswillbesamplestakenfrompopulationscontainingdistantoutliers.
TheCentralLimitTheoremisveryimportant.Itgreatlyincreasestheusefulness
ofthenormaldistribution.Manyofthesetsofdataencounteredbyengineersare
means,sothenormaldistributionappliestothemifthesamplesizeislargeenough.
TheCentralLimitTheoremalsogivesussomeindicationofwhichsetsof
measurementsarelikelytobecloselyapproximatedbynormaldistributions.If
variationiscausedbyseveralsmall,independent,randomsourcesofvariationof
similarsize,themeasurementsarelikelyclosetoanormaldistribution.Ofcourse,if
onevariableaffectstheprobabilitydistributionoftheresultinaformofconditional
probability,sothattheprobabilitydistributionchangesasthevariablechanges,we
cannotexpecttheresulttobedistributednormally.Ifthemostimportantsinglefactor
isfarfrombeingnormallydistributed,theresultingdistributionmaynotbecloseto
normal.Ifthereareonlyafewsourcesofvariation,theresultingmeasurementsare
notlikelytofollowadistributionclosetonormal.
Problems
l. Themeancontentofaboxofcatfoodis2.50kg,andthestandarddeviationof
thecontentofaboxis0.030kg.Thereare24boxesinacase,andthereare400
casesinacarloadasitleavesthefactory.Whatisthestandarddeviationofthe
amountofcatfoodcontainedin(i)acase,and(ii)acarload?
2. Thedesignloadonahoistis50tonnes.Thehoistisusedtoliftpackageseach
havingameanweightof1.2tonnes.Theweightsof individualpackagesare
knowntobenormallydistributedwithastandarddeviationof0.3tonnes.If40
packagesareliftedatonetime,whatistheprobabilitythatthedesignloadonthe
hoistwillbeexceeded?
3. Bagsofsugarfromaproductionlinehaveameanweightof5.020kgwitha
standarddeviationof0.078kg.Thebagsofsugararepackedincartonsof20
bagseach,andthecartonsarepiledinlotsof12ontopalletsforshipping.
a) Whatpercentageofcartonswouldbeexpectedtocontainlessthan100kgof
sugar?
b) Findtheupperquartileofsugarcontentofacarton.
c) Whatmeanweightofanindividualbagofsugarwillresultin95%ofthe
palletsweighingmorethan1200kg.?
4. Fertilizerissoldinbags.Thestandarddeviationofthecontentofabagis0.43
kg.Weightsoffertilizerinbagsarenormallydistributed.40bagsarepiledona
palletandweighted.
a) Ifthenetweightoffertilizerinthe40bagsis826kg,(i)whatpercentageof
thebagsareexpectedtoeachcontainlessthan20.00kg?(ii)Findthe10th
percentile(orsmallestdecile)ofweightoffertilizerinabag.
208
SamplingandCombinationofVariables
b) Howmanypallets,carrying40bagseach,willhavetobeweighedsothat
thereisatleast96%probabilitythatthemeanweightoffertilizerinabagis
knownwithin0.05kg?
5. Atruckingcompanydeliveringbagsofcementtosuppliershasafleetoftrucks
whosemeanunloadedweightis6700kgwithastandarddeviationofl00kg.
Theyareeachloadedwith800bagsofcementwhichhaveameanweightof44
kgandastandarddeviationof3kg.Thetruckstravelinaconvoyoffourand
passoveraweighscaleenroute.
a) Thegovernmentlimitonloadedtruckweightis42,000kg.Exceedingthis
limitby (i)lessthan125kgresultsinafineof$200,(ii)between125and
200kgresultsinafineof$400and(iii)over200kgyieldsafineof$600.
Whatistheexpectedfinepertruck?
b) Inadditiontotheabove,thegovernmentchargesaspecialroadtaxifthe
meanloadedweightofthetrucksinaconvoyoffourtrucksisgreaterthan
42,000kg.Whatistheprobabilitythatanyparticularconvoywillbecharged
thistax?
6. Apopulationconsistsofoneeachofthefourvalues1,3,4,6.
a) Calculatethestandarddeviationofthispopulation.
b) Asampleofsize2istakenfromthispopulationwithoutreplacement.What
isthestandarderrorofthemean?
c) Asampleofsize2istakenfromthispopulationwithreplacement.Whatis
thestandarderrorofthemean?
d) Ifapopulationnowconsistsofl000eachofthesamefourvalues,whatnow
willbethestandarderrorofthemean(i)withoutreplacementand(ii)with
replacement?
7. Apopulationconsistsofoneeachofthefournumbers2,3,4,7.
a) Calculatethemeanandstandarddeviationofthispopulation.
b) (i)Listallthepossibleandequallylikelysamplesofsize2drawnfromthis
populationwithoutreplacement.Calculatethesamplemeans.(ii)Findthe
meanandstandarddeviationofthesamplemeansfrom(i).(iii)Usemean
andstandarddeviationfortheoriginalpopulationtocalculatethemeanof
thesamplemeansandthestandarderrorofthemean.Comparewiththe
resultsof(ii).
c) Repeatpart(b)(i)to(iii),butnowforsamplesdrawnwithreplacement.
8. a) Arandomsampleofsize2isdrawnwithoutreplacementfromthepopula-
tionthatconsistsofoneeachofthefournumbers5,6,7,and8.(i)Calculate
themeanandstandarddeviationforthepopulation.(ii)Listallthepossible
andequallylikelyrandomsamplesofsize2andcalculatetheirmeans.(iii)
Calculatethemeanandstandarddeviationofthesesamplesandcompareto
theexpectedvalues.
b) Considernowthesamepopulationbutsamplewithreplacement.(i)Listall
thepossiblerandomsamplesofsize2andcalculatetheirmeans.(ii)Calcu-
209
Chapter8
latethemeanandstandarddeviationofthesesamplemeansandcompareto
theresultsobtainedbyuseofthetheoreticalequations.
9. Theresistancesoffourelectricalspecimenswerefoundtobe12,15,17and20
ohms.Listallpossiblesamplesofsizetwodrawnfromthispopulation(a)with
replacementand(b)withoutreplacement.Ineachcasefind:(i)thepopulation
mean,(ii)thepopulationstandarddeviation,(iii)themeanofthesamplemeans
(i.e.,ofthesamplingdistributionofthemean),(iv)thestandarddeviationofthe
samplemeans(i.e.,thestandarderrorofthemean) Showhowtoobtain(iii)and
(iv)from(i)and(ii)usingtheappropriaterelationships.
10. Apopulationconsistsofoneofeachofthefournumbers3,7,11,15.
a) Findthemeanandstandarddeviationofthepopulation.
b) Considerallpossiblesamplesofsize2whichcanbedrawnwithoutreplace-
mentfromthispopulation.Findthemeanofeachpossiblesample.Findthe
meanandthestandarddeviationofthesamplingdistributionofmeans.
Comparethestandarderrorofthemeanswithanappropriateequationfrom
thisbook.
c) Repeat(b),exceptthat.thesamplesarechosenwithreplacement.
11. Steelplatesaretorestincorrespondinggrooves.Themeanthicknessofthe
platesis2.100mm,andthemeanwidthofthegroovesis2.200mm.Thestan-
darddeviationofplatethicknessesis0.024mm,andthestandarddeviationof
groovewidthsis0.028mm.Wefindthatunlesstheclearance(differencebe-
tweengroovewidthandplatethickness)foraparticularpairisatleast0.040
mm,thereisriskofbinding.Assumethatbothplatethicknessesandgroove
widthsarenormallydistributed.Ifplatesandgroovesarematchedrandomly,
whatpercentageofpairswillhaveclearanceslessthan0.04mm?
12. Rodsaretakenfromabininwhichthemeandiameteris8.30mmandthe
standarddeviationis0.40mm.Bearingsaretakenfromanotherbininwhichthe
meandiameteris9.70mmandthestandarddeviationis0.35mm.Arodanda
bearingarebothchosenatrandom.Ifdiametersofbotharenormallydistributed,
whatistheprobabilitythattherodwillfitinsidethebearingwithatleast0.10
mmclearance?
13. Acoffeedispensingmachineissupposedtodispenseameanof7.00fluidounces
ofcoffeepercupwithstandarddeviation0.25fluidounces.Thedistribution
approximatesanormaldistribution.Whatistheprobabilitythat,when12cups
aredispensed,theirmeanvolumeismorethan7.15fluidounces?Explainwhy
thenormaldistributioncanbeassumedinthiscalculation.
14. Accordingtoamanufacturer,afive-litercanofhispaintwillcover60square
metersonaverage(whenhisinstructionsarefollowed),andthestandarddevia-
tionforcoveragebyonecanwillbe3.10m
2
.Apaintingcontractorbuys40cans
andfindsthattheaveragecoverageforthese40cansisonly58.8m
2
.
a) Noinformationisavailableonthedistributionofcoveragebyacanofpaint.
210
SamplingandCombinationofVariables
Whatruleortheoremindicatesthatthenormaldistributioncanbeusedfor
themeancoverageby40cans?Whatnumericalcriterionissatisfied?
b) Whatistheprobabilitythatthesamplemeanwillbethissmallorsmaller
whenthetruepopulationmeanis60.0m
2
thatis,ifthemanufacturers
claimistrue?
15. Theamountofcopperinanoreisestimatedbyanalyzingasamplemadeupofn
specimens.Previousexperienceindicatesthatthisgivesnosystematicerrorand
thestandarddeviationofanalysisonindividualspecimensis2.00grams.How
manymeasurementsmustbemadetoreducethestandarderrorofthesample
meantonomorethan0.6grams?
16. Theresistanceofagroupofspecimenswasfoundtobe12,15,17and20ohms.
Considerallpossiblesamplesofsizetwodrawnfromthisgroup:
a) withreplacement
b) withoutreplacement.
Ineachcasefind:
(i) thepopulationmean
(ii) thepopulationstandarddeviation
(iii)themeanofthesamplemeans(i.e.,ofthesamplingdistributionofthemean)
(iv)thestandarddeviationofthesamplemeans(i.e.,thestandarderrorofthe
mean)
Showhowtoobtain(iii)and(iv)from(i)and(ii)usingtheappropriateformulas.
211
CHAPTER
9
StatisticalInferencesfortheMean
ThischapterrequiresagoodknowledgeofChapter7.
Somepartsrequireaknowledgeofsections8.3and8.4.
Wehavealreadyseenthatsamplescanbeusedtoinfersomeinformationaboutthe
populationfromwhichthesamplecame,specificallythemeanandvarianceofthe
population.Nowweintendtoinferfurtherinformation,whichshouldbequantitative.
Thischapterwillbeconcernedwithstatisticalinferencesforthemean,andlater
chapterswillbeconcernedwithstatisticalinferencesforotherstatisticalquantities.
Theideas,approachesandnomenclatureconcerningstatisticalinferencedevelopedin
thischapterwillmostlybeapplicabletotheotherquantities.
Therearetwomainquestionsforwhichstatisticalinferencemayprovideanswers
inthisandlaterchapters.Supposewehavecollectedarepresentativesamplethat
givessomeinformationconcerningameanorotherstatisticalquantity.Onequestion
onthebasisofthisinformationwouldbe:arethesamplequantityandacorrespond-
ingpopulationquantitycloseenoughtogethersothatitisreasonabletosaythatthe
samplemighthavecomefromthepopulation?Oraretheyfarenoughapartsothat
theylikelyrepresentdifferentpopulations?Weshouldmaketheanswertothose
questionsasquantitativeaswecan.Anotherquestionmightbe:forwhatintervalof
valuescanwehaveaspecificlevelofconfidencethattheintervalcontainsthetrue
valueoftheparameterofinterest?Wewillfindthattheanswerstothesetwoques-
tionsarerelatedtooneanother.
Furthermore,wecandividestatisticalinferencesforthemeanintotwomain
categories.Inonecategory,wealreadyknowthevarianceorstandarddeviationofthe
population,usuallyfrompreviousmeasurements,andthenormaldistributioncanbe
usedforcalculations.Intheothercategory,wefindanestimateofthevarianceor
standarddeviationofthepopulationfromthesampleitself.Thenormaldistribution
isassumedtoapplytotheunderlyingpopulation,butanotherdistributionrelatedtoit
willusuallyberequiredforcalculations.Wewillstartwiththefirstcategorybecause
itissimplerandallowsustodevelopthemainideasneeded.Thesecondcategoryis
foundmoreofteninpractice.
212
StatisticalInferencesfortheMean
9.1 InferencesfortheMeanwhenVarianceIsKnown
Wemayhavesomepreviousdatagivingthevarianceorstandarddeviationofthe
population,anditmaybereasonabletoassumethatthepreviousvalueofthevari-
ancestillapplies.Inthatcase,howcanwemakequantitativeinferencesforthe
mean?
9.1.1 TestofHypothesis
Herewearetestingthehypothesisthatasampleissimilarenoughtoaparticular
populationsothatitmighthavecomefromthatpopulation.Inthatcase,ifthe
hypothesisistrue,alldisagreementbetweensampleandpopulationisduetorandom
variation,andwesaythatthesampleisconsistentwiththepopulation.Butisthat
hypothesisreasonableorplausible?Specifically,wemakethenullhypothesisthatthe
samplecamefromapopulationhavingthestatedvalueofthepopulationcharacteris-
tic,whichisthemeaninthiscase.Thenwedocalculationstoseehowreasonable
suchahypothesisis.Wehavetokeepinmindthealternativeifthenullhypothesisis
nottrue,asthealternativewillaffectthecalculations.
Illustration:Saythepercentagemetalinthetailingsstreamfromaflotationmill
inthemetallurgicalindustryhasbeenfoundtofollowanormaldistribution.When
themillisoperatingnormally,themeanpercentagemetalinthestreamis0.370and
thestandarddeviationis0.015.Theseareassumedtobepopulationvalues,and.
Nowaplantoperatortakesasinglespecimenasasampleandfindsapercentage
metalof0.410.Doesthisindicatethatsomethingintheprocesshaschanged,orisit
stillreasonabletosaythatthemillisoperatingnormally?Toputthatquestionalittle
differently,isitplausibletosaythatthissamplevalueoramoreextremeonemight
occurbychancewhilethepopulationmeanpercentagemetalisstill0.370?
Ournullhypothesisisthatnothinghaschanged,sothepopulationmeanisstill
0.370.Whatisthealternativehypothesis?Areweconcernedwithpossiblechangesin
bothdirections,positiveandnegative,orarechangesinonlyonedirectionimpor-
tant?Inthesituationofmeasurementsofpercentagemetalinawastestream,we
wouldlikelybeconcernedwithdeviationsinbothdirections.Thenthenullhypoth-
esiswouldbequestionedinrelationtoavalueasfarawayfrom0.370as0.410or
fartherineitherdirection.Thisisoftencalledatwo-sidedortwo-tailedtest.Inthat
casethespecificnullhypothesisisthat=0.370,andthespecificalternative
hypothesisisthat 0.370.Inothercaseswe
maybeinterestedinchangesinonlyone
direction,soadifferentalternativehypothesis
wouldapply.WeusethesymbolsH
0
forthe
nullhypothesisandH forthealternative
a
(z
1
)
1(z
1
)
hypothesis.Noticethatthenullhypothesisand
alternativehypothesismustalwaysbestatedin
termsofpopulationvalues,suchasthepopula-
tionmean,.
0.3700.410 x,percentagemetal
z
1
0 z
1
z
Figure9.1:TestofHypothesis
213
Chapter9
Nowwecalculatetheprobabilityofgettingasamplevaluethisfarfromthe
populationmeanorfartherusingthenormaldistributionandassumingthatthenull
hypothesisistrue.Foratwo-sidedtest,deviationsinbothdirectionshavetobe
considered.ThesituationnowisshowninFigure9.1.(Itisalwaysagoodideato
sketchasimplediagramlikeFigure9.1inthistypeofproblem.Weshouldmarkon
itasmuchinformationaswehaveavailableatthispointinthesolution.)Because
thetestistwo-sided,weneedtocalculatetheprobabilitycorrespondingtobothtails.
x 0.370
Theteststatisticis z ,thetestdistributionisnormal,andlargevalues
0.015
of|z|willgiveevidenceagainstthenullhypothesis.z
1
willbethevalueofzcorre-
spondingtothesampleobservation,x=0.410.
Nowwearereadyforcalculationsusingthesampleobservation.
0.410 0.370
Wehave z
1
2.67.
0.015
FromTableA1orthefunctionNORMSDISTonExcel,
(2.67)=0.9962,so
1(2.67)=10.9962=0.0038,
and(2.67)=0.0038.
Thenassumingthatthenullhypothesisistrue,theprobabilityofasamplevaluethis
farawayfromthepopulationmeanorfartherbychanceineitherdirectionis0.0038
+0.0038=0.0076or0.8%.
Isitreasonabletothinkthatthisistheonetimeinabout130thataresultthisfar
awayfromthemeanorfartherwouldoccurbychance?Itmightbe,butmorelikely
thepopulationmeanhaschanged,contrarytothenullhypothesis.Thentheconclu-
sionistorejectthenullhypothesis.
Theobservedlevelofsignificanceorp-valueistheprobabilityofobtaininga
resultasfarawayfromtheexpectedvalueastheobservationis,orfarther,purelyby
chance,whenthenullhypothesisistrue.Thatwouldbe0.8%inthenumerical
illustration.Noticethatasmallerobservedlevelofsignificanceindicatesthatthenull
hypothesisislesslikely.Ifthisobservedlevelofsignificanceissmallenough,we
concludethatthenullhypothesisisnotplausible.
Theprocedurefortestsofsignificancecanbesummarizedasfollows:
1. Statethenullhypothesisintermsofapopulationparameter,suchas.
2. Statethealternativehypothesisintermsofthesamepopulationparameter.
3. Statetheteststatistic,substitutingquantitiesgivenbythenullhypothesisbut
nottheobservedvalues.Whatvaluesoftheteststatisticwillindicatethatthe
differencemaybesignificant?Statewhatstatisticaldistributionisbeing
used.
214
StatisticalInferencesfortheMean
4. Showcalculationsassumingthatthenullhypothesisistrue.
5. Reporttheobservedlevelofsignificance,orelsecomparethevalueofthetest
statisticwithacriticalvalueasdiscussedbelow.
6. Stateaconclusion.Thatmightbeeithertoacceptthenullhypothesis,orelseto
rejectthenullhypothesisinfavorofthealternativehypothesis.Iftheevidenceis
notstrongenoughtorejectthenullhypothesis,itistentativelyaccepted,butthat
mightbechangedbyfurtherevidence.Bystatisticalanalysiswecannotprove
thatthenullhypothesisiscorrect.Insteadofsayingthatthenullhypothesisis
accepted,itisoftenbettertosayjustthatthenullhypothesisisnotrejected.
Inmanyinstanceswechooseacriticallevelofsignificancebeforeobservations
aremade.Themostcommonchoicesforthecriticallevelofsignificanceare10%,
5%,and1%.Iftheobservedlevelofsignificanceissmallerthanaparticularcritical
levelofsignificance,wesaythattheresultisstatisticallysignificantatthatlevelof
significance.Iftheobservedlevelofsignificanceisnotsmallerthanthecriticallevel
ofsignificance,wesaythattheresultisnotstatisticallysignificantatthatlevelof
significance.
Example9.1
ItisveryimportantthatacertainsolutioninachemicalprocesshaveapHof8.30.
Themethodusedgivesmeasurementswhichareapproximatelynormallydistributed
abouttheactualpHofthesolutionwithaknownstandarddeviationof0.020.We
decidetouse5%asthecriticallevelofsignificance.
a) SupposeasingledeterminationshowspHof8.32.Thenullhypothesisisthatthe
truepHis8.30(H
0
:pH=8.30).ThealternativehypothesisisH
a
:pH8.30(this
isatwo-sidedtestbecausethereisnoindicationthatchangesinonlyonedirec-
tionareimportant).
pH 8.30
Theteststatisticis z ,andlargevalues
0.020
of|z|willmakethenullhypothesisimplausible.
(z
1
)
1(z
1
)
Thenormaldistributionapplies.
SeeFigure9.2.
8.32 8.30
8.30 8.32 pH
z
1
1.00
z
1
0 z
1
z
0.020
(z
1
)=0.8413 (fromTableA1orthefunction
Figure9.2:
TestofHypothesis
NORMSDISTonExcel),so1 (z
1
)=0.1587.
(z
1
)=0.1587(fromsamesources).
Thentheobservedlevelofsignificanceis(2)(0.1587)=0.317or31.7%.
Sincethisislargerthan5%,wedonotrejectthenullhypothesis.Wedonot
haveenoughevidencefromthiscalculationtosaythatthepHisnotequalto
215
Chapter9
8.32.WecouldsaythatthedifferencefromapHof8.30isnotstatistically
significantatthe5%levelofsignificance.
b) Supposethatnowoursampleconsistsof4determinationsgivingvaluesof8.31,
8.34,8.32,8.31.Thesamplemeanis x
1
=8.32.
ThenullhypothesisisH
0
:=8.30.
ThealternativehypothesisisH
a
: 8.30(stillatwo-sidedtest).
x 8.30
Theteststatisticis
z
.
(
0.020/ 4
)
Thenormaldistributionapplies.
Thediagramisthesameasbefore(Figure9.2)exceptthatpHisreplacedby
x
.
8.32 8.30
Now z
1
2.00 .
0.010
(z
1
)=0.9772(fromTableA1orthefunctionNORMSDISTonExcel),
1 (z
1
)=0.0228.(z
1
)=0.0228(fromthesamesources).
Thentheobservedlevelofsignificanceis(2)(0.0228)=0.046or4.6%.
Sincethisis(just)lessthan5%,werejectthenullhypothesisandacceptthe
alternativehypothesisthat 8.30.Atthe5%levelofsignificanceweconclude
thatthetruemeanpHisnolonger8.30.
Wemightwanttoexaminetherejectionregionforthisproblem.SeeFigure9.3.
Forsamplesofsize4therejectionregion
at5%criticallevelofsignificanceis
,
j 0.020\
]
RejectionRegion
theunionof
,
x> 8.30+ z
2 , (]
and
RejectionRegion

(
4
,
]
,
j 0.020\
]
,
x< 8.30 z
2 , (]
,
Figure9.3:RejectionRegion

(
4
,
]
where1F(z
2
)=0.025,soF(z
2
)=0.975,andF(z
2
)=0.025.Thevalueofz
2
canbe
foundfromTableA1orfromtheExcelfunctionNORMSINVtobe1.96.Sincez
1
=
2.00,justlargerthan1.96,againtheconclusionwouldbetorejectthenullhypothesis.
Aresultthisclosetotheboundarybetweentherejectionregionandtheaccep-
tanceregionwouldlikelyresultinfurthersamplinginpractice.
Acomparisonofparts(a)and(b)ofExample9.1showstheeffectofsamplesize.
Example9.2
Thestrengthofsteelwiremadebyanexistingprocessisnormallydistributedwitha
meanof1250andastandarddeviationof150.Abatchofwireismadebyanew
process,andarandomsampleconsistingof25measurementsgivesanaverage
216
StatisticalInferencesfortheMean
strengthof1312.Assumethatthestandarddeviationdoesnot
change.Isthereevidenceatthe1%levelofsignificancethat
thenewprocessgivesalargermeanstrengththantheold?
Answer:
1312
1%
x
bar
H
0
:=1250
1250 x
1
0 z
1
z
H
a
:>1250 (aone-tailedtestbecausethequestionasksabout
alargermeanstrength)
Figure9.4:
One-tailedTest
x 1250
Theteststatisticis
z
(
150/ 25
)
,andlargevaluesofz
willindicatethatH
0
isnotplausible.Thecriticalvalueofzforaone-tailedtestfor
1%levelofsignificancecorrespondsto1F(z
1
)=0.01,soF(z
1
)=10.01=0.99.
FromTableA1orNORMSINVthiscorrespondstoz
1
=2.33.
Thenthecriticalvalueofxis
x
j 150 \
1
=1250+(2.33), ( =1320.
(
25
,
Thesamplemeanis1312.ThesequantitiesareshowninFigure9.4.
Since1312<1320,theresultisnotintherejectionregion;wehaveinsufficient
evidencetorejectthenullhypothesis.Thereisnotenoughevidenceatthe1%level
ofsignificancetosaythatthenewprocessgivesalargermeanstrengththantheold.
Wemaywanttoobtainmoreevidencebyalargersample.Butfornow,wecansay
just thattheincreaseinmeanstrengthisnotstatisticallysignificantatthe1%level
ofsignificance.
Wemaydecidetotakesomeactiononthebasisofthetestofsignificance,such
asadjustingtheprocessifaresultisstatisticallysignificant.Butwecanneverbe
completelycertainwearetakingtherightaction.Therearetwotypesofpossible
errorwhichwemustconsider.
ATypeIerroristorejectthenullhypothesiswhenitistrue.Inthecaseofa
mean,thisoccurswhenthenullhypothesisiscorrect,butanobservationorsample
meanissofarfromtheexpectedmeanbychancethatthenullhypothesisisrejected.
TheprobabilityofaTypeIerrorisequaltothelevelofsignificance.
ATypeIIerroristoacceptthenullhypothesiswhenitisfalse.Ifweareapply-
ingatestofsignificancetoamean,thenullhypothesiswouldusuallybethatthe
populationmeanhasnotchanged.Ifinfactthepopulationmeanhaschanged,the
nullhypothesisisfalse.Butthesamplemeanmightstillbychancecomeclose
enoughtotheoriginalsamplemeansothatwewouldacceptthenullhypothesis,
givingaTypeIIerror.Thisismorelikelytooccurifthepopulationmeanhas
changedonlyalittle.Thus,theprobabilityofaTypeIIerrordependsonhowmuch
217
Chapter9
thepopulationmeanhaschangedincomparisontothestandarderrorofthemean.
Howmuchchangedowewanttobefairlycertainofdetecting?Weshouldtakethis
intoaccountwhenwechoosethecriticallevelofsignificance.
Ifthecriticallevelofsignificanceismadesmaller,theprobabilityofaTypeI
errorbecomessmaller,buttheprobabilityofaTypeIIerrorbecomeslarger.Tomake
theprobabilityofaTypeIIerrorsmaller,weshouldchoosealargervalueforthe
criticallevelofsignificance.Rationalchoiceofacriticallevelofsignificancethen
dependsonbalancingthetwotypesoferror.Noticethatwemaybeabletoreduce
botherrorseitherbydecreasingthevarianceoftheunderlyingsystem(i.e.,making
themeasurementsmorereproducible)orbyincreasingthesamplesize.Further
discussionofchoosingacriticallevelofsignificancecanbefoundinvariousrefer-
encebooks,suchasthebookbyVardemanortheonebyWalpoleandMyers.See
section15.2forreferences.
Wemustdistinguishclearlybetweenstatisticalsignificance,asshownbyatestof
hypothesis,andpracticalsignificance,whichisdeterminedbyaneconomicanalysis.
Analternativemaygivearesultwhichissignificantlybetterthanthepreviouschoice
statistically,butthedifferencemaybetoosmalltobeworthwhileeconomically.For
example,sayamechanicaldevicegivesasmallimprovementinanautomobiles
gasolinemileage.Thatimprovementmaybestatisticallysignificant,butitmaynot
beenoughtojustifyitscosteconomically.
Problems
1. Whenamanufacturingprocessisoperatingproperly,themeanlengthofacertain
partisknowntobe6.175inches,andlengthsarenormallydistributed.The
standarddeviationofthislengthis0.0080inches.Ifasampleconsistingof6
itemstakenfromcurrentproductionhasameanlengthof6.168inches,isthere
evidenceatthe5%levelofsignificancethatsomeadjustmentoftheprocessis
required?
2. AtaxicompanyhasbeenusingBrandAtires,andthedistributionofkilometers
towear-outhasbeenfoundtobeapproximatelynormalwith=114,000and
=11,600.Nowittries12tiresofBrandBandfindsasamplemeanof x =
117,200.Testatthe5%levelofsignificancetoseewhetherthereisasignificant
difference(positiveornegative)inkilometerstowear-outbetweenBrandAand
BrandB.Assumethestandarddeviationisunchanged.Showallstepsofthe
proceduredescribedbeforeExample9.1.
3. Theaveragedailyamountofscrapfromaparticularmanufacturingprocessis
25.5kgwithastandarddeviationof1.6kg.Amodificationoftheprocessistried
inanattempttoreducethisamount.Duringa10-daytrialperiod,thekilograms
ofscrapproducedeachdaywere:25.0,21.9,23.5,25.2,22.0,23.0,24.5,25.0,
26.1,22.8.Fromthenatureofthemodification,nochangeinday-to-dayvari-
abilityoftheamountofscrapwillresult.Thenormaldistributionwillapply.A
218
StatisticalInferencesfortheMean
firstglanceatthefiguressuggeststhatthemodificationiseffectiveinreducing
thescraplevel.Doesasignificancetestconfirmthisatthe1%level?
4. Thestandarddeviationofaparticulardimensiononamachinepartisknownto
be0.0053inches.Fourpartscomingofftheproductionlinearemeasured,giving
readingsof2.747,2.740,2.750and2.749inches.Thepopulationmeanis
supposedtobe2.740inches.Thenormaldistributionapplies.
a) Isthesamplemeansignificantlylargerthan2.740inchesatthe1%levelof
significance?
b) WhatistheprobabilityofaTypeIIerror(i.e.,ofacceptingthenullhypoth-
esisofpart(a)wheninfactthetruemeanis2.752inches)?Assumethe
standarddeviationremainsunchanged.
5. Theoutletstreamofacontinuouschemicalreactorissampledeverythirty
minutesandtitrated.Extensiverecordsofnormaloperationshowtheconcentra-
tionofcomponentAinthisstreamisapproximatelynormallydistributedwith
mean41.2g/Landstandarddeviation0.90g/L.
a) WhatistheprobabilitythattheconcentrationofcomponentAinthisstream
willbemorethan42.3g/L?
b) FivedeterminationsofconcentrationofcomponentAaremade.Ifthemean
ofthesefiveconcentrationsismorethan42.3g/L,actionistaken.Whatis
thelevelofsignificanceassociatedwiththistest?
c) Statethenullhypothesisandthealternativehypothesisthatfitthetest
describedinpart(b).
d) Thetestinpart(b)isapplied.Nowsupposethetruemeanhaschangedto
43.5g/Lwithnochangeinstandarddeviation.Whatistheprobabilityofa
TypeIIerror?
6. Amanufacturerproducesaspecialalloysteelwithanaveragetensilestrengthof
25,800psi.Thestandarddeviationofthetensilestrengthis300psi.Strengthsare
approximatelynormallydistributed.Achangeinthecompositionofthealloyis
triedinanattempttoincreaseitsstrength.Asampleconsistingofeightspeci-
mensofthenewcompositionistested.Unlessanincreaseinthestrengthis
significantatthe1%level,themanufacturerwillreturntotheoldcomposition.
Standarddeviationisnotaffected.
a) Ifthemeanstrengthofthesampleofeightitemsis26,100psi,shouldthe
manufacturercontinuewiththenewcomposition?
b) Whatistheminimummeanstrengththatwilljustifycontinuingwiththenew
composition?
c) Howlargewouldthetruemeanstrengthofthenewcomposition(i.e.,anew
populationmean)havetobetomaketheodds9to1infavorofobtaininga
samplemeanatleastasbigastheonespecifiedinpart(b)?
7. Noiselevelsinthecabsofalargenumberofnewfarmtractorsweremeasured
tenyearsagoandwerefoundtovaryaboutameanvalueof76.5decibels(db)
219
Chapter9
withavarianceof72.43db
2
.Aresearcherconductedasurveyofthisyearsnew
tractorstodeterminewhetherornottractorcabmanufacturershavebeensuccess-
fulindevelopingquietercabs.Inherfinalreport,theresearcherstatedthatthe
meannoiselevelinthecabsshestudiedwas74.5db,andsheconcludedthat
therewasonly12%probabilityofgettingresultsatleastthisfardifferentifthere
wasnorealreductioninnoiselevel.Calculatethenumberofcabsthatthe
researchermusthavesurveyedinordertohavedrawnthisconclusion.
8. JackSprattisinchargeofqualitycontroloftheconcretepouredduringthe
constructionofacertainbuilding.Hehasspecimensofconcretetestedtodeter-
minewhethertheconcretestrengthiswithinthespecifications;thesecallfora
meanconcretestrengthofnolessthan30MPa.Itisknownthatthestrengthof
suchspecimensofconcretewillhaveastandarddeviationof3.8MPaandthat
thenormaldistributionwillapply.Mr.Sprattisauthorizedtoordertheremoval
ofconcretewhichdoesnotmeetspecifications.Sincethegeneralcontractorisa
burlysort,Mr.Sprattwouldliketoavoidremovingtheconcretewhentheaction
isnotjustified.Therefore,theprobabilityofrejectingtheconcretewhenit
actuallymeetsthespecificationshouldbenomorethan1%.Whatsizesample
shouldMr.Sprattuseifasamplemean10%lessthanthespecifiedmean
strengthwillcauserejectionoftheconcretepour?Statethenullhypothesisand
alternativehypothesis.
9. Ascaleforweighingbagsofproducteitherweighscorrectlyorslipsoutof
adjustmentsothatitreadshighbyaconstant5kg.Thescaleisusedtoweigh
samplesof20bagsofproduct.Thebagsareintendedtohaveameanweightof
35kgeach,andthepopulation standarddeviationremainsconstantat6kg.The
baggingmachineischeckedwhenthescaleindicatesthatthemeanweightofthe
bagsissignificantlyhigherthanexpectedatthe5%levelofsignificance.
a) Whatisthemaximumsamplemeanthatwillnottriggerabaggingmachine
check?
b) Usingthevaluefrompart(a),whatistheprobabilitythatthebagging
machinewillnotbecheckedwhenithasslippedoutofadjustment?
c) Atwhatcutoffvalueandlevelofsignificancewilltheprobabilityofan
undetectedslippageequaltheprobabilityofanunnecessarymachinecheck?
10. Amanufactureroffluorescentlampsclaims(1)thathislampshaveanaverage
luminousfluxof3,600lmatratedvoltageandfrequencyand(2)that90%ofall
lampsproducedbyanautomaticprocesshavealuminousfluxhigherthan3,300
lm.Theluminousfluxofthelampsfollowsanormaldistribution.Whatstandard
deviationisimpliedbythemanufacturersclaim?Assumethatthisstandard
deviationdoesnotchange.Arandomsampleofl0lampsistestedandgivesa
samplemeanof3,470lm.Atthe5%levelofsignificancecanweconcludethat
themeanluminousfluxissignificantlylessthanwhatthemanufacturerclaims?
Stateyournullhypothesisandalternativehypothesis.
220
StatisticalInferencesfortheMean
9.1.2 ConfidenceInterval
Wesawintheprevioussectionthatwhenthenormaldistributionapplies,therejec-
tionregionforasamplemeanatthe5%levelofsignificanceintwotailsistheunion
of z<1.96andz>+1.96.Intherejectionregionsamplemeansarefarenough
awayfromtheassumedpopulationmeanthatonly5%ofsamplemeanswouldfall
therebychance.
Wecanlookatthosesamenumbersfromanother
pointofview.Ifthepopulationmeanisandthe
normaldistributionapplies,theprobabilitythatarandom
samplemeanwillfallbychanceintheregionbetween
2.5%
95%
Confidence
Interval 2.5%
z
1
=1.96andz
1
=+1.96is100%5%=95%.There-
z
1
0 z
1
z
fore,wecanhave95%confidencethatarandomsample
Figure9.5:
meanwillfallinthatinterval.Thatisshownin
Confidence Interval
Figure9.5.
Thisiscalledthe95%confidenceinterval.Thelevelofconfidencefortheinterval
is95%.Similarly,wemightfinda98%confidenceintervalorsomeotherintervalfor
astatedlevelofconfidence.
Example9.3
Datatakenoveralongperiodoftimehaveestablishedthatthestandarddeviationof
percentageironinanironanalysisis0.12,andthatisnotexpectedtochange.A
representative,well-mixedoresampleisanalyzedsixtimes,sothesamplesizeis6.
Ifthetrueironcontentis32.60percentiron,ifthereisnosystematicerror,andifthe
normaldistributionapplies,whatisthe95%confidenceintervalforsamplemeans?
Answer:Forthe95%confidencelevel,z
1
=1.96.
Thentheconfidenceintervalforsamplemeansisfrom
j 0.12\ j 0.12\
32.601.96,
6
( to 32.60+1.96,
6
(
,
,
( , (
or 32.50 to 32.70percentiron.
Buttheproblemwhichwefaceinpracticeisusuallynottofindaconfidence
intervalforsamplemeans.Muchmorefrequentlyweneedtofindaconfidence
intervalforthepopulationmean.Thesamplemeanisknownfrommeasurements,
andthepopulationmeanistheuncertainvalueforwhichweneedanestimate.We
alreadyknowthatthesamplemeangivesapointestimateforthepopulationmean,
butnowweneedanintervalestimate.Thatintervalshouldcorrespondtoastated
levelofconfidencethattheintervalcontainsthetruepopulationmeaniftheassump-
tionsaresatisfied.Theassumptionsarethatthenormaldistributionapplies,thereis
nosystematicerror,thesampleisrandomandcanthereforebeconsideredrepresenta-
tive,andthestandarddeviationofthepopulationisknown.Thentheknownsample
221
Chapter9
meanisatthecenteroftheconfidenceintervalforthepopulationmean.Thesketch
showninFigure9.5stillapplies.
Example9.4
AsinExample9.3,thestandarddeviationofpercentageironinanalysesis0.12,the
samplesizeis6,andthenormaldistributionapplieswithnosystematicerror.The
samplemeanisdeterminedfrommeasurementstobe32.56percentiron.Findthe
95%confidenceintervalforthetruemeanironcontentofthepopulationfromwhich
thesamplecame.
Answer:For95%confidenceleveltheintervalisstill
fromz=1.96toz=+1.96. x=32.56.Thentheinterval
estimateforthepopulationmeanwith95%confidenceis
95%
Confidence
Interval 2.5%
j 0.12\ j 0.12\
2.5%
from32.561.96,
6
( to 32.56+1.96,
6
,
,or (
z
1
0 z z
1
( , (
from32.46to32.66percentiron.SeeFigure9.6.This
32.46 32.56 32.66
sortofresultisoftenshownas32.560.10,orwithin
Figure9.6:
0.10ofthesamplemean. 95%ConfidenceInterval
Example9.5
Alargepopulationisnormallydistributedwithastandarddeviationof0.12.A
randomsamplewillbetakenfromthispopulationandthesamplemeanwillbe
calculated.Werequireatleast98%confidencethatthetruepopulationmeanwillbe
within0.05ofthesamplemean,assumingthereisnosystematicerror.Whatsample
sizeisrequired?
Answer:
For98%confidence1(z
1
)=0.01and
(z
1
)=0.01.FromTableA1orfromExcel
functionNORMSINV()wefindthat
x
1

z
1
=2.33.Wehavealsothat z
1
and

x
x
1

98%
Confidence
Interval
1% ,so
z
1

x
n
(
/ n
)
.
1%
z
1
0 z z
1
Substitutingfor x
1
=0.05(whichwill x
2
x x
1

alsogive x
2
=0.05with
x
2

Figure9.7:
),andsubstituting=0.12and Confidence Interval z
1

x
z
1
=2.33,weobtain
222
StatisticalInferencesfortheMean
0.05
2.33
j 0.12\
,
n
(
( ,
0.12 0.05

n
2.33
n
(
0.12
)(
2.33
)
5.59
0.05
and n=(5.59)
2
=31.3.
Thesamplesizemustbeatleastthislarge,anditmustbeaninteger.Themini-
mumsamplesizetogiveatleast98%confidencethatthepopulationmeanwillbe
within0.05ofthesamplemeanthereforeis32.The98%confidenceintervalwill
thenbe x0.05.
Rememberthatcalculationsforbothtestofhypothesisandconfidencelimitsby
themethodswhichhavebeendiscussedsofarhavethreerequirements:
1. Thesamplemustberandomandrepresentative.
2. Thedistributionofthevariablemustbeanormaldistribution,atleasttoagood
approximation.However,theCentralLimitTheoremishelpfulhere.Ifthe
samplecontainsenoughobservations,thesamplemeanwillbenormallydistrib-
uted(towhateverapproximationisrequired)eventhoughtheoriginal
observationswerenot.
3. Thestandarddeviationoftheobservationsmustbeknownreliably,probably
frompreviousinformation.
Theserequirementsareoftennotcompletelymet.Forexample,probability
distributionsinthetails,farfromthepopulationmean,areoftennotexactlyaccord-
ingtothenormaldistribution.Therefore,inferencesmaynotbequiteasreliableas
theyseem:a98%confidenceintervalmayactuallybea96%confidenceinterval,
andsoon.
Thenextsectionwillconsideracalculationwherethestandarddeviationofthe
observationsisnotknownreliablybeforetheexperiment,soitmustbeestimated
fromasampleofmoderatesize.
Problems
1. Acocoapackagingmachinefillsbagssothatthebagcontentshaveastandard
deviationof3.5g.Weightsofcontentsofbagsarenormallydistributed.
a) Ifarandomsampleof20bagsgivesameanof102.0g,whatarethe99%
confidencelimitsforthemeanweightofthepopulation(i.e.,allbags)?
223
Chapter9
b) Whatsamplesize(howmanybags)wouldhavetobetakensothataperson
wouldbe95%confidentthatthepopulationmeanwasnotsmallerthanthe
samplemeanminus1g?
2. Thediametersofshaftsmadebyaparticularprocessareapproximatelynormally
distributedwithastandarddeviationof0.0120cm.Whenallsettingsarecorrect,
themeandiameteris3.200cm.
a) Ifthesettingsarecorrectandrandomsamplescontainsixspecimenseach,
whatproportionofthesamplemeanswillbesmallerthan3.190cm?
b) Howlargeshouldthesamplebetogive98%confidencethatthesample
meaniswithin0.0080cmofthetruemeanofalldiametersofshafts
producedundercurrentconditions?
3. Carboncompositionresistorswithmeanresistance560andcoefficientof
variation10%areproducedbyafactory.Theyaresampledeachhourinthe
qualitycontrollab.Whatsamplesizewouldberequiredsothatthereis95%
probabilitythatthemeanresistanceofthesamplelieswithinl0of560ifthe
populationmeanhasnotchanged?
4. Wewanttoestimatethemeandistancetraveledtoworkbyemployeesofalarge
manufacturingfirm.Paststudiesindicatethatthestandarddeviationofthese
distancesis2.0kmandthatthedistancesfollowanormaldistribution.How
manyemployeesshouldbechosenatrandomandpollediftheestimatedmean
distanceistobewithin0.1kmofthetruemeanwithaconfidencelevelof95%?
5. Abatchprocessordispensesameanvolumeofapproximately0.80m
3
ofgrain
withastandarddeviationof0.05m
3
.Thevolumesofbatchesarenormally
distributed.Atestengineerwishestocheckthecalibrationoftheprocessor.
a) Howmanybatcheswouldhavetobemeasuredfortheengineertobe90%
confidentthatthemeanvolumefromthesampleisbetween0.99and1.01
timesthetruepopulationmean?
b) Ifasampleof50batchesofgrainismeasured,withwhatconfidencecanthe
claimbemadethatthesamplemeanvolumeiswithin1%ofthetruemean
volumeofthepopulation,ifthetruemeanvolumeisapproximately0.80m
3
?
m
c) Ifasampleof200batchesofgrainismeasured,theengineercanbe90%
confidentthatthesamplemeaniswithinwhatpercentageofthetruepopula-
tionmean?Againassumethatthepopulationmeanisapproximately0.80
3
.
6. CompanyAproducestires.Themeandistancetowearoutofthesetiresis
108,000km,andthestandarddeviationofthewearoutdistancesis15,000km.
a) Adistributorwhoisabouttobuythosetireswishestotestarandomsample
ofthem.Whatnumberoftireswouldhavetobetestedsothereis98%
probabilitythatthemeanwearoutdistanceforthesampleiswithin5%ofthe
populationmean?
224
StatisticalInferencesfortheMean
b) Forsamplesizesof4,8,16,32,64and128,calculatetheprobabilitywith
whichthedistributorcanclaimthatthemeanwearoutdistanceforthe
sampleiswithin5%ofthepopulationmean.
c) Ifthemanufacturersclaimiscorrect,howmanytiresfromashipmentofl00
tiresareexpectedtowearoutinlessthan120,000km?
7. Carbonresistorsofmeanresistanceapproximately560ohmsareproducedina
certainfactory.Thestandarddeviationis28ohms,andresistancesarenormally
distributed.
a) Whatconfidencelevelisassociatedwithasingleresistorfallingwithin10
ohmsofthepopulationmean?
b) Howlargearandomsampleisrequiredtogive95%levelofconfidencethat
thesamplemeaniswithin10ohmsofthepopulationmean?
8. a) Thediameterofacertainshaftisnormallydistributedwithameanexpected
tobe2.79cmandastandarddeviationof0.01cm.Thespecificationlimits
are2.770.03.If1000shaftsareproduced,howmanycanweexpectwill
beunacceptable?Ifthesamplemeanfor1000shaftsis2.786cm,whatare
the99%confidencelimitsforthetruemeandiameter?
b) Whatistheprobabilitythatasinglediametermeasurementwilldeviatefrom
thetruemeanbyatleast0.02cm?
c) Estimatetherequiredsizeoftherandomsampleinfuturemeasurementsso
thatthe95%confidenceintervalforthepopulationmeanwillnotbewider
thanfromthesamplemeanless0.01cm,tothesamplemeanplus0.01cm.
9. Asmallplantbagsablendofthreetypesoffertilizertomeettheneedsofa
particulargroupoffarmers.Thedifferenttypesoffertilizerarefedthroughthree
differentmachines.Eachbagissupposedtocontain18.00kgfrommachine1,
7.00kgfrommachine2,and5.00kgfrommachine3.Itisfoundthattheactual
amountsarenormallydistributedaboutthesemeanswiththefollowingstandard
deviations:
Machine StandardDeviation
1 0.19kg
2 0.07kg
3 0.04kg
a) Whatarethevarianceandcoefficientofvariationoftheamountoffertilizer
inabag?
b) Whatpercentageofthebagscontainlessthan29.50kg?
c) Howmanybagsmustbesampledtoestablishwith99%confidencethatthe
truemeaniswithin0.5%ofthesamplemean,regardlessofwhatthe
populationmeanis?
10. Portlandcementispackedinbagsofnominalweight80pounds.Theactual
meanweightofabagisfoundtobe80.2poundswithacoefficientofvariation
of1.2%.Anormaldistributionapplies.Railwayflatcarsareloadedwithenough
225
Chapter9
bagstomakeupanominalloadof60tonsoneachcar.Atrainismadeupof
fiftyflatcars.
a) Whatisthemeanweightofacarload?
b) Whatisthestandarddeviationoftheweightofacarload?
c) Whatarethe95%confidencelimitsfortheweightofatrainloadofPortland
cement?
11. TheSoapySudsCorporationownsamachinewhichfillsboxesoflaundrysoap.
Themeanweightis51ouncesperbox,withastandarddeviationof1.10ounces.
a) Whymightweexpectthedistributionofweightsofboxesofsoaptobe
approximatelynormal?
b) Assumingthedistributionisnormal,whatfractionoftheboxesdifferfrom
themeanweightbymorethan1.50ounces?
c) Whatsamplesize(numberofboxes)mustaqualitycontrolofficertestso
thatheis90%confidentthatthemeanofthesampleiswithin0.500ounces
ofthetruepopulationmean?
12. Fertilizerissoldinbags.Thestandarddeviationofthecontentofabagis0.43
kg.Weightsoffertilizerinbagsarenormallydistributed.40bagsarepiledona
palletandweighed.
a) Ifthenetweightoffertilizerinthe40bagsis826kg,(i)whatpercentageof
thebagsareexpectedtocontainlessthan20.00kgeach?(ii)findthe10th
percentile(orsmallestdecile)ofweightoffertilizerinabag.
b) Howmanypallets,carrying40bagseach,willhavetobeweighedsothat
thereisatleast96%probabilitythatthetruemeanweightoffertilizerina
bagisknownwithin0.05kg?
13. Insulatorsproducedbyafactoryhaveabreakdownvoltagedistributionthatcan
beapproximatedbyanormaldistribution.Thecoefficientofvariationis5%.
a) Whatisthesmallestsamplesizethatwillensureprobabilityof90%thatthe
samplemeanmeasuredisbetween0.98timesthepopulationmean and1.02
timesthepopulationmean?
b) Ifthesamplesizeis40,whatisnowtheconfidencelevelassociatedwiththe
samplemeanlyingbetween0.98timesthepopulationmean and1.02times
thepopulationmean?
14. TwoproductsAandBareaddedtogethertoformamixture.Acartonofthe
mixedproducthasameanweightof100.0kgandastandarddeviationof1.2kg.
ThemeanweightofProductAineachcartonis14.0kgandthestandarddevia-
tionis0.6kg.WeightsofbothProductAandProductBfollownormal
distributions.
a) WhatarethemeanweightandstandarddeviationofProductBineachcarton?
b) WhatistheprobabilitythattheweightofProductBinacartonchosenat
randomwillbeatleast5%lowerthanitsspecifiedmeanweight?
c) Wetakearandomsamplefromaconsignmentof1,000cartonsforacon-
structionproject.Howbigshouldthesamplebetoensurewith95%
226
StatisticalInferencesfortheMean
confidencethatthepopulationmeancartonweightiswithinl%ofthe
meanweightofasample?
15. Acoffeemachineisadjustedtoprovideapopulationmeanofll0mlofcoffeeper
cupandastandarddeviationof5ml.Thevolumeofcoffeepercupisassumedto
haveanormaldistribution.Themachineischeckedperiodicallybysampling12
cupsofcoffee.Ifthemeanvolume, x, ofthose12cupsinmlfallsinthe
interval(1102
x
)
x
(110+2
x
),noadjustmentismade.Otherwise,
themachineisadjusted.
a) Ifa12-cuptestgivesameanvolumeof107.0ml,whatshouldbedone?
b) Whatfractionofthetotalnumberof12-cuptestswouldleadtoanadjust-
mentbeingmade,evenifthemachinehadnotchangedfromitsoriginal
correctsetting?
c) Howmanycupsshouldbesampledrandomlysothereis99%confidence
thatthemeanvolumeofthesamplewillliewithin2mlofll0mlwhenthe
machineiscorrectlyadjusted?
16. Capacitorsaremanufacturedonaproductionline.Itisknownthattheircapaci-
tanceshaveacoefficientofvariationof2.3%.
a) Whatistheprobabilitythatthecapacitanceofacapacitorwillbebetween
0.990and1.010ifisthemeancapacitanceofthepopulation?Stateany
assumption.
b) Wewanttomakethe99%confidenceintervalforthesamplemeanofthe
capacitancestobenolargerthanfrom0.990to1.010,whereisthe
populationmean.Whatistheminimumsamplesize?
17. Acompanyreceived200electricalcomponentsthatwereclaimedtohaveamean
lifeof500hours.Assumethedistributionofcomponentliveswasnormal.A
sampleof25componentswasselectedrandomlywithoutreplacement.Itwas
decidedtogivethatsampleaspecialtestthatwouldallowthecomponentlifeto
beestimatedaccuratelybutnondestructively.
a) Whatisthemaximumvaluethestandarddeviationofthepopulationcould
haveifthesamplemeanwastobewithin10%ofthepopulationmeanwith
a95%confidencelevel?
b) Ifthecoefficientofvariationofthepopulationwas2%andthesamplemean
wasfoundtobe487.0hours,whatconclusionscanbemadeabouttheclaims
ofthemanufacturer?Usethe5%levelofsignificance.
18. Insulatorsproducedbyafactoryhaveabreakdownvoltagedistributionthatcan
beapproximatedbyanormaldistribution.Thecoefficientofvariationis5%.
a) Whatisthesmallestsamplesizethatwillensureaprobabilityof90%that
thesamplemeanmeasurediswithin2%ofthepopulationmean?The
samplesaretakenrandomly.
b) Ifeconomicreasonsdictatethatthesamplesizeshouldbe40,whatisnow
theconfidencelevelassociatedwiththesamplemeanlyingwithin2%ofthe
populationmean?
227
Chapter9
c) Whatistheprobabilitythataninsulatorwillhaveabreakdownvoltage10%
higherthanthemeanbreakdownvoltage?
9.2 InferencesfortheMeanwhenVarianceIs
EstimatedfromaSample
Inmostcasesthevarianceorstandarddeviationmustbeestimatedfromasample.
Evenifwehaveareliablefigureforvarianceorstandarddeviationfromprevious
observations,itisoftenhardtobecertainthatthevariancehasntchanged.Whatis
thevariancenow?Wecanestimateitfromthesamesampleasweusetoestimatethe
meanofthepopulation.
Butifvarianceisestimatedfromasampleofmoderatesize,thatestimateisalso
subjecttorandomerrorrelatedtothesizeofthesample.Thelargerthesample,the
morereliabletheestimateofvariancebecomes.Thequantitativerelationisexpressed
intermsofthedegreesoffreedomofthesample.Thedegreesoffreedomrefertothe
numberofpiecesofindependentinformationusedtoestimatethevariance.The
samplemean, x,wascalculatedfromnindependentquantities,x
i
.Butthedeviations
fromthemean, x
1
x,arenotallindependentbecause,aswesawinChapter3,
n

(
x
i
x
)
0.Thenumberofindependentdeviationsfromthemeanisnotn
i1
but(n1).Thenthenumberofindependentpiecesofinformationonwhichthe
varianceorstandarddeviationisbasedis(n1).Wecancheckthisbyconsidering
thecaseofasampleconsistingofonlyoneitem, x
1
,son=1.Thisgivesarough
indicationofthemean( x=x
1
),butnoinformationatallaboutthevarianceof
2
0
thepopulation.Forthiscase s
2

(
x
(
1
n


x
1
)
)

0
,whichismathematicallyindeterminate.
Thatagreeswiththestatementthattheestimateofthevarianceforasampleofn
itemshas(n1)degreesoffreedombecauseinthiscasen=1,so(n1)isequalto
n
2
zero.

(
x
i
x
)
Asampleofsizengivesanestimateofvariance,
s
2

i1
.Inwordsthis
(
n 1
)
estimateisthesumofsquaresofthedeviationsfromthesamplemean,dividedby
thenumberofdegreesoffreedom.Inabbreviatedformtherelationiss
2
=SS/df,
whereSSisthesumofsquaresofdeviationsfromthesamplemean(ortheequiva-
lent),anddfisthenumberofdegreesoffreedom(oftenrepresentedbytheGreek
letternu,).WewillseeinChapter14thatvariancefromaregressionlineisgiven
byasimilarrelation,althoughboththesumofsquaresofdeviationsandthenumber
ofdegreesoffreedomareevaluateddifferently.
Ifwehaveonlyalimitednumberofdegreesoffreedomtouse,theresulting
estimateofvariance,s
2
,islessreliablethanifaninfinitenumberofdegreesof
freedomwereavailable.Thismustbetakenintoaccountwhenwemakestatistical
inferencesaboutthemean.
228
StatisticalInferencesfortheMean
ThispuzzlewasconsideredintheearlyyearsofthetwentiethcenturybyW.S.
Gosset,achemistworkingforabreweryinDublin.Hehadapracticalproblem:how
tomakevalidinferencesaboutthecontentsofbeeronthebasisofrathersmall
samples.Heworkedoutthemathematicalsolutiontothisproblem.Thatgaveanew
probabilitydistribution,adistributionrelatedtothenormaldistributionbuttaking
intoaccountthenumberofdegreesoffreedom.Hecalledthisnewdistributionthe
t-distribution.Herealizedthatotherpeoplewouldfindthet-distributionuseful,sohe
wantedtopublishitinascientificjournal.Butherewasadifficultyofadifferent
kind:thecompanyforwhichheworkeddidnotallowemployeestopublishinthe
openliterature.Hewassurethatpublicationwouldnotharmthecompany.He
decidedtopublishusingapen-name,Student.Thusthet-distributionisoften
calledStudentst-distribution,andatestofhypothesisbasedonitisoftencalled
Studentst-test.
Theindependentvariableofthenormaldistributionappliedtosamplemeansis
x

x
z
.Ifwedontknow,weestimateitfromasamplebytheestimated
j \
x
, (
(
n
,
standarddeviation,s.Theninsteadofzwehavethevariablet,whichforthiscaseis
x
equalto .Probabilityaccordingtothet-distributionisthenafunctionoftwo
j s \
,
n
(
, (
independentvariables,tandthenumberofdegreesoffreedom,whichinthiscaseis
(n1).
Figure9.8showstheprobabilitydensityfunctionsoft-distributionsasfunctions
oftforvariousnumbersofdegreesoffreedom.Thegeneralshapeissimilartothe
shapeofthenormaldistribution:symmetricalandroughlybell-shaped.However,
smallernumbersofdegreesoffreedomgivelowerandwiderdistributions(thetotal
areaundereachcurvemustcorrespondtoaprobabilityofone,soifadistributionis
loweratthecenteritisalsowiderinthetails).Thehighestcurveisforinfinite
degreesoffreedomandisidenticaltothenormalcurveasafunctionofz.Thelowest
(andhencewidest)curveofFigure9.8isforonedegreeoffreedom.Thatmakes
sense,becauseonlyonedegreeoffreedomwouldcorrespondtolittlereliability.
Tablesofthet-distributionoftengiveone-tailprobabilities,thatisPr[t>t
1
],
wheret
1
isacriticalorlimitingvalue.ThecorrespondingareasareshowninFigure
9.8.Foranyparticularnumberofdegreesoffreedom,theone-tailprobabilityis1
minusthecumulativeprobability,whichis(t
1
)=Pr[t<t
1
].Forinfinitedegreesof
freedomthet-distributionbecomesthenormaldistribution,sotheone-tailprobability
becomes1(z
1
)=(z
1
).
229
Chapter9
230
Figure9.9showstheratiooft(forthet-distribution)toz(forthenormaldistribu-
tion)correspondingtovariousvaluesoftheone-tailprobability.Thisratioshowsthe
effectoflimitednumbersofdegreesoffreedom,ascomparedtoinfinitedegreesof
freedomforthenormaldistribution.Noticethatthescalesarelogarithmic.Forone
degreeoffreedomtheratioofttozis2.4foraone-tailprobabilityof0.10.Stillfor
onedegreeoffreedom(d.f.),thatratiorisesto103foraone-tailprobabilityof0.001.
Thus,wecanseethatatsmallnumbersofdegreesoffreedom,theeffectoncalcula-
tionsofestimatingthevariancefromasampleoflimitedsizecanbelarge.Onthe
otherhand,thelinefor30degreesoffreedomisnotfarfromaratioof1.For30d.f.
theratiovariesfrom1.023foraone-tailprobabilityof0.10,to1.095foraone-tail
probabilityof0.001.Thisindicatesthatforasamplesizeof30ormore,thenormal
distributionisusually(butnotalways)areasonableapproximationtothet-distribution.
TableA2inAppendixAgivesvaluesoftaccordingtothet-distributionas
functionsoftheone-tailprobability(acrossthetopinboldlettersfrom0.1to0.001)
andthenumberofdegreesoffreedom,d.f.(alongthelefthandandrighthandsidesin
boldlettersfrom1to8).Forexample,foraone-tailprobabilityof0.025andthree
degreesoffreedom,thevalueoftis3.182.
Figure9.8:t-distributionsandone-tailprobabilities
0
0.1
0.2
0.3
0.4
0.5
f(t)
4 3 2 1 0 1 2 3 4
t
1
t
2d.f.
3d.f.
5d.f.
Infinited.f.
1d.f.
One-tailprobabilities
StatisticalInferencesfortheMean
231
Alternatively,ifapersonhasaccesstoacomputerwithExcel(oranalternative)
orapocketcalculator,valuesforthet-distributioncanbeobtainedusingExcel
functionsortheirequivalents.TheExcelfunctionTINVgivesthevalueoftforthe
desiredtwo-tailprobabilityandnumberofdegreesoffreedom.Sincethet-distribu-
tionissymmetrical,thetwo-tailprobabilityisjusttwicetheone-tailprobability.
Thus,aone-tailprobabilityof0.025correspondstoatwo-tailprobabilityof0.05,
andcombiningthiswiththreedegreesoffreedomonExcelbyentering
=TINV(0.05,3)givesavaluefortof3.18244929(toquoteallthefiguresgiven).The
functionTDISTgivesaone-tailortwo-tailprobabilityforastatedvalueoftanda
statednumberofdegreesoffreedom.Thenumberoftailsisspecifiedbyathird
parameter,whichcanbe1or2.Thus,entering=TDIST(3.182,3,1)gives
0.02500857,andentering=TDIST(3.182,3,2)gives0.05001714.Excelhassome
otherrelatedfunctions,buttheyarenotneededduringthelearningprocess.
Variousstatisticalinferencescanbemadeusingthet-distribution.Theyarenot
usuallysensitivetosmalldeviationsoftheunderlyingdistributionfromthenormal
distributionbecauseoftheCentralLimitTheoremandbecausethelargervalueoft
forsmallnumbersofdegreesoffreedomreducestheeffectsofsmalldeviations.In
general,ifthevarianceisestimatedfromasample,statisticalinferencesshouldbe
madeusingthet-distributionratherthanthenormaldistribution.Ifthenumberof
degreesoffreedomislargeenough,thenormaldistributioncanbeusedasanap-
proximationtothet-distribution.
Letusnowconsidertheinferencesinvolvingthet-distribution.
1
10
100
Ratio:t/z
0.001 0.01 0.1
One-tailprobability
1d.f.
2d.f.
5d.f.
10d.f.
30d.f.
Figure9.9:Ratioofttozasfunctionofone-tailprobability
Chapter9
9.2.1 ConfidenceIntervalUsingthet-distribution
Saywehavearandomsampleofsizen,fromthemeasurementsofwhichwecalcu-
latethesamplemean, x,andtheestimateofvariance,s
2
.Thentheestimated
standarddeviationiss,basedon(n1)degreesoffreedom.Becausethevarianceor
standarddeviationisestimatedfromasample,wemustgenerallyusethet-distribu-
tionratherthanthenormaldistributionforcalculations.Fromthenumberofdegrees
offreedomandtheestimateofthestandarddeviation,wecancalculatet,thenuse
tablesorcomputerfunctionstofindaconfidenceintervalforthepopulationmean, ,
atastatedlevelofconfidence.Oncewefindavalueoft,thecalculationsarethe
sameasusingzwiththenormaldistribution.
Example9.6
Acertaindimensionismeasuredonfoursuccessiveitemscomingoffaproduction
line.Thissamplegives x =2.384ands=0.048.
(a) Onthebasisofthissample,whatisthe95%confidenceintervalforthepopula-
tionmean?
(b) Ifinsteadofestimatingthestandarddeviationfromasample,weknewthetrue
standarddeviationwas0.048,whatthenwouldbethe95%confidenceinterval
forthepopulationmean?
Answer:
(a) The95%confidenceinterval,two-sidedas-
sumedunlessotherwisestated,correspondstoa
one-tailprobabilityof(10095)%/2=2.5%.
2.5%
95%
Confidence
Interval
2.5%
ThisisshowninFigure9.10.Thenumberof
degreesoffreedomis41=3.FromTableA2
orthefunctionTINVonExcel,thelimiting
t
1
t
1
t
valueoftist
1
=3.182.Then,the95%
confidenceintervalforis
Figure9.10:
95%ConfidenceInterval
j s \
(
2.384
(
3.182
)
j
,
0.048\
= 2.31to2.46. x t
1,

(
(
n
, (
4
,
(b) Ifthestandarddeviationofthepopulationwereknownreliably,wewouldfind
confidenceintervalsusingthenormaldistribution.The95%confidenceinterval
extendsfromcumulativenormalprobabilityof0.025(atz=z
1
)tocumulative
normalprobabilityof0.975(atz=+z
1
).FromTableA1orExcelfunction
NORMSINVwefindz
1
=1.96.Then,ifthestandarddeviationwereknown
reliablytobe0.048,the95%confidenceintervalforwouldbe
j \
(
2.384
(
1.96
)
j
,
0.048\
=2.34to2.43. x z
1,

(
(
n
, (
4
,
Thentheconfidenceintervalwouldbeappreciablynarrowerthaninpart(a).
232
StatisticalInferencesfortheMean
9.2.2 TestofSignificance:ComparingaSampleMean
toaPopulationMean
Forthiscasealso,thecalculationisverysimilartothecorrespondingcaseusingthe
normaldistribution.Thequantitytiscalculatedinnearlythesamewayasz,andthen
aprobabilityisfoundfromtablesortheappropriatecomputerfunctiontakingthe
numberofdegreesoffreedomintoaccount.Thenullhypothesis,alternativehypoth-
esis,andteststatisticshouldbestatedexplicitly.Atestofsignificanceusingthe
t-distributionisoftencalledat-test.
Example9.7
Theelectricalresistancesofcomponentsaremeasuredastheyareproduced.A
sampleofsixitemsgivesasamplemeanof2.62ohmsandasamplestandarddevia-
tionof0.121ohms.Atwhatobservedlevelofsignificanceisthissamplemean
significantlydifferentfromapopulationmeanof2.80ohms?Istherelessthan2%
probabilityofgettingasamplemeanthisfarawayfrom2.80ohmsorfartherpurely
bychancewhenthepopulationmeanis2.80ohms?
Answer: H
0
:=2.80
H
a
: 2.80(two-sidedtest)
(
x
) (
x 2.80
)
Theteststatisticist= .
j s \ j 0.121\
,
n
( ,
6
(
( , ( ,
Largevaluesof|t|indicatethatH
0
isunlikelytobecorrect.
(
2.62 2.80
) 0.18
With x =2.62,t
observed
=
3.64
j 0.121\ 0.0494
,
6
(
( ,
SeeFigure9.11.
FromTableA2with61=5degreesoffreedom,foraone-tailprobabilityof
0.01,t
1
=3.365,andforaone-tailprobabilityof0.005,
t
1
=4.032.Theobservedvalueof|t|isbetween3.365
and4.032(rememberthatthedistributionissymmetri-
cal).Thentheone-tailprobabilityisbetween0.005and
0.01,andthesamplemeanissignificantlydifferentfrom
apopulationmeanof2.80ohmsatatwo-sidedobserved
1%
1%
levelofsignificancebetween0.01and0.02.
IfacomputerwithExcel(orsomealternatives)is
3.64 +3.64t
available,theobservedlevelofsignificancecanbefound
Figure9.11:
moreexactly.Thetwo-tailprobabilityisgivenby LevelofSignificance
233
Chapter9
entering=TDIST(3.64,5,2),giving0.0149.Thenthesamplemeanissignificantly
differentfromapopulationmeanof2.80ohmsatatwo-sidedobservedlevelof
significanceof0.0149or0.015.
Thereislessthana2%probabilityofgettingasamplemeanthisfarfrom2.80
ohmsorfartherpurelybychancewhenthepopulationmeanis2.80ohms.
9.2.3 ComparisonofSampleMeansUsingUnpairedSamples
Inthiscasewehavesamplesforeachoftwoconditions.Thequestionbecomes:are
thetwosamplemeanssignificantlydifferentfromoneanother,orcouldbothplausi-
blycomefromthesamepopulation?Wewillhaveanestimatefromeachsampleof
thevarianceorstandarddeviationofthepopulation,sothesetwoestimateswillhave
tobecombinedinalogicalway.Thetworandomsampleswillhavebeenchosen
separatelyandindependentlyofoneanother,sothetwosamplemeanswillbe
independentestimates.Thistestofsignificanceisoftencalledanunpairedt-test.
Thetwoestimatesofvariancemustbecompatiblewithoneanother.Thisshould
becheckedbythevarianceratiotesttobeintroducedinthenextchapter.According
toWalpoleandMyers(seesection15.2forreference)largerdeparturesfromequality
ofthevariancescanbetoleratedifthetwosamplesareofequalsize(n
1
=n
2
).The
samplesshouldbeofequalsizeifthatisfeasible.Intheexamplesandproblemsof
therestofthischapter,weassumethattheestimatesofvariancearecompatiblewith
oneanother.
Theestimatesofvariance,s
1
2
ands
2
2
,arecombinedorpooledtogiveacombined
estimateofvariance,s
c
2
.Saytheestimates
1
2
isbasedon(n
1
1)degreesoffreedom,
andtheestimates
2
2
isbasedon(n
2
1)degreesoffreedom.Rememberthatthe
greaterthenumberofdegreesoffreedom,themorereliableweexpecttheestimateto
be.Itcanbeshowntheoreticallythattheseparateestimatesofvarianceshouldbe
weightedbytheirnumbersofdegreesoffreedombeforetheyareaveraged.Then
2
s n
1
+ s
2
2
(
n 1
)
1
(
1
)
2 2
s
c
1
) (
n
2
(
n
1
+ 1
)
(9.1)
Thisiscalledthecombinedorpooledestimateofvariance.
Sincetheproductofthesampleestimateofvarianceandnumberofdegreesof
freedomisthesumofsquaresofdeviationsfromthesamplemean,equation9.1can
alsobeshownasthesumofsquaresofdeviationsforthefirstsample,plusthesum
ofsquaresofdeviationsforthesecondsample,thendividedbythesumofthe
degreesoffreedom.
Thecombinedestimateofvarianceisbasedonmoreinformationthaneitherof
thetwoindividualestimates,soitisreasonablethatithasmoredegreesoffreedom,
(n
1
1)+(n
2
1).Noticethatthisisthedenominatorofequation9.1.
Usingthiscombinedestimateofvariance,theestimatedvarianceofthefirst
234
StatisticalInferencesfortheMean
2 2
s s
c
samplemeanis
c
,andtheestimatedvarianceofthesecondsamplemeanis .
n
2
n
1
Aswesawinsection8.1,thevarianceofthedifferencebetweentwoindependent
quantitiesisthesumofthevariancesoftheseparatequantities.Then
2
j 1
s
2
(x x
2
)
s
c ,
+
1 \
(
(9.2)

1
n n
2 ( 1 ,
Anothernotationisoftenused,letting y x x ,andthen
2 1
2 2
j 1 1 \
s
y
s
c ,
+
(
(9.2a)
n n
2 ( 1 ,
Thenullhypothesisisthatbothsamplescouldhavecomefromthesamepopula-
tion,so
1
=
2
,or
1

2
=0,or
y
=0.Thealternativehypothesismaybeeither
that
1

2
(atwo-tailedtest),orthat
1
>
2
or
1
<
2
(aone-tailedtest).Inthe

2
y
notationfor y x x ,thealternativehypothesiswouldbeeither 0(atwo-
1
tailedtest),orelse >0or <0(aone-tailedtest).
y y
Example9.8
Twomethodsofdeterminingthenickelcontentofsteelarecomparedusingfour
determinationsbyeachmethod.Theresultsare:
Formethod1: x
1
=3.2850, s
1
=0.00774(from3degreesoffreedom)
Formethod2: x
2
=3.2580, s
2
=0.00960(from3degreesoffreedom)
Assumingthatthetwoestimatesofvariancearecompatible,isthedifferencein
meansstatisticallysignificantatthe5%levelofsignificance?
Answer:
H
0
:Bothsamplescouldhavecomefromthesamepopulation,so
2
=0
1
(orintheothernotation, =0).
y
H
a
:
2
0 (orelse 0)(atwo-tailedtest)
1 y

1
Theteststatisticis t
x x
2
j
,
orelset
y
\
(
.
(

s
(x x
2
) (
,
s
y
,
Largevaluesof|t|tendtomakeH
0
unlikely.
1
2 2
2 2
s n
1
+ s n 1
) (
0.00774
) ( )
+
(
0.00960
) ( )
1
(
1
)
2
(
3 3
2 2
s
c
(
n
1
+
(
n
2
1
)
3+ 3 1
)
=76.0310
6
(Thisvaluewouldcorrespondtos =0.00872,butitisnotnecessarytomakethat
c
calculation.)
235
Chapter9
2
j 1 1 \ 1\
2
s
(x x
2
)
s
c ,
+
(

(
76.0310
6
)
j 1
+
1

, (
n
2 (
n
1 ,
(
4 4
,
=38.0210
6
3
Then s
(x x
2
)
38.0210
6
6.1710 ,basedon3+3=6d.f. s
y
1
3.2853.258
t 4.38
6.1710
3
FromTableA2orTINVfromExcel,for5%
levelofsignificanceintwotailsand6degreesof
freedom,t
critical
=2.447.Alternatively,the
observedlevelofsignificanceorp-valueisgiven
2.5% 2.5%
fromExcelbyTDIST(4.38,6,2)=0.00467.
Sincet>t
critical
(orsince0.00467<5%or
4.38t even<0.5%),thedifferenceisstatistically
significantatthe5%levelofsignificance.
Figure9.12:
LevelofSignificance Thistwo-samplet-testorunpairedt-testis
usedveryfrequentlyinaplannedexperimentto
seewhetherachangeintheexperimentalconditionshasanystatisticallysignificant
effectontheproductorresultofaprocess.Comparingthetwosamplemeansgivesa
directcomparisonbetweenthetwosetsofconditions.However,wehavetobeas
sureaswecanthatothersignificantfactorsarenotaffectingtheresultbecausethey
arechangingatthesametime.Wecanminimizetheeffectofinterferingfactors
(whicharesometimescalledlurkingvariables)byrandomizingthechoiceof
samplesfordifferenttreatmentsandtheorderinwhichsamplesaretakenand/or
analyzed.RandomizinghasbeendiscussedbrieflyinChapter8andwillbeconsid-
eredmorefullyinChapter11.
Ifaninterferingfactorisknownorsuspectedtobepresentandhasanappreciable
effect,theunpairedortwo-samplet-testmaynotbethebestplan.Analternative
experimentmaybeabetterchoice.
Illustration:Supposewewanttocompareratesofevaporationfromastandard
evaporationpanandfromapanusinganexperimentaldesign.Bothtypesareusedto
measureratesofevaporationataweatherstation.Thequestionwewanttoansweris,
doesthenewtypegivesignificantlyhigherresultsthanthestandardtype?Weknow
thatevaporationwillbedifferentonsuccessivedaysbecauseofchangingweather
conditions,butdifferentweatherconditionsshouldnothaveanappreciableeffecton
anydifferenceinratesofevaporationbetweenthetwotypesofpan.Wewanttofocus
onacomparisonofevaporationratesbetweenthetwotypes,soforpresentpurposes
thevariationfromdaytodayisnotofprimeinterest.Ifweusethecomparisonof
236
StatisticalInferencesfortheMean
samplemeansbythet-testwithunpairedsamples,variationofevaporationfromday
todaywillbeaninterferingfactor.Ifweneglectrandomizing,theeffectofdaily
variationmaybeconfusedwiththeeffectoftypeofevaporator;thentheresultsof
thetestsmaybequitemisleading.Ifwemakerandomchoicesofwhichtypeof
evaporatorshouldbeusedonaparticularday,saybydrawinglots,theeffectof
varyingweatherconditionswillbeminimized.Butstillthevariationinevaporation
duetovaryingconditionsfromdaytodaywillgivelargervariancewithinpopula-
tionsforbothheattreatments.Thentheestimatesofvariancewillverylikelybe
inflated.Thiswillgivesmallervaluesoft,soitwillbedifficulttoshowthatonetype
issignificantlybetterthananother.Thereadershouldcomparethenexttwoex-
amples,Examples9.9and9.10.
Example9.9
Dailyevaporationratesweremeasuredon20successivedays.Whichoftwotypesof
evaporationpanwouldbeusedonaparticulardaywasdecidedbytossingacoin.
Themeandailyevaporationforthe10daysonwhichPanAwasusedwas19.10
mm,andthemeanevaporationonthe10daysonwhichPanBwasusedwas17.24
mm.ThevarianceestimatedfromthesamplefromPanAis7.72mm
2
,andthe
varianceestimatedfromthesamplefromPanBis5.36mm
2
.Assumingthatthese
twoestimatesofvariancearecompatible,doestheexperimentalevaporationpan,Pan
A,givesignificantlyhigherevaporationratesthanthestandardpan,PanB,atthe1%
levelofsignificance?
Answer: H
0
:
B
=0
H
a
:
A
>0(one-tailedtest,becausethequestiontobeansweredis
A
B
whethertheexperimentaldesigngiveshigherevaporationratesthanthestandardpan)
x
A
x
B
s
Teststatistic: t
diff
LargevaluesoftmakeH
0
lesslikely.
x
A
=19.10, x
B
=17.24
s
A
2
=7.72, s
B
2
=5.36
n
A
=10, n
B
=10
df
A
=101=9, df
B
=101=9
1%
(
10
(
101
)(
7.72
)
+ 1
)(
5.36
)
2
s 6.54
c
1
) (
10
(
10 + 1
)
1 \ 2
s
c
2
j
,
1
+
1 \
(

(
6.54
)
j
,
1
+
1.63 t
s
diff (
n
A
(
10 10
,
(
n
B ,
Figure9.13:t-distribution
1.308
forunpairedtest
237
Chapter9
1.308 1.144 s
diff

x
A
x
B
19.10 17.24
t
calculated
s
diff
1.144
1.86
1.626
1.144
t
Thisvalueoftmustbecomparedwithalimitingorcriticalvalueoftfor9+9=
18degreesoffreedomatthe1%levelofsignificanceforonetail.SeeFigure9.13.
AccordingtoTableA2ortheExcelfunctionTINV,t
critical
=2.552.Sincet
calculated
<
critical
,thedifferenceinratesofevaporationbythesedataisnotsignificantatthis
levelofsignificance.Alternatively,theobservedlevelofsignificanceisgivenfrom
ExceloralternativesbyTDIST(1.626,18,1)=0.0607.Since0.0607>1%,againthe
differenceinevaporationratesisnotsignificantatthe1%level.
However,sincetheeffectofvaryingatmosphericconditionsontheevaporation
rateswasappreciable,adifferentexperimentinvolvingpairedsamplesmightwell
showasignificantdifference.Thattypeofexperimentwillbediscussednext.
9.2.4 ComparisonofPairedSamples
Althoughrandomizingcaneffectivelyminimizeincorrectdefiniteconclusionsdueto
interferingfactorsinthecomparisonoftwosamplemeansusingunpairedsamples,
theinterferingfactorswillstillinflatethepooledestimateofvarianceoftestresults.
Thislargerestimateofvariancewillmakecalculatedvaluesoftsmaller,soitwillbe
difficulttoshowthatonetreatmentissignificantlybetterthananother.Thus,real
effectsmaybemissed.Amoresensitivetestisdesirable.
Insomecasesitispossibletopairthemeasurements.Onememberofeach
matchedpaircomesfromonevalueorcharacteristicofavariableordesign,andthe
othermemberofeachpaircomesfromtheothercharacteristic,buteverythingelseis
nearlythesame(ascloselyaspossible)forthetwomembersofthepair.Forex-
ample,wemighthaveonememberofeachpairfromanexperimentaltypeof
equipmentandtheothermemberfromastandardtype.Asidefromthevariableused
toformthepairs,factorswhichmighthaveappreciableeffectsmustbekeptas
constantaspossible.Wetrytomatchthetwoitemsformingapair.Randomization
shouldstillbeusedtominimizeinterferencefromotherfactors.Thenthedifference
betweenthemembersofapairbecomestheimportantvariable,whichwillbe
examinedbyatestofsignificanceusingthet-distribution.Isthemeandifference
significantlydifferentfromzero?Thistechniqueblocksouttheeffectofinterfering
variables.Itiscalledapairedt-testorat-testusingamatchedpair.
Example9.10
Wedecidetorunatestusinganexperimentalevaporationpanandastandard
evaporationpanovertensuccessivedays.Thetwotypesaresetupside-by-sideso
238
StatisticalInferencesfortheMean
thatatmosphericconditionsshouldbethesame.Acoinistossedtodecidewhich
evaporationpanisonthelefthandsideandwhichontherighthandsideonany
particularday.Themeasureddailyevaporationsareasfollows:
PairorDayNo. 1 2 3 4 5 6 7 8 9 10
Evaporation,mm:
PanA 9.1 4.6 14.0 16.9 11.4 10.7 27.4 22.8 42.8 29.4
PanB 6.7 3.1 13.8 16.6 12.3 6.5 24.2 20.1 41.9 27.7
d= ,AB 2.4 1.5 0.2 0.3 0.9 4.2 3.2 2.7 0.9 1.7
evap
DoestheexperimentalPanAgivesignificantlyhigherevaporationthanthestandard
PanBatthe1%levelofsignificance?
Answer: H
0
:norealdifferencebetweenthetwomethods,so
d
=0.
H
a
:
d
>0(one-tailedtest)
d 0
Theteststatisticis t .
s
d
LargeenoughvaluesoftwouldshowthatH
0
is
unlikelytobecorrect.
1%

d
16.2
n10,d 1.62,
n 10
3.31
t
2
2

d
i
n
( )
Figure9.14: t-distribution d
s
d

n1
forPairedt-test
( ) ( )( )
2
2 2 2 2
10
10 1
+ + + +

2.4 1.5 0.2 1.7 1.62
=1.548(usingcalculator)
1.548
0.4896 and Then s
d

10
d 0 1.62
3.31 t
calculated
s 0.4896
d
t
For9d.f.,1%singletailarea,TableA2ortheExcelfunctionTINVgives
critical
=2.821.SeeFigure9.14.Sincet
calculated
>t
critical
,thereisevidenceatthe1%
levelofsignificancethattheexperimentaltreatmentdoesgivesignificantlyhigher
evaporation.Thealternativeapproachistofindtheobservedlevelofsignificanceor
239
Chapter9
p-valuefromExceloralternatives.TDIST(3.31,9,1)=0.0046.Sincethisislessthan
1%,againwefindthedifferenceissignificantatthe1%levelofsignificance.
Noticethatthepairedt-testcomparesthemeandifferenceofpairstoanassumed
populationmeandifferenceofzero.Fromthatpointon,thecalculationbecomesthe
sameascomparingasamplemeantoapopulationmeanasinsection9.2.2above.
Noticealsothatthenumberofdegreesoffreedomisonlyhalfasgreatforthe
pairedt-testasforthecorrespondingunpairedt-test,soifthevariable(orvariables)
keptconstantinformingthepairshaslittleeffect,theunpairedt-testmayactuallybe
moresensitive.
Avariationofthepairedt-testaskswhetherthedifferencewithinpairsismore
thanastatednon-zeroquantity.
However,anoteofcautionmustbesoundedatthispoint.Thepairedt-test
assumesthatiftheinterferingfactoriskeptconstantwithineachpair,thedifference
inresponsewillnotbeaffectedbythevalueoftheinterferingfactor.Thismeansthat
theeffectontheresponseoftheinterferingfactorandofthefactorofinterestmust
bepurelyadditive.Ifthevariableofinterestandtheinterferingvariablecaninteract
tocomplicatetheeffectontheresponse,thepairedt-testwillnotbeassensitiveas
wemaythink.Forexample,supposewearestudyingthestrengthsofmetalrods
madeofchromium-steelalloysofvaryingcompositionandwanttoseewhetherone
heattreatmentgivesgreaterstrengththananotherheattreatment.Butforsome
compositionsthestrengthismoresensitivetoheattreatmentthanforsomeother
compositions.(Seethediscussionofinteractioninsection11.3.) Inthatcase,evenif
weuseaproperlyrandomized,pairedt-test,theinteractionbetweenheattreatment
andcompositionwilltendtoinflatetheestimateofrandomerrorandsomakethe
testlesssensitivethanitshouldbe.Ifthatisthesituation,ratherthanunpairedt-test
orpairedt-test,weshoulduseafactorialdesign,whichwillbediscussedinsection
11.3.
Problems
1. Benzeneintheairworkersbreathecancausecancer.Itisveryimportantforthe
benzenecontentofairinaparticularplanttobenotmorethan1.00ppm.
Samplesaretakentocheckthebenzenecontentoftheair.25specimensofair
fromonelocationintheplantgaveameancontentof0.760ppm,andthestan-
darddeviationofbenzenecontentwasestimatedonthebasisofthesampletobe
0.45ppm.Benzenecontentsinthiscasearefoundtobenormallydistributed.
a) Isthereevidenceatthe1%levelofsignificancethatthetruemeanbenzene
contentislessthanorequalto1.00ppm?
b) Findthe95%confidenceintervalforthetruemeanbenzenecontent.
2. Highsulfurcontentinsteelisveryundesirable,givingcorrosionproblems
amongotherdisadvantages.Ifthesulfurcontentbecomestoohigh,stepshaveto
240
StatisticalInferencesfortheMean
betaken.Fivesuccessiveindependentspecimensinasteel-makingprocessgive
valuesofpercentagesulfurof0.0307,0.0324,0.0314,0.0311and0.0307.Do
thesedatagiveevidenceatthe5%levelofsignificancethatthetruemean
percentagesulfurisabove0.0300?Whatisthe90%two-sidedconfidence
intervalforthemeanpercentagesulfurinthesteel?
3. Thediameterofamechanicalcomponentisnormallydistributedwithameanof
approximately28cm.Astandarddeviationisfoundfromthesamplestobe0.25
cm.Ifwerequireasamplebigenoughsothatthereisatleast95%probability
thatthesamplemeandiameter(
x
)iswithin0.08cm.ofthetruemeandiameter
(),whatistheminimumsamplesize?
4. a) 25standardreinforcingbarsweretestedintensionandfoundtohaveamean
yieldstrengthof31,500psiwithasamplevarianceof25x10
4
psi
2
.
Another
sampleof15barscomposedofanewalloygaveameanandcoefficientof
variationof32,000psiand2.0%respectively.Yieldstrengthsfollowa
normaldistribution.Atthe1%levelofsignificance,doesthenewalloygive
anincreasedyieldstrength?
b) Iftheindustry-widenorm(soapopulationvalue)foryieldstrengthof
reinforcingbarsis31,600psi,doesthenewalloyresultinasignificantly
highermeanyieldstrengththantheindustrystandard?Usethe5%levelof
significance.
c) Findthe90%confidenceintervalforthetruemeanstrengthofthenewalloy.
5. Themeanheightof61malesfromthesamestatewas68.2incheswithan
estimatedstandarddeviationof2.5inches,while61malesfromanotherstatehad
ameanheightof67.5incheswithanestimatedstandarddeviationof2.8inches.
Theheightsarenormallydistributed.Testthehypothesisthatmalesfromthefirst
statearetallerthanmalesfromthesecondstate.
a) Usealevelofsignificanceof5%.
b) Usealevelofsignificanceof10%.
6. Onlastyearsfinalexaminationinstatistics,themarksoftwodifferentsections
hadthenumbers,meansandstandarddeviationsshowninthetablebelow:
n
x
s
x
41 64.3 15.6
51 59.5 17.2
Themarkswerenormallydistributed.
a) Werethemeansinthetwosectionssignificantlydifferent?Usethe5%level
ofsignificance.
b) Theoverallaverageofallthestudentsinthelast10yearsofstatisticsfinal
examinationswas61.7withastandarddeviationof16.8.Wasthesection
averageforthe41studentsshownabovesignificantlyhigherthantheoverall
average?Usethe5%levelofsignificance.
241
Chapter9
7. Twochemicalprocessesformanufacturingthesameproductarebeingcompared
underthesameconditions.YieldfromProcessAgivesanaveragevalueof96.2
fromsixruns,andtheestimatedstandarddeviationofyieldis2.75.Yieldfrom
ProcessBgivesanaveragevalueof93.3fromsevenruns,andtheestimated
standarddeviationis3.35.Yieldsfollowanormaldistribution.Isthedifference
betweenthemeanyieldsstatisticallysignificant?Usethe5%levelofsignifi-
cance,andshowrejectionregionsforthedifferenceofmeanyieldsonasketch.
8. Twocompaniesproduceresistorswithanominalresistanceof4000ohms.
ResistorsfromcompanyAgiveasampleofsize9withsamplemean4025ohms
andestimatedstandarddeviation42.6ohms.AshipmentfromcompanyBgives
asampleofsizel3withsamplemean3980ohmsandestimatedstandarddevia-
tion30.6ohms.Resistancesareapproximatelynormallydistributed.
a) At5%levelofsignificance,isthereadifferenceinthemeanvaluesofthe
resistorsproducedbythetwocompanies?
b) Iseithershipmentsignificantlydifferentfromthenominalresistanceof4000
ohms?Use.05levelofsignificance.
9. Twodifferenttypesofevaporationpansareusedformeasuringevaporationata
weatherstation.Theevaporationforeachpanfor6differentdaysisasfollows:
Evaporation(mm)
DayNo. PanA PanB
1 9 11
2 42 41
3 28 29
4 16 16
5 11 13
6 1l 12
Atthe5%levelofsignificance,isthereasignificantdifferenceintheevaporation
recordedbythetwopans?Interactionbetweentypeofpanandweathervariation
fromdaytodaycanbeneglected.
10. Anewcompositionforcartireshasbeendevelopedandisbeingcomparedwithan
oldercomposition.Tentiresaremanufacturedfromthenewcomposition,andten
aremanufacturedfromtheoldcomposition.Onetireofthenewcompositionand
oneoftheoldcompositionareplacedonthefrontwheelsofeachoftencars.
Whichcompositiongoesonthelefthandorrighthandwheelisdeterminedrandomly.
Thewheelsareproperlyaligned.Eachcarisdriven60,000kmunderavarietyof
drivingconditions.Thenthewearoneachtireismeasured.Theresultsare:
CarNo. 1 2 3 4 5 6 7 8 9 10
WearofNewComposition 2.4 1.3 4.2 3.8 2.8 4.7 3.2 4.8 3.8 2.9
WearofOldComposition 2.7 1.9 4.3 4.2 3.0 4.8 3.8 5.3 3.7 3.1
242
StatisticalInferencesfortheMean
Dotheresultsshowatthe1%levelofsignificancethatthenewcomposition
givessignificantlylesswearthantheoldcomposition?Interactionbetweenthe
tirecompositionandthecarcanbeneglected.
11. Ninespecimensofunalloyedsteelweretakenandeachwashalved,onehalf
beingsentforanalysistoalaboratoryattheUniversityofAntarcticaandthe
otherhalftoalaboratoryattheUniversityofArctica.Thedeterminationsof
percentagecarboncontentwereasfollows:
SpecimenNo. l 2 3 4 5 6 7 8 9
UniversityofAntarctica 0.22 0.ll 0.46 0.32 0.27 0.l9 0.08 0.l2 0.l8
UniversityofArctica 0.20 0.l0 0.39 0.34 0.23 0.l4 0.l3 0.08 0.l6
Testforadifferenceindeterminationsbetweenthetwolaboratoriesatthe0.05
levelofsignificance.Neglectanypossibilityofinteraction.
12. Twoflowmeters,AandB,areusedtomeasuretheflowrateofbrineinapotash
processingplant.Thetwometersareidenticalindesignandcalibrationandare
mountedontwoadjacentpipes,Aonpipe1andBonpipe2.Onacertainday,
thefollowingflowrates(inm
3
/sec)wereobservedat10-minuteintervalsfrom
1:00p.m.to2:00p.m.
MeterA MeterB
1:00p.m. 1.7 2.0
1:10p.m. 1.6 1.8
1:20p.m. 1.5 1.6
1:30p.m. 1.4 1.3
1:40p.m. 1.5 1.6
1:50p.m. 1.6 1.7
2:00p.m, 1.7 1.9
Istheflowinpipe2significantlydifferentfromtheflowinpipe1atthe5%level
ofsignificance?
13. Thevisibilityoftwotrafficpaints,AandB,wastested,eachat8different
locations.Themeasuresofvisibilityweretakenafterexposuretoweatherand
trafficduringtheperiodJanuaryltoJulyl.Theresultswereasfollows:
PaintA PaintB
Location Visibility Location Visibility
lA 7 lB 8
2A 7 2B l0
3A 8 3B 8
4A 5 4B 5
5A 5 5B 3
243
Chapter9
6A 6 6B 9
7A 6 7B 8
8A 4 8B 5
TestthehypothesisthatthemeanvisibilityofpaintAislessthanthatofpaintB
atthe5%levelofsignificanceunderthefollowingconditions:
a) ifbothpaintsweretestedsimultaneouslyunderidenticalconditions,the
signsinpaintAandpaintBbeingerectedadjacenttooneanother.Neglect
anypossibilityofinteraction.
b) iftheAlocationsareonthewestsideandtheBlocationsareontheeastside
ofthecity.
14. Awaterqualitylabtestsforthebacterialcountindrinkingwaterinacertain
northerncity.
a) Atestismadeofaclaimintheliteraturethatthetimetoequilibriumin
bacterialgrowthisgreaterinnortherlyclimates,thestandarddeviation
remainingunaffected.Themeantimeinsoutherlycitieshasbeenfound,
frommanymeasurements,toequal24.1hourswithastandarddeviationof
2.3hours.Thenorthernlabtests21waterspecimensandfindsthemean
timetoequilibriumbacterialgrowthis25.4hours,withanestimatedstan-
darddeviationof2.2hours,whichisnotsignificantlydifferentfromthe
standarddeviationof2.3hoursquotedabove.Doesthisdatabearoutthe
claimintheliteratureabouttheincreaseinmeantimetoequilibrium,atthe
5%levelofsignificance?
b) Twosalesmenturnupatthelaboratoryoneweek,eachclaimingthatthe
additiveheissellingwilldecreasethetimetoequilibriumbacterialgrowth,
comparedtotheothersalesmansproduct.Thelaboratorydecidestocheck
outtheclaimsandtests6specimensofwater,halfofeachtreatedwitheach
ofthetwoproducts.Youshouldneglectanypossibilityofinteraction.What
doesthefollowingdataindicateastothesalesmensclaims(atthe5%level
ofsignificance)?
TimetoEquilibrium,hours
WaterSampleno. Additive1 Additive2
1 23.8 24.5
2 34.1 34.4
3 22.1 23.2
4 15.3 16.7
5 31.8 31.8
6 22.5 22.9
15. a) 41carsequippedwithstandardcarburetorsweretestedforgasusageand
yieldedanaverageof8.1km/litrewithastandarddeviationof1.2km/l.21
ofthesecarswerethenchosenrandomly,fittedwithspecialcarburetorsand
244
StatisticalInferencesfortheMean
tested,yieldinganaverageof8.8km/lwithastandarddeviationof0.9km/l.
Atthe5percentlevelofsignificance,doesthenewcarburetordecreasegas
usage?
b) Doesthefollowinggroupofdatabearoutthesameresult?Neglectany
possibilityofinteractionbetweenthetypeofcarburetorandothercharacter-
isticsofthecars.
CarNo. StandardCarburetor NewCarburetor
1 7.6 8.2
2 7.9 7.8
3 6.5 8.1
4 5.6 8.6
5 7.3 9.5
Supplementary Problems
Studentsmayneedpracticeindecidingwhetheraparticularproblemcanbedone
usingthenormaldistributionorrequiresthet-distribution.Thefollowingproblem
setcontainsbothtypes.
1. ThelivesofGlowbritelightbulbsmadebyGlownuffInc.haveameanof1000
hoursandstandarddeviation160hours.
a) Assuminganormaldistributionforthesamplemeans,findtheprobability
that25bulbswillhaveameanlifeoflessthan920hours.
b) TheConsumersAssociationdemandsthatthemeanlifeofsamplesof25
bulbsbenotbelow920hourswith99.9%confidence.Whatisthemaximum
permissiblestandarddeviation(for=1000hours)?
c) Themanufacturerhasinstitutedasamplingprogramtomaintainquality
control.Heintendsthattherebenomorethan5%probabilitythatthetrue
meanbulblifeismorethan20hoursdifferentfromthesamplemean.What
samplesizeshouldheuse,assumingthestandarddeviationisstill160
hours?
2. Electricalresistorsmadebyaparticularfactoryhaveacoefficientofvariationof
0.28%withanormaldistributionofresistances.
a) Findthe99%confidenceintervalforthemeanofsamplesofsizefiveifthe
populationmeanis10.00ohms.
b) Howmanyobservationsmustasamplecontaintogiveatleast99.5%prob-
abilitythatthesamplemeaniswithin0.30%ofthepopulationmean?
3. Slakedlimeisaddedtothefurnaceofanelectricpowerstationtoreducethe
productionofSO
2
(amajorcauseofacidrain).Extensivepreviousdatashowed
thatastandardmethodofaddingslakedlimereducedSO
2
emissionbyan
averagepercentageof31.0withastandarddeviationof4.70.Atestonanew
methodgivesmeanpercentageremovedof33.5basedonasampleofsize15
withnochangeinthestandarddeviation.Isthereevidenceatthe1%levelof
245
Chapter9
significancethatthenewmethodgiveshigherremovalofSO
2
thanthestandard
method?Anormaldistributionisfollowed.
4. a) ThemanufactureroftheEnergy-saverfurnaceclaimsameanenergyeffi-
ciencyofatleast0.83.Asampleof21Energy-saverfurnacesgivesasample
meanof0.81andsamplestandarddeviationof0.060.Datashowapproxi-
matelyanormaldistribution.Testwhetherthemanufacturersclaimcanbe
rejectedatthe5%levelofsignificance.
b) Itisknownthattheindustry-standardfurnacehasameanenergyefficiency
of0.78andastandarddeviationof0.055.Usethesamplemeanfor
Energy-saverfurnacestotestwhetherthesefurnaceshaveasignificantly
higherefficiencythantheindustrystandardatthe5%levelofsignificance.
5. Themeanyieldstressofacertainplasticisspecifiedtobe30.0psi.Thestandard
deviationisknowntobe1.20psi.Anormaldistributionisfollowed.
a) Ifthepopulationmeanis30.0psi,whatisthe95%confidenceintervalfor
themeanyieldstressof9specimens?
b) Asampleof9specimensshowsameanof27.4psi.Isthissamplemean
significantlydifferentfromthespecifiedmeanvalue?Usethe5%levelof
significance?
c) Isthesamplemeanfrompart(b)significantlylargerthan26.3psiat1%
levelofsignificance?
6. Thestandarddeviationofaparticulardimensiononamachinepartisknownto
be0.0053inches.Anormaldistributionisfollowed.Fourpartscomingoffthe
productionlinearemeasured,givingreadingsof2.747in,2.739in,2.750in,and
2.749in.Isthesamplemeansignificantlylargerthan2.740inchesatthel%level
ofsignificance?Whatistheprobabilityofacceptingthenullhypothesisifthe
truemeanis2.752in.andthestandarddeviationremainsunchanged?(Notice
thatthiswouldbeaTypeIIerror.)
7. Specimensofsoilwereobtainedfromasitebothbeforeandaftercompaction.
Testson10pre-compactionspecimensgaveameanporosityof0.413anda
standarddeviationof0.0324.Testson20post-compactionspecimensgavea
meanporosityof0.340andastandarddeviationof0.0469.Thesestandard
deviationsarenotsignificantlydifferent.Porosityfollowsanormaldistribution.
a) Atthe5%levelofsignificance,didthecompactioncorrespondtoasignifi-
cantreductioninmeanporosity?
b) Atthe5%levelofsignificance,isthereductioninmeanporositysignifi-
cantlylessthanthedesiredreductionof0.1?
8. Threemachinesareusedtopackdifferentcoloredcrystalsinabathsaltmixture.
Themachinesaresetformachines1and2toeachadd500gramsofsaltsand
machine3toadd750grams.Ithasbeenfoundthatthevariationaroundtheset
pointisnormallydistributedineachcasewiththefollowingdispersions:
246
StatisticalInferencesfortheMean
Machine StandardDeviation
1 20grams
2 10grams
3 25grams
a) Whatisthemeanweightofapackageofbathsalts?
b) Ifpackagesofbathsaltswithweightlessthan1.65kghavetoberepacked,
whatpercentageofthedaysoutputwouldfallintothiscategory?
c) Itisdecidedtosamplethefinaloutputtoestimatethemeanweightofthe
packages.Howbigasamplemustbetakentoestimatewith99%confidence
thatthetruemeanliesbetween99%and101%ofthesamplemean?
9. TwodifferentkindsofcerealdesignatedAandBarecombinedtoformanew
productcalledBrandX.Thecerealtypesareweighedindependentlyandmixed
automaticallybeforebeingpackedinaplasticbagwhichweighs10grams.The
weighingmachinesaresetsothat
A
=1000gramsand
B
=500grams.The
weightsarenormallydistributed,andthecoefficientofvariationineachcaseis
10%.
a) WhatisthemeantotalweightofabagofBrandX?
b) WhatistheprobabilitythatabagofBrandXwillcontainlessthan950
gramsofCerealAandmorethan450gramsofCerealB?
c) WhatistheprobabilitythatabagofBrandXwillcontainexactly1400
grams?
d) WhatistheprobabilitythatabagofBrandXwillcontainlessthan1400
grams?
e) Howmanybagsmustbeweighedtoensurewith95%confidencethatthe
truemeanweightofabaglieswithin30gramsofthesamplemean?
247
CHAPTER
10
StatisticalInferencesfor
VarianceandProportion
ForthischapterthereaderneedsagoodknowledgeofChapter9.
Forsection10.1asolidunderstandingofsections3.1and3.2isneeded,
whilesection10.2requiresagoodknowledgeofsection5.3.
ThegeneralapproachdevelopedinChapter9fortestsofhypothesisandconfidence
intervalsformeanscarriesovertosimilarinferencesforvariancesandproportions.
Theconceptsofnullhypothesis,alternativehypothesis,levelofsignificance,confi-
dencelevelsandconfidenceintervalscanbeapplieddirectly.
10.1 InferencesforVariance
Isasamplevariancesignificantlylargerthanapopulationvariance?Orisonesample
variancesignificantlylargerthananother,indicatingthatonepopulationismore
variablethananother?Thosearethesortsofquestionwearetryingtoanswerwhen
wecomparetwovariances.Toobtainanswerswewillintroducetwomoreprobability
distributions,thechi-squareddistributionandtheF-distribution.Mathematically,the
F-distributionisrelatedtotheratiooftwochi-squareddistributions.Wewillusethe
chi-squareddistributioninsection10.1.1tocompareasamplevariancewitha
populationvariance,andwewillusetheF-distributioninsection10.1.2tocompare
twosamplevariances.Wewillseeinpart(d)ofsection10.1.2thattheF-distribution
canbeusedalsotocompareasamplevariancewithapopulationvariance.Therefore,
atthistimethereadercanomitsection10.1.1,andsothechi-squaredprobability
distribution,ifthatseemsdesirable.Wewillneedthechi-squareddistributionlater
whenwecometoChapter13,wherewewillencounterthechi-squaredtestfor
frequencydistributions.
10.1.1 ComparingaSampleVariancewithaPopulationVariance
Saywearetryingtomaketheproductionfromaparticularprocesslessvariable,so
moreuniform.Toassesswhetherwehavebeensuccessfulwemighttakeasample
fromcurrentproductionandcompareitssamplevariancewiththepopulationvari-
anceestablishedunderpreviousconditions.Isthenewestimateofvariance
significantlysmallerthanthepreviousvariance?Ifitis,wehaveanindicationthat
theproductionhasbecomelessvariable,sothereissomeevidenceofsuccess.
Wewouldtestthetrialassumptionthatthenewsamplevarianceandtheprevious
populationvariancedifferonlybecauseofchance.Specifically,thenullhypothesisis
thatthenewpopulationvarianceisequaltothepreviouspopulationvariance.The
248
StatisticalInferencesforVarianceandProportion
alternativehypothesiswouldbethatthenewpopulationvarianceissmallerthanthe
previousone,sowehaveaone-sidedtest.Isthenewsamplevariancesomuch
smallerthanthepreviouspopulationvariancethatthenullhypothesisisveryun-
likely?Thesizeofthesamplewould,ofcourse,affecttheanswer.
(a) Chi-squaredProbabilityDistribution
Ifthesampleisfromanormaldistribution,theprobabilitydistributionwhichapplies
tothevariancesinthissituationisthechi-squareddistribution.Thechi-squared
distributionandthenormaldistributionarerelatedmathematically.ChiisaGreek
letter,,whichispronouncedkigh,likehigh.Arelationshipcanbederivedamong
2 2
, ,s
2
,andthenumberofdegreesoffreedomonwhichs
2
isbased,(n1).This
relationshipis
2
(
n1
)
s
2
(10.1)
2

Thedensityfunctionofthe
2
distributionisunsymmetrical,anditsshape
dependsonthenumberofdegreesoffreedom.Probabilitydensityfunctionsforthree
differentnumbersofdegreesoffreedomareshowninFigure10.1.Asthenumberof
degreesoffreedomincreases,thedensityfunctionbecomesmoresymmetricalasa
functionof
2
.Foranyparticularnumberofdegreesoffreedom,themeanofthe
distributionisequaltothenumberofdegreesoffreedom.
1
pdf
1df
4df
8df
Figure10.1:ShapesofProbability
0.75
DensityFunctionsforSome
Chi-squared Distributions
0.5
0.25
0
0 5 10 15 20
Chisquared
TableA3inAppendixAgivesvaluesof
2
correspondingtosomevaluesofthe
upper-tailprobability.
IfacomputerwithExcelorsomealternativesisavailable,valuescanbefound
2
fromthecomputerinsteadoffromtables.Probabilitiescorrespondingtovaluesof
canbefoundfromtheExcelfunctionCHIDIST.Theargumentstobeusedwiththis
functionarethevalueof
2
andthenumberofdegreesoffreedom.Thefunctionthen
249
Chapter10
returnstheupper-tailprobability.Forexample,for
2
=18.49at30degreesof
freedom,wetypeinacellforaworksheettheformulaCHIDIST(18.49,30),orelse
wecanpasteinthefunction,CHIDIST(,),thentypeintheargumentsandchoose
2
theOKbutton.Theresultis0.95005,theprobabilityofobtainingavalueof
greaterthan18.49completelybychance.
Ifwehaveavalueoftheupper-tailprobabilityandthenumberofdegreesof
freedom,weusetheExcelfunctionCHIINVtofindthevalueof
2
.Again,thefunction
canbechosenusingtheFormulamenuoritcanbetypedintoacell.Foranupper-tail
probabilityof0.95and30degreesoffreedom,CHIINV(0.95,30)gives18.4927.
Wewillusethe
2
distributioninthischaptertocompareasamplevariancewith
apopulationvariance.InChapter13wewillusethissamedistributionforanentirely
differentpurpose,tocomparetwoormorefrequencydistributions.
(b) TestofSignificanceforVariances
Letuslookatanexample.
Example10.1
Thepopulationstandarddeviationofstrengthsofsteelbarsproducedbyalarge
manufactureris2.95.Inordertomeettighterspecificationsengineersaretryingto
reducethevariabilityoftheprocess.Asampleof28barsgivesasamplestandard
deviationof2.65.Assumethatthestrengthsofsteelbarsarenormallydistributed.Is
thereevidenceatthe5%levelofsignificancethatthestandarddeviationhasde-
creased?
Answer: H
0
:
2
=(2.95)
2
=8.70
H
a
:
2
<8.70(one-tailedtest)
2
(
n1
)
s
2
Theteststatisticwillbe
2
.

2
issufficientlysmall,thenH
0
isnotlikelytobetrue. If
calculated
2
2
(
n1
)
s
(
28 1
)(
2.65
)
2

calculated
2 2
20.98

(
2.95
)
Figure10.2:
5%probability
Chi-squared Distribution
16.1 21.0 Chi-squared
250
StatisticalInferencesforVarianceandProportion
FromTableA3,for5%probabilityinthelowertail(thelefthandtail)andthere-
fore95%probabilityintheuppertail,theonetotheright,andwith281=27
2 2
degreesoffreedom,wefind
2
,sothecalculated
critical
=16.15.Then
calculated
>
critical
valuedoesnotfallinthecross-hatchedtailfor5%probability.Thepopulation
varianceisnotsignificantlylessthan8.70,sothepopulationstandarddeviationisnot
significantlylessthan2.95.Wedonothaveevidenceatthe5%levelofsignificance
thatthestandarddeviationofstrengthsofthesteelbarshasdecreased.
AnalternativemethodforsolvingthissortofproblemusingtheF-distribution
willbegiveninsection10.1.2(d).
(c) ConfidenceIntervalsforPopulationVarianceorStandardDeviation
Ifwehaveanestimateofthevarianceorstandarddeviationfromasample,wecan
determineacorrespondingconfidenceintervalforthevarianceorstandarddeviation
forthepopulation.Again,letsexamineanexample.
Example10.2
Asampleof15concretecylinderswastakenrandomlyfromtheproductionofa
plant.Thestrengthofeachspecimenwasdetermined,givingasamplestandard
deviationof215kN/m
2
.Findthe95%confidenceinterval(withequalprobabilitiesin
thetwotails)forstandarddeviationofthestrengths.Assumethestrengthsfollowa
normaldistribution.
Answer: s
2
=(215)
2
=46,225basedon151=14degreesoffreedom.
2
(
n1
)
s
2
TherelevantstatistictobefoundfromtablesorExcelis
2
.

2
(
n1
)
s
2
Thentheconfidencelimitswillbefoundfrom
2
usingvaluesof
2
at

cumulativeprobabilitiesof0.025and0.975for14d.f.
0.025
Thelimitingvaluesof
2
canbefoundfrom
eitherTableA3orExcel.FromTableA3for14d.f.
0.025
thelimitingvaluesof
2
are5.63atacumulative
probabilityof0.025(soupper-tailareaof0.975)
and26.12atacumulativeprobabilityof0.975(so
2
upper-tailareaof0.025).Thesamenumbers
5.63 26.12
(expressedinmorefigures)arefoundfrom
Figure10.3:Confidencelimits
CHIINV(0.025,14)andCHIINV(0.975,14).Limit-
for Chi-squared Distribution
ingvaluesareshownonFigure10.3.The
( )(
46225
)
14
(
46225
)
14
correspondinglimitson
2
are 115,000 and
( )
24,800.
5.63 26.12
Thelimitsonarethesquarerootsofthesenumbers,339and157.Then,the95%
confidenceintervalforstandarddeviationisfrom157to339kN/m
2
.
251
Chapter10
10.1.2 ComparingTwoSampleVariances
Saywehavetwosamplevariances.Isonesamplevariancesignificantlydifferent
from(orelselargerthan)theother?Or,ontheotherhand,isitreasonabletosaythat
bothsamplevariancesmighthavecomefromthesamepopulation?Theappropriate
testofhypothesisistheF-testorVariance-ratiotest.Wecalculatetheratioofthe
twosamplevariances:
2
F
s
1
(10.2)
s
2
2
wheres
1
2
istheestimateofpopulationvarianceonthebasisofsample1,ands
2
2
is
theestimateofpopulationvarianceonthebasisofsample2.Inthisbookwewillput
thelargerestimateofvarianceinthenumeratorandcallits
1
2
sothatthequantityFis
largerthan1.
(a) ProbabilityDistributionforVarianceRatio
AcriticalorlimitingvalueofFisobtainedfromtablesorExcel.Thesetheoretical
valuesmustberelatedtotheratioofone
2
functiontoanother.Infact,thetheoretical
statisticFisdefinedastheratiooftwoindependentchi-squaredrandomvariables,
eachdividedbyitsnumberofdegreesoffreedom,butwedontneedtogointothe
detailshere.Rememberthatweassumedthatthesamplecamefromanormaldistri-
butioninordertomakethechi-squareddistributionapplicabletothevariances,and
thesameassumptionisrequiredtomaketheF-distributionapplicablehere.
TheshapeoftheF-distributionisalwaysunsymmetrical,skewedtotheright.The
shapedependsonthenumbersofdegreesoffreedominthesamplevariancesinboth
thenumeratorandthedenominator.
Figure10.4showstheshapesoftwo
0.8
F-distributions.
pdf
TheprobabilitythatF>f
1
depends
0.6
onthenumberofdegreesoffreedom
inthenumeratorandthenumberof
degreesoffreedominthedenomina- 0.4
tor,aswellasthevalueoff
1
.Toshow
allthecombinationsofparametersthat
mightbeneededinpracticalcalcula-
0.2
tionswouldrequireaveryextensive
table.Theusualpracticeistoshowin
0
atableonlyalimitedselectionof
6,24df
4,10df
0 1 2 3
values.TableA4inAppendixAisin
twoparts.Forvariouscombinationsof
f
degreesoffreedomforvarianceinthe
Figure10.4:ShapesofTwoF-distributions
numerator,df
1
,anddegreesoffreedom withVariousDegreesofFreedomin
forvarianceinthedenominator,df
2
,
NumeratorandDenominator
252
4
StatisticalInferencesforVarianceandProportion
valuesofFwhichwillgiveanupper-tailprobabilityof0.05areshownonthefirst
pageofTableA4.Forvariouscombinationsofdf
1
anddf
2
,valuesofFwhichwill
giveanupper-tailprobabilityof0.01areshownonthesecondpageofTableA4.If
combinationsofdf1anddf2thatarenotshownonTableA4areneeded,interpolation
isrequired.
IfacomputerisavailablewithExcelorsomealternative,itcanbeusedtofind
probabilitiescorrespondingtoanyapplicablevalueofF,orelsevaluesofFcorre-
spondingtoanyapplicableprobability.Thesewouldbothbefortherequired
combinationofdegreesoffreedominthenumeratoranddegreesoffreedominthe
denominator.TheExcelfunctionFDISTgivestheprobabilitydistributionforF.The
argumentstobeusedwiththisfunctionarethevalueofF,thenumberofdegreesof
freedomforvarianceinthenumerator,andthenumberofdegreesoffreedomfor
varianceinthedenominator.ThenExcelwillgivethecorrespondingupper-tail
probability,thatis,Pr[F>f
1
].Similarly,theExcelfunctionFINVgivesthevalueof
Fforstatedupper-tailprobability.IfweenterFINV(upper-tailprobability,degreesof
freedomforvarianceinthenumerator,degreesoffreedomforvarianceinthede-
nominator),ExcelwillgivethecorrespondingvalueofF.
(b) TestofSignificance:theF-testorVariance-ratioTest
NowwecompareacalculatedvalueofFtoachosenorcriticalvalueofF.Isthe
calculatedvaluesolargethatitisveryunlikelythatitcouldhaveoccurredby
chance?Thesamplesmusthavebeenchosenrandomlyandindependently.
Wemakethenullhypothesisthatthedifference
betweenthetwoestimatesofvarianceisentirely
2
duetochance,so
1
=
2
2
.Thealternativehypoth-
Levelofsignificance
s
esisiseitherthat
1
2

2
2
foratwo-sidedtest,or
elsethat
1
2
>
2
2
foraone-sidedtest.Becausewe
putthelargerestimateofvarianceinthenumerator,
1
2
>s
2
2
,wehavenoreasontoconsiderthepossibil-
2 f
itythat
1
2
<
2
.
f
critical
Ifthevarianceratio,F,istoolarge,thenthereis
Figure10.5:
littleprobabilitythatthenullhypothesisistrue.
LevelofSignificance
Specifically,theprobabilityofobtainingthislargea foraone-sidedF-test
valueofForlargerpurelybychance,whenthenull
hypothesisistrue,isequaltotheobservedlevelofsignificance.Suchaprobability
mustalsodependonthenumbersofdegreesoffreedomonwhicheachestimateof
varianceisbased.Thesearedf
1
degreesoffreedomforthelargerestimateofvariance
andsoforthenumerator,anddf
2
degreesoffreedomforthesmallerestimateof
varianceandsoforthedenominator.
Forthe5%levelofsignificance,thelimitingvalueofFforaone-sidedtestmust
besuchthatPr[F>f
critical
]=0.05,andsimilarlyforotherlevelsofsignificance.
253
Chapter10
f
Foratwo-sidedF-test,butsetupsothatf
calculated
>1,thesamevaluesoff
critical
applyforlevelsofsignificancetwiceasgreattoallowforbothtails.Forexample,
critical
foratwo-sidedtestat2%levelofsignificanceisthesameasf
critical
foraone-
sidedtestat1%levelofsignificance.
Example10.3
TwoadditivestoPortlandcementarebeingtestedfortheireffectonthestrengthof
concrete.21batchesweremadewithAdditiveA,andtheirstrengthsshowedstandard
deviations
A
=41.3.16batchesweremadewiththesamepercentageofAdditiveB,
andtheirstrengthsshowedstandarddeviations
B
=26.2.Assumethatthestrengthsof
concretefollowanormaldistribution.Isthereevidenceatthe1%levelofsignifi-
cancethattheconcretemadewithAdditiveAismorevariablethanconcretemade
withAdditiveB?
2 2
Answer: H
0
:
A
=
B
2
H
a
:
A
2
>
B
(one-tailedtest)
2
s
A
TheteststatisticwillbeF= 2 .Largevaluesoff
calculated
willindicatethatthenull
s
B
hypothesisisnotlikelytobetrue.
2
s
A
41.3
2
f
calculated

2

26.2
2
2.485basedon20degreesoffreedomforthenumeratorand
s
B
15degreesoffreedomforthedenominator.
FromthesecondpartofTableA4,for1%levelofsignificancewithdf1=20and
df2=15,f
critical
=3.37.Alternatively,fromthefunctionFINVinExcel,
FINV(0.01,20,15)givesf
critical
=3.37189476.
Sincef
calculated
<f
critical
,thedifferenceisnotsignificantatthe1%levelofsignificance.
Thenatthislevelofsignificancethereisnotsufficientevidencetosaythatthe
strengthofconcretemadewithAdditiveAismorevariablethanthestrengthof
concretemadewithAdditiveB.
Example10.4
UsingthesamefiguresasinExample10.3,isthereevidenceatthe10%levelof
significancethatconcretemadewithAdditiveAandconcretemadewithAdditiveB
havedifferentvariabilities?Again,assumethatthestrengthsofconcretefollowa
normaldistribution.
2 2
Answer: H
0
:
A
=
B
2
H
a
:
B
2

B
(two-tailedtest)
254
StatisticalInferencesforVarianceandProportion
2
s
A
TheteststatisticwillbeF= 2 .Largevaluesoff
calculated
willindicatethatH
0
is
2
unlikelytobetrue.Asbefore,
s
B
f
s
A
2

41.3
2
2.485basedon20degreesof
calculated
s
B
26.2
2
freedomforthenumeratorand15degreesoffreedomforthedenominator.
FromthefirstpartofTableA4,for5%upper-tailarea,correspondingto10%level
ofsignificanceforatwo-tailedtest,withdf1=20anddf2=15,f
limit
=2.33.Alterna-
tively,fromthefunctionFINVinExcel,FINV(0.01,20,15)gives2.32753194.
Sincef
calculated
>f
limit
,thereisevidenceatthe10%levelofsignificancethatconcrete
madewithAdditiveAandconcretemadewithAdditiveBhavedifferentvariabilities.
Besidescomparisonsinwhichthemajorobjectiveistoseewhetheronesetofdatais
significantlymorevariableorhasdifferentvariabilitythananotherset,theF-testis
usedfortwomainpurposes:
1. Toseewhethertwoestimatesofvariancecanbecombinedorpooledtocompare
meansbyanunpairedt-test.InthiscasetheF-testwouldbetwo-tailed.Usually,
ifthevariancesarenotsignificantlydifferentat(letussay)the10%levelof
significance,theycanbecombinedtogiveabetterestimateofvariancetousein
thet-test.
2. Tocomparetwoestimatesofvariancefromdifferenttypesofdataaspartofthe
analysisofvariance,whichwillbeconsideredmorefullyinChapter12.Insome
casesthetotalvariationofdatafromanexperimentcanbebrokendownintotwo
estimatedvariances,saythevariancewithingroupsandthevariancebetween
groups.Thevariancewithingroupscomesfrom
repeatedmeasurementsatthesameconditionandso
givesanestimateofthevarianceduetoexperimental
error.Thevariancebetweengroupsarisesfrom
1%probability
differenttreatmentsordifferentconditionsaswellas
fromexperimentalerror.Thequestiontobeanswered
is,isthevariancebetweengroupssignificantlylarger
thanthevariancewithingroups?Ifso,thatisan
f
limit
f
4.60 6.53
indicationthatthevariationoftreatmentsorcondi-
tionshasaneffectontheresults.Acriticallevelof
Figure10.6:
significancemustbestated.Thisisaone-tailedF-test. TestofSignificance
Example10.5
Intheresultsfromanexperimenttheestimatedvariancewithingroups(WG),based
on27degreesoffreedom,is233,whiletheestimatedvariancebetweenthegroups
(BG),basedon3degreesoffreedom,is1521.Isthereevidenceatthe1%levelof
significancethatthedifferenceinconditionsbetweenthegroupshasaneffectonthe
results?Thedatahavebeenplottedonnormalprobabilitypaper,showingreasonable
agreementwithnormaldistributions.
255
Chapter10
2
Answer: H
0
:
WG
2
=
BG
2
H
a
:
BG
2
>
WG
(one-tailedtest)
s
2
BG
2
s
TheteststatisticwillbeF= 2 ,inthatorderbecauses
BG
2
>s
WG
.
WG
Iff
calculated
issufficientlylarge,H
0
willnotbeplausible.
2
f
calculated
=
s
BG
2

1521
6.53.Anothernameforf
critical
isf
limit
.For1%levelofsignifi-
s
WG
233
canceinaone-tailedtest,with3degreesoffreedominthenumeratorand27degrees
offreedominthedenominator,thesecondpartofTableA4givesf
limit
=4.60.Alterna-
tively,fromthefunctionFINVinExcel,FINV(0.01,3,27)gives4.60090632.
Sincef
calculated
>f
limit
,thereisevidenceatthe1%levelofsignificancethatthediffer-
enceinconditionsbetweenthegroupshasaneffectontheresults.
(c) ConfidenceIntervalforRatioofSampleVariances
Apointestimateoftheratiooftwopopulationvariancesisgivenbythecorresponding
2
s
1
ratiooftwosamplevariances, 2 .Itisquitefeasibletoderiveaconfidenceinterval
s
2

1
2
fortheratioofthepopulationvariances,

2
2 ,aslongasthesamplesweretaken
randomlyfromnormaldistributions.However,practicalapplicationsofthistech-
niquebyengineersarehardtofind,sotheseconfidenceintervalswillnotbe
discussedfurtherhere.Iftheyshouldbeneeded,thereaderisreferredtobooksby
WalpoleandMyersandbyVardeman(referencesinsection15.2).Ontheotherhand,
confidenceintervalsforpopulationvariances(ratherthantheirratios)areveryuseful
andhavebeendiscussedinsection10.1.1(c).
(d) UsingtheVarianceRatiotoCompareaSampleVariancewitha
PopulationVariance
Insection10.1.1wesawthatthechi-squareddistributioncanbeusedtocomparea
samplevariancewithapopulationvariance.Analternativemethodofmakingthis
comparisonusestheF-distribution.Ifoneofthevariancesisapopulationvariance,
itsnumberofdegreesoffreedomwillbeinfinite.InthissectionExample10.1will
besolvedbythisalternativemethod.
Example10.1(Alternativesolution)
Thepopulationstandarddeviationofstrengthsofsteelbarsproducedbyalarge
manufactureris2.95.Inordertomeettighterspecificationsengineersaretryingto
reducethevariabilityoftheprocess.Asampleof28barsgivesasamplestandard
deviationof2.65.Assumethatthestrengthsofsteelbarsarenormallydistributed.Is
thereevidenceatthe5%levelofsignificancethatthestandarddeviationhasdecreased?
256
StatisticalInferencesforVarianceandProportion
Answer: H
0
:
2
=(2.95)
2
=8.70
H
a
:
2
<8.70(one-sidedtest)
2
2
s
1

TheteststatisticwillbeF= 2 2 .IfF
calculated
issufficientlylarge,thenH
0
isnot
s
s
2
likelytobetrue.
2
2
(
2.95
)
= 1.24 F
calculated
2 2
F
s
(
2.65
)
FromTableA4for5%upper-tailprobability,degreesoffreedominthenumerator
(df1)and281=27degreesoffreedominthedenominator(df2),F
limit
=1.67.Then
calculated
<F
limit
,sothecalculatedvalueisnotsignificantatthe5%levelofsignifi-
cance.Therefore,wedonothaveevidenceatthe5%levelofsignificancethatthe
standarddeviationhasdecreased.
Problems
1. Atestinglaboratoryistryingtomakeitsresultsmoreconsistentbystandardizing
certainprocedures.Fromasampleofsize28thesamplestandarddeviationby
therevisedprocedureisfoundtobe1.74units.Plottingconcentrationson
normalprobabilitypaperdidnotshowanymarkeddeparturefromanormal
distribution.Isthereevidenceatthe5%levelofsignificancethatthesample
standarddeviationissignificantlylessthantheformerpopulationstandard
deviationof2.92units?
2. Itisknownfromlongexperiencethat,foraparticularchemicalcompound,
determinationsmadewithamassspectrometerhaveavarianceof0.24.An
analystwhoisnewtothejobmakesaseriesof28determinationswiththe
spectrometerandtheygiveanunbiasedestimateofvarianceof0.32.Plottingthe
resultsonnormalprobabilitypaperindicatesthatthedatadonotvarysignifi-
cantlyfromanormaldistribution.Isthesampleestimateofvariancesignificantly
largerthanthevariancebasedonlongexperience?Usea5%levelofsignificance.
3. Yieldstressesforshearweremeasuredinarandomsampleconsistingof28soil
specimens.Plottingthedataonnormalprobabilitypapershowednoapparent
departurefromnormaldistribution.Thesamplestandarddeviationwasfoundto
be285kN/m
2
.Findthetwo-sidedconfidencelimits(withequalprobabilities in
thetwotails)forthestandarddeviationoftheyieldstress.
4. Asampleconsistsof21specimens,eachtakenbyastandardprocedurefroma
differentfiltercakeonanindustrialfilter.Moisturecontentsofthespecimens
weremeasured.Plottingthedataonnormalprobabilitypaperindicatednegli-
gibledeparturefromnormaldistribution.Thesamplestandarddeviationof
percentagemoisturecontentswasfoundtobe3.21.Findthetwo-sided90%
confidencelimits(withequalprobabilitiesinthetwotails)forthestandard
deviationofpercentagemoisturecontents.
257
Chapter10
5. Thecoefficientsofthermalexpansionoftwoalloys,AandB,arecompared.Six
randommeasurementsaremadeforeachalloy.ForalloyA,thecoefficients
(10
6
)are12.95,14.05,12.75,12.10,13.50and13.00.Coefficients(10
6
)for
alloyBare14.05,15.35,14.35,15.15,1385and14.25.Assumethevaluesfor
eachalloyarenormallydistributed.IsthevarianceofcoefficientsforalloyA
significantlydifferentfromthevarianceofcoefficientsforalloyB?Usethe10%
levelofsignificance.
6. Thecarbondioxideconcentrationintheairwithinanenergy-efficienthousewas
measuredonceeachmonthoveranentireyear.Themeasurements(inppm)for
JanuarytoDecember,respectively,were650,625,480,400,325,305,310,305,
490,540,695,and600.Assumethatthesemeasurementsfollowanormal
distribution.Theconcentrationofcarbondioxideinanolderhousealsowas
measuredeachmonthinthesameyear,butonadifferentdayofthemonththan
fortheenergyefficienthouse.ThedataforthishouseforJanuarytoDecember,
respectively,were505,530,430,400,300,300,305,310,320,410,520,and
540.Atthe10%levelofsignificance,isthereadifferenceinthevariabilityof
carbondioxideconcentrationbetweenthetwohouses?
7. Thestandardwayofmeasuringwatersuctioninsoilisbyatensiometer.Anew
instrumentformeasuringthisparameterisanelectricalresistivityprobe.A
purchaserisinterestedinthevariabilityofthereadingsgivenbythenewinstru-
ment.Thepurchaserputbothinstrumentsintoalargetankofsoilattendifferent
locations,bothinstrumentssidebysideateachlocation,andobtainedthe
followingresults.
Suction(incm)Measuredby
Tensiometer ElectricalResistivityProbe
355 365
305 300
360 375
330 360
345 340
315 320
375 385
350 380
330 330
350 390
a) Chooseanappropriatelevelofsignificanceandtestforasignificantdiffer-
enceinthevarianceofthetwoinstruments.
b) Itisknownfromextensivemeasurementsthatthevarianceofthetensiometer
readingsinatankofsoillikethisshouldbe350cm
2
. Chooseanappropriate
levelofsignificanceandtestwhethertheelectricalresistivityprobegivesa
highervariabilitythanexpected.
258
StatisticalInferencesforVarianceandProportion
8. Ageneralcontractorisconsideringpurchasinglumberfromoneoftwodifferent
suppliers.Asampleof12boardsisobtainedfromeachsupplierandthelengthof
eachboardismeasured.Theestimatedstandarddeviationsfromthesamplesare
s
1
=0.13inchands
2
=0.17inch,respectively.Assumethelengthsfollowa
normaldistribution.Doesthisdataindicatethelengthsofonesuppliersboards
aresubjecttolessvariabilitythanthosefromtheothersupplier?Testusinga
levelofsignificanceequalto0.02.
9. Wireofacertaintypeissuppliedtoanelectricalretailerbyeachoftwomanufac-
turers,AandB.Usersofthewiresuggestthatthereismorevariability(from
specimentospecimen)intheresistanceofthewiresuppliedbyCompanyAthan
inthatsuppliedbyCompanyB.Randomsamplesofwirefromspoolsofthewire
suppliedbythetwocompaniesweretaken.Theresistancesweremeasuredwith
thefollowingresults:
Company A B
NumberofSamples 13 21
SumofResistances 96.8 201.4
SumofSquaresofResistances 732.30 1936.90
Assumetheresistanceswerenormallydistributed.Usetheresultsofthese
samplestodetermineatthe5%levelofsignificancewhetherornotthereis
evidencetosupportthesuggestionoftheusers.
10. Astudyofwaveactiondownstreamofadamspillwaywascarriedoutbeforeand
afteramodificationwasmadetothestructure.Themodificationwasintendedto
reducewaveaction,whichisindicatedbyvariabilityinthedepthofwater.
Depthsofwaterweremeasuredinmeters.Beforemodification41measurements
gaveasamplestandarddeviationof2.80.Aftermodification51measurements
gaveasamplestandarddeviationof1.49.
a) Choosinganappropriatelevelofsignificance,determineifthereisasignifi-
cantreductioninvariabilityinthewaterdepthi.e.,asignificantreduction
inwaveaction.
b) Isthepre-modificationwaveactionatthissiteanydifferentfromthatat
anothersitewhere51measurementsgaveasamplevarianceofthedepthof
2.65m
2
?Chooseanappropriatelevelofsignificance.
11. InarandomsurveyofgasolinestationsinSaskatchewanandAlberta,theaverage
pricesperliterofunleadedregulargasolineandthecorrespondingstandard
deviationswereasfollows:
Province SampleSize Mean Standarddeviation
(Cents/liter) (Cents/liter)
Alberta 14 68.8 1.1
Saskatchewan 9 70.7 0.8
259
Chapter10
a) Usingthe10% levelofsignificance,testtheclaimthatthepriceperliterof
gasolineisequallyvariableinthetwoprovinces.
b) Atwhatlevelofsignificancecanyouconcludethattheaveragegasoline
priceinAlbertaislessthaninSaskatchewan?
12. Itwasclaimedbyasandfiltersalesmanthatthemeanconcentrationsofsolids
afterfilteringarenormallydistributedandhaveanaveragevalueof.025percent
solids,andthat95%ofrecordedconcentrationswillnotexceed.030percent
solids.Inordertocheckthevalidityofthisclaimasampleof21measurements
ofsolidsconcentrationafterfilteringwastaken.Ameanvalueof.0265percent
solidsandasamplestandarddeviationof0.0042percentsolidswerefound.
a) Istherereasonatthe5%levelofsignificancetosuspectthattheoutputis
morevariablethanthesalesmanclaims?
b) Assumingtheanswertoparta)isno,istherereasontosuspectthatthefilter
islessefficientthanthesalesmanclaimsatthe5%levelofsignificance?
13. Sixrandomdeterminationsofsulfurcontentinsteelataparticularpointina
processgavethevalues3.07,3.11,3.14,3.24,3.16,and3.08.Assumethevalues
arenormallydistributed.Apreviousstudybasedonasampleof21random
observationsgaveanestimateofvarianceof1.5110
3
.Isthevariancesignifi-
cantlyhighernow?Usethe5%levelofsignificance.
14. Thefollowingarethevalues,inmillimeters,obtainedbytwoengineersinten
successivemeasurementsofthesamedimension.
EngineerA l0.06 l0.00 9.94 10.l0 9.90 l0.04 9.98 l0.02 9.96 l0.00
EngineerB l0.04 9.94 9.84 9.96 9.92 9.98 9.90 9.94 9.92 9.96
a) Atl0%levelofsignificance,isoneengineermoreconsistentinhismeasur-
ingthantheother?
b) At5%levelofsignificance,isthereadifferenceinthemeanvaluesobtained
bythetwoengineers?
15. Fromasetofexperimentalresultsthesampleestimateofthevariancewithin
groups,basedon40degreesoffreedom,is312,andthesampleestimateofthe
variancebetweengroups,basedon5degreesoffreedom,is987.Atthe5%level
ofsignificance,canwesaythatthedifferenceinconditionsbetweengroupshasa
significanteffect?Thedatahavebeenplottedonnormalprobabilitypaper,
showingreasonableagreementwithanormaldistribution.
16. Analysisofasetofexperimentsgivesanestimatedvariancewithingroups,based
on20degreesoffreedom,of4.55,andanestimatedvariancebetweengroups,
basedon4degreesoffreedom,of21.3.Isthereevidencetosayatthe5%level
ofsignificancethatthedifferencebetweengroupsissignificant?Whendataare
plottedonnormalprobabilitypapertheyshowreasonableagreementwitha
normaldistribution.
260
StatisticalInferencesforVarianceandProportion
10.2 InferencesforProportion
Letusconsideratypicalengineeringprobleminvolvinginferenceforproportion,
mostoftenaproblemfromtheareaofqualitycontrolorqualityassurance.Engineers
inindustryoftenneedtofindtheproportionofrejecteditemsamongtheunits
producedbyaproductionline.Wewouldattacksuchaproblembytakingarandom
sample.Wewouldexamineacertainnumberofunits,saynunits,astheyarepro-
duced.Wewoulddetermineforthatsamplethenumberofrejectedunits,sayxof
them.Thentheratioofxtongivesanindicationoftheproportionofrejectsinallthe
itemsproducedunderthoseconditions.Infact,thisturnsouttobeanunbiased
estimateoftheproportionofrejectsinthatpopulation,althoughitmaybeavery
preliminaryestimate.Wewillstillneedsomeindicationofhowprecisetheestimate
is,andbytakingalargeenoughsamplewecanmaketheestimateaspreciseas
desired.Thenwemightfindconfidencelimitsfortheproportionofrejectsinthe
population.
Ifwelatertakeasampleofasuitablesizeandfindthattheproportionofrejects
inthesampleissolargethatthedifferencefromthepreviousresultissignificantata
particularlevelofsignificance,thatwouldbeanindicationthattheproportionof
rejectsinthepopulationhaschanged.Asanotherpossibility,wemaymakesome
modificationofoperatingconditionsandtakeasampleofsuitablesize.Analysis
wouldindicatewhetherthereisstatisticallysignificantevidencethatthemodification
hasreducedtheproportionofrejectsinthepopulation.
ThemethodsweusedinChapter9tofindanswerstosimilarquestionsforthe
mean(andinsection10.1forthevariance)canbeappliedtoquestionsinvolving
proportionwithoutmuchmodification,butnowthebinomialdistributionwillbe
appropriateinsteadofthenormaldistributionort-distributionorF-distribution.
10.2.1 ProportionandtheBinomialDistribution
Wehaveseeninsection5.3(g)thatifcertainreasonableassumptionsaresatisfied,the
proportionofrejectsinasampleisgovernedbyaformofthebinomialdistribution.
Ifarandomsampleofsizenisfoundtocontainxrejects,thenonthebasisofthat
samplewewouldestimatetheproportionofrejectsintherelevantpopulationtobe
p
x
.Accordingtoequation5.13themathematicalexpectationofthesample
n
proportionrejectedis

p
=p,wherepisthetrueproportionofrejectsinthepopula-
tion.Accordingtoequation5.14,thevarianceoftheproportionrejectedinarandom
p(
1 p)
sampleofthatsizeis .
n
10.2.2 TestofHypothesisforProportion
Ifthenumberofdefectiveitemsinasampleistoolarge,wehaveanindicationthat
theproportionofdefectiveitemsinthepopulationhasbecomeunacceptable.
261
Chapter10
(a) DirectCalculationfromtheBinomialDistribution
Ifthesamplesizeandnumberofdefectiveitemsinthesamplearefairlysmall,we
cancalculateusingthebinomialdistributiondirectly.
Example10.7
Mechanicalcomponentsareproducedcontinuouslyinlargenumbersonaproduction
line.Whenthemachinesarecorrectlyadjusted,extensivedatashowthatthepropor-
tionofdefectivecomponentsis0.027.Iftheproportionofdefectivesinasampleof
size50issolargethattheresultissignificantatthe5%level,theproductionline
willbestoppedforadjustment.
a) Whatprobabilitydistributionapplies?
b) Whatisthesmallestproportionofdefectiveitemsinasampleof50thatwillstop
theproductionline?
Answer:(a)Thebinomialdistributionappliesbecausethereareonlytwopossible
results,theprobabilityofdefectiveitemsisassumedconstant,eachresultis
independentofeveryotherresult,andthenumberoftrialsisfixed.
(b) Theproductionlinewillbestoppediftheproportionofrejecteditemsina
sampleof50issolargethattheobservedlevelofsignificanceis5%orless.
Nullhypothesis,H
0
:p=0.027
Alternativehypothesis,H
a
:p>0.027(one-tailedtest)
Thebinomialdistributionapplieswithn=50andp=0.027,so
Pr[X=x]=
50
C
x
(0.027)
x
(0.973)
(50x)
Letthelimitingproportionofdefectiveitemstostoptheproductionlinebe
x
lim
x
lim
.
Thenthecumulativeprobabilityofaproportiondefectiveuptoand
x
n 50
lim
including mustbenomorethan5%whenthetrueproportiondefectiveis0.027.
50
Thatis,wechoosethesmallestvalueof p whichwillsatisfytherequirementthat
,
Pr
,
p
x
lim
]
0.05onconditionthatthetrueproportiondefectiveisp=0.027.
]

50
]
Theprobabilitythatthesamplewillcontainnorejectsis
Pr[p =0]=Pr[X=0]=(0.973)
50
=0.254
Similarly,Pr[p =0.02]=Pr[X=1]=(50)(0.027)
1
(0.973)
49
=0.353
50
Pr[p =0.04]=Pr[X=2]=
( )(
49
)
(0.027)
2
(0.973)
48
=0.240
2
262
StatisticalInferencesforVarianceandProportion
50
Pr[p =0.06]=Pr[X=3]=
3 2
)
(0.027)
3
(0.973)
47
=0.107
( )(
49
)(
48
)
( )(
(
50
)( )( )( )
49 48 47
Pr[p =0.08]=Pr[X=4]=
4 3
)(
2
)
(0.027)
4
(0.973)
46
=0.035
( )(
(
50
)( )( )( )(
46
)
49 48 47
Pr[p =0.10]=Pr[X=5]=
5 4
)(
3
)(
2
)
(0.027)
5
(0.973)
45
=0.009
( )(
Probabilitiesaredecreasingrapidly,andthetotalprobabilitytothispointis0.998(to
threefigures),sothecriticalnumberofrejecteditemsatthe5%levelofsignificance
hasbeenreachedorexceeded.Toseejustwheretheboundaryforthatlevelof
significanceislocated,wecalculatesuccessivecumulativeprobabilities:
Pr[p 0.02] =1Pr[p =0]=10.254=0.746.
Pr[p 0.04]=1Pr[p =0]Pr[p =0.02]
=10.2540.353=0.392
Pr[p 0.06]=1Pr[p =0]Pr[p =0.02]Pr[p =0.04]
=10.2540.3530.240=0.152
Pr[p 0.08]=1Pr[p =0]Pr[p =0.02]Pr[p =0.04]Pr[p =0.06]
=10.2540.3530.2400.107=0.046
Sincethislastresultislessthan0.05,andPr[p 0.08]correspondstoPr[X 4],4
ormoredefectiveitemsinasampleof50willbesignificantatthe5%levelof
significance.Thenthesmallestproportionofdefectiveitemsinasampleof50items
whichwillstoptheproductionlinewillbe0.08.
Example10.8
ThisisacontinuationofExample10.7.Nowthetrueprobabilitythatanyone
componentisdefectivehasincreasedto0.045.WhatistheprobabilityofaTypeII
error?
Answer: RememberthataTypeIIerrorisacceptinganullhypothesiswheninfact
thenullhypothesisisincorrect.
ThenPr[TypeIIerror]=Pr[observedlevelofsignificance>5%|H
0
isnottrue]
InthisspecificcasePr[TypeIIerror|p=0.045]=
Pr[fewerthan4defectiveitemsinasampleof50|p=0.045]
Thebinomialdistributionstillapplies,butnow
Pr[X=x]=
50
C
x
(0.045)
x
(0.955)
(50x)
ThenPr[X=0]=(0.955)
50
=0.100
Pr[X=1]=(50)(0.045)
1
(0.955)
49
= 0.236
263
Chapter10
50
Pr[X=2]=
( )(
49
)
(0.045)
2
(0.955)
48
=0.272
2
49 48
Pr[X=3]=
3 2
)
(0.045)
3
(0.955)
47
=0.205
(
50
)( )( )
( )(
ThenPr[TypeIIerror|p=0.045]=Pr[X 3|p=0.045]
=0.100+0.236+0.272+0.205
=0.813
Thus,iftheprobabilityofadefectiveitemhasincreasedto0.045,theprobability
thattheproductionlinewillnotbestoppedforadjustmentis0.813,sothefairodds
aremorethan4to1thattheincreasedlikelihoodofdefectiveswillnotbedetected
byanyonesample.Inalmostallpracticalcaseswewouldrequirealargerprobability
ofdetectingsuchalargeincreaseinthelikelihoodofadefectiveitem,sowewould
probablyneedtoincreasethesamplesize.
Asthesamplesizeincreases,calculationsusingthebinomialdistributiondirectly
becometime-consuming,soanalternativemethodofcalculationbecomesvery
desirable.Thenormalapproximationtothebinomialdistributioncanbeusedifthe
probabilityofadefectiveitemoranitemofanotherspecificclassiscloseenoughto
0.5andthesamplesizeislargeenough.(Seethediscussionoftheroughrulein
section7.6.)Rememberthatifp,theprobabilitythatanysingleitemcomeswithina
particularclass,iscloseto0or1,alargervalueofnporn(1p)willberequired.
Engineersoftenneedconfidenceintervalsforqualitycontrolproblemsinwhichp,
theprobabilityofadefectiveitem,issmallinrelationto1.Inthatcaseverylarge
samplesarerequiredbeforethenormalapproximationprovidessatisfactoryresults.
SeeExample7.8.
(b) CalculationUsingExcel
IfacomputerwithExceloralternativesoftwareisavailable,anotherpossibilityisto
usecomputercalculations.TheuseofthefunctionBINOMDISThasbeendiscussed
insection5.3(f).Itcanbeusedtocalculatetheindividualtermsorthecumulative
distributionfunctionofthebinomialdistribution.Itrequiresfourparameters:the
numberofsuccessesinafixednumberoftrials,thenumberoftrials,theprobabil-
ityofsuccessoneachtrial,andeitherTRUEtodirecttheprogramtocalculate
cumulativeprobabilitiesorFALSEtodirecttheprogramtocalculateindividual
probabilitiesaccordingtothebinomialdistribution.
Example10.9
Electricalcomponentsaremanufacturedcontinuouslyonaproductionline.Extensive
datashowthatwhenallmachinesarecorrectlyadjusted,afraction0.026ofthe
componentsaredefective.However,somesettingstendtovaryasproductioncontin-
264
StatisticalInferencesforVarianceandProportion
ues,sothefractionofdefectivecomponentsmayincrease.Asampleof420compo-
nentsistakenatregularintervals,andthenumberofdefectivecomponentsinthe
sampleiscounted.Iftherearemorethan16defectivecomponentsinthesampleof
420,theproductionlinewillbestoppedandadjustmentswillbemade.
(a) Statethenullhypothesisandalternativehypothesisintermsofp.
(b) Whatistheobservedlevelofsignificanceifthenumberofdefectivecomponents
isjustlargeenoughtostoptheproductionline?
(c) Supposetheprobabilitythatacomponentwillbedefectivehasincreasedto
0.040.ThenwhatistheprobabilityofaTypeIIerror?
Answer:a) H
0
:p=0.026
H
a
:p>0.026(one-tailedtest)
Thebinomialdistributionapplieswithn=420andp=0.026.
b) Theproductionlinewillbestoppedifasampleof420componentscontains
morethan16defectiveitems.Thentheobservedlevelofsignificancewillbethe
probabilityoffindingmorethan16defectiveitemsinasampleofsize420.
MSExcelcanbeusedtofindtheobservedlevelofsignificance.Itwillbe1
minusthecumulativeprobabilityoffinding16orfewerdefectivecomponentsin
asampleofsize420ifthenullhypothesisiscorrect.ThatwillbegivenbyExcel
ifweentertheexpression=1BINOMDIST(16,420,0.026,TRUE),where
BINOMDISTisanExcelfunctiongivingprobabilitiesforthebinomialdistribu-
tion,16isthenumberofdefectiveitems,420isthesamplesize,0.026isthe
probabilitythatanyonecomponentwillbedefective,andTRUEindicatesthat
wewantacumulativeprobability.Thatgivesanobservedlevelofsignificanceof
0.0507or0.051or5.1%.
Thisisamoreaccurateresultthanananswerobtainedusingthenormalapproxi-
mationtothebinomialdistribution.
c) NowwewanttofindtheprobabilityofaTypeIIerrorwhentheprobabilityofa
defectivecomponentonanyonetrialhasincreasedto0.040.Ifweobtain16or
fewerdefectivecomponentsinasampleconsistingof420components,wewill
havenoreasontostoptheproductionline.
Pr[16orfewerdefectivecomponentsinasampleofsize420|p=0.040]willbe
givenbyenteringtheexpression=BINOMDIST(16,420,0.040,TRUE)inExcel.
WefindthattheprobabilityofaTypeIIerroris0.486or48.6%.Theprobability
ofdetectinganincreaseintheproportiondefectivefrom0.026to0.040bythis
schemeofsamplingisnotmuchmorethan50%.Thatsituationisalmostcer-
tainlyunacceptable.WecanreducetheprobabilityofaTypeIIerrorbymaking
thesamplelarger.
265
Chapter10
10.2.3 ConfidenceIntervalforProportion
Unlessthesamplesizeisverysmall,itisnotpracticaltofindconfidenceintervalsfor
proportionbycalculationsofindividualprobabilitiesdirectlyfromthebinomial
distribution.Weneedtouseeitheranormalapproximationoracomputersolution.
AcomputersolutionwithExcel(exceptforrathersmallsamplesizes)involves
usingthefunctionBINOMDISTtoobtaincumulativeprobabilities.Thenthegoal-
seekingalgorithmcanbeusedtofindtheupperlimitorthelowerlimitofthe
appropriateconfidenceintervalfortheproportionp,saytheprobabilitythatanyone
itemwillbedefective.
Example10.10
Mechanicalcomponentsarebeingproducedcontinuously.Aqualitycontrolprogram
forthemechanicalcomponentsrequiresacloseestimateoftheproportiondefective
inproductionwhenallsettingsarecorrect.1020componentsareexaminedunder
theseconditions,and27ofthe1020itemsarefoundtobedefective.
(a) Findapointestimateoftheproportiondefective.
(b) Finda95%two-sidedconfidenceinterval.
(c) Findanupperlimitgiving95%levelofconfidencethatthetrueproportion
defectiveislessthanthislimitingvalue.
UseExcelinparts(b)and(c).
27
Answer: a) Thepointestimateoftheproportiondefectiveisjust

0.0265.
1020
b) Iftheprobabilitydistributionisnotsymmetrical,varioustwo-sidedconfidence
intervalscanbedefined.Wewillusetheconfidenceintervalwithequaltails,that
is,oneinwhichtheprobabilityofavalueabovetheupperlimitisequaltothe
probabilityofavaluebelowthelowerlimit.Forthisproblemthatwouldmean
2.5%probabilitythattheproportiondefectiveisabovetheupperboundaryofthe
confidenceintervaland2.5%probabilitythatitisbelowthelowerlimit.
Theselimitscanbefoundusingthegoal-seekingmethodontheFormulamenu
orToolsmenuofExcel.Attheupperlimitweseekaproportionp
upper
(orp_u)
suchthattheprobabilityoffinding27orfewerdefectiveitemsinasampleof
size1020is2.5%.IntheworksheetshowninTable10.1thefunction
=BINOMDIST(27,1020,p_u,true)wasenteredincell$B$10.Thecell$B$9was
selectedandnamedp_uusingDefineNameontheFormulamenu.Thencell
$B$10wasselected,andfromtheFormulaorToolsmenuGoalSeekwas
chosen.IntheSetCellbox,thereference$B$10appeared.IntheToValue
boxthequantity0.025wasentered.IntheByChangingCellboxthenamep_u
wasentered,referringtocell$B$9.ThentheOKbuttonwaschosen.ThenExcel
begananumericalalgorithmtochangethevalueofp_uinsuchawaythatthe
goal,0.025,wasapproachedbythecontentofthecell$B$10.Thegoalcannot
266
StatisticalInferencesforVarianceandProportion
0.06
beattainedexactly:theprocessis
terminatedbythealgorithmwhenthe
0.05
wasenteredincellsB22:B25.In
Limit
CumulativeProbability
0.025
thiscasethevalue0.0383wasfound
tobecorrecttofourdecimalplaces,
18 21 24 27 30 33 36
NumberRejected,x
Figure10.7:
orthreesignificantfigures.The
BinomialDistributionatUpperLimitof
binomialdistributionforthis
95%ConfidenceInterval,p
upper
=0.0383
situationisshowninFigure10.7.
Similarly,atthelowerconfidencelimitweseekaproportionp_lsuchthatthe
probabilityoffinding27ormoredefectiveitemsinasampleofsize1020is2.5.%.
Buttheavailablefunctionfindsacumulativeprobabilitythatthenumberofdefective
itemswillbelessthan,orequalto,alimitingnumber.Thatlimitingnumbermust
nowbe26ratherthan27becausethebinomialdistributionisdiscrete;Pr[R27]=
1Pr[R26].ThebinomialdistributionforthisrelationshipisshowninFigure10.8.
Thefunction
0.1
Limit
CumulativeProbability
0.025
contentofthatcellcomeswithina
presetdifferencefromthegoal.Inthe
P
r
o
b
a
b
i
l
i
t
y

[
x

r
e
j
e
c
t
e
d
]
0.04
presentexamplethefinalcontentof
cell$B$10was0.0244whenthe
0.03
valueofp_uwas0.0383.The
accuracyoftheupperconfidence
0.02
limitwascheckedbyenteringvalues
closetothegivenquantityincells
0.01
A22:A25.Thearrayfunction
=BINOMDIST(26,1020,A22:A25,true)
0
BINOMDIST(26,1020,p_l,true)was
enteredincell$B$15.Thecell$B$14
P
r
o
b
a
b
i
l
i
t
y

[
x

r
e
j
e
c
t
e
d
]
wasdefinedasp_l.ThenGoalSeek 0.075
waschosen.Thereference$B$15was
placedintheSetCellbox,andthe
quantity0.975wasenteredintheTo
0.05
Valuebox.Thenamep_l,whichrefers
thentocell$B$14,wasenteredinthe
0.025
ByChangingCellbox.TheOK
buttonwaschosentostartthealgorithm
ofchangingthecontentofcell$B$14
sothatthecontentofcell$B$15
0
18 21 24 27 30 33 36
approachedthegoalof0.975.Thefinal
contentofcell$B$15was0.9749when
NumberRejected,x
thecontentofcell$B$14was0.0175.
Figure10.8:
Checkingindicatedthatthisgavea
BinomialDistributionatLowerLimitof
correctanswertofourdecimalplaces. 95%ConfidenceInterval,p
lower
=0.0175
267
Chapter10
Thenthe95%two-sidedconfidenceintervalisfrom0.0175to0.0383.
TheworksheetisshowninTable10.1.
Table10.1:WorkSheetforExample10.10
A B C D
1
2
3
4
5
6
7
ConfidenceIntervalforProportionFormulaMenu:GoalSeek
SampleSize=n 1020
Numberrejected=x 27
PointEstimate,p_hat=x/n 0.02647059
1p_hat= 0.97352941
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Pr[R<=27|p=p_u]->0.025SetcellB10tovalue0.025bychangingB9
Upperboundaryofinterval,p_u= 0.0383459
Pr[R<=27|p=p_u] 0.02440541
[Binomdist(27,1020,p_u,true)=]
Pr[R>=27|p=p_l]->0.025 or Pr[R<=26|p=p_l]->0.975 SetcellB15by
Lowerboundaryofinterval,p_l= 0.01752218 changingB14
Pr[R<=26|p=p_l] 0.97489228
[Binomdist(26,1020,p_l,true)=]
Then95%confidenceintervalseemstobe
from 0.0175 to 0.0383
Check Upper Confidence Limit: CumProb
0.038 0.02771796 Binomdist(27,1020,A22:A25,true)
0.039 0.01908482
0.0382 0.02575748
0.0383 0.02482385
Check Lower Confidence Limit: CumProb
0.017 0.98196003 Binomdist(26,1020,A27:A30,true)
0.018 0.96669979
0.0176 0.97367698
268
StatisticalInferencesforVarianceandProportion
30
31
32
0.0175 0.97523058
Thenthetruelimitsarefrom0.0175to0.0383.
c) Aone-sidedconfidenceintervalcorrespondingtoPr[0<P p
upper,2
]canbe
foundinthesamewayastheupperlimitforpart(b)wasfound.Thatgivesa
95%one-sidedconfidenceintervalof0to0.0363.
10.2.4 Extension
(a) ComparisonofTwoSampleProportions
Indiscussinghypothesistestingforproportioninsection10.2.2wehaveassumed
thatweknowwithoutappreciableerrortheproportionofthedefectivecomponents
whenallmachinesarecorrectlyadjusted.Thiswouldrequireaverylargesample,
whichisoftennotavailable.Inmanycaseswemusttakeintoaccountboththe
variancewhenalladjustmentsarecorrectandthevarianceinthecasebeingtested.
Thevarianceofthesamplemeanproportionatcorrectadjustmentmustbeaddedto
thevarianceofthesamplemeanproportionbeingtested,givingthevarianceofthe
difference.Theonlysimplecalculationavailableinsuchacaseinvolvesanormal
approximationtothebinomialdistribution.
(b) SampleSizeforRequiredLevelofConfidence
Similartothewaysamplesizestoreducestandarderrorsofthemeantorequired
valueswerefoundinExamples7.3and7.4,wecanfindatleastapproximatelythe
samplesizeneededtogivearequiredlevelofconfidencethataproportioniswithin
statedlimits.Thiscanbefoundeitherbygoal-seekingwithExcelorbyusinga
normalapproximationtothebinomialdistribution.Fortheseweneedanassumed
valueofp,theprobabilityofsuccess,sosatisfactoryresultsrequireacloseestimate
ofp.Thatestimateisoftenobtainedfromapreliminarysampleofthepopulation.
Closing Comment
Tomakeconfidenceintervalsforproportionreasonablysmalloftenrequireslarge
samplesizes,particularlyforsmallproportionssuchasproportiondefective.Ifthe
propertythatmakestheitemsdefectivecanbemeasuredfairlyprecisely,itwill
usuallybemoresatisfactorytobasequalitycontrolonthatmeasurementratherthan
ontheproportiondefective.
Ontheotherhand,proportiondefectiveisoftenquotedassomeindicationofquality.
Ifthatisdone,thereshouldbesomeindicationoftheconfidencelimitsforpropor-
tiontoseehowreliablethisindicationis.
269
Chapter10
Problems
1. Aproductionlineisproducingelectricalcomponents.Undernormalconditions
2.4%ofthecomponentsaredefective.Tomonitorproduction,asampleof18
componentsistakeneachhour.Ifthenumberofdefectivesbecomestoohigh,the
productionlineisstoppedandadjustmentsaremade.Whatdistributionappliesto
thenumberofdefectivesinasample?Writedownspecificallythenullhypothesis
andthealternativehypothesis.For1%levelofsignificance,whatisthesmallest
numberofdefectivesinthesamplewhichshouldshutdowntheproductionline?
2. Inproblem1theprobabilitythatanyonecomponentwillbedefectivehas
increasedto6.3%.NowwhatistheprobabilityofaTypeIIerror?
3. Whenaproductionlineisproperlyadjusted,itisfoundthat4%ofthemechani-
calcomponentsproducedaredefective.Occasionallysettingsgooutof
adjustment,andmoredefectivesareproduced.Asampleof12componentsis
examinedandthenumberofdefectivesiscounted.Whatdistributionappliesto
thenumberofdefectives?Whatarethenullhypothesisandthealternative
hypothesis?Ifthelevelofsignificanceissetatl%,howmanydefectivescanbe
allowedinthesamplebeforeanyactionistaken?
4. Inproblem3adjustmentshavegonebadlywrongsothat7.5%ofthecompo-
nentsaredefective.NowwhatistheprobabilityofaTypeIIerror?
5. Acontinuousproductionlineisproducingelectricalcomponents.Whenall
adjustmentsarecorrect,3.2%ofthecomponentsfromthelinearedefective.A
sampleof480componentsistakeneveryfewhours,andthenumberof
defectivesiscounted.Iftherearemorethan21defectivesinthesample,exten-
siveadjustmentswillbemade.Usethenormalapproximationtothebinomial
distribution,rememberingthecorrectionforcontinuity.
a) Statethenullhypothesisandalternativehypothesis.
b) Whatistheobservedlevelofsignificanceiftherearejustmorethan21
defectivesinthesample?
c) Iftheprobabilitythatacomponentwillbedefectivehasincreasedto6.0%,
whatistheprobabilityofaTypeIIerror?
6. Mechanicalcomponentsarebeingproducedcontinuously.Whenalladjustments
arecorrect,3.0%ofthecomponentsfromtheproductionlinearedefective.A
sampleof500componentsistakenatregularintervals,andthenumberof
defectivesiscounted.Ifthenumberofdefectivesislargeenoughtobesignifi-
cantatthe5%levelofsignificance,theproductionlinewillbeshutdownfor
adjustment.Usethenormalapproximationtothebinomialdistribution.
a) StatethenullhypothesisandAlternativeHypothesis.
b) Whatistheminimumnumberofdefectivesinasamplewhichwillresultina
shut-down?
c) Iftheprobabilitythatacomponentwillbedefectivehasincreasedto0.060,
whatistheprobabilityofaTypeIIerror?
270
StatisticalInferencesforVarianceandProportion
ComputerProblems
C7.Acontinuousproductionlineisproducingelectricalcomponents.Whenall
adjustmentsarecorrect,3.2%ofthecomponentsfromthelinearedefective.A
sampleof480componentsistakeneveryfourhours,andthenumberofdefectivesis
counted.Iftherearemorethan21defectivesinthesample,extensiveadjustments
willbemade.
UseExcel.Thisisthesameproblemasnumber5,exceptthatthatproblemwasdone
usinganormalapproximation.
a) Statethenullhypothesisandalternativehypothesis.
b) Whatistheobservedlevelofsignificanceifthereare22defectivesinthe
sample?
c) Iftheprobabilitythatacomponentwillbedefectivehasincreasedto6.0%,what
istheprobabilityofaTypeIIerror?
C8.Mechanicalcomponentsarebeingproducedcontinuously.Whenalladjustments
arecorrect,3.0%ofthecomponentsfromtheproductionlinearedefective.Asample
of500componentsistakenatregularintervals,andthenumberofdefectivesis
counted.Ifthenumberofdefectivesislargeenoughtobesignificantatthe5%level
ofsignificance,theproductionlinewillbeshutdownforadjustment.
UseExcel.Thisisthesameproblemasnumber6,exceptthatthatproblemwasdone
usingthenormalapproximationtothebinomialdistribution.
a) Statethenullhypothesisandalternativehypothesis.
b) Whatistheminimumnumberofdefectivesinasamplewhichwillresultina
shut-down?
c) Iftheprobabilitythatacomponentwillbedefectivehasincreasedto0.060,what
istheprobabilityofaTypeIIerror?
C9.Mechanicalcomponentsarebeingproducedcontinuously.Whenallsettingsare
correctandcheckedfrequently,asampleof1800componentscontains44items
whicharerejected.
a) Findapointestimateoftheproportionofthecomponentswhicharerejected.
b) Findthetwo-sided90%confidenceintervalwithequalprobabilityinthetwo
tails.
271
CHAPTER
11
IntroductiontoDesign
ofExperiments
Thischapterislargelyindependentofpreviouschapters,
althoughsomepreviousvocabularyisusedhere.
Professionalengineersinindustryorinresearchpositionsareveryfrequentlyrespon-
siblefordevisingexperimentstoanswerpracticalproblems.Therearemanypitfalls
inthedesignofexperiments,andontheotherhandtherearewell-triedmethods
whichcanbeusedtoplanexperimentsthatwillgivetheengineerthemaximum
informationandoftenmorereliableinformationforaparticularamountofeffort.
Thus,weneedtoconsidersomeofthemoreimportantfactorsinvolvedinthedesign
ofexperiments.Completediscussionofdesignofexperimentswillbebeyondthe
scopeofthisbook,sothecontentsofthischapterwillbeintroductoryinnature.
Wehaveseeninsection9.2.4thatmoreinformationcanbegainedinsomecases
bydesigningexperimentstousethepairedt-testratherthantheunpairedt-test.In
manyothercasesthereisasimilaradvantageindesigningexperimentscarefully.
Therearecomplicationsinmanyexperimentsinindustry(andalsoinmany
researchprograms)thatarenotfoundinmostundergraduateengineeringlaborato-
ries.First,severaldifferentfactorsmaybepresentandmayaffecttheresultsofthe
experimentsbutarenotreadilycontrolled.Itmaybethatsomefactorsaffectthe
resultsbutarenotofprimeinterest:theyareinterferingfactors,orlurkingfactors.
Oftentheseinterferingfactorscannotbecontrolledatall,orperhapstheycanbe
controlledonlyatconsiderableexpense.Veryfrequently,notallthefactorsact
independentlyofoneanother.Thatis,someofthefactorsinteractinthesensethata
highervalueofonefactormakestheresultseithermoreorlesssensitivetoanother
factor.Wehavetoconsiderthesecomplicatingfactorsinplanningthesetofexperi-
ments.
Thereareseveralexpressionsthatarekeytounderstandingthedesignofexperi-
ments.Amongthemostimportantare:
FactorialDesign
Interaction
Replication
Randomization
Blocking
272
IntroductiontoDesignofExperiments
Wewillseethemeaningofthesekeywordsandbegintoseehowtousetheminlater
sections.
11.1 Experimentationvs.UseofRoutineOperatingData
Ratherthandesignaspecialexperimenttoanswerquestionsconcerningtheeffectsof
varyingoperatingconditions,someengineerschoosetochangeoperatingconditions
entirelyonthebasisofroutinedatarecordedduringnormaloperations.Routine
productiondataoftenprovideusefulcluestodesirablechangesinoperatingcondi-
tions,butthosecluesareusuallyambiguous.Thatisbecauseinnormaloperation
oftenmorethanonegoverningfactorischanging,andnotinanyplannedpattern.
Oftensomechangesinoperatingconditionsareneededtoadjustforchangesin
inputsorconditionsbeyondtheoperatorscontrol.Somefactorsmaychangeuncon-
trollably.Theoperatormayormaynotknowhowtheyarechanging.Ifheorshe
knowswhatfactorshavechanged,itmaybeextremelydesirabletomakecompensat-
ingchangesinothervariables.Forexample,thecompositionofmaterialfedtoaunit
maychangebecauseofmodificationsinoperationsinupstreamunitsorbecauseof
changesinthefeedtotheentireplant.Thecompositionofcrudeoiltoarefinery
oftenchanges,forinstance,withincreaseordecreaseofratesofflowfromthe
individualfieldsorwells,whichgivepetroleumofdifferentcompositions.Insome
casesconsiderabletimeisrequiredbeforesteady,reliabledataareobtainedafterany
changeinoperatingconditions,soanotherchangemaybemade,consciouslyor
unconsciously,beforethefulleffectsofthefirstchangearefelt.
Ifmorethanonefactorchangesduringroutineoperation,itbecomesverydiffi-
culttosaywhetherthechangesinresultsareduetoonefactorortoanother,or
perhapstosomecombinationofthetwo.Twoormorefactorsmaychangeinsucha
wayastoreinforceoneanotherorcanceloneanotherout.Theresultsbecome
ambiguous.Ingeneral,itismuchbettertouseplannedexperimentsinwhichchanges
arechosencarefully.
Anexceptiontothisstatementiswhenallthefactorsaffectingaresultandthe
mathematicalformofthefunctionarewellknownwithoutquestion,themathemati-
calformsofthedifferentfactorsaredifferentfromoneanother,butthevaluesofthe
coefficientsoftherelationsarenotknown.Inthatcase,datafromroutineoperations
cangivesatisfactoryresults.
11.2 ScaleofExperimentation
Experimentsshouldbedoneonassmallascaleaswillgivethedesiredinformation.
Managersinchargeoffull-scaleindustrialproductionunitsarefrequentlyreluctant
toallowanyexperimentationwiththeoperatingconditionsintheirunits.Thisis
becauseexperimentsmightresultinproductionofoff-specificationproducts,orthe
rateofproductionmightbereduced.Eitherofthesemightresultinveryappreciable
financialpenalties.Productionmanagerswillprobablybemorewillingtoperform
273
Chapter11
274
experimentsifconditionsarechangedonlymoderately,especiallyifexperimentscan
bedonewhentheplantisnotoperatingatfullcapacity.Atechniqueofmakinga
seriesofsmallplannedchangesinoperatingconditionsisknownasevolutionary
operation,orEVOP.Thechangesateachstepcanbemadesmallenoughsothat
seriousconsequencesareveryunlikely.Aftereachstep,resultstothatpointare
evaluatedinordertodecidethemostlogicalnextstep.Forfurtherinformationsee
thebookbyBox,Hunter,andHunter,shownintheListofSelectedReferencesin
section15.2.
Sometimesexperimentstogivethedesiredinformationcanbedoneonalabora-
toryscaleatverymoderatecost.Inothercasestheinformationcanbegainedfroma
pilotplantwhichismuchsmallerthanfullindustrialscale,butwithcharacteristics
verysimilartofull-scaleoperation.Instillothercases,thereisnosubstitutefor
experimentsatfullscale,andthecostsarejustifiedbytheimprovedtechniqueof
operation.
11.3 One-factor-at-a-timevs.FactorialDesign
Whatsortofexperimentaldesignshouldbeadopted?Oneapproachistosetup
standardoperatingconditionsforallfactorsandthentovaryconditionsfromthe
standardset,onefactoratatime.Anoptimumvalueofonefactormightbefoundby
tryingtheeffectsofseveralvaluesofthatfactor.Thenattentionwouldshifttoa
differentfactor.Thisisaplanthathasbeenusedfrequently,butingeneralitisnota
goodchoiceatall.
Itwouldbeareasonableplanifallthefactorsoperatedindependentlyofone
another,althougheventhenitwouldnotbeanefficientmethodforobtaininginfor-
mation.Ifthefactorsoperatedindependently,theresultsofchangingtwofactors
Figure11.1:
ProfileswithoutInteraction
Figure11.2:
ProfileswithInteraction
250
200
150
100
50
0
50
100
z
3 1.5 0 1.5 3 4.5 6
x
y=-5
y=0
y=+10
250
200
150
100
50
0
50
100
z
3 2 1 0 1 2 3 4 5 6
x
y=+10
y=-5
y=0
IntroductiontoDesignofExperiments
togetherwouldbejustthesumoftheeffectsofchangingeachfactorseparately.
Figure11.1showsprofilesforsuchasituation.Eachprofilerepresentsthevariation
oftheresponse,z,asafunctionofonefactor,x,ataconstantvalueoftheother
factor,y.Ifthefactorsarecompletelyindependentandsohavenointeraction,the
profilesoftheresponsevariableallhavethesameshape.Theprofilesfordifferent
valuesofydifferfromoneanotheronlybyaconstantquantity,ascanbeseenin
Figure11.1.Inthatcaseitwouldbereasonabletoperformmeasurementsofthe
responseatvariousvaluesofxwithaconstantvalueofy,andthenatvariousvalues
ofywithaconstantvalueofx.Ifweknewthatxandyaffectedtheresponseindepen-
dentlyofoneanother,thatsetofmeasurementswouldgiveacompletedescriptionof
theresponseovertherangesofxandyused.Butthatisaveryuncommonsituation
inpractice.
Veryfrequentlysomeofthefactorsinteract.Thatis,changingfactorAmakesthe
processmoreorlesssensitivetochangeinfactorB.ThisisshowninFigure11.2,
whereaninteractiontermisaddedtothevariablesshowninFigure11.1.Nowthe
profilesdonothavethesameshape,someasurementsarerequiredforvarious
combinationsofthevariables.
Forexample,theeffectofincreasingtemperaturemaybegreaterathigher
pressurethanatlowerpressure.Iftherewerenointeractionthesimplestmathemati-
calmodeloftherelationwouldbe
R
i
=K
0
+K
1
P+K
2
T+
i
(11.1)
where R
i
istheresponse(thedependentvariable)fortesti,
K
0
isaconstant,
Pispressure,
K
1
istheconstantcoefficientofpressure,
Tistemperature,
K
2
istheconstantcoefficientoftemperature,
and
i
istheerrorfortesti.
ThisissimilartotheprofilesofFigure11.1,exceptthatFigure11.1doesnotinclude
errorsofmeasurement.
Wheninteractionispresentthesimplestcorrespondingmathematicalmodel
wouldbe
R
i
=K
0
+K
1
P+K
2
T+K
3
PT+
i
(11.2)
whereK
3
istheconstantcoefficientoftheproductoftemperatureandpressure.Then
thetermK
3
PTrepresentstheinteraction.Inthiscasetheinteractionissecond-order
becauseitinvolvestwoindependentvariables,temperatureandpressure.Ifitin-
volvedthreeindependentvariablesitwouldbeathird-orderinteraction,andsoon.
275
Chapter11
Second-orderinteractionsareverycommon,third-orderinteractionsarelesscom-
mon,andfourthorder(andhigherorder)interactionsaremuchlesscommon.
Consideranexamplefromfluidmechanics.Thedragforceonasolidcylinder
movingthroughafluidsuchasairorwatervarieswiththefactorsinacomplexway.
Undercertainconditionsthedragforceisfoundtobeproportionaltotheproductof
thedensityofthefluidandthesquareoftherelativevelocitybetweenthecylinder
andthefluidfarfromthecylinder,sayF
d
=Ku
2
,omittingtheeffectoferrorsof
measurement.Thisdoesnotcorrespondtoequation11.1.Ifadensityincreaseof
1kgm
3
atarelativevelocityof0.1ms
1
increasedthedragforceby1N,thesame
densityincreaseatarelativevelocityof0.2ms
1
wouldincreasethedragforceby
4N.Theninsuchacasedensityandrelativevelocityinteract.Inthiscasethe
interactingrelationshipcouldbechangedtoanon-interactingrelationshipbya
changeofvariables,takinglogarithmsofthemeasurements,butthereareother
relationshipsinvolvinginteractionwhichcannotbesimplifiedbyanychangeof
variable.
Interactionisfoundveryfrequently,anditspossibilitymustalwaysbeconsid-
ered.However,theone-factor-at-a-timedesignwouldnotgiveusanyprecise
informationabouttheinteraction,andresultsfromthatplanofexperimentation
mightbeextremelymisleading.Inordertodeterminetheeffectsofinteraction,we
mustcomparetheeffectsofincreasingonevariableatdifferentvaluesofasecond
variable.
Whatisanalternativetochangingonefactoratatime?Oftenthebestalternative
istoconducttestsatallpossiblecombinationsoftheoperatingfactors.Letssaywe
decidetodotestsatthreedifferentvalues(levels)ofthefirstfactorandtwodifferent
levelsofthesecondfactor.Thenmeasurementsateachofthethreelevelsofthefirst
factorwouldbedoneateachlevelofthesecondfactor,soat(3)(2)=6different
combinationsoflevelsofthetwofactors.Thisiscalledafactorialdesign.Then
suitablealgebraicmanipulationofthedatacanbeusedtoseparatetheresultsof
changesinthevariousfactorsfromoneanother.Thetechniquesofanalysisof
variance(whichwillbeintroducedinchapter12)andmultilinearregression(which
willbeintroducedinChapter14)areoftenusedtoanalyzethedata.
Nowletuslookatanexampleoffactorialdesign.
Example11.1
Figure11.3showsacasewherethreefactorsareimportant:temperature,pressure,
andflowrate.Wechoosetooperateeachoneattwodifferentlevels,alowlevelanda
highlevel.Thatwillrequire2
3
=8differentexperimentsforacompletefactorial
design.Ifthenumberoffactorsincreases,therequirednumberofrunsgoesup
exponentially.
276
IntroductiontoDesignofExperiments
*
Pressure
0
.1
c
u
.m
/s

* *
* *
*
*
1

a
t
m

a
t
m

Figure11.3:FactorialDesign
*
20C 30C
Temperature
0
.0
5
c
u
.m
/s
F
lo
w

Forthethreefactors,eachattwolevels,measurementswouldbetakenatthe
followingconditions:
Pressure Temperature FlowRate
1. 1atm 20C 0.05 m
3
s
1
2. 1atm 20C 0.1 m
3
s
1
3. 1atm 30C 0.05 m
3
s
1
4. 1atm 30C 0.1 m
3
s
1
5. 2atm 20C 0.05 m
3
s
1
6. 2atm 20C 0.1 m
3
s
1
7. 2atm 30C 0.05 m
3
s
1
8. 2atm 30C 0.1 m
3
s
1
EachoftheseconditionsismarkedbyanasteriskinFigure11.3.
Apossiblesetofresults(foroneflowrate)isillustratedinFigure11.4.The
interactionbetweentemperatureandpressureisshownbytheresultthatincreasing
pressureincreasestheresponseconsiderablymoreatthehighertemperaturethanat
thelowertemperature.
Interaction
FlowRate
200
=0.05cu.m/s
150
Response
100
50
1atm
2atm
0
20C Pressure
30C
Temperature
Figure11.4:AnIllustrationofInteraction
277
Chapter11
Intheearlystagesofindustrialexperimentationitisusuallybesttochooseonly
twolevelsforeachfactorvariedinthefactorialdesign.Onthebasisofresultsfrom
thefirstsetofexperiments,furtherexperimentscanbedesignedlogicallyandmay
wellinvolvemorethantwolevelsforsomefactors.Ifacompletefactorialdesignis
used,experimentswouldbedoneateverypossiblecombinationofthechosenlevels
foreachfactor.
Ingeneralweshouldnottrytolayoutthewholeprogramofexperimentsatthe
verybeginning.Theknowledgegainedinearlytrialsshouldbeusedindesigning
latertrials.Atthebeginningwemaynotknowtherangesofvariablesthatwillbeof
chiefinterest.Furthermore,beforewecandecidelogicallyhowmanyrepetitionsor
replicationsofameasurementareneeded,werequiresomeinformationaboutthe
variancecorrespondingtoerrorsofmeasurement,andthatwilloftennotbeavailable
untildataareobtainedfrompreliminaryexperiments.Earlyobjectivesofthestudy
maybemodifiedinthelightoflaterresults.Insummary,theexperimentaldesign
shouldusuallybesequentialorevolutionaryinnature.
Insomecasesthenumberofexperimentsrequiredforacompletefactorialdesign
maynotbepracticalordesirable.Thensomeotherdesign,suchasafractional
factorialdesign(tobediscussedbrieflyinsection11.6),maybeagoodchoice.
Thesealternativedesignsdonotgiveasmuchinformationasthecorrespondingfull
factorialdesign,socareisrequiredinconsideringtherelativeadvantagesanddisad-
vantages.Forexample,theresultsofaparticularalternativedesignmayindicate
eitherthatcertainfactorsoftheexperimenthaveimportanteffectsontheresults,or
elsethatcertaininteractionsamongthefactorsareofmajorimportance.Which
explanationappliesmaynotbeatallclear.Insomeinstancesoneofthepossible
explanationsisveryunlikely,sotheotherexplanationistheonlyreasonableone.
Thenthealternativedesignwouldbealogicalchoice.Butassumptionsalwaysneed
toberecognizedandanalyzed.Neveradoptanalternativeexperimentaldesign
withoutexaminingtheassumptionsonwhichitisbased.
Everythingweknowaboutaprocessorthetheorybehinditshouldbeused,both
inplanningtheexperimentandinevaluatingtheresults.Theresultsofprevious
experiments,whetheratbenchscale,pilotscale,orindustrialscale,shouldbe
carefullyconsidered.Theoreticalknowledgeandpreviousexperiencemustbetaken
intoaccount.Atthesametime,thepossibilityofeffectsthathavenotbeenencoun-
teredorconsideredbeforemustnotbeneglected.
Thesearesomeofthebasicconsiderations,butseveralotherfactorsmustbekept
inmind,particularlyreplicationoftrialsandstrategiestopreventbiasduetointerfer-
ingfactors.
278
IntroductiontoDesignofExperiments
11.4 Replication
Replicationmeansdoingeachtrialmorethanonceatthesamesetofconditions.In
somepreliminaryexploratoryexperimentseachexperimentisdonejusttwice(two
replications).Thisgivesonlyaveryroughindicationofhowreproducibletheresults
areforeachsetofconditions,butitallowsalargenumberoffactorstobeinvesti-
gatedfairlyquicklyandeconomically.Wewillseelaterthatinsomecasesno
replicationisused,particularlyinsometypesofpreliminaryexperiments.
Usuallysome(perhapsmany)ofthefactorsstudiedinpreliminaryexperiments
willhavenegligibleeffectandsocanbeeliminatedfromfurthertests.Aswezoomin
onthefactorsofgreatestimportance,largernumbersofreplicationsareoftenused.
Besidesgivingbetterestimatesofreproducibility,furtherreplicationreducesthe
standarderrorofthemeanandtendstomakethedistributionofmeansclosertothe
normaldistribution.Wehavealreadyseeninsection8.3thatthemeanofrepeated
resultsforthesameconditionhasastandarddeviationthatbecomessmallerasthe
samplesize(ornumberofreplications)increases.Thus,largernumbersofreplica-
tionsgivemorereliableresults.Furthermore,wehaveseeninsection8.4thatasthe
samplesizeincreases,thedistributionofsamplemeanscomesclosertothenormal
distribution.ThisstemsfromtheCentralLimitTheorem,anditjustifiesuseofsuch
testsasthet-testandtheF-test.
11.5 BiasDuetoInterferingFactors
Veryfrequentlyinindustrialexperimentsaninterferingfactorispresentthatwillbias
theresults,givingsystematicerrorunlesswetakesuitableprecautions.Suchinterfer-
ingfactorsaresometimescalledlurkingvariablesbecausetheycansuddenly
assaulttheconclusionsoftheunwaryexperimenter.Weareoftenunawareofinterfer-
ingfactors,andtheyarepresentmoreoftenthanwemaysuspect.
(a) SomeExamplesofInterferingFactors
Wewillconsiderseveralexamples.First,thetemperatureofthesurroundingairmay
affectthetemperatureofameasurement,andsotheresultsofthatmeasurement.This
isparticularlysoifmeasurementsareperformedoutdoors.Variationsofairtempera-
turebetweensummerandwinteraresogreatthattheyareunlikelytobeneglected,
buttemperaturevariationsfromdaytodayorfromhourtohourmaybeoverlooked.
Atmospherictemperaturehassometendencytopersist:iftheoutsideairtemperature
isaboveaveragetoday,theairtemperaturetomorrowisalsolikelytobeabove
average.Butatsomepointtheweatherpatternchanges,soairtemperaturemaybe
higherthanaveragetodayandtomorrow,butbelowaverageaweekfromtoday.Thus,
takingsomemeasurementstodayandothersofthesametypeaweeklatercouldbias
theresults.Ifwedoexperimentswithonesetofconditionstodayandsimilarexperi-
mentswithadifferentsetofconditionsaweeklater,thedifferencesinresultsmayin
factbeduetothechangeinairtemperatureratherthantotheintendeddifferencein
279
Chapter11
conditions.Shorter-termvariationsinairtemperaturecouldalsocausebias,sincethe
temperatureofoutsideairvariesduringtheday.Theremaybeasystematicdifference
betweenresultstakenat9A.M.andresultstakenat1:30P.M.Wehaveusedtempera-
turevariationasanexampleofaninterferingfactor,butofcoursethisfactorcanbe
takenintoaccountbysuitabletemperaturemeasurements.Otherinterferingfactors
arenotsoeasilymeasuredorcontrolled.
Anunknowntracecontaminantmayaffecttheresultsofanexperiment.Ifthe
feedtotheexperimentalequipmentcomesfromalargesurgetankwithcontinual
flowinandoutandgoodmixing,higherthanaverageconcentrationofacontaminant
islikelytopersistoveranintervaloftime.Thismightmeanthathighresultstoday
arelikelytobeassociatedwithhighresultstomorrow,andlowresultstodayare
likelytobeassociatedwithlowresultstomorrow,butthesituationmightbequite
differentaweeklater.Thus,teststodaymayshowasystematicdifference,orbias,
fromtestsaweekfromtoday.
Anotherinstanceinvolvestestsonamachinethatissubjecttowear.Wearonthe
machineoccursslowlyandgradually,sotheeffectofwearmaybemuchthesame
todayastomorrow,butitmaybequitedifferentinamonthstime(oraweekstime,
dependingontherateofwear).Thus,wearmightbeanuncontrolledvariablethat
introducesbias.
(b) PreventingBiasbyRandomization
Oneremedyforsystematicerrorinmeasurementsistomakerandomchoicesofthe
assignmentofmaterialtodifferentexperimentsandoftheorderinwhichexperi-
mentsaredone.Thisensuresthattheinterferingfactorsaffectingtheresultsare,toa
goodapproximation,independentoftheintendedchangesinexperimentalcondi-
tions.Wemaysaythattheinterferingfactorsareaveragedout.Then,thebiasesare
minimizedandusuallybecomenegligible.Therandomchoicesmightbemadeby
flippingacoin,butmoreoftentheyaremadeusingtablesofrandomnumbersor
usingrandomnumbersgeneratedbycomputersoftware.Randomnumberscanbe
obtainedonExcelfromthefunctionRAND,andthatprocedurewillbediscussed
brieflyinsection11.5(c).
Veryoftenwedontknowenoughaboutthefactorsaffectingameasurementtobe
surethatthereisnocorrelationofresultswithtime.Therefore,ifwedonttake
precautions,someoftheintentionalchangesinoperatingfactorsmaybychance
coincidewith(andbecomeconfusedwith)someaccidentalchangesinotheroperat-
ingfactorsoverwhichwemayhavenocontrol.Onlybyrandomizingcanweensure
thatthefactorsaffectingtheresultsarestatisticallyindependentofoneanother.
Randomizationshouldalwaysbeusedifinterferingfactorsmaybepresent.
However,insomecasesrandomizationmaynotbepracticalbecauseofdifficul-
tiesinadjustingconditionsoverawiderangeinareasonabletime.Thensome
alternativeschememayberequired;attheveryleastthepossibilityofbiasshouldbe
280
IntroductiontoDesignofExperiments
recognizedclearlyandsomeschemeshouldbedevelopedtominimizetheeffectsof
interferingfactors.Whereverpossible,randomizationmustbeusedtodealwith
possibleinterferingfactors.
Example11.1(continued)
NowletsaddrandomizationtotheexperimentaldesignbeguninExample11.1.Let
eachtestbedonetwiceinrandomorder.Theorderofperformingtheexperiments
hasbeenrandomizedusingrandomnumbersfromcomputersoftwarewiththe
followingresults:
Order Conditions
1 1atm 30C 0.05m
3
s
1
2 2atm 20C 0.05m
3
s
1
3 1atm 20C 0.05m
3
s
1
4 1atm 20C 0.1m
3
s
1
5 2atm 30C 0.05m
3
s
1
6 1atm 30C 0.1m
3
s
1
7 2atm 20C 0.1m
3
s
1
8 2atm 30C 0.1m
3
s
1
9 1atm 20C 0.05m
3
s
1
10 2atm 20C 0.05m
3
s
1
11 1atm 30C 0.05m
3
s
1
12 1atm 20C 0.1m
3
s
1
13 2atm 30C 0.05m
3
s
1
14 1atm 30C 0.1m
3
s
1
15 2atm 20C 0.1m
3
s
1
16 2atm 30C 0.1m
3
s
1
Example11.2
Astirredliquid-phasereactorproducesapolymerused(insmallconcentrations)to
increaseratesoffiltration.Apilotplanthasbeenbuilttoinvestigatethisprocess.The
factorsbeingstudiedaretemperature,concentrationofreactantA,concentrationof
reactantB,andstirringrate.Eachfactorwillbestudiedattwolevelsinafactorial
design,andeachcombinationofconditionswillberepeatedtogivetworeplications.
a) Howmanytestswillberequired?
b) Listalltests.
c) Whatorderoftestsshouldbeused?
281
Chapter11
Answer:
a)Numberoftests=(2)(2
4
)=32.
b)ThetestsareshowninTable11.1below:
Table11.1:ListofTestsforExample11.2
Temperature ConcentrationofA ConcentrationofB StirringRate NumberofTests
Low Low Low Low 2
High Low Low Low 2
Low High Low Low 2
Low Low High Low 2
Low Low Low High 2
High High Low Low 2
High Low High Low 2
High Low Low High 2
Low High High Low 2
Low High Low High 2
Low Low High High 2
High High High Low 2
Low High High High 2
High Low High High 2
High High Low High 2
High High High High 2
Total: 32
c) Theorderinwhichtestsareperformedshouldbedeterminedusingrandom
numbersfromatableorcomputersoftware.
Example11.3
Amechanicalengineerhasdecidedtotestanovelheatexchangerinanoilrefinery.
Themajorresultwillbetheamountofheatingproducedinapetroleumstreamwhich
variesincomposition.Testswillbedoneattwocompositionsandthreeflowrates.
Togetsufficientprecisioneachcombinationofcompositionandflowratewillbe
testedfivetimes.Afactorialdesignwillbeused.
a) Howmanytestswillberequired?
b) Listalltests.
c) Howwilltheorderoftestingbedetermined?
282
IntroductiontoDesignofExperiments
Answer:
a) (2)(3)(5)=30testswillberequired.
b) Thetestswillbe:
Composition FlowRate NumberofTests
Low Low 5
Low Middle 5
Low High 5
High Low 5
High Middle 5
High High 5
Total 30
c) Orderoftestingwillbedeterminedbyrandomnumbersfromatableorfrom
computersoftware.
Example11.4
Previousstudiesinapilotplanthavebeenusedtosettheoperatingconditions
(temperatureandpressure)inanindustrialreactor.However,someoftheconditions
inthefull-scaleindustrialequipmentarenotquitethesame,sotheplantengineerhas
decidedtoperformtestsintheindustrialplant.Theplantmanagerisafraidthat
changesinoperatingconditionsmayproduceoff-specificationproduct,soonlysmall
changesinconditionswillbeallowedateachstageofexperimentation.Ifresults
fromthefirststageappearencouraging,furtherstagesofexperimentationcanbe
done.(Thisisaformofevolutionaryoperation,orEVOP.)Asimplefactorialpattern
willbeused:temperaturesettingswillbeincreasedanddecreasedfromnormalby
2C,andpressurewillbeincreasedanddecreasedfromnormalby0.05atm.The
plantengineerhascalculatedthattogetsufficientprecisionwiththesesmallchanges,
eighttestswillberequiredateachsetofconditions.Thenormaltemperatureis125
C,andnormalpressureis1.80atm.
a) Howmanytestswillberequiredinthefirststageofexperimentation?
b) Listthetests.
c) Howwilltheorderoftestingbedetermined?
Answer:
a)(8)(2)(2)=32testswillberequired.
b)Thetestswillbeasfollows:
Temperature Pressure Number0fTests
123C 1.75atm 8
123C 1.85atm 8
127C 1.75atm 8
127C 1.85atm 8
Total 32
283
Chapter11
c) Orderoftestingwillbedeterminedbyrandomnumbersfromatableorfrom
computersoftware.
(c) ObtainingRandomNumbersUsingExcel
Excelcanbeusedtogeneraterandomnumberstorandomizeexperimentaldesigns.
Thefunction=RAND()willreturnarandomnumbergreaterthanorequaltozero
andlessthanone.Weobtainanewnumbereverytimethefunctionisenteredorthe
worksheetisrecalculated.Ifwewantarandomnumberbetween0and10,we
multiplyRAND()by10.Butthatwilloftengiveanumberwithafraction.
Ifwewantaninteger,wecanapplythefunction=INT(number),whichrounds
thenumberdown,notup,tothenearestinteger;e.g.,INT(7.8)equals7.Wecannest
thefunctionsinsideoneanother,soINT(RAND()*10)willgivearandominteger
intheinclusiveintervalfrom0to9.Ifwewantarandomintegerintheinclusive
intervalfrom1to10,wecanuseINT(RAND()*10)+1.Similarly,ifwewanta
randomintegerintheinclusiveintervalfrom1to8,thatwillbegivenbythefunction
INT(RAND()*8)+1,andsoonforotherchoices.Ifwewantawholesequenceof
randomnumberswecanuseanarrayfunction.
Example11.5
Toobtainasequenceofthirtyrandomintegersintheinclusiveintervalfrom1to6,
cellsA1toA30wereselected,andtheformula=INT(RAND()*6)+1wasentered
asanarrayformula.Theresultswereasfollowsinrowform:
2 1 5 6 2 1 2 3 3 2 2 5 6 6 3
1 2 5 1 1 2 2 3 5 5 2 1 2 2 1
(Noticethatthiscouldbeconsideredanumericalsimulationofadiscreterandom
variableinwhichtheintegersfrom1to6inclusiveareequallylikely.)
Nowsupposeweassignthenumbers1to6tosixdifferentengineeringmeasure-
ments.Wewantarandomorderofthesesixmeasurements.TheresultofExample
11.5wouldgiveusthatrandomorderifweuseeachdigitthefirsttimeitappearsand
discardallrepetitions.Ifbychancethethirtyrandomdigitsdonotcontainatleast
fiveofthesixdigitsfrom1to6,wecanrepeatthewholeoperation.Buttheprob-
abilityofthatissmall.
Acomplicationisthatchangingotherpartsoftheworksheetcausestherandom
numbergeneratortore-calculate,givinganewsetofrandomintegers.Toavoidthat,
wecanconvertthecontentsofcellsA1toA30toconstantvalues.(Seereferencesor
theHelpfunctiononExceltoseehowtodothat.)
Example11.5(continued)
EverytimeacellincolumnAcontainedarepetitionofoneoftheintegersfrom1to
6,anxwasplacedinthecorrespondingrowofcolumnBuntilalltheintegersinthe
requiredintervalhadappearedorthelistofintegerswasexhausted.Thenthe
284
IntroductiontoDesignofExperiments
unrepeatedintegers(inorder)wereenteredintocolumnC.Theresults(asarow
insteadofacolumn)wereasfollows:
2 1 5 6 3 [4]
Noticethat,asithappened,4didnotappearamongthethirtyintegers.However,
since4istheonlymissinginteger,itmustbethelastofthesix.
This,then,wouldgivearandomorderofperformingthesixengineeringmea-
surements.
(d) PreventingBiasbyBlocking
Blockingmeansdividingthecompleteexperimentintogroupsorblocksinwhichthe
variousinterferingfactors(especiallyuncontrolledfactors)canbeassumedtobe
morehomogeneous.Comparisonsaremadeusingthevariousfactorsinvolvedin
eachblock.Blockingisusedto increase the precisionofanexperimentbyreducing
theeffectofinterferingfactors.Resultsfromblockexperimentsareapplicabletoa
widerrangeofconditionsthanifexperimentswerelimitedtoasingleuniformsetof
conditions.Forexample,techniciansmayperformtestsinsomewhatdifferentways,
sowemightwanttoremovethedifferencesduetodifferenttechniciansinexploring
theeffectofusingrawmaterialfromdifferentsources.
Apairedt-testisanexampleofanexperimentaldesignusingblocking.Insection
9.2.4weexaminedthecomparisonofsamplesusingpaireddata.Twodifferent
treatments(evaporatorpans)wereinvestigatedoverseveraldays.Thenthedaywas
theblockingfactor.Therandomizationwasoftherelativepositionsofthepans.The
measuredevaporationwastheresponse.Thedifferencebetweendailyamountsof
evaporationfromthepairedpanswastakenasthevariateinordertoeliminatethe
effectsofday-to-dayvariationinatmosphericconditions.Example9.10illustrated
thisprocedure.
Thepairedt-testinvolvedtwofactorsortreatments,butblockingcanbeextended
toincludemorethantwotreatments.Randomizationisusedtoprotectusfrom
unknownsourcesofbiasbyperformingtreatmentsinrandomorderwithineach
block.(Noticethatrandomizationisstillrequiredwithintheblock.)
Ablockdesigndoesnotgiveasmuchinformationasacompletefactorialdesign,
butitgenerallyrequiresfewertests,andtheextrainformationfromthefactorial
designmaynotbedesired.Fewertestsarerequiredbecausewedonotusuallyrepeat
testswithinblocksforthesameexperimentalconditions.Errorisestimatedfrom
variationswhichareleftaftervariancebetweenblocksandvariancebetweenexperi-
mentaltreatmentshavebeenremoved.Thisassumesthattheaverageeffectof
differenttreatmentsisthesameinalltheblocks.Inotherwords,weassumethereis
nointeractionbetweentheeffectsoftreatmentsandblocks.Caution:ifthereisany
reasontothinkthattreatmentsandblocksmaynotbeindependentandsomay
interact,weshouldincludeadequatereplicationsothatthatinteractioncanbe
285
Chapter11
checked.Ifinteractionbetweentreatmentandblockispresentbutnotdetermined,
therandomizedblockingdesignwillresultinaninflationoftheerrorestimate,sothe
testforsignificanteffectsbecomeslesssensitive.
Blockingshouldbeusedwhenwewishtoeliminatethedistortioncausedbyan
interferingvariablebutarenotveryinterestedindeterminingtheeffectsofthat
variable.Iftwofactorsareofcomparableinterest,adesignblockingoutoneofthe
twofactorsshouldnotbeused.Inthatcase,weshouldgotoacompletefactorial
design.
Whyisblockingrequiredifrandomizationisused?Ifsomefactorishavingan
effecteventhoughwemaynotknowit,randomizingwilltendtopreventusfrom
comingtoincorrectconclusions.However,theinterferingeffectwillstillincreasethe
standarddeviationorvarianceduetoerror.Thatis,thevarianceduetoexperimental
errorwillbeinflatedbythevarianceduetotheinterferingfactor;inotherwords,the
randomizedinterferencewilladdtotherandomnoiseofthemeasurements.Ifthe
errorvarianceislarger,wearelesslikelytoconcludethattheeffectofanexperimen-
talvariableisstatisticallysignificant.Inthatcase,wearelesslikelytobeableto
cometoadefiniteconclusion.(Thisisessentiallythesameastheeffectofalarger
standarddeviationinthet-test:ifthestandarddeviationislarger,twillbesmaller,so
thedifferenceisnotsolikelytobesignificant.SeethecomparisonofExamples9.9
and9.10insections9.2.3and9.2.4.)Ifweuseblocking,wecanbothavoidincorrect
conclusionsandincreasetheprobabilityofcomingtoresultsthatarestatistically
significant.Therefore,thepriorityistoblockalltheinterferingfactorswecan(so
longasinteractionisnotappreciable),thentorandomizeinordertominimizethe
effectsoffactorswecantblock.
Example11.6
Example9.10wasforapairedt-test.Theevaporationfromtwotypesofevaporation
pansplacedside-by-sidewascomparedovertendays.Anydifferenceduetorelative
positionwasaveragedoutbyrandomizingtheplacementofthepans.
Nowwewishtocomparethreeevaporatorpans:A,B,andC.Thepansareplaced
side-by-sideagain,andtheirrelativepositionsaredecidedrandomlyusingrandom
numbers.Weknowthatevaporationfromthepanswillvaryfromonedaytoanother
withchangingweatherconditions,butthatvariationisnotofprimeinterestinthe
currenttest.Thedaybecomestheblockingvariable.Theresultingorderisshownin
termsofA,B,andCfortherelativepositionsoftheevaporatorpans,and1,2,3,4,
5,6,7,8,9,10fortheday.Thentheorderinwhichtestsaredonemightbe1:BCA,
2:BAC,3:CBA,4:ACB,5:CAB,6:ABC,7:CBA,8:ACB,9:ABC,10:CAB.
Example11.7
LetusmodifyExample11.1byaddingblockingforeffectsassociatedwithtime.The
principalfactorsarestilltemperature,pressure,andflow,butwesuspectthatsome
286
IntroductiontoDesignofExperiments
interferingfactorsmayvaryfromonedaytoanotherbutareveryunlikelytointeract
appreciablywiththeprincipalfactors.Afterconsideringthepossibleinterfering
factorsandtheirtimescales,wedecidethatvariationswithinaneight-hourshiftare
likelytobenegligible,butvariationsbetweenTuesdayandFridaymaywellbe
appreciable.Wecandoeighttrialsondayshifteachday.Thentheeighttrialson
Tuesdaywillbeoneblock,andtheeighttrialsonFridaywillbeanotherblock.The
orderofperformingtestsoneachdaywillberandomized,againusingrandom
numbersfromcomputersoftware.
Answer:
TheordersoftrialsforTuesdayandforWednesdayareshownbelow.
OrderforTuesday Conditions
1 1atm 30C 0.1m
3
s
1
2 2atm 30C 0.1m
3
s
1
3 1atm 30C 0.05m
3
s
1
4 1atm 20C 0.05m
3
s
1
5 2atm 20C 0.05m
3
s
1
6 2atm 30C 0.05m
3
s
1
7 2atm 20C 0.1m
3
s
1
8 1atm 20C 0.1m
3
s
-1
OrderforFriday Conditions
1 1atm 30C 0.1m
3
s
1
2 1atm 30C 0.05m
3
s
1
3 1atm 20C 0.1m
3
s
1
4 1atm 20C 0.05m
3
s
1
5 2atm 30C 0.1m
3
s
1
6 2atm 20C 0.05m
3
s
1
7 2atm 30C 0.05m
3
s
1
8 2atm 20C 0.1m
3
s
1
Weshouldnotetwopoints.First,thereisnoreplication,soerrorisestimated
fromresidualsleftaftertheeffectsoftemperature,pressure,andflowrate(andtheir
interactions),anddifferencesbetweenblocks,havebeenaccountedfor.(Ifthereis
anyinteractionbetweenthemaineffects(temperature,pressure,flow)andthe
blockingvariable(timeofrun),thatwillinflatetheerrorestimate.)Second,ifa
completefour-factorexperimentaldesignhadbeenusedwithtworeplications,the
requirednumberoftestswouldhavebeen(2)(2
4
)=32,whereastheblockdesign
requires16tests.
287
Chapter11
Variationsofblockingareusedforspecificsituations.Ifthenumberoffactors
beingexaminedistoolarge,notallofthemcanbeincludedineachblock.Thena
plancalledabalancedincompleteblockdesignwouldbeused.Ifmorethanone
interferingfactorneedstobeblocked,thenplanscalledLatinsquaresandGraeco-
Latinsquareswouldbeconsidered.
Thedesignofexperimentsthatincludeblockingisdiscussedinmoredetailin
booksbyBox,Hunter,andHunterandbyMontgomery(forreferencesseesection
15.2).Thereareconsiderationsandpitfallsnotdiscussedhere.
Example11.8
Acivilengineerisplanninganexperimenttocomparethelevelsofbiological
oxygendemand(B.O.D.)atthreedifferentpointsinariver.Thesearejustupstream
ofasewageplant,fivekilometersdownstream,andtenkilometersdownstream.
Assumethatineachcasesampleswillbetakeninthemiddleofthestream(in
practicesampleswouldlikelybetakenatseveralpositionsacrossthestreamand
averaged).Onesetofsampleswillbetakenat6a.m.,anotherwillbetakenat2p.m.,
andathirdwillbetakenat10p.m.Thedesignwillblocktheeffectoftimeofday,as
someinterferingfactorsmaybedifferentatdifferenttimes.However,theinteraction
oftheseinterferingfactorswithlocationisexpectedtobenegligible.
a) Ifthereisnoreplicationasidefromblocking,howmanytestsarerequired?
b) Listalltests.
c) Specifywhichsetoftestsconstituteeachblock.
d) Howshouldtheorderoftestsbedetermined?
Answer:
a) Numberoftests=(3)(3)=9.
b) Thetestsare:
6a.m.:justupstream,5kilometersdownstream,and10kilometersdownstream.
2p.m.:justupstream,5kilometersdownstream,and10kilometersdownstream.
10p.m.:justupstream,5kilometersdownstream,and10kilometersdownstream.
c) The6a.m.testsmakeuponeblock,the2p.m.testsmakeupanotherblock,and
the10p.m.testswillmakeupathirdblock.
d) Theorderinwhichtestsareperformedshouldbedeterminedusingrandom
numbersfromatableorcomputersoftware.
11.6 FractionalFactorialDesigns
Asthenumberofdifferentfactorsincreases,thenumberofexperimentsrequiredfor
afullfactorialdesignincreasesexponentially.Evenifwetesteachfactoratonlytwo
levels,withonlyonemeasurementforeachcombinationofconditions,acomplete
factorialdesignfornfactorsrequires2
n
separatemeasurements.Iftherearefive
288
IntroductiontoDesignofExperiments
factors,thatcomestothirty-twoseparatemeasurements.Iftherearesixfactors,
sixty-fourseparatemeasurementsarerequired.
Inmanycasesnearlyasmuchusefulinformationcanbeobtainedbydoingonly
halforperhapsaquarterofthefullfactorialdesign.Certainly,somewhatlessinfor-
mationisobtained,butbycarefuldesignoftheexperimenttheomittedinformation
isnotlikelytobeimportant.Thisisreferredtoasafractionalfactorialdesignora
fractionalreplication.
Fractionaldesignwillbeillustratedforthesimplecaseofthreefactors,eachat
onlytwolevels,withnoreplicationofmeasurements.InExample11.1wesawthe
completefactorialdesignforthissetofconditions.Theasterisk(*)markedinFigure
11.3eachofthe2
3
=8combinationsofconditionsforthefullfactorialdesign.In
Figure11.5,ontheotherhand,onlyhalfofthe2
3
combinationsofconditionsare
markedbyasterisks,andonlythesemeasurementswouldbemadeforthefractional
factorialdesign.
Figure11.5:
FractionalFactorialDesign
Pressure
0
.1
c
u
.m
/s
F
lo
w

*
*
*
*
.
.
.
.
1

a
t
m

a
t
m

20C 30C
Temperature
0
.0
5
c
u
.m
/s

Thus,measurementswouldbemadeatthefollowingconditions:
1. 1atm 20C 0.1m
3
s
-1
2. 2atm 20C 0.05m
3
s
-1
3. 1atm 30C 0.05m
3
s
-1
4. 2atm 30C 0.1m
3
s
-1
Noticethatinthefirstthreesetsofconditionstwofactorsareatthelowervalue
andoneisatthehighervalue,butinthelastsetallthreefactorsareatthehigher
value.Thenhalfofthesetsshoweachfactoratitslowervalue,andhalfshoweach
factoratitshighervalue.Theorderofperformingtheexperimentswouldberandomized.
1
Thishalf-fraction,three-factordesigninvolving
2
(2
3
)=4combinationsof
conditionsisnotreallypracticalbecauseitdoesnotallowustoseparateamain-
factoreffectfromthesecond-orderinteractionoftheothertwoeffects.Forexample,
theeffectofpressureisconfused(orconfounded,asthestatisticianssay)withthe
interactionbetweentemperatureandflowrate,andthesequantitiescannotbesepa-
289
Chapter11
rated.Sincesecond-orderinteractionsarefoundfrequently,thisisnotasatisfactory
situation.
Thehalf-fractiondesignismoreusefulwhenthenumberoffactorsislarger.
Considerthecasewherefivefactorsarebeinginvestigated,sothefullfactorial
designattwolevelswithnoreplicationwouldrequire2
5
=32runs.Ahalf-fraction
1
factorialdesignwouldrequire
2
(2
5
)=16runs.Itwillgiveessentiallythesame
informationasthefullfactorialdesignifeitheroftwoconditionsismet:either(1)at
leastonefactorhasnegligibleeffectonanyoftheresults,sothatitsmaineffectand
allitsinteractionswithothervariablesarenegligible;orelse(2)allthethree-factor
andfour-factorinteractionsarenegligible,sothatonlythemaineffectsandsecond-
orderinteractionsareappreciable.Inexploratorystudieswewishtoseewhichfactors
areimportant,sowewilloftenfindthatoneormorefactorshavenoappreciable
effect.Inthatsituationthefirstconditionwillbesatisfied.Furthermore,theeffectsof
three-factorandhigherorderinteractionsareusuallynegligible,sothatthesecond
conditionwouldfrequentlybesatisfied.Therearesomecasesinwhichthird-order
interactionsproduceappreciableeffects,butthesecasesarenotcommon.Thenif
analysisofthehalf-factorfactorialdesignindicatesthatwecannotneglectanyofthe
factors,furtherinvestigationshouldbedone.
Rememberthatthehalf-fractiondesignrequiresmeasurementsathalfofthe
combinationsofconditionsneededforafullfactorialdesign.Theniftheresultsof
thehalf-fractionexperimentaldesignsareambiguous,oftentheotherhalfofa
completefactorialdesigncanberunlater.Thetwohalvesaretogetherequivalenttoa
fullfactorial,runintwoblocksatdifferenttimes.Thenanalyzingthetwoblocks
togethergivesnearlyasmuchinformationasacompletefactorialdesign,provided
thatinterferingfactorsdonotchangetoomuchintheinterveningtime.
Problems
1. Amechanicalengineerhasdesignedanewelectronicfuelinjector.Heisdevel-
opingaplanfortestingit.Hewilluseafactorialdesigntoinvestigatetheeffects
ofhigh,mediumandlowfuelflow,andhigh,mediumandlowfueltemperature.
Fourtestswillbedoneateachcombinationofconditions.
a) Howmanytestswillberequired?
b) Listthem.
2. Acivilengineerisperformingtestsonascreeningdevicetoremovecoarser
solidsfromstormoverflowofuntreatedsewage.Astreamisdirectedatarotating
collarscreen,7.5feetindiameter,madeofstainlesssteelmesh.Theengineer
intendstotrythreemeshsizes(150mesh,200meshand230mesh),tworota-
tionalspeeds(30and60R.P.M.),threeflowrates(550gpm,900gpmand1450
gpm),andthreetimeintervalsbetweenbackwashes(20,40and60seconds).
a) Howmanytestswillberequiredforacompletefactorialdesignwithout
replication?
290
IntroductiontoDesignofExperiments
b) Listthem.
c) Howwilltheorderoftestsbedetermined?
3. Apilotplantinvestigationisconcernedwiththreevariables.Thesearetempera-
ture(160Cand170C),concentrationofreactant(1.0mol/Land1.5mol/L),
andcatalyst(CatalystAandCatalystB).Theresponsevariableisthepercentage
yieldofthedesiredproduct.Afactorialdesignwillbeused.
a) Ifeachcombinationofvariablesistestedtwice(tworeplications),howmany
testswillberequired?
b) Listthem.
c) Howwilltheorderoftestsbedetermined?
4. Ametalalloywasmodifiedbyaddingsmallamountsofnickeland/ormanga-
nese.Thebreakingstrengthofeachresultingalloywasmeasured.Testswere
performedinthefollowingorder:
1. 1.5%Ni,0%Mn
2. 3%Ni,2%Mn
3. 1.5%Ni,1%Mn
4. 1.5%Ni,0%Mn
5. 0%Ni,1%Mn
6. 3%Ni,1%Mn
7. 0%Ni,2%Mn
8. 1.5%Ni,1%Mn
9. 0%Ni,0%Mn
10. 3%Ni,0%Mn
11. 0%Ni,1%Mn
12. 1.5%Ni,2%Mn
13. 3%Ni,1%Mn
14. 1.5%Ni,2%Mn
15. 0%Ni,0%Mn
16. 3%Ni,2%Mn
17. 3%Ni,0%Mn
18. 0%Ni,2%Mn
a) Howmanyfactorsarethere?Howmanylevelshavebeenusedforeach
factor?Howmanyreplicationshavebeenused(rememberthatthiscanbea
fraction)?
b) Thensummarizetheexperimentaldesign:factorialdesignoralternatives,
characteristics.
c) Verifythatthesecharacteristicswouldresultinthenumberoftestruns
shown.
5. Fourdifferentmethodsofdeterminingtheconcentrationofapollutantinparts
permillionarebeingcompared.Wesuspectthattwotechniciansobtainsome-
whatdifferentresults,soarandomizedblockdesignwillbeused.Each
technicianwillrunallfourmethodsondifferentsamples.Unknowntothe
291
Chapter11
technicians,allsampleswillbetakenfromthesamewellstirredcontainer.All
determinationswillberuninthesamemorning.
a) Howmanydeterminationsarerequired?
b) Listthem.
c) Howwilltheorderofdeterminationsbedecided?
6. Testsarecarriedouttodeterminetheeffectsofvariousfactorsonthepercentage
ofaparticularreactantwhichisreactedinapilot-scalechemicalreactor.The
effectsoffeedrate,agitationrate,temperature,andconcentrationsoftworeac-
tantsaredetermined.Testrunswereperformedinthefollowingorder:
FeedRate AgitationRate Temperature Conc.ofA Conc.ofB
L/m RPM C mol/L mol/L
1. 15 120 150 0.5 1.0
2. 10 100 150 1.0 0.5
3. 15 100 150 1.0 1.0
4. 10 120 150 0.5 0.5
5. 15 120 160 0.5 0.5
6. 10 120 150 1.0 1.0
7. 15 100 160 1.0 0.5
8. 15 120 160 1.0 1.0
9. 10 100 150 0.5 1.0
10. 15 100 160 0.5 1.0
11. 15 100 150 0.5 0.5
12. 10 100 160 0.5 0.5
13. 10 100 160 1.0 1.0
14. 15 120 150 1.0 0.5
15. 10 120 160 0.5 1.0
16. 10 120 160 1.0 0.5
a) Howmanyfactorsarethere?Howmanylevelshavebeenusedforeach
factor?Howmanyreplicationshavebeenused(rememberthatthiscanbea
fraction)?
b) Thensummarizetheexperimentaldesign:factorialdesignoralternatives,
characteristics.
c) Verifythatthesecharacteristicswouldresultinthenumberoftestruns
shown.
292
IntroductiontoDesignofExperiments
ComputerProblems
C7.Aprogramoftestingtheeffectsoftemperatureandpressureonapieceof
equipmentinvolvesatotalofeightruns,twoateachoffourcombinationsoftem-
peratureandpressure.Letuscallthesefourcombinationsofconditionsnumbers1,
2,3,and4.UserandomnumbersfromExceltofindtworandomordersoffourtests
eachinwhichthetestsofconditions1,2,3,and4mightbeconducted.
C8.Anengineerisplanningtestsonaheatexchanger.Sixdifferentcombinationsof
flowratesandfluidcompositionswillbeused,andtheengineerlabelsthemas1,2,
3,4,5,6.Shewilltesteachcombinationofconditionstwice.Userandomnumbers
fromExceltofindarandomorderofperformingthetwelvetests.
C9.Simulatetwosamplesofsizetenfromabinomialdistributionwithn=5and
p=0.12.UsetheAnalysisToolscommandonExcel.Produceanoutputtablewith
tencolumnsandtworows.UsetheFrequencyfunctiontoprepareafrequencytable,
whichmustbelabeledclearly.
C10.UsetheAnalysisToolscommandonExceltosimulatetheresultsofasampling
scheme.Theprobabilityofadefectonanyoneitemis0.07,andeachsamplecon-
tains12items.Simulatetheresultsofthreesamples.UsetheFrequencyfunctionto
prepareafrequencytable,andlabelitclearly.
293
CHAPTER
12
IntroductiontoAnalysis
ofVariance
Thischapterrequiresanunderstandingofthematerialin
sections3.1,3.2,3.4,and10.1.2.
InChapter11wehavelookedbrieflyatsomeoftheprincipalideasandtechniquesof
designingexperimentstosolveindustrialproblems.Oncethedatahavebeenob-
tained,howcanweanalyzethem?
Theanalysisofdatafromdesignedexperimentsisbasedonthemethodsdevel-
opedpreviouslyinthisbook.Thedataaresummarizedasmeansandvariances,and
graphicalpresentationsareused,especiallytochecktheassumptionsofthemethods.
Confidenceintervalsandtestsofhypothesisareusedtoinferresults.Butsome
techniquesbeyondthosedescribedpreviouslyareusuallyneededtocompletethe
analysis.
Thetwomaintechniquesusedinanalysisofdatafromfactorialexperiments,
withandwithoutblocking,aretheanalysisofvarianceandmultiplelinearregression.
Analysisofvariancewillbeintroducedhere.Multiplelinearregressionwillbe
introducedinsection14.6.
Analysisofvariance,orANOVA,isusedwithbothquantitativedataandqualita-
tivedata,suchasdatacategorizingproductsasgoodordefective,lightorheavy,and
soon.Withbothquantitativeandqualitativedata,thefunctionofanalysisofvariance
istofindwhethereachinputhasasignificanteffectonthesystemsresponse.Thus,
analysisofvarianceisoftenusedatanearlystageintheanalysisofquantitativedata.
Multiplelinearregressionisoftenusedtoobtainaquantitativerelationbetweenthe
inputsandtheresponses.Butanalysisofvarianceoftenhasanotherfunction,which
istotesttheresultsofmultiplelinearregressionforsignificance.
WesawinChapter8thatoneofthemostdesirablepropertiesofvarianceisthat
independentestimatesofvariancecanbeaddedtogether.Thisideacanbeextended
toseparatingthequantitiesleadingtovarianceintovariouslogicalcomponents.One
componentcanbeascribedtodifferencesresultingfromvariousmaineffectssuchas
varyingpressureortemperature.Anothercomponentmaybeduetointeractions
betweenmaineffects.Athirdcomponentmaycomefromblocking,whichhasbeen
discussedinsection11.5(d).Afinalcomponentmaycorrespondtoexperimental
error.
294
IntroductiontoAnalysisofVariance
Wehaveseeninsection9.2thatanestimateofvarianceisfoundbydividingthe
sumofsquaresofdeviationsfromameanbythenumberofdegreesoffreedom.We
canpartitionboththetotalsumofsquaresandthetotalnumberofdegreesoffree-
domintocomponentscorrespondingtomaineffects,interactions,perhapsblocking,
andexperimentalerror.Then,foreachofthesecomponentsthesumofsquaresis
dividedbythecorrespondingnumberofdegreesoffreedomtogiveanestimateof
thevariance.Estimatesofvarianceareoftencalledmeansquares.
ThentheF-test,whichwasdiscussedinsection10.1.2,isusedtoexaminethe
variousratiosofvariancestoseewhichratiosarestatisticallysignificant.Isaratioof
variancesconsistentwiththehypothesisthatthetwopopulationvariancesareequal,
sothatdifferencesbetweenthemaredueonlytorandomchancevariations?More
specifically,thenullhypothesistobetestedisusuallythatvariousfactorsmakeno
differencetopopulationmeansfromdifferenttreatmentsordifferentlevelsoftreat-
ment.TheF-testisusedtoseewhetherthenullhypothesiscanbeacceptedatastated
levelofsignificance.
Wewillfindthatcalculationsfortheanalysisofvarianceusingacalculatorinvolve
considerablelabor,especiallyifthenumberofcomponentsinvestigatedisfairlylarge.
Almostalwaysinpractice,therefore,computersoftwareisusedtodothecalculations
moreeasily.Theproblemsinthischaptercanbesolvedusingacalculator.Ifthereader
chooses,heorshecanuseacomputerspreadsheetwithformulasinvolvingbasic
operations.Insomeproblemsthatwillsaveconsiderabletime.However,morecom-
plex,pre-programmedcomputerpackagessuchasSASorSPSSshouldnotbeused
untilthereaderhasthebasicideasfirmlyinmind.Thisisbecausetheseuseblack-
boxfunctionswhichrequirelittlethinkingfromsomeonewhoislearning.
12.1 One-wayAnalysisofVariance
Letusconsiderthesimplestcase,analyzingarandomizedexperimentinwhichonly
onefactorisbeinginvestigated.Twoormorereplicatesareusedforeachseparate
treatmentorleveloftreatment,andtherewillbethreeormoretreatmentsorlevels.
Thenullhypothesiswillbethatalltreatmentsproduceequalresults,sothatall
populationmeansforthevarioustreatmentsareequal.Thealternativehypothesiswill
bethatatleasttwoofthetreatmentmeansarenotequal.
(a) BasicRelations
Saytherearemdifferenttreatmentsorlevelsoftreatment,andforeachofthesethere
arerdifferentobservations,sorreplicatesofeachtreatment.Lety
ik
bethekth
observationfromtheithtreatment.Letthemeanobservationfortreatmentibe y
i
,
andletthemeanofallNobservationsbe y,whereN=(m)(r).Then
r

y
ik
y
k1 (12.1)
i
r
295
Chapter12
and
m r

y m

y
y
i ik
i1

k1 (12.2)
m N
Thetotalsumofsquaresofthedeviationsfromthemeanofalltheobservations,
abbreviatedasSST,is
m r
2
SST

(
y
ik
y
)
(12.3)
i1 k1
Thetreatmentsumofsquaresofthedeviationsofthetreatmentmeansfromthe
meanofalltheobservations,abbreviatedasSSA,is
m
2
SSA r

(
y y
)
i (12.4)
i1
Thewithin-treatmentorresidualsumofsquaresofthedeviationsfromthemeans
withintreatmentsis
m r
2
SSR

(
y
ik
y
i
)
(12.5)
i1 k1
Thisresidualsumofsquarescangiveanestimateoftheerror.
Itcanbeshownalgebraicallythat
m r
2
m
2
m r
i
2

(
y y
)
r

(
y y
)
+

(
y y
i
)
(12.6) ik i ik
i1 k1 i1 i1 k1
or
SST=SSA+SSR (12.7)
Thus,thetotalsumofsquaresispartitionedintotwoparts.
Thedegreesoffreedomarepartitionedsimilarly.Thetotalnumberofdegreesof
freedomis(N1).Thenumberofdegreesoffreedombetweentreatmentmeansis
thenumberoftreatmentsminusone,or(m1).Bysubtraction,theresidualnumber
ofdegreesoffreedomis(N1)(m1)=(Nm).Thismustbethenumberof
degreesoffreedomwithintreatments.
Thentheestimateofthevariancewithintreatmentsis
2
SSR
s
R

(12.8)
N m
Thisisoftencalledthewithintreatmentsmeansquare.Itisanestimateoferror,
givinganindicationoftheprecisionofthemeasurements.
Theestimateofthevarianceobtainedfromdifferencesofthetreatmentmeans(so
betweentreatments)is
2
SSA
s
A

(12.9)
m1
296
IntroductiontoAnalysisofVariance
Thisisoftencalledthebetweentreatmentsmeansquare.
Nowthequestionbecomes:arethesetwoestimatesofvariance(ormeansquare
deviations)compatiblewithoneanother?Thespecificnullhypothesisisthatthe
populationmeansfordifferenttreatmentsorlevelsareequal.Ifthatnullhypothesis
istrue,thevariabilityofthesamplemeanswillreflecttheintrinsicvariabilityofthe
individualmeasurements.(Ofcoursethevarianceofthesamplemeansisdifferent
fromthevarianceofindividualmeasurements,aswehaveseeninconnectionwith
thestandarderrorofthemean,butthathasalreadybeentakenintoaccount.)Ifthe
populationmeansarenotequal,thetruepopulationvariancebetweentreatmentswill
belargerthanthetruepopulationvariancewithintreatments.Iss
A
2
,theestimateof
thevariancebetweenmeans,significantlylargerthans
R
2
,theestimateofthevariance
withinmeans?(Notethatthisisaone-tailedtest.)Butbeforethequestioncanbe
addressedproperly,weneedtocheckthatthenecessaryassumptionshavebeenmet.
(b) Assumptions
Thefirstassumptionisthemathematicalmodeloftherelationshipweareinvestigat-
ing.Usuallyforastarttheanalysisofvarianceisbasedonthesimplestmathematical
modelforeachsituation.Inthissectionweareconsideringasinglefactoratvarious
levelsandwithsomereplication.Thesimplestmodelforthiscaseis
y + + (12.10)
ik i ik
wherey
ik
isthekthobservationfromthetheithtreatment(asbefore),
isthetrueoverallmean(forthenumbersoftreatmentsandreplicatesusedin
thisexperiment),

i
istheincrementaleffectoftreatmenti,suchthat=,
i i

i
isthetruepopulationmeanfortreatmenti,and

ik
istheerrorforthekthobservationfromtheithtreatment.
Thismathematicalmodelisthesimplestforthissituation,butifwefindthatitisnot
consistentwiththedata,wewillhavetomodifyit.Forexample,ifweturnup
evidenceofsomeinterferingfactororlurkingvariable,amoreelaboratemodelwill
berequired;or,ifthedatadonotfitthelinearrelationofequation12.10,changes
mayberequiredtogetabetterfit.
Otherimportantassumptionsarethattheobservationsforeachtreatmentareat
leastapproximatelynormallydistributed,andthatobservationsforallthetreatments
havethesamepopulationvariance,
2
,butthetreatmentmeansdonothavetobethe
same.Morespecifically,theerrors
ik
must(toareasonableapproximation)be
independentlyandidenticallydistributedaccordingtoanormaldistributionwith
meanzeroandunknownbutfixedvariance
2
.However,accordingtoBoxetal.(see
section15.2forreferences)theanalysisofvarianceasdiscussedinthischapterisnot
297
Chapter12
sensitivetomoderatedeparturesfromanormaldistributionorfromequalpopulation
variances.InthissensetheANOVAmethodissaid,likethet-test,toberobust.
Ifthereareanybiasinginterferingfactors,randomizingtheorderoftakingand
testingthesamplewillusuallymakethenormaldistributionapproximatelyappropri-
ate.However,therewillstillbeaninflationoftheerrorvarianceifbiasingfactorsare
present.Noticethatifrandomizationhasnotbeendoneproperlyinthesituation
wherebiasesarepresent,theassumptionofanormaldistributionwillnotbeappro-
priate.Anyoutliers,pointswithverylargeerrors,maycauseseriousproblems.
(c) DiagnosticPlots
Theassumptionsshouldbecheckedbyvariousdiagnosticplotsoftheresiduals,
whicharethedifferencesbetweentheobservations,y
ik
,and y
ik
,thebestestimatesof
thetruevaluesaccordingtothemathematicalmodel.Thus,theresidualsare(y
ik
y
ik
).
Inthecaseofone-wayanalysisofvariance,whereonlyonefactorisaninput,the
bestestimateswouldbe y
i
.Theplotsaremeanttodiagnoseanymajordiscrepancies
betweentheassumptionsandrealityinthesituationbeingstudied.Ifthereareany
unexplainedsystematicvariationsoftheresiduals,theassumptionsmustbeques-
tionedskeptically.
Thefollowingplotsshouldbeexaminedcarefully:
(1) astem-and-leafdisplay(orequivalent,suchasadotdiagramornormal
probabilityplot)ofalltheresiduals.Isthisconsistentwithanormaldistribu-
tionofmeanzeroandconstantvariance?Arethereanyoutliers?Ifwehave
sufficientdata,asimilarplotshouldbeshownforeachtreatmentorlevel.
(2) aplotofresidualsagainstestimatedvalues(y
ik
,whichisequalto y
i
inthis
case).Isthereanyindicationthatvariancebecomeslargerorsmalleras y
ik
increases?
(3) aplotofresidualsagainsttimesequenceofmeasurement(andalsotime
sequenceofsamplingifthatisdifferent).Isthereanyindicationthaterrors
arechangingwithtime?
(4) aplotofresidualsagainstanyvariable,suchaslaboratorytemperature,
whichmightconceivablyaffecttheresults(ifsuchaplotseemsuseful).Are
thereanytrends?
TheseplotsaresimilartothoserecommendedinthebooksbyBox,Hunter,and
HunterandbyMontgomery(seereferencesinsection15.2).
Eachplotshouldbeconsideredcarefully.Ifplot(1)isnotreasonablysymmetri-
calandconsistentwithanormaldistribution,somechangeofvariableshouldbe
considered.Ifplot(1)showsoneormoreoutliers,thecorrespondingnumbersshould
becheckedtoseeifsomeobviousmistake(suchasanerrorofrecordinganobserva-
tion)ispresent.However,intheabsenceofanyobviouserrortheoutliershouldnot
298
IntroductiontoAnalysisofVariance
bediscarded,althoughtheassumptionofanunderlyingnormaldistributionshouldbe
questioned.Carefulexaminationofremainingoutlierswilloftengiveusefulinforma-
tion,cluestodesirablechangestotheassumedrelationship.
Ifplot(2)indicatesthatvarianceisnotconstantwithvaryingmagnitudeof
estimatedvalues,thentheassumptionofconstantvarianceisapparentlynotsatisfied.
Thenthemathematicalmodelneedstobeadjusted.Forexample,iftheresidualstend
toincreaseas y
ik
increases,thepercentageerrormaybeapproximatelyconstant.This
wouldimplythatthemathematicalmodelmightbeimprovedbyreplacingy
ik
by
log(y
ik
)inequation12.10.
Ifplot(3)showsasystematictrend,thereissomeinterferingfactorwhichisa
functionoftime.Itmightbeatemperaturevariation,orpossiblyimprovementin
experimentaltechniqueastheexperimenterlearnstomakemeasurementsmore
exactly.Iftheorderoftestinghasbeenproperlyrandomized,theassumptionofa
normaldistributionoferrorswillbeapproximatelysatisfied,buttheestimatederror
willbeinflatedbyanysystematicinterferingfactor.
Anytrendsinplot(4)willrequiremodificationofthewholeanalysis.
Letusbeginanexamplebycalculatingmeansforthevarioustreatmentsand
examiningthediagnosticplots.
Example12.1
Fourspecimensofsoilweretakenfromeachofthreedifferentlocationsinthesame
locality,andtheirshearstrengthsweremeasured.Dataareshownbelow.Doesthe
locationaffecttheshearstrengthsignificantly?Usethe5%levelofsignificance.
Sequence Location Shearstrength
oftesting number N/m2
1 2 2940
2 2 2940
3 2 2940
4 3 3482.5
5 1 4000
6 1 4000
7 1 4000
8 3 3482.5
9 2 2940
10 3 3482.5
11 1 4000
12 3 3482.5
299
Chapter12
Answer:
Thesimplestmathematicalmodelforthiscaseisgivenbyequation12.10:
y + +
ik i ik
Thebestestimatesoftheshearstrength, y
ik
,aregivenbythemeansforthevarious
locations.
First,weneedtoarrangethedatabylocation(whichisthetreatmentinthis
case)andcalculatethetreatmentmeansandtheoverallmean.Thentheresidualsare
calculatedinlines15to28ofthespreadsheetofTable12.1withresultsinlines31to43.
Calculationsofsumsofsquaresareshowninlines44to50,andestimatesof
variancesormeansquaresareshowninlines51and52.Nowthedegreesof
freedomshouldbepartitioned.Thetotalnumberofdegreesoffreedomis(N1)=
(3)(4)1=11.Betweentreatmentmeanswehave(m1)=31=2degreesof
freedom.Thenumberofdegreesoffreedomwithintreatmentsbydifferenceisthen
(Nm)=123=9.Anobservedvarianceratioisshowninline53.Calculations
canbedoneusingeitherapocketcalculatororaspreadsheet:Table12.1showsa
spreadsheet.
Table12.1:SpreadsheetforExample12.1,One-wayAnalysisofVariance
A B C D E F
15 SortedData: yik
16 Location,i Shear Strength Sequence Observ. no., kLocationMeans,yi(bar)
17 1 4010 5 1
18 1 3550 6 2
19 1 4350 7 3 SUM(B17:B20)/4=
20 1 4090 11 4 4000
21 2 2970 1 1
22 2 2320 2 2
23 2 2910 3 3 SUM(B21:B24)/4=
24 2 3560 9 4 2940
25 3 3650 4 1
26 3 3470 8 2
27 3 3650 10 3 SUM(B25:B28)/4=
28 3 3160 12 4 3482.5
29OverallMean,y(barbar)=(E20+E24+E28)/3=3474.16667
30Residuals:yik-yi(bar)
300
IntroductiontoAnalysisofVariance
31 Location,i Residual
32 1 B17:B20-E20 10 (Arrayformulas:
33 -450 seeAppendixB)
34 350
35 90
36 2 B21:B24-E24 30
37 -620
38 -30
39 620
40 3 B25:B28-E28 167.5
41 -12.5
42 167.5
43 -322.5
44SSA,eqn12.4:
45 4*SUM(yi(bar)-y(barbar))^2=
46 =4*((E20-E29)^2+(E24-E29)^2+(E28-E29)^2= 2247616.67 SSA
47 SSR,eqn12.5 :
48 SUMi(SUMk(yik-yi(bar))^2=
49 =(10^2+450^2+350^2+90^2)+
50 +(167.5^2+12.5^2+167.5^2+322.5^2)= 1264075 SSR
51 (sA)^2= SSA/(m-1)= E46/(3-1)= 1123808.33
52 (sR)^2= SSR/(N-m)= E50/(12-3) 140452.778
53 fobs=(sA)^2/(sR)^2= D51/D52= 8.00132508
Nowwechecktheresiduals.Astem-and-leafdisplayofalltheresidualsisshown
inTable12.2.Thestemisthedigitcorrespondingtohundreds,from6to+6.
Table12.2:Stem-and-leafdisplayofresiduals
Stem Leaf Frequency
6 2 1
5 0
4 5 1
3 2 1
2 0
1 0
0 31 2
+0 139 3
301
____ _____
Chapter12
1 66 2
2 0
3 5 1
4 0
5 0
6 2 1
Consideringthesmallnumberofdata,Table12.2isconsistentwithanormal
distributionofmeanzeroandconstantvariance.Thereisnoindicationofanyoutliers.
Plotsofresidualsagainsttreatmentmeans, y
ik
,andagainsttimesequenceof
measurement areshowninFigure12.1.
1000 1000
500 500
Residual 0
Residual
0
500 -500
1000 1000
2700 3000 3300 3600 3900 4200 0 5 10 15
TreatmentMean
TimeSequence
(a)Residualvs.TreatmentMean (b)Residualvs.TimeSequence
Figure12.1:PlotsofResiduals
NeitherplotofFigure12.1showsanysignificantpattern,sotheassumptions
appeartobesatisfied.Ifcalculationswerebeingdonewithacalculator,theresiduals
wouldbecheckedbeforeproceedingwithcalculationsofsumsofsquares.
(d) TableforAnalysisofVariance
Nowwearereadytoproceedtotheanalysisofvariance,whichwewilldiscussin
generalfirst,thenapplyittoExample12.1.Atableshouldbeconstructedlikethe
oneshowningeneralinTable12.3below.
Table12.3:TableofOne-wayAnalysisofVariance
SourcesofVariation Sumsof Degreesof Mean VarianceRatios
Betweentreatments
Squares
SSA
Freedom
(m1)
Squares
s
A
2
f
observed
=
2
2
A
R
s
s
Withintreatments SSR (Nm) s
R
2
Total(aboutthe
grandmean, y) SST (N1)
302
IntroductiontoAnalysisofVariance
Thenullhypothesisandthealternativehypothesismustbestated:
H
0
:
i
=0forallvaluesofi(or
1
=
2
=
3
=...).
H
a
:
i
>0foratleastonetreatment.
2
Ifthenullhypothesisistrue,then
A
=
R
2
,sothedepartureoff
observed
from1isdue
onlytorandomfluctuations.Ifthenullhypothesisisnottrue,thevariancebetween
treatments,
A
2
,willbelargerthanthevariancewithintreatments,
R
2
,becauseat
leastoneofthetruetreatmentmeanswillbedifferentfromtheothers.
2
s
A
Thecalculatedvarianceratio,f
observed
= 2 ,shouldbecomparedwithcritical
s
R
valuesoftheF-distributionfor(m1)and(Nm)degreesoffreedom.Thisisthe
samecomparisonaswasdoneinsection10.1.2.Iff
observed
>f
critical
ataparticularlevel
ofsignificance,thetestresultsaresignificantatthatlevel.
Noticethatthisanalysisofvarianceteststhenullhypothesisthatthemeansofall
thetreatmentpopulationsareequal.Ifwehaveonlytwotreatments,thisisequivalent
tothet-testwiththenullhypothesisthatthetwopopulationmeansareequal.This
hasbeendescribedinsection9.2.3.Ifwehavemorethantwotreatments,bythe
analysisofvarianceweexamineallthetreatmentmeanstogetherandsoavoid
problemsofperhapsselectingsubjectivelythepairsoftreatmentmeanswhichare
mostfavorabletoaparticularconclusion.
ThetableofanalysisofvarianceforExample12.1isshownbelowinTable12.4.
Thistablesummarizestheresultsoflines44to53inTable12.1.
Example12.1(continued)
Table12.4:TableofOne-wayAnalysisofVarianceforSoilStrengths
SourcesofVariation Sumsof Degreesof Mean VarianceRatios
Squares Freedom Squares
Betweentreatments 2,247,616.7 2 1,123,808.3 f
observed
=
1,123,808.3
Withintreatments 1,264,075 9 140,452.78 =
140,452.78
Total 3,511,691.7 11 =8.001
Thenullhypothesisandalternativehypothesisareasfollows:
H
0
:
Location1
=
Location2
=
Location3
=
Location4
H
a
: Atleastoneofthepopulationmeansforalocationisnotequaltotheothers.
Theobservedvalueoffiscomparedtothelimitingvalueoffforthecorrespond-
ingdegreesoffreedomatthe5%levelofsignificancefromtablesorsoftware.We
303
Chapter12
findf
limit
orf
critical
=4.26,whereasf
observed
fromTable12.4is8.001.Sincef
observed
>f
limit
,
thevarianceratioissignificantatthe5%levelofsignificance.
Thereforewerejectthenullhypothesisandsoacceptthealternativehypothesis.
Thus,atthe5%levelofsignificancethelocationdoesaffecttheshearstrength.
12.2 Two-wayAnalysisofVariance
Nowtheeffectsofmorethanonefactorareconsideredinafactorialdesign.The
possibilityofinteractionmustbetakenintoaccount.Thiscorrespondstotheexperi-
mentaldesignsusedinExamples11.2,11.3,and11.4,and,withmodification,tothe
fractionalfactorialdesignsdiscussedbrieflyinsection11.6.
Letsconsidertherelationsforafactorialdesigninvolvingtwoseparatefactors.
Laterwecanextendtheanalysisforlargernumbersoffactors.
SaytherearemdifferenttreatmentsorlevelsforfactorA,andndifferenttreat-
mentsorlevelsforfactorB.AllpossiblecombinationsofthetreatmentsoffactorA
andfactorBwillbeinvestigated.Letusperformrreplicationsofeachcombination
oftreatments(thenumber ofreplicationscouldvaryfromonefactortoanother,but
wewillsimplifyalittlehere).
Forthiscasetheobservationsymusthavethreesubscriptsinsteadoftwo:i,j,
andkfortheithtreatmentoffactorA,thejthtreatmentoffactorB,andthekth
replicationofthatcombination.Thenanindividualobservationisrepresentedbyy
ijk
.
y
ij
willbethemeanobservationforthe(ij)thcombinationorcell,themeanofallthe
observationsfortheithtreatmentoffactorAandthejthtreatmentforfactorB. y
i
willbethemeanobservationfortheithtreatmentoffactorA(atalllevelsoffactor
B). y
j
willbethemeanobservationforthejthobservationoffactorB(atalllevels
offactorA).Thenwehave
r
y

y
ijk
ij

k1 (12.11)
r
AveragingalltheobservationsfortheithtreatmentoffactorAgives
n r n

y
ij
y
ijk
k1 j1
y
j1
n

( )(
n
)
(12.12)
i
r
Similarly,averagingalltheobservationsforthejthtreatmentoffactorBgives
m r m

y
ij
y
ijk
k1 i1
y
j

i1
m

( )(
m
)
(12.13)
r
304
IntroductiontoAnalysisofVariance

Thegrandaverage,y,isgivenby
m
n m n r

y
i

y
j
y
ijk
y
i1

j1

i1 j1 k1
(12.14)
m n mnr
Thetotalsumofsquaresofthedeviationsofindividualobservationsfromthe
meanofalltheobservationsis
m n r

SST

(y
ijk
y)
2
(12.15)
i1 j1 k1
ForfactorAthetreatmentsumofsquaresofthedeviationsofthetreatmentmeans
fromthemeanofalltheobservationsis
m

SSA nr

(
y
i
y)
2
(12.16)
i1
SimilarlyforfactorBthetreatmentsumofsquaresis
n

SSB mr

(
y
j
y)
2
(12.17)
j1
ButwehavealsotheinteractionbetweenfactorsAandB.Theinteractionsumof
squaresforthesetwofactorsis
m n

SS
(
AB
)
r

(y
ij

y
i

y
j
y)
2
(12.18)
i1 j1
Theresidualsumofsquares,fromwhichanestimateoferrorcanbecalculated,is
m n r
SSR

(y
ijk
y
ij
)
2
(12.19)
i1 j1 k1
Itcanbeshownalgebraicallythat
SST=SSA+SSB+SS(AB)+SSR (12.20)
Thetotalnumberofdegreesoffreedom,N1=(m)(n)(r)1,ispartitionedinto
thedegreesoffreedomforfactorA,(m1);thedegreesoffreedomforfactorB,
(n1);thedegreesoffreedomforinteraction,(m1)(n1).Thedegreesoffreedom
withincellsavailableforestimatingerroristheremainingnumber,(m)(n)(r1).
Thentheestimateofthevarianceobtainedfromthevariabilitywithincellsis
2
SSR
s
R

m n
)(
r1
)
(12.21)
( )(
Thisisoftencalledtheresidualmeansquareorsometimestheerrormeansquare.
305
Chapter12
Theestimateofthevarianceobtainedfromthevariabilityofthetreatmentmeans
forfactorAiscalledthemeansquareforMainEffectAandisgivenby
SSA
2
s
A

(12.22)
m 1
TheestimateofthevarianceobtainedfromthevariabilityofmeansforfactorBis
calledthemeansquareforMainEffectBandisgivenby
SSB
2
s
B

(12.23)
n 1
TheestimateofthevarianceobtainedfromtheinteractionbetweeneffectsAand
Bis
SS
(
AB
)
2
(12.24)
s
( AB)

(
m 1
)(
n 1
)
ThisiscalledthemeansquareforinteractionbetweenAandB.
Onceagain,variousassumptionsmustbeexaminedbeforewecanproceedtothe
variance-ratiotest.Thefirstassumptionisthemathematicalmodel.Thesimplest
mathematicalmodelforthecaseoftwoexperimentalfactorswithreplicationand
interactionis
y
ijk
=+
i
+ +()
ij
+
ijk
(12.25)
where y
ijk
isthekthobservationoftheithleveloffactorAandthejthleveloffactorB
isthetrueoverallmean

i
istheincrementaleffectoftreatmenti,suchthat
i
=
i
-

j
istheincrementaleffectoftreatmentj,suchthat
j
=
j
-

i
isthetruepopulationmeanfortheithleveloffactorA

j
isthetruepopulationmeanforthejthleveloffactorB
()
ij
istheinteractioneffectfortheithleveloffactorAandthejthlevelof
factorB,
and
ijk
istheerrorforthekthobservationoftheithleveloffactorAandthejth
leveloffactorB.
Thisisthesimplestmathematicalmodel,butagainitmaynotbethemost
appropriate.Ifequation12.25applies,thebestestimateofthetruevalueswouldbe
y y
j
y

)= y
ij
(12.26)
j
y
ijk
=

+(
y
i


)+(
y


)+(y
ij
y
i

y
j
+
y
Thenresidualsaregivenby(y
ijk
y
ijk
).
Again,weareassumingthattheerrors
ijk
are(toagoodapproximation)indepen-
dentlyandidenticallydistributedaccordingtoanormaldistributionwithmeanzero
andfixedbutunknownvariance
2
.
306
IntroductiontoAnalysisofVariance
Theseassumptionsarecheckedbythesameplotsaswereusedinsection12.1for
theone-wayanalysisofvariance.Thesewereplots(1)to(4)ofsection(c)ofthat
section.
Ifplot(2)indicatesthatthevarianceisnotconstantwithvaryingestimatesofthe
measuredoutput, y
ijk
,somemodificationofthemathematicalmodelisrequired.The
bookbyBox,Hunter,andHuntergivesafulldiscussionandanexampleintheir
section7.8.
Iftheplotsgivenosignificantindicationthatanyoftheassumptionsareincor-
rect,wecangoontotheanalysisofvarianceforthiscase.Thenullhypothesesand
alternativehypothesesareasfollows:
H
0
:
i
=0forallvaluesofi(or
1
=
2
=
3
=...=
a
);
H
a
:
i
>0foratleastonetreatment.
H
0
:
j
=0forallvaluesofj(orequalvaluesof
j
);
H
a
:
j
>0foratleastonetreatment.
H
0
:()
ij
=0forallvaluesofiandj;
H
a
:()
ij
>0foratleastonecell.
Aswediscussedinsection12.1,ifsomeofthealternativehypothesesaretrue,
someofthecorrespondingtruepopulationvarianceswillbelargerthanthetrue
varianceforerror.ThentheF-testisusedtoseewhethertheotherestimatesof
variancearesignificantlylargerthantheestimateofvariancecorrespondingtoerror
(noteagainthatthisisaone-sidedtest).
AtableshouldbeconstructedasinTable12.5below.
Table12.5:TableofAnalysisofVarianceforaFactorialDesign
SourcesofVariation Sumsof Degreesof Mean VarianceRatios
Squares Freedom Squares
2
MaineffectA SSA (m1) s
A
2
f
observed1
= 2
A
R
s
s
2
MaineffectB SSB (n1) s
B
2
f
observed2
= 2
B
R
s
s
2
InteractionAB SS(AB) (m1)(n1) s
(AB)
2
f
observed3
=
( )
2
AB
R
s
s
Error SSR (m)(n)(r1) s
R
2
Total SST (N1)
307
Chapter12
Again,theobservedratiosofvariancecanbecomparedtothetabulatedvaluesof
Ffortheappropriatenumbersofdegreesoffreedomandforvariouslevelsofsignifi-
canceaccordingtotheone-sidedF-test.
Iftherearemorethantwofactorsinthefactorialdesign,therewillbefurther
maineffectsinTable12.7andfurtherinteractions.Thus,iftherearethreefactors,the
maineffectsmightbeA,B,andC,andthecorrespondinginteractionswouldbeAB,
AC,BC,andABC.
Ifsomemaineffectsorinteractionsshownosignofbeingstatisticallysignifi-
cant,theirsumsofsquaresanddegreesoffreedomaresometimescombinedwiththe
errorsumofsquaresanderrordegreesoffreedom,respectively,togiveimproved
estimatesoftheerrormeansquares.
Sometimesitisassumedthatfourth-order(orperhapsthird-order)andhigher-
orderinteractionsarenegligible.Thenreplicationsaresometimesomitted,sothatthe
errormeansquaresareestimatedentirelyfromthehigher-orderinteractions.
Anextensionofthisisusedinanalyzingfractionalfactorialdesigns.Ifthe
numberoffactorsislarge,somemaineffectsandtheirinteractionswithothermain
effectsarelikelytohavenegligiblesignificance.Thenthesemaineffectsandinterac-
tionsareusedtoestimatetheerrorofmeasurements.ThisisdiscussedinChapters12
and13ofthebookbyBox,Hunter,andHunter.Sincefractionalfactorialdesignsare
usedforexploratoryinvestigations,oftengraphicalanalysisoftheresultsissufficient
toshowwhichvariablesrequiremoredetailedexamination.This,also,isdiscussed
inthebookbyBox,Hunter,andHunter.
Example12.2
Achemicalprocessisbeinginvestigatedinapilotplant.Thefactorsunderstudyare,
firstthecatalyst,LiquidCatalyst1orLiquidCatalyst2,andthentheconcentration
ofeach(1gram/literor2grams/liter).Tworeplicaterunsaredoneforeachcombina-
tionoffactors.Theresponse,ordependentvariable,isthepercentageyieldofthe
desiredproduct.ResultsareshowninTable12.6below.
Table12.6:PercentageYieldsforCatalystStudy
Yield
Catalyst Catalyst
Concentration 1 2
1g/L 49.3 47.4
53.4 50.1
2g/L 63.6 49.7
59.2 49.9
308
IntroductiontoAnalysisofVariance
Istheyieldsignificantlydifferentforadifferentcatalystorconcentrationorcombina-
tionofthetwo?Usethe5%levelofsignificance.
Answer: Theplotsoftheresidualswereexaminedinthesamewayasforthe
previousexampleandshowednosignificantpatterns.Theywillbeomittedherefor
thesakeofbrevity.
Table12.7:SpreadsheetforExample12.2,Two-wayAnalysisofVariance
A B C D E F
3 Catalyst,i -> 1 2
4 Concentration Yield ,yijk
5 1g/L,j=1 49.3 47.4
6 53.4 50.1
7 2g/L,j=2 63.6 49.7
8 59.2 49.9
9 yij(bar): cat1(i=1) cat2(i=2) cat1(i=1) cat2(i=2)
10 1g/L,j=1 (B5+B6)/2= (C5+C6)/2= 1g/L, j=1 51.35 48.75
11 2g/L, j=2 (B7+B8)/2= (C7+C8)/2= 2g/L, j=2 61.4 49.8
12 yi(barbar): cat1(i=1) cat2(i=2) cat1(i=1) cat2(i=2)
13 (E10+E11)/2=(F10+F11)/2= 56.375 49.275
14 yj(barbar):
15 1g/L, j=1 (E10+F10)/2= 1g/L,j=1 50.05
16 2g/L, j=2 (E11+F11)/2= 2g/L,j=2 55.6
17 y (barbarbar): (E13+F13)/2 52.825
18 (check:) (E15+E16)/2= 52.825
19 Residuals yijk-yij(bar) i=1 i=2
20 j=1 -2.05 -1.35 B5:C5-E10:F10
21 2.05 1.35 B6:C6-E10:F10
22 j=2 2.2 -0.1 B7:C7-E11:F11
23 -2.2 0.1 B8:C8-E11:F11
24Forcatalyst,SSA=2*2*SUM((yi(barbar)-y(barbarbar))^2)=
25 4*((E13-D17)^2+(F13-D17)^2)= 100.82 SSA
26Forconcentration,SSB=2*2*SUM((yj(barbar)-y(barbarbar))^2=
27 4*((E15-D17)^2+(E16-D17)^2)= 61.605 SSB
28Forinteraction,SS(AB)=2*SUMiSUMj((yij(bar)-yi(barbar)-yj(barbar)+y(barbarbar))^2
29 =2*((E10-E13-E15+D17)^2+(E11-E13-E16+D17)^2+
30 +(F10-F13-E15+D17)^2+(F11-F13-E16+D17)^2)= 40.5 SS(AB)
309
Chapter12
31Forresidual,SSR=SUMiSUMjSUMk(yijk-yij(bar))=
32 C20^2+D20^2+C21^2+D21^2+C22^2+D22^2+C23^2+D23^2=
33 21.75 SSR
34dftotal=(2)*(2)*(2)-1= 7 df:
35 Catalyst, df(A)=2-1= 1 A
36 Conc., df(B) =2-1= 1 B
37 Interaction, df df(AB)=(2-1) *(2-1)= 1 AB
38 df(error)= (2)*(2)*(2-1)= 4 error
39 Check: D29-D30-D31-D32= 4 (check)
40 s(A)^2: SSA/df(A)= E25/D35= 100.82
41 s(B)^2: SSB/df(B)= E27/D36= 61.605
42 s(AB)^2: SS(AB)/df(AB)= E30/D37= 40.5
43 s(R)^2: SSR/df(error)= E33/D38= 5.4375
44 f(obs,A)= s(A)^2/s(R)^2= D40/D43= 18.5416092
45 f(obs,B)= s(B)^2/s(R)^2= D41/D43= 11.3296552
46 f(obs,AB)= s(AB)^2/s(R)^2= D42/D43= 7.44827586
CalculationsareshowninthespreadsheetofTable12.7.Themeanyieldsforthe
cellswerefoundbyaveragingtheyieldsfoundforeachsetofconditions,asshown
inlines10and11.Themeanyieldsforthecatalysts(atalllevelsofconcentration)
areshowninline13.Themeanyieldsforconcentrationsof1g/Land2g/L(for
bothcatalysts)areshowninlines15and16.Theoverallmean(orgrandaverage)is
calculatedinline17.Residualsarecalculatedinlines19to23usingarrayformulas.
Thetreatmentsumofsquaresforcatalyst,SSA,iscalculatedinlines24and25.
Thetreatmentsumofsquaresforconcentration,SSB,iscalculatedinlines26and27.
Theinteractionsumofsquaresbetweencatalystandconcentration,SS(AB),is
calculatedin lines29and30.Theresidualsumofsquares,SSR,iscalculatedinlines
31to33.
Thetotalnumberofdegreesoffreedomiscalculatedinline34andthenparti-
tionedintodegreesoffreedomforcatalyst,concentration,interaction,andthe
residualusedforestimatingerror.Theseareshowninlines35to39.Meansquares,
estimatesofvariances,arecalculatedinlines40to43.Theyareeachfoundby
dividingthecorrespondingsumofsquaresbythedegreesoffreedom.Observed
varianceratiosarecalculatedinlines44to46.
310
IntroductiontoAnalysisofVariance
ThetableofanalysisofvarianceforthiscaseisshowninTable12.8.
Table12.8:TableofAnalysisofVarianceforStudyofCatalysts,
Two-wayAnalysisofVariance
SourcesofVariation Sumsof Degreesof Mean VarianceRatios
Squares Freedom Squares
Maineffect,catalyst 61.605 1 61.605 f
observed1
=
61.605
5.4375
11.33
Maineffect,concentration 100.82 1 100.82 f
observed2
=
100.82
5.4375
18.54
Interactionbetween
catalystandconcentration
40.5 1 40.5 f
observed3
=
40.5
5.4375
7.45
Error 21.75 4 5.4375
Total 224.675 7
TheobservedvarianceratiosinTable12.8werefoundbydividingthemean
squaresforcatalyst,concentration,andinteraction,respectively,bythemeansquare
forerror.
Onthebasisofthesimplestmathematicalmodelforthiscase,
y
ijk
=+
i
+ +()
ij
+
ijk
,
j
thenullhypothesesandalternativehypothesesareasfollows:
H
0,1
: Thetrueeffectofthecatalystiszero,asopposedto
H
a,1
: thetrueeffectofthecatalystisnotzero.
H
0,2:
Thetrueeffectofconcentrationiszero,versus
H
a,2
: thetrueeffectofconcentrationisnotequal.
H
0,3
Thetrueeffectofinteractionbetweencatalystandconcentrationiszero,vs.
H
a,3
thetrueeffectofinteractionbetweencatalystandconcentrationisnotzero.
Ifthealternativehypothesesarecorrect,thecorrespondingtruevarianceratiosfor
thepopulationswillbegreaterthan1.
f
NowwecanapplytheF-test.f
observed1
shouldbecomparedwithf
critical
foraone-
sidedtestwithonedegreeoffreedomandfourdegreesoffreedomatthe5%levelof
significance;thisis7.71.f
observed2
shouldalsobecomparedwith7.71,andsoshould
observed3
.Thenwecanrejectthenullhypothesesthatthetruepopulationmeansfor
bothcatalystsareequalandthetruepopulationmeansforbothconcentrationsare
equal,bothatthe5%levelofsignificance,andacceptthealternativehypothesesthat
311
Chapter12
off
thecatalystandtheconcentrationmakeadifference.However,wedonothave
enoughevidenceatthe5%levelofsignificancetorejectthehypothesisthatthemean
resultoftheinteractionbetweencatalystandconcentrationiszero.Becausethevalue
observed
forinteractionbetweencatalystandconcentrationisonlyalittlesmaller
thanthecorrespondingvalueoff
critical
,wemaywelldecidetocollectsomemoredata
onthispoint.
Thus,weconclude(atthe5%levelofsignificance)thattheyieldisaffectedby
boththecatalystandtheconcentration, butwedonothaveenoughevidenceto
concludethattheyieldisaffectedbytheinteractionbetweencatalystandconcentration.
Noticethattheanalysisdiscussedinthischapterallowsustoconcludethat
certainfactorshaveaneffect,butitdoesnotallowustosayquantitativelyhowthe
yieldischangedbyanyparticularlevelofafactor.Inotherwords,wehavenot
determinedthefunctionalrelationshipbetweenthevariables.Forthat,wewouldhave
tousearegressionanalysis,whichwillbediscussedinChapter14.
Example12.3
Concretespecimensaremadeusingthreedifferentexperimentaladditives.The
purposeoftheadditivesistotrytoacceleratethegainofstrengthastheconcrete
sets.AllspecimenshavethesamemassratioofadditivetoPortlandcement,andthe
samemassratioofaggregatetocement,butthreedifferentmassratiosofwaterto
cement.Tworeplicatespecimensaremadeforeachofninecombinationsoffactors.
Allspecimensarekeptunderstandardconditions.Aftertwenty-eightdaysthe
compressionstrengthsofthespecimensaremeasured.Theresults(inMPa)are
showninTable12.9.
Table12.9:StrengthsofConcreteSpecimens
Additives
Ratio,water #1 #2 #3
tocement
CompressiveStrengths
0.45 40.7 42.5 40.4
39.9 41.4 41.7
0.55 36 35.6 26.6
26.3 30.7 28.2
0.65 24.7 30.6 21.9
23.9 23.9 27.6
Dothesedataprovideevidencethattheadditivesorthewater:cementratiosor
interactionsofthetwoaffecttheyieldstrength?Usethe5%levelofsignificance.
312
IntroductiontoAnalysisofVariance
Answer: Againtheplotsoftheresidualswereexaminedinthesamewayasfor
Example12.1butshowednosignificantpatternsthatwouldindicatethatsomeofthe
assumptionswerenotvalid.Againtheyareomittedforthesakeofbrevity.Calcula-
tionsareshowninthespeadsheet,Table12.10.
Table12.10:SpreadsheetforStudyofAdditivestoConcrete,
Two-WayAnalysisofVariancewithInteraction
A B C D E F
1 Additives, i
2 j, Ratio,w/c i=1 i=2 i=3 Rowsums Totals,ratioj
3 yijk
4 j=1, 0.45 40.7 42.5 40.4 123.6
5 39.9 41.4 41.7 123 246.6
6 j=2, 0.55 36 35.6 26.6 98.2
7 26.3 30.7 28.2 85.2 183.4
8 j=3, 0.65 24.7 30.6 21.9 77.2
9 23.9 23.9 27.6 75.4 152.6
10 Totals,addtvi 191.5 204.7 186.4 582.6
11 Cell means, (B4+B5)/2,andcopy: Grandtotal,y
12 yij(bar): i=1 i=2 i=3
13 j=1 40.3 41.95 41.05
14 j=2 31.15 33.15 27.4
15 j=3 24.3 27.25 24.75
16 Residuals,yijk-yij(bar):
17 i=1 i=2 i=3 Array formulas:
18 j=1 0.4 0.55 -0.65 B4:B5-B13,andcopy
19 -0.4 -0.55 0.65
20 j=2 4.85 2.45 -0.8 B6:B7-B14,andcopy
21 -4.85 -2.45 0.8
22 j=3 0.4 3.35 -2.85 B8:B9-B15,andcopy
23 -0.4 -3.35 2.85
24 Means,addtviB10/6,andcopy
25 i=1 i=2 i=3
26 yi(barbar): 31.9166667 34.1166667 31.0666667
27 Means,ratioj: F5/6, etc.:
28 j=1 41.1
313
Chapter12
29 j=2 30.5666667
30 j=3 25.4333333
31 Overall mean: F10/18= 32.3666667 <y(barbarbar)
32 For additive, SSA=3*2*SUM((y i(barbar)-y(barbarbar))^2)
33 6*((B26-C31)^2+(C26-C31)^2+(D26-C31)^2) 29.73 SSA
34 Forw/cratio, SSB=3*2*SUM((y j(barbar)-y(barbarbar))^2
35 6*((B28-C31)^2+(B29-C31)^2+(B30-C31)^2)= 765.493333 SSB
36 Interaction: SS(AB)=2*SUMiSUMj((yij(bar)-yi(barbar)-yj(barbar)+y(barbarbar))^2
37 (B13-B26-B28+$C$31)^2,andcopy:
38 i=1 i=2 i=3
39 j=1 0.1225 0.81 1.5625
40 j=2 1.06777778 0.69444444 3.48444444
41 j=3 0.46694444 0.00444444 0.38027778 SUMj 2*SUMj
42 SUMi 1.65722222 1.50888889 5.42722222 8.59333333 17.1866667
43 Forresiduals: SSR=SUMiSUMjSUMk(yijk-yij(bar))^2 SS(AB)^
44 (B18:D23)^2: 0.16 0.3025 0.4225 <Arrayformula
45 0.16 0.3025 0.4225
46 23.5225 6.0025 0.64
47 23.5225 6.0025 0.64
48 0.16 11.2225 8.1225
49 0.16 11.2225 8.1225 SUMj
50 SUMi 47.685 35.055 18.37 101.11 SSR
51 Degreesoffreedom
52 dftotal= 3*3*2-1= 17 df:
53 Addtves,df(A): 3-1= 2 A
54 w:c ratio,df(B): 3-1= 2 B
55 Interaction,df(AB)= (3-1)*(3-1)= 4 AB
56 df(error)= (3)*(3)*(2-1)= 9 error
57 Check: D52-D53-D54-D55= 9 (check)
58 MeanSquares:
59 s(A)^2: SSA/df(A)= E33/D53= 14.865
60 s(B)^2: SSB/df(B)= E35/D54= 382.746667
61 s(AB)^2: SS(AB)/df(AB)= F42/D55= 4.29666667
62 s(R)^2: SSR/df(error)= E50/D56= 11.2344444
314
IntroductiontoAnalysisofVariance
63 f(obsrvd,A)= s(A)^2/s(R)^2= D59/D62= 1.323163 f(A)
64 f(obsrvd,B)= s(B)^2/s(R)^2= D60/D62= 34.06903 f(B)
65 f(obsrvd,AB)=s(AB)^2/s(R)^2= D61/D62= 0.382455 f(AB)
Thegivendataareshowninlines1to9,totalsforadditiveiareshowninline10,
andtotalsforwater/cementratiojareshownincolumnF.Cellmeansarecalculated
incellsB13:D15,asshownincellB11.Thenresidualsarecalculatedincells
B18:D23.Line26showsmeans y
i
foradditiveiforallvaluesofthew/cratioj
accordingtoequation12.12,andsimilarlycellsB28:B30showmeans y
j
forw/c

ratiojforallvaluesforadditiveiacccordingtoequation12.13.Theoverallmean,
y
,
iscalculatedincellC31.
ThetreatmentsumofsquaresforfactorA(additives),SSA,iscalculatedinlines
32and33.ThetreatmentsumofsquaresforfactorB,(w/cratio),SSB,iscalculated
inlines34and35.Thetreatmentsumofsquaresforinteraction,SS(AB),iscalculated
inlines36to42withtheresultincellF42.Theresidualsumofsquares,SSR,is
calculatedinlines43to50withtheresultincellE50.
Thedegreesoffreedomarecalculatedinlines51to57.Inline57wecheckthat
thenumberofdegreesoffreedomavailableforestimatingerroristhedifference
betweenthetotaldegreesoffreedomandthedegreesoffreedomallocatedtoA,B,
andinteractionAB.
Finally,meansquaresforestimatingvariancesforA,B,AB,anderrorarecalcu-
latedinlines58to62,andtheobservedvarianceratiosarecalculateinlines63to65.
TheanalysisofvarianceforthiscaseissummarizedinTable12.11.Onceagain,
themeansquares,orestimatesofvariance,arefoundbydividingthecorresponding
sumsofsquaresanddegreesoffreedom.
Table12.11:AnalysisofVarianceforStrengthofConcrete
SourcesofVariation Sumsof Degreesof Mean VarianceRatios
Squares Freedom Squares
Additives 29.730 2 14.865 f
observed1
=
14.865
11.234
1.32
Water-cementratio 765.493 2 382.747 f
observed2
=
382.747
11.234
34.07
Interactionbetween
additivesandwater
-cementratio 17.187 4 4.297 f
observed3
=
4.297
11.234
0.38
Error 101.110 9 11.234
Total 913.520 17
315
Chapter12
Againthesimplestmathematicalmodelforthiscaseis
y
ijk
=+
i
+ +()
ij
+
ijk j
H
H
Thecorrespondingnullhypothesesandalternativehypothesesareasfollows:
0,1
: Thetrueeffectoftheadditivesiszero,asopposedto
a,1
: thetrueeffectoftheadditivesisnotzero.
H
H
0,2
: Thetrueeffectofthewater:cementratioiszero,versus
a,2
: thetrueeffectofthewater:cementratioisnotzero.
H
H
0,3
: Thetrueeffectofinteractionbetweenadditivewater:cementratioiszero,vs.
a,3
: thetrueeffectofinteractionbetweenadditiveandwater:cementratioisnotzero.
TheobservedvarianceratiosinTable12.11arecomparedwithcriticalvaluesof
theF-distributionforcorrespondingnumbersofdegreesoffreedomatthe5%level
ofsignificance.Fortwodegreesoffreedominthenumeratorandninedegreesof
freedominthedenominator,tablesindicatethatthecriticalorlimitingvarianceratio
is4.26.Thusf
observed,1
isnotsignificantlydifferentfrom1,butf
observed,2
clearlyis.
Sincetheinteractionmeansquareissmallerthantheerrormeansquare,thereisno
indicationatallthattheinteractionhasasignificanteffect.
Wecanconclude,then,thatthedataprovideevidenceatthe5%levelofsignifi-
cancethatthewater:cementratioaffectstheyieldstrength,butnotthattheadditives
ortheinteractionbetweenadditivesandcement-waterratioaffecttheyieldstrength.
12.3 AnalysisofRandomizedBlockDesign
Aswediscussedinsection11.5(d),blockingisusedtoeliminatethedistortion
causedbyaninterferingvariablethatisnotofprimaryinterest.Inrandomizedblock
designsthereisnoreplicationwithinablock,andinteractionsbetweentreatments
andblocksareassumedtobenegligible(subjecttochecking).Inthissectionwewill
discussasimplecaseinwhichthereisonlyonetreatmentintwoormorelevels.
Thenomenclatureisalittlesimplerthanintheprevioussection.y
ij
isanobserva-
tionoftheithleveloffactorAandthejthblock.Thereareadifferentlevelsfor
factorAandbdifferentblocks.misthetrueoverallmean.
i
istheincremental
effectoftreatmenti,suchthat
i
=
i
,where
i
isthetruepopulationmeanfor
theithleveloffactorA.
j
istheincrementaleffectofblockj,suchthat
j
=
j
,
where
j
isthetruepopulationmeanforblockj.Theerror
ij
isthedifferencebe-
tweeny
ij
andthecorrespondingtruevalue.
Thesimplestmathematicalmodelis
y
ij
=+
i
++
ij
(12.27)
j
Thequantities,
i
,and areestimatedfromthedata.Thebestestimateofis
j
thegrandmean,themeanofallobservations:
316
IntroductiontoAnalysisofVariance
a b

y
ij
y
i1 j1
a b
) ( )(
Thebestestimateof
i
,thepopulationmeanfortheithlevelfactorA,isthetreat-
mentmean,
b

y
ij
y
j1
,
i
b
sothebestestimateof
i
is
(
y
i
y
)
.Similarly,thebestestimateof,thepopulation
j
meanforblockj,istheblockmean,
a

y
ij
y
j

i1 ,
a
sothebestestimateof
j
is
(
y y
)
.Thenthebestestimateof(+
i
+ )is
j j
] ,
y+
(
y
i
y
)
+
(
y
j
y
)
]
.Sinceforablockdesignthereisnoreplication,theerror
ij

isestimatedbytheresidual,
y
ij

,
y+
(
y
i
y
)
+
(
y
j
y
)
]
y
ij
y
i
y
j
+ y
.Remember
]
thatforablockdesign,interactionwiththeblockingvariableisassumedtobe
absent.
Thetotalsumofsquaresofthedeviationsofindividualobservationsfromthegrand
a b
2
averageis SST
(
y
ij
y
)
.Thetreatmentsumofsquaresofthedeviationsof
i1 j1 a
2
thetreatmentmeansfromthemeanofalltheobservationsisSSA= b

(
y y
)
.
i
i1
Theblocksumofsquaresofthedeviationsoftheblockmeansfromthemeanofall
b
2
theobservationsisSSB=
a
(
y y
)
.Theresidualsumofsquaresis
j
j1
a b
2
SSR
(
y
ij
y y
j
+ y
)
.Itcanbeshownalgebraicallythat
i
i1 j1
SST=SSA+SSB+SSR (12.28)
Thetotalnumberofdegreesoffreedom,N1=(a)(b)1,ispartitionedsimi-
larlyintothecomponentdegreesoffreedom.Thenumberofdegreesoffreedom
betweentreatmentmeansisthenumberoftreatmentsminusone,or(a1).The
numberofdegreesoffreedombetweenblockmeansisthenumberofblocksminus
one,or(b1).Theresidualnumberofdegreesoffreedomis(ab1)(a1)
(b1)=abab+1=(a1)(b1).
317
Chapter12
Onceagain,theassumptionsshouldbecheckedbythesameplotsofresidualsas
wereusedinsection12.1.If,contrarytoassumption,thereisaninteractionbetween
treatmentsandtheblocks,aplotofresidualsversusexpectedvaluesmayshowa
curvilinearshape,thatis,asystematicpatternthatisnotlinear.Ifthatoccurs,a
transformationofvariableshouldbeattempted(seethebookbyMontgomery).Afull
factorialdesignmayberequired.
Iftheseplotsgivenoindicationofseriouserror,atableofanalysisofvariance
shouldbepreparedasinTable12.12below.
Table12.12:TableofAnalysisofVarianceforaRandomizedBlockDesign
SourcesofVariation Sumsof Degreesof Mean VarianceRatio
Betweentreatments
Squares
SSA
Freedom
(a1)
Squares
s
A
2
f
observed,1
=
s
A
2
s
R
2
Betweenblocks SSB (b1) s
B
2
Residuals SSR (a1)(b1) s
R
2
Total(aboutthe
grandmean,
y
) SST (N1)
Thenullhypothesisandalternativehypothesisaresimilartotheoneswehave
seenbefore,andtheobservedvarianceratiosarecomparedasbeforetothetabulated
valuesfortheF-test.Thenappropriateconclusionsaredrawn.
Onceagain,morethanonefactormaybepresent,andthetableofanalysisof
variancecanbemodifiedaccordingly.
Example12.4
Threesimilarmethodsofdeterminingthebiologicaloxygendemandofawaste
streamarecompared.Twotechnicianswhoareexperiencedinthistypeofworkare
available,butthereissomeindicationthattheyobtaindifferentresults.Arandomized
blockdesignisused,inwhichtheblockingfactoristhetechnician.Preliminary
examinationofresidualsshowsnosystematictrendsorotherindicationofdifficulty.
ResultsinpartspermillionareshowninTable12.13.
Table12.13:ResultsofB.O.D.Studyinpartspermillion
Method1 Method2 Method3
Technician1 827 819 847
Technician2 835 845 867
Isthereevidenceatthe5%levelofsignificancethatoneortwomethodsof
determinationgivehigherresultsthantheothers?
318
IntroductiontoAnalysisofVariance
Answer:
Table12.14:SpreadsheetforExample12.4,RandomizedBlockDesign
A B C D E F
1 Methods,i
2 i=1 i=2 i=3
3 j, technician yij Totals,ratioj
4 j=1 827 819 847 2493
5 j=2 835 845 867 2547
6 Totals,methodi 1662 1664 1714 5040 Grand Total
7 B6/2 C6/2 D6/2
8 Means,methodi 831 832 857 OverallMean:
9 Means,technj 831 E4/3, j=1 y(bar,bar)= 840
10 849 E5/3, j=2 (E6/6)
11 Residuals,yij-yi(bar)-yj(bar)+y(bar,bar):
12 5 -4 -1 B4:D4-B$8:D$8-B9+F$9
13 -5 4 1 (arrayformula),andcopy
14 SSA=2*SUM(y i(bar)-y(bar,bar))^2 =2*((B8-F9)^2+(C8-F9)^2+(D8-F9)^2)=
15 SSA= 868
16 SSB=3*SUM(y j(bar)-y(bar,bar)^2) =3*((B9-F9)^2+(B10-F9)^2)=
17 SSB= 486
18 SSR=SUM(residual^2) =B12^2+B13^2+C12^2+C13^2+D12^2+D13^2=
19 SSR= 84
20 df(A)=a-1= 3-1= 2
21 df(B)=b-1= 2-1= 1
22 df(resid)=(a-1)*(b-1)= D20*D21= 2
23 MeanSquare,A=B15/D20= 434
24 MeanSquare,B=B17/D21= 486
25 Mean Square, residual= B19/D22= 42
26 fobs,A= D23/D25= 10.3333333
ThespreadsheetisshowninTable12.14.Themeanresultforeachtechnician(forall
methods)wascalculatedincolumnE.Themeanresultforeachmethod(forboth
technicians)wascalculatedinline8.Theoverallmean(orgrandaverage)wasfound
incolumnF.Residualswerecalculatedinrows12and13.Thetreatment(method)
319
Chapter12
sumofsquares,SSA,wascalculatedinrows14and15.Theblock(technician)sum
ofsquares,SSB,wascalculatedinlines16and17.Theresidualsumofsquares,
SSR,wascalculatedinrows18and19.Degreesoffreedomwerecalculatedinrows
20:22.Meansquareswerecalculatedinrows23:25.Observedvarianceratiowas
calculatedinrow26.
Onthebasisofthesimplestmathematicalmodelforthiscase,
y
ij
=+
i
+ +
ij
thenullhypothesisandalternativehypothesisareasfollows:
H
0,1
: Thetrueeffectofthemethodiszero,asopposedto
H
a,1
: thetrueeffectofthemethodisnotzero.
Ifthealternativehypothesisistrue,thecorrespondingpopulationvarianceratio
willbegreaterthan1.
f
WearenowinapositiontoapplytheF-test.Meansquaresandobservedf-ratios
areshowninthiscaseaspartofthespreadsheet,ratherthanasaseparatetable.
observedA
shouldbecomparedtof
limit
fortwodegreesoffreedominthenumeratorand
twodegreesoffreedominthedenominatoratthe5%levelofsignificance,whichis
j
19.00.Thereforewedonothaveenoughevidencetorejectthenullhypothesisthat
thetrueeffectofthemethodiszero.However,theresultforMethod3isappreciably
abovetheresultsforMethods1and2.Thus,wemaywanttocollectmoreevidence.
Unlessthiswasonlyapreliminaryexperiment,weshouldprobablyuselarger
samplesizesfromthebeginning.Largernumbersofdegreesoffreedomwould
providetestsmoresensitivetodeparturesfromthenullhypotheses,aswehaveseen
before.Inthisexamplethesamplesizeshavebeenkeptsmalltomakethecalcula-
tionsassimpleaspossible.
Fromthesedatathereisnoevidenceatthe5%levelofsignificancethatoneor
twomethodsgivehigherresultsthantheothersorthatthetechniciansreallydoaffect
theresults.
12.4 ConcludingRemarks
WehaveseeninChapter11someofthechiefstrategiesandconsiderationsinvolved
indesigningindustrialexperiments.Chapter12hasintroducedtheanalysisof
variance,oneofthemainmethodsofanalyzingdatafromfactorialdesigns.Both
thesechaptersareintroductory.Furtherinformationonbothcanbeobtainedfromthe
booksbyBox,Hunter,andHunterandbyMontgomery(seetheListofSelected
Referencesinsection15.2).Forinstance,bothofthesebookshavemuchmore
informationonfractionalfactorialdesigns,includingworkedexamples.Some
personsfindthebookbyBox,Hunter,andHuntereasiertofollow,butthebookby
Montgomeryismoreup-to-date.
320
IntroductiontoAnalysisofVariance
Theworkedexamplesontheanalysisofvariancethatweveseeninthischapter
weresimplecasesinvolvingsmallamountsofdata.Theywerechosentomakethe
calculationsaseasilyunderstandableaspossible.Asthenumberofdataincrease,
calculationusingacalculatorbecomesmorelaboriousandtedious,andtheprobability
ofmechanicalerrorincreases.Therecanbenodoubtthatthecomputercalculationis
muchquickerandmoreconvenient,andtheprobabilityoferrorincalculationismuch
smaller.Theadvantageofthecomputerincreasesgreatlyasthesizeofthedataset
increases,andatypicalsetofdatafromindustrialexperimentationismuchlargerthan
thesetofdatausedinExample12.3.Thus,thegreatmajorityofpracticalanalysesof
variancearedonenowadaysusingvarioustypesofsoftwareondigitalcomputers.
TherearetwomainapproachestocomputercalculationsofANOVA.Oneisthe
fundamentalapproach,inwhichthebasicformulasofExceloranotherspreadsheet
areusedtoperformthecalculationsoutlinedinthischapter.Thatcanbeusedfrom
thebeginning.Theotherisuseofmorecomplicatedandspecializedfunctionssuch
asspecialsoftwarelikeSPSSandSAS.Thoseareveryusefuloncethereaderhasa
goodgraspofthebasicrelationsandtheirusefulness,buttheyshouldnotbeusedin
thelearningphase.
Wehavenotedalsothattheanalysisofvarianceasintroducedinthischapterdoes
notgiveusalltheinformationwewantinmanypracticalcases.Iftheindependent
variablesarenumericalquantities,ratherthancategories,weusuallywanttoobtaina
functionalrelationshipbetweenoramongthevariables.Ifaparticularindependent
variableincreasesbytenpercent,byhowmuchdoesthedependentvariableincrease?
Theanalysisofvarianceintheformdiscussedinthischaptermaytellusthata
certainindependentvariablehasasignificanteffectonthedependentvariable,butit
cannotgiveaquantitativefunctionalrelationshipbetweenthevariables.Toobtaina
functionalrelationshipwemustuseadifferentmathematicalmodelandadifferent
analysis.Thatistheanalysiscalledregression,whichwillbediscussedinChapter14.
Problems
1. Threetestingmachinesareusedtodeterminethebreakingloadintensionofwire
whichisbelievedtobeuniform.Ninepiecesofwirearecutoff,oneafteran-
other.Theyarenumberedconsecutively,andthreeareassignedtoeachmachine
usingrandomnumbers.Randomnumbersareusedalsotodeterminetheorderin
whichspecimensaretestedoneachmachine.Thebreakingloads(inNewtons)
foundoneachmachineareshowninthetablebelow.
TestingMachine
#1 #2 #3
1570 1890 1640
1750 1860 1760
1680 2390 2020
321
Chapter12
Thediagnosticplots1,2,and3recommendedinpart(c)ofsection12.1
werecarriedoutandshowednosignificantdiscrepancies.
a) Whatfurtherdiagnosticplotshouldbemadeinthiscase?Why?Ifthisplotis
notsatisfactory,howwillthataffectthesubsequentanalysis?
b) Assumingthisfurtherplotisalsosatisfactory,dothedataindicate(atthe5%
levelofsignificance)thatoneortwoofthemachinesgivehigherreadings
thanothers?
2. Twodeterminationsweremadeoftheviscositiesofeachofthreepolymer
solutions.Viscositiesweremeasuredatthesameflowrateinthesameinstru-
ment.Theorderofthetestswasdeterminedusingrandomnumbers.Theresults
wereasfollows.
Solution
#1 #2 #3
177 184 206
183 187 202
176 175 200
Thediagnosticplotsrecommendedinpart(c)ofsection12.1werecarried
outandshowednosignificantdiscrepancies.Canweconclude(atthe5%levelof
significance)thatthesolutionshavedifferentviscosities?
3. Achemicalengineerisstudyingtheeffectsoftemperatureandcatalystonthe
percentageofundesiredbyproductintheoutputofachemicalreactor.Ordersof
testingweredeterminedusingrandomnumbers.Percentageofthebyproductis
showninthetablebelow.
Temperature,C
Catalyst 140 150
#1 2.3 1.6
1.3 2.8
#2 3.6 3.0
3.4 3.8
Thediagnosticplotsrecommendedinpart(c)ofsection12.1werecarried
outandshowednosignificantdiscrepancies.Canweconclude(atthe5%levelof
significance)thatcatalystortemperatureortheirinteractionaffectsthepercent-
ageofbyproduct?
4. Astoragebatteryisbeingdesignedforuseatlowtemperatures.Twomaterials
havebeentestedattwotemperatures.Theordersoftestingweredetermined
usingrandomnumbers.Thelifeofeachbatteryinhoursisshowninthefollow-
ingtable.
322
IntroductiontoAnalysisofVariance
Material
Temperature
20C 35C
#1 90 92
119 86
#2 128 85
150 103
Thediagnosticplotsrecommendedinpart(c)ofsection12.1werecarried
outandshowednosignificantdiscrepancies.Canweconclude(atthe5%levelof
significance)thatmaterialortemperatureortheirinteractionaffectsthelifeof
thebattery?
5. Thecoppersulfidesolidsfromaunitofametallurgicalplantweresampledon
March3,March10,andMarch17.Halfofeachsamplewasdriedandanalyzed.
Theotherhalfofeachsamplewaswashedwithanexperimentalsolventand
filtered,thendriedandanalyzed.Theorderoftestingwasdeterminedusing
randomnumbers.Thedateofsamplingwastakenasablockingfactor.The
percentagecopperwasreportedasshowninthetablebelow.
March3 March10 March17
Unwashed 64.48 68.67 68.34
Washed 68.22 72.74 74.54
Thediagnosticplotsshowednosignificantdiscrepancies.Isthereevidenceat
the5%levelofsignificancethatwashingaffectsthepercentagecopper?
6. Atestsectioninafertilizerplantisusedtotestmodificationsintheprocess.A
processingunitfeedscontinuouslytothreefiltersinparallel.Achangeismade
intheprocessingunit.Asampleistakenfromthefiltercakeofeachfilter,both
beforeandafterthechange.Percentagemoistureisdeterminedforeachsample
inanorderdeterminedbyrandomnumbers,andresultsareshowninthetable
below.
Filter#1 Filter#2 Filter#3
Beforechange 2.14 2.31 2.32
Afterchange 1.51 1.83 1.8
Thediagnosticplotsshowednosignificantdiscrepancies.Takingthefilter
numberasablockingfactor,dothesedatagiveevidenceatthe5%levelof
significancethattheprocessingchangeaffectsthepercentagemoisture?
323
CHAPTER
13
Chi-squaredTestfor
FrequencyDistributions
Forthischapterthereadershouldhaveagoodunderstandingof
statisticalinferencefromChapter9,andofsections2.2,4.4,5.3,and5.4.
Thisisanothercaseinwhichwesetupanullhypothesisandthentestthestatistical
significanceofdisagreementwithit.Butnowweareconcernedwithfrequency
distributions.Wecompareobservedfrequencieswithcorrespondingexpectedfre-
quenciescalculatedonthebasisofanullhypothesiswithstatedtrialassumptions.
Thenwecalculateaquantitywhichsummarizesthedisagreementbetweenobserved
andexpectedfrequencies,andwetestwhetheritissolargethatitwouldnotlikely
occurbychance.
13.1 CalculationoftheChi-squaredFunction
Lettheobservedfrequencyforclassibeo
i
,andlettheexpectedfrequencyfor
thatsameclassbee
i
.Wemusthave

o
i


e
i
=totalfrequency.Thenwedefine
2
2

calculated


(
o
i
e
i
)
(13.1)
allclasses
e
i
Thisisavalueofarandomvariablehavingapproximatelyachi-squareddistribution,
andtheapproximationgenerallygetsbetterasthedatasetbecomeslarger.Thetheory
behindthatinvolvesboththenormalapproximationtothebinomialdistributionand
themathematicalrelationshipbetweenthenormaldistributionandthechi-squared
distribution.Noticethatthesummationinequation13.1mustextendoverallpos-
sibleclassesofaparticularsetratherthananyselectionofthem.
Topreventtheerrorofapproximationinusingthe
2
distributionfrombecoming
appreciable,eachexpectedvalueoffrequencyshouldbeatleast5.Thisissimilarto
theroughruleforthenormalapproximationtothebinomialdistribution,which
requiresthatbothnpand(n)(1p)begreaterthan5.Undersomeconditionsasmall
proportionoftheexpectedfrequenciesforthechi-squaredtestcanbelessthan5
withoutproducingseriouserror(seethebookbyBarneslistedinsection15.2).
However,aminimumexpectedfrequencyof5shouldbeappliedinsolvingproblems
inthisbook.Ifoneormoreexpectedfrequenciesarelessthan5,itmaybereason-
abletocombineadjacentcellsorclassestogetacombinedexpectedfrequencyofat
least5.Wewillseethatthisisdonefrequently.
324
Chi-squaredTestsforFrequencyDistributions
However,likeothertestsofsignificance,thechi-squaredtestforfrequencydistribu-
tionsbecomesmoresensitiveasthenumberofdegreesoffreedomincreases,andthat
increasesasthenumberofclassesincreases.Thus,weshouldmakethenumberof
classesaslargeaswecan,keepingtherequirementsofthelastparagraphinmind.
Thevalueof
2
calculated
canbecomparedwiththeoreticalvaluesof
2
forappropri-
atenumberofdegreesoffreedomandlevelofsignificance.The
2
distributiontobe
usedhereisthesameasthe
2
distributionintroducedinsection10.1forcomparing
asamplevariancewithapopulationvariance.Rememberthatispronounced
kigh,likehigh.Theshapeofthe
2
distributionisalwaysskewed;shapesof
distributionsforthreedifferentnumbersofdegreesoffreedomareshowninFigure
10.1.Sometabulatedvaluesof
2
canbefoundinTableA3inAppendixA.
IfacomputerwithExcelorsomealternativeisavailable,itcanbeusedinsteadof
atable.Probabilitiesforparticularvaluesof
2
canbefoundfromtheExcelfunction
CHIDIST.Theargumentstobeusedwiththisfunctionarethevalueof
2
andthe
numberofdegreesoffreedom.Thefunctionthenreturnstheupper-tailprobability.
Forexample,for
2
=11.07at5degreesoffreedom,wetypeinacellforawork
sheettheformula=CHIDIST(11.07,5),orelsefromtheFormulamenu,wechoose
PasteFunction,StatisticalFunctions,CHIDIST(,),thentypeintheargumentsand
choosetheOKbutton.Theresultis0.05000962,theprobabilityofobtainingavalue
of
2
greaterthan11.07completelybychance.Ifwehaveavalueoftheupper-tail
probabilityandthenumberofdegreesoffreedom,weusetheExcelfunction
CHIINVtofindthevalueof
2
.Again,thefunctioncanbechosenusingtheFormula
menuoritcanbetypedintoacell.Foranupper-tailprobabilityof0.05and5
degreesoffreedom,CHIINV(0.05,5)
gives11.0704826.
Upper-tailprobability=0.05
Ifthecalculatedvalueof
2
is
greaterthanthecorrespondingtabulated
orcomputervalueof
2
,thenullhypoth-
esismustberejectedatthelevelof
significanceequaltothestatedupper-tail
11.07
2
probability.Thechi-squaredtestfor
frequencydistributionsisalwaysaone-
Figure13.1:Upper-tailprobability
tailedorone-sidedtest. for Chi-squared Distribution
Ingeneral,thenumberofdegreesof
freedomforanystatisticaltestisequaltothenumberofindependentpiecesof
informationinthedata.Forthechi-squaredtestforfrequencydistributions,the
numberofdegreesoffreedomisthenumberofclassesorcellsusedinthecompari-
son,lessthenumberoflinearlyindependentrestrictionsplacedonthosedata.For
example,ifwemake100tossesofacoin,wehavetwoclassesorcells,thenumber
ofheadsandthenumberoftails,andonerestriction,thatthenumberofheadsand
thenumberoftailsmustaddupto100.Thenthenumberofdegreesoffreedomin
325
Chapter13
thiscaseis2classes1restriction=1degreeoffreedom.Wealwayshaveatleast
onerestriction,givenbythetotalfrequencyforallclassesorcells.
Insomecases,whichwewillencounterinsection13.3,therearefurtherrestric-
tions.Thisisbecauseoneormorestatisticalparameterssuchasameanorastandard
deviationareestimatedfromthedata.Eachcalculationofanestimatedparameter
fromthedatarepresentsanotherindependentrestrictionthatreducesthenumberof
degreesoffreedom.
Anotherwayoffindingthenumberofdegreesoffreedomistocountthenumber
ofclassesorcellstowhichfrequenciescouldbeassignedarbitrarilywithoutchang-
ingtotalfrequenciesofanykind,andsubtractthenumberofparameters(ifany)
whichhavebeendeterminedfromthedata.Thisisoftenthemostpracticalapproach.
Ifthenumberofdegreesoffreedomis1,weshouldapplyacorrectionfor
continuity(calledtheYatescorrection).Thiscorrectionforcontinuityissimilartothe
oneusedforanormalapproximationtoadiscretedistribution.However,thatwillbe
omittedfromthisbook.ItisdiscussedinthebookbyWalpoleandMyers(see
referenceinsection15.2)andotherreferences.
Thechi-squaredtestforfrequencydistributionsappearsinvariousformsdepend-
ingonjustwhattrialassumptionsareusedtogivenullhypotheses.Ineachcasethe
expectedfrequencyforanyclassorcellistheproductoftwoquantities:thetotal
frequencyforallclassesandtheprobabilitythatarandomlychosenitemwillfallin
thatparticularclass.
13.2 CaseofEqualProbabilities
Ifitisreasonabletomakethetrialassumptionthatalltheclassesorcellsareequally
probable,wecaneasilycalculatetheexpectedfrequenciesforthecorrespondingnull
hypothesis.
Example13.1
Adiewastossed120timeswiththeobservedfrequenciesshownbelow.Testwhether
thedieshowsevidenceofbiasatthe5%levelofsignificance.
Result 1 2 3 4 5 6
Observedfrequency 12 25 28 14 15 26
Answer:
Ifthereisnobias,alltheresultsareequallylikely.
H
0
:Pr[1]=Pr[2]=Pr[3]=Pr[4]=Pr[5]=Pr[6]
H
a
:Notalltheresultsareequallylikely.
326
Chi-squaredTestsforFrequencyDistributions
Onthebasisofthenullhypothesis,theprobabilityofeachofthesixpossible
1 120
resultsis ,sotheexpectedfrequencyofeachresultis =20.Thenwehave
6 6
Wehave6classesorcells,andwehave1restriction,thatthesumofthefrequen-
ciesmustbeequaltothetotalnumberoftosses.Thenthenumberofdegreesof
freedomisgivenby
no.ofclassesorcellsno.ofrestrictions=61=5.
FromTableA3for5d.f.and0.05upper-
tailprobability,
2
limit
=11.07,orfrom
Upper-tailprobability=0.05
ExcelCHIINV(0.05,5)=11.0704826
(quotingallthedigitsfromExcel).

Thecalculatedandlimitingvaluesof
2
arecomparedinFigure13.2.
Since
11.07
2
2
calculated
>
2
limit
,werejectthe 12.50
nullhypothesis.
Figure13.2:ComparisonofCalculated
Thenthereisevidenceatthe5%levelof andLimitingValuesof
2
significancethatthedieisbiased.
13.3 GoodnessofFit
Wecanusethechi-squaredtestforfrequencydistributionstocompareexperimental
frequencieswiththefrequenciesthatwouldbeexpectedifanassumedprobability
distributionapplies.Arethedifferencesbetweenobservedandexpectedfrequencies
smallenoughsothatwecansaythattheycouldreasonablybedueonlytochance,or
aretheytoolargeforthatinterpretation?Wecalculatetheexpectedclassfrequencies
onthebasisoftheassumedprobabilitydistribution,thenusethechi-squaredtestto
judgethesignificanceofthedifferences.Thatisessentiallywhatwedidforavery
simpleprobabilitydistributioninsection13.2,butnowwewillusethatapproachfor
other,morecomplexdistributions,suchasthebinomial,Poisson,andnormaldistri-
butions,orgenerallyforanycasewhereprobabilitiesforvariouscategoriesare
known.Iftheassumedprobabilitydistributioninvolvesparametersthatareestimated
fromthedata,eachestimatedparameterwillcorrespondtoafurtherrestriction,and
thatwillhavetobetakenintoaccountindeterminingthenumberofdegreesof
freedom.
327
Chapter13
Weshouldnotethatothertestsarealsousedfrequentlyfortestsofgoodnessof
fitandmayhaveadvantagesinsomecases.Inparticular,theKolmogorov-Smirnov
andAnderson-Darlingtestsaresaidtobebetterforsmallsamples.Seethebookby
Johnson(referencegiveninsection15.2).
Example13.2
InExample4.2andTable4.5wehaddataonthethicknessesof121metalpartsofan
opticalinstrument.ThehistogramforthesedatawasshowninFigure4.4,andwe
sawlaterthatitsshapewassimilartotheshapewemightexpectforanormalfre-
quencydistribution.InExample7.9weplottedthedataonnormalprobabilitypaper
andfoundgoodagreement.Nowwewilltestthedataforgoodnessoffittoanormal
distributionat5%levelofsignificance.
Answer:ThemeanandstandarddeviationwereestimatedfromthedataofExample
4.2tobe x =3.369mmands=0.0629mm.ThesewereusedinExample7.6to
calculateexpectedfrequenciesforthevariousclassintervalsaccordingtothenormal
distribution,andtheseexpectedfrequencieswerecomparedtotheobservedfrequen-
cies.Thecomparisonisshowninthetablebelow:
Table13.1:ExpectedandObservedClassFrequencies
LowerClass UpperClass ExpectedClass ObservedClass
Boundary,mm Boundary,mm Frequency Frequency
3.195 0.3 0
3.195 3.245 2.6 2
3.245 3.295 11.4 14
3.295 3.345 28.2 24
3.345 3.395 37.2 46
3.395 3.445 27.6 22
3.445 3.495 10.9 10
3.495 3.545 2.4 2
3.545 3.595 0.3 1
3.595 0.0 0
Beforewecanapplythechi-squaredtestforfrequencydistributionstothesedata,
someadjacentclasseshavetobecombinedsothattheexpectedfrequencyforeach
revisedcellisatleast5.Thus,thefirstthreeclassesarecombinedtogiveacellwith
expectedcellfrequency14.3andobservedcellfrequency16,andthelastfourclasses
arecombinedtogiveacellwithexpectedcellfrequency13.6andobservedcell
frequency13.Thatleavesuswith1023=5cellsorclasses.
328
Chi-squaredTestsforFrequencyDistributions
Nowwecancalculate
=4.07
Wehave H
0
:probabilitiesforthevariouscellsaregivenbythe
normaldistribution
and H
a
:otherfactorsaffectprobabilities.
Thenumberofcellsis5,thetotalexpectedfrequencyhasbeenmadeequaltothe
totalobservedfrequency,andwehavetwostatisticalparameters,and,which
havebeendeterminedfromthedata.Thenthenumberofdegreesoffreedomis51
2=2.For2degreesoffreedomand0.05levelofsignificance,TableA3orthe
2 2 2 2
ExcelfunctionCHIINVgives
critical
or
limit
=5.99. Since
calculated
<
limit
,wehave
noreasontorejectthenullhypothesis.Theobservedfrequencydistributionseemsto
beconsistentwithanormaldistribution.
Example13.3
InExample5.15,aPoissondistributionwasfittedtodataforthenumbersofcars
crossingabridgeinfortysuccessive6-minuteintervalsoftime.Thesamplemean
wascalculatedfromthedatatobe x =2.875,andthisvaluewasusedasanestimate
ofthepopulationmean,,forcalculationofPoissonprobabilities.Thecomparison
offrequenciesforvariousvaluesofthenumberscounted,x,isasfollows:
x ObservedFrequency ExpectedFrequency
0 2 2.26
1 7 6.49
2 10 9.33
3 8 8.94
4 6 6.42
5 3 3.69
6 3 1.77
7 1 0.73
8 0 0.38
Isthegoodnessoffitsatisfactoryatthe5%levelofsignificance?
329
Chapter13
Answer: Tomaketheminimumexpectedfrequencyineachcellatleast5,thefirst
twocellsshouldbecombined,andalsothelastfour.Forcountsof0or1,theob-
servedfrequencybecomes9,andtheexpectedfrequencybecomes8.75.Forcounts
of5ormore,theobservedfrequencybecomes7,andtheexpectedfrequencybe-
comes6.57.Afterthismodification,thenewnumberofcellsis913=5.
Nowwearereadytoapplythechi-squaredtest.
H
0
:TheobservedfrequencydistributionisconsistentwithaPoissondistribution.
H
a
:ThefrequencydistributionisnotadequatelyfittedbyaPoissondistribution.
=0.007+0.048+0.099+0.027+0.028
=0.21
Wehave5cells,andtherearetworestrictions,forthetotalfrequencyandestima-
tionoffromthedata.Thenthereare52=3degreesoffreedom.FromTableA3
ortheExcelfunctionCHIINVwefindfor0.05uppertailprobabilityand3degrees
2 2 2
offreedom,
critical
=7.81.Since
calculated
<<
critical
thereisnoindicationatallthatthe
fitisnotgoodenough.
Infactthefitistoogood.Youmay
rememberfromsection10.1.1thatfor Upper-tailprobability=0.95
anynumberofdegreesoffreedom,the
meanofthechi-squareddistributionis
Upper-tailprobability =0.05
equaltothenumberofdegreesof
2
freedom.Inthiscase
calculated
issmaller
thanthenumberofdegreesoffreedom
andsosmallerthanthemeanofthe
0.21
distribution.For3degreesoffreedom
0.35
7.81
2
and0.95uppertailprobability,soatthe
otherendofthedistribution,TableA3 Figure13.3:ComparisonofCalculated
2
gives
critical
=0.35.Thenthevalueof
andLimitingValuesof
2
2 2

calculated
isevenlessthan
critical
foran
upper-tailprobabilityof0.95.SeeFigure13.3.
2
Thereislessthan5%probabilityofgettingbychanceavalueof
calculated
smaller
thanthereportedvalue.Thisindicatesthatthereporteddataaretoogoodtobetrue
andmaysuggestthattheywereconcoctedratherthanhonestlyobserved.
330
Chi-squaredTestsforFrequencyDistributions
13.4 ContingencyTables
Acontingencytableinvolvestwodifferentfactorsinmorethanonerowandmore
thanonecolumn,givingatwo-dimensionalarray.Bothfactorsareusuallyqualita-
tive.Weusethechi-squareddistributiontotestthesetwofactorsforindependence:
doesoneofthefactorsaffecttheother?oraretheyoperatingindependently?Ifthe
factorsareindependent,thenthesimpleformofthemultiplicationruleapplies
accordingtoequation2.2:theprobabilityofaparticularleveloffactorAanda
particularleveloffactorBissimplytheproductoftheprobabilityofthatlevelof
factorAandtheprobabilityofthatleveloffactorB.Thebestestimatewecanmake
oftheprobabilityofaparticularlevelofeitherfactoristhetotalnumberofoutcomes
whichoccuratthatlevel,dividedbythetotalfrequencyforthissetofdata.Onthat
basistheexpectedfrequencyforlevelioffactorAandleveljoffactorBisgivenby
Pr[levelioffactorAleveljoffactorB]totalfrequency=
=Pr[levelioffactorA]Pr[leveljoffactorB]totalfrequency
totalfrequency
Thetotalnumbersatparticularlevelsareusuallyspokenofascolumntotalsand
rowtotals,andthetotalfrequencyforallconditionsiscalledthegrandtotal.Then
theexpectedfrequencyforlevelioffactorAandleveljoffactorBisgivenby
(
rowtotal
)(
columntotal
)
grandtotal
.
Thisrelationshipfortheexpectedfrequencyappliesbothforthecasewhereall
thetotalnumbersatparticularlevelsarerandomvariablesandforthecasewhere
sometotalnumbersatparticularlevels(eitherforcolumnsorforrows,notboth)are
fixedatchosenvalues.Thus,inExample13.4below,thetotalfrequencyforeach
shiftisfixedat300.
Example13.4
Theobservednumbersofdaysonwhichaccidentsoccurredinafactoryonthree
successiveshiftsoveratotalof300daysareasshownbelow.Thenumbersofdays
withoutaccidentsforeachshift wereobtainedbysubtraction.
331
Chapter13
Shift DaysWithAccidents DaysWithoutAccidents Total
A 1 299 300
B 7 293 300
C 7 293 300
Total 15 885 900
Totalsforallrowsandallcolumnshavebeencalculated.Isthedifferenceinnumber
ofdayswithaccidentsbetweendifferentshiftsstatisticallysignificant?Thatis,is
thereevidencethattheprobabilityofaccidentsdependsontheshift?Usethe5%
levelofsignificance.
Answer:
H
0
:Thenumbersofdayswithaccidentsareindependentoftheshift.
H
a
:Someshiftshavegreaterprobabilityofaccidentsthanothers.
Theanalysiswillusethechi-squaredtestforfrequencydistributionwith

2
2 i


(
o e
i
)
.
allclasses
e
i
Theexpectedfrequenciesarefoundusingthenullhypothesisandthecolumnand
rowtotals.Overall,thebestestimateoftheprobabilitythattherewillbeatleastone
15
accidentonarandomlychosenshiftanddayis ,andthebestestimateofthe
900
885
probabilityofnoaccidentonarandomlychosenshiftanddayis
900
.(Withthese
figurestheprobabilityofmorethanoneaccidentonanyparticularshiftanddayis
smallenoughtobeneglected.)Similarly,theprobabilitythatanyrandomlychosen
300
shiftisAshift(orBshift,orCshift)is
900
.Onthebasisofthenullhypothesisthe
expectednumberofdayswithaccidentsonAshiftorBshiftorCshiftisthen
j 15 \j 300\
(
(
900
)
5,or
(
rowtotal
)(
columntotal
) ( )(
300
)
15
, (,
5.Similarly,the
(
900
,(
900
,
grandtotal 900
expectednumberofdayswithoutaccidentsonAshiftorBshiftorCshiftis
(
rowtotal
)(
columntotal
)

(
885
)(
300
)
295. Weusetheseexpectedfrequencies
grandtotal 900
2
andthecorrespondingobservedfrequenciestofind
calculated
.
332
Chi-squaredTestsforFrequencyDistributions
Wehave6classesorcells.Therestrictionsarethetotalsforeachshift,thetotal
numberofaccidents,andthetotalnumberofdayswithoutaccidents,buttheseare
notalllinearlyindependent.Thenumberofdegreesoffreedomforacontingency
tableisbestfoundasthenumberofclassfrequencieswhichcouldbevariedarbi-
trarilywithoutchanginganyoftheroworcolumntotals.Inthisproblemthatnumber
ofdegreesoffreedomis2,since2cellfrequenciescouldbevariedarbitrarilywithout
changingthetotals.Wecanseethatbyremovingtheindividualclassfrequencies
fromthecontingencytable,thenmarkingxsinsomecellsuntilnomorecouldbe
variedwithoutaffectingsomeofthetotals:
Shift DaysWithAccidents DaysWithoutAccidents Total
A x
1
300
B x
2
300
C 300
Total 15 885 900
Forinstance,ifthenumbersofaccidentsforshiftsAandBarefixed,thenthevalues
inalltheothercellsaredeterminedbythetotals.Thereforethenumberofdegreesof
freedominthisproblemmustbe2.
Upper-tailprobability=0.05
FromTableA3ortheExcel
functionCHIINV,forupper-tail
probability0.05and2degreesof
2
freedom,
critical
=5.991.Thisis
shownonFigure13.4.
5.991
2
Since4.88<5.991,thecalculated
4.88
valueof
2
isnotsignificantatthe5% Figure13.4:Calculatedand
levelofsignificance.Thereforewe
CriticalValuesofChi-squared
haveinsufficientevidencetorejectthe
nullhypothesis.Wecouldgathermoreinformation.Iffurtherdatacontinuetoshow
moreaccidentsonBandCshiftsthanonAshift,alateranalysismightwellshowa
significantvalueof
2
.
Example13.5
Resultsofastudyoftherepairrecordsofthreemodelsofcarsoverthefirstthree
yearsofthecarslivesonthebasisofasampleareshownbelow.
PercentagesRequiring
CarModel
A 60 20 50 30
B 30 40 40 20
C 40 30 60 10
NumberSurveyed MajorRepairs MinorRepairs NoRepairs
333
Chapter13
Testthehypothesisthatallmodelsperformequallywell,soprobabilitiesare
independentofthemodel.Thelevelofsignificancetobeusedwillbe5%.
Answer:Beforewecanapplythechi-squaredtestwehavetoconvertpercentagesto
observedfrequenciesandfindcolumnandrowtotals.
ObservedFrequencies
CarModel MajorRepairs MinorRepairs NoRepairs Total
A 12 30 18 60
B 12 12 6 30
C 12 24 4 40
Total 36 66 28 130
Thecorrespondingexpectedfrequenciesarecalculatedusingthenullhypothesis.
H
0
:Probabilitiesforrepairareindependentofthemodel.
H
a
:Atleastonemodelhasdifferentprobabilitiesofrepair.
j rowtotal \
Expectedfrequenciesarethencalculatedusingtotals:
(
,
grandtotal
,
( (columntotal).
36
Expectedfrequenciesareshowninthetablebelow.
(
130
,
Forexample,expectedfrequencyofmajorrepairsformodelAis
j
,
60 \
(
( )
16.6.
ExpectedFrequencies
CarModel MajorRepairs MinorRepairs NoRepairs Total
A 16.6 30.5 12.9 60
B 8.3 15.2 6.5 30
C 11.1 20.3 8.6 40
Total 36 66 28 130
Then
2 2 2
(
128.3
) (
1215.2
) (
66.5
)
+ + +
8.3 15.2 6.5
2 2 2
(
1211.1
) (
2420.3
) (
48.6
)
+ + +
11.1 20.3 8.6
8.87
334
2
Chi-squaredTestsforFrequencyDistributions
Thenumberofdegreesoffreedomisthenumberofclassfrequencieswhich
couldbechangedarbitrarilywithoutchanginganyofthetotalsforrowsandcol-
umns.Ifthefrequenciesof,say,majorrepairsforModelsAandBandminorrepairs
forModelsAandBarechosenarbitrarily,allotherfrequenciesarefixedifthetotals
aretostaythesame.Thenthenumberofdegreesoffreedomis4.
(Thislastcalculationcanbereducedtoasimpleformula,butthereaderwill
obtainbetterunderstandingduringthelearningprocessbyreasoningfromthe
underlyingideas,aswehavedonehere.Theformulafornumberofdegreesof
freedomforcontingencytablescanbefoundinanumberofreferencebooks,includ-
ingthebookbyWalpoleandMyersforwhichacitationisgiveninsection15.2.)
FromTableA3ortheExcelfunctionCHIINVatthe0.05levelofsignificance,
2 2

critical
=9.488.Since
calculated
<
critical
,wecannotrejectthenullhypothesis.Wedo
nothavesufficientevidencetosaythatprobabilitiesofthevariouscategoriesof
repairsdependonthemodel.
Problems
1. Numbersofpeopleenteringacommercialbuildingbyeachoffourentrancesare
observed.Theresultingsampleisasfollows:
Entrance 1 2 3 4
No.ofPeople 49 36 24 41
a) Testthehypothesisthatallfourentrancesareusedequally.Usethe0.05level
ofsignificance.
b) Entrances1and2areonasubwaylevelwhile3and4areongroundlevel.
Testthehypothesisthatsubwayandground-levelentrancesareusedequally
often.Useagainthe0.05levelofsignificance.
2. Twodicearerolled100timesandtheresultsaretabulatedbelowaccordingtothe
specifiedcategories:
Valueofroll 2to4 5or6 7 8or9 10to12
No.ofrolls 21 21 18 28 12
Atthe5%levelofsignificance,canwesaythatthediceareunbiased?
3. Arobot-operatedassemblylineisdevelopedtoproducearangeofnewproducts,
whicharecolor-codedblack,white,redandgreen.Theassemblylineispro-
grammedtoproduce11.76%black,29.41%white,7.06%redand51.76%green
items.Asampleof180itemswastakenandthefollowingdistributionwas
observed:
Color Black White Red Green
Frequency 26 43 15 96
a) Canyouconcludeatthe5%levelofsignificancethattheassemblyline
needsadjustment?
335
Chapter13
b) Whatisthelowestlevelofsignificanceatwhichyoucouldconcludethatthe
systemneedsadjustment?
4. Whenfourpenniesweretossed160times,thefrequenciesofoccurrenceof0,1,
2,3and4headswere9,48,53,44and6,respectively.Isthereevidenceatthe
5%levelofsignificancethatthecoinsarenotfair?
5. Considertheaveragedailyyieldsofcokefromcoalinacokeovenplantsumma-
rizedbythegroupedfrequencydistributionshownbelow.
LowerBound UpperBound ClassMidpoint Frequency
67.95 68.95 68.45 1
68.95 69.95 69.45 8
69.95 70.95 70.45 22
70.95 71.95 71.45 22
71.95 72.95 72.45 9
72.95 73.95 73.45 8
73.95 74.95 74.45 2
Theestimatedmeanandstandarddeviationfromthedataare71.25and1.2775,
respectively.Isthefrequencydistributiongivenabovesignificantlydifferentfrom
anormaldistributionatthe5%levelofsignificance?
6. Considerthehourlylaborcosts(indollars)forarandomsampleofsmallcon-
structionprojectssummarizedinthefrequencytablebelow.
LowerBound UpperBound ClassMidpoint Frequency
18.505 19.505 19.005 6
19.505 20.505 20.005 24
20.505 21.505 21.005 17
21.505 22.505 22.005 16
22.505 23.505 23.005 7
23.505 24.505 24.005 3
24.505 25.505 25.005 2
Themeanandstandarddeviationestimatedfromthesedataare$21.15and
$1.42,respectively.Aretheabovedatasignificantlydifferentfromanormal
distribution?Use.05levelofsignificance.
7. Scoresmadeinthefinalexambyanelementarystatisticssectioncanbesumma-
rizedinthefollowinggroupedfrequencydistribution:
ClassNo. ClassMidpoint Frequency
1 14.5 3
2 24.5 2
3 34.5 3
336
Chi-squaredTestsforFrequencyDistributions
4 44.5 4
5 54.5 5
6 64.5 11
7 74.5 14
8 84.5 14
9 94.5 4
Themeanandstandarddeviationcalculatedfromthesedataare65.48and
20.957,respectively.Ata5%levelofsignificancedotheabovedatadifferfroma
NormalDistribution?
8. Acompanyhassetupaproductionlineforcansofcarrots.Thenumbersof
breakdownsontheproductionlineover49shiftsaresummarizedasfollows:
No.ofbreakdownsinoneshift No.ofshifts
0 18
1 12
2 8
3 6
4 3
5 2
>5 0
IsthisdistributionsignificantlydifferentfromaPoissondistribution?Usethe5%
levelofsignificance.
9. Asectionofanoilfieldhasbeendividedinto48equalsub-areas.Countingthe
oilwellsinthe48sub-areasgivesthefollowingfrequencydistribution:
Numberofoilwells 0 1 2 3 4 5 6 7
Frequency 5 10 11 10 6 4 0 2
FittingthedatatoaPoissonDistributiongivesthefollowingestimatedfrequencies:
3.94 9.85 12.31 10.25 6.41 3.21 1.34 0.47
Testatthe5%levelofsignificancethenullhypothesisthatthedatafitaPoisson
distribution.
10. Astudyoffourblockfacescontaining52one-hourparkingspaceswascarried
out.Frequenciesofvacantspaceswereasfollows:
No.ofvacantparkingspaces 0 1 2 3 4 5 >6
Observedfrequency 30 45 20 15 7 3 0
Fromthesedatathemeannumberofvacantspaceswascalculatedtobe1.442.Atthe
5%levelofsignificance,canyouconcludethatthedistributionofvacantone-hour
parkingspacesfollowsaPoissondistribution?
337
Chapter13
11. Thenumberofweedsineach10m
2
squareoflawnwasrecordedbyateamof
second-yearstudentsforarandomsampleof220lawns.
Numberofweedsper10m
2
Frequency
0 19
1 44
2 68
3 48
4 18
5 7
6 6
>6 10
a) Atthe5%levelofsignificance,isthisdistributionsignificantlydifferent
fromaPoissonDistribution?
b) Isthereanyreasontosuggestthatthedatamaynothavebeenreported
honestly?
12. Afactorybuysrawmaterialfromthreesuppliers.Allrawmaterialsaremadeinto
productsbythesameworkersusingthesamemachines.Anengineerthinksthere
isadifferenceinthelikelihoodofdefectsinproductsmadefromrawmaterials
fromdifferentsuppliersandcollectsthefollowinginformation.
SourceofRawMaterials
SmithCo. JonesCo. RobertsCo.
No.ofdefectiveproducts 11 5 4
No.ofsatisfactoryproducts 54 71 62
Isthereevidenceatthe5%levelofsignificancethatthediscrepanciesarenotdue
tochance?
13. Aparticulartypeofsmallfarmmachineryisproducedbyfourdifferentcompa-
nies.Theproportionsofmachinesrequiringrepairsinthefirstyearaftersaleto
thefarmersareasfollows:
Company TotalNumberofMachines ProportionRequiringRepairs
A 145 0.1034
B 140 0.0429
C 120 0.0333
D 105 0.1143
Isthedistributionregardingrequiringandnotrequiringrepairsindependentof
thecompany?Usethe1%levelofsignificance.
338
Chi-squaredTestsforFrequencyDistributions
14.Anindustrialengineercollecteddataonthefrequencyandseverityofaccidentsin
theminingindustryandsummarizedherfindingsasfollows:
DaysofWeek
Severityof Monday& Tuesday& Wednesday Total
Accident Friday Thursday
Severe 22 9 4 35
Minor 283 254 128 665
Total 305 263 132 700
a) Canyouconcludeatthe5%levelofsignificancethattheseverityofacci-
dentsisindependentofthedayoftheweek?
b) Whatisthelowestlevelofsignificanceatwhichyoucouldconcludethatthe
frequencyofsevereaccidentsdependsuponthedayoftheweek?
15. Intestingthenullhypothesisthatthelevelofheavyequipmentusageandthe
ownersmaintenancepolicyareindependentvariables,amechanicalengineer
receivedrepliestoherquestionnairefromarandomsampleofusers.Thefollow-
ingsummaryapplies:
MaintenancePolicy
EquipmentUsage ByCalendar ByHoursofOperation AsRequired Total
Light 12 8 13 33
Moderate 7 15 22 44
Heavy 3 22 15 40
Total 22 45 50 117
Atthe1%levelofsignificance,shouldtheengineerrejectthenullhypothesis?
16. Thefollowingdatahavebeenobtainedbyanautomotiveengineerinterestedin
estimatingownerpreferences.Fromasampleof163automobilesthefollowing
dataonenginesizeandtransmissiontypewereobtained.
EngineSize
Transmission small medium large
4-speed 34 19 12
5-speed 24 28 5
Automatic 7 12 22
a) Hewishestotestthenullhypothesisthattransmissiontypeandenginesize
chosenbythecar-owningpopulationareindependent.Usinga5%levelof
significance,dotheabovedatasupportthishypothesis?
b) InallofCanada,statisticsforcarsequippedwithautomatictransmissions
showthat21%havesmallengines,23%havemediumsizeenginesandthe
remainderhavelargeengines.Arethedataintheabovetableconsistentwith
theCanadianstatistics?
339
Chapter13
17. Thetreadlifeofaparticularbrandoftirewasevaluatedbyrecordingkilometers
traveledbeforewearoutforarandomsampleof500cars.Thecarswereclassi-
fiedassubcompacts,compacts,intermediates,andfull-sizecars.Thegrouped
frequencydistributionisshowninthefollowingtable.
Treadwear,km ClassofCar
LowerBound UpperBound Subcompact Compact Intermediate FullSize
0 30,000 26 55 46 23
30,001 60,000 95 171 99 55
60,001 90,000 120 205 115 60
Atthe1%levelofsignificance,canyouconcludethattreadwearandclassofcar
areindependent?
18. Fouralternativemethodsofloadingamachinearetriedtoseewhetherthe
loadingmethodhasanyeffectonthelikelihoodthatcycleswillendinstop-
pages.Theresultsareasfollows:
Methodofloading A B C D
Observedfrequencyofcycleswithstoppages 8 4 9 3
Observedfrequencyofcycleswithoutstoppages 10 16 12 18
Usethechi-squaredtestforfrequenciestoseewhetherthesedatashowasignifi-
canteffectofthemethodofloadingontheprobabilityofastoppage.Usethe5%
levelofsignificance.
340
CHAPTER
14
RegressionandCorrelation
Forthischapterthereadershouldhaveagoodunderstanding
ofthematerialinsections3.1and3.2andinChapter9.
Inpreviouschapterswehaveinvestigatedfrequencydistributions,probabilitydistri-
butions,andcentralvaluessuchasmeans,allatfixedvaluesoftheindependent
variables.Nowwewanttoseehowthedistributionsandmeanschangeasoneor
moreindependentvariableschange.Wewilllookatsamplesofdatatakenovera
rangeofanindependentvariableorvariablesandusethosedatatoobtaininforma-
tionregardingtherelationbetweenthedependentandindependentvariables.
Inasimplecasewehaveonlyoneindependentvariable,x,andonedependent
variable,y.Regressionanalysisassumesthatthereisnoerrorintheindependent
variable,butthereisrandomerrorinthedependentvariable.Thus,alltheerrorsdue
tomeasurementandtoapproximationsinthemodelingequationsappearinthe
dependentvariable,y.Ifotherindependentvariableshaveaneffectbutarekeptonly
approximatelyconstant,effectsoftheirvariationmayinflatetheerrorsinthedepen-
dentvariable.Insomecasesotherindependentvariablesmaybevaryingappreciably
andmayaffectthedependentvariable,buttheeffectofachosenindependentvariable
maybeexaminedbyitself,asthoughitweretheonlyindependentvariable,toobtain
apreliminaryindicationofitseffect.Inanyexampleofregression,theexpectationor
expectedvalueofyvariesasafunctionofx,anderrorscausemeasuredvaluesofyto
deviatefromtheexpectedvalueofyatanyparticularvalueofx.Ifthereareseveral
measuredvaluesofyatonevalueofx,themeanofthemeasuredvaluesofywill
giveanapproximationoftheexpectedvalueofyatthatvalueofx.
Engineersoftenencountersituationswhereanindependentvariableaffectsthe
valueofadependentvariable,anderrorsofmeasurementproducerandomfluctua-
tionsabouttheexpectedvalues.Thus,changeinstressproduceschangeinstrainplus
variationinmeasuredstrainduetoerror.Theoutputofastirredchemicalreactor
changesasthetemperaturewithinthereactorvarieswithtime,andthemeasured
concentrationofanycomponentintheoutputshowsanadditionalvariationcaused
byerror.Thepowerproducedbyanelectricmotorchangeswithvariationofthe
inputvoltage,andmeasurementsofoutputincludemeasuringerrors.
341
Chapter14
Correlationinvolvesadifferentapproachandadifferentsetofassumptionsbut
someofthesamequantities.Thosewillbediscussedinsection14.6.
Themethodsofregressionareusedtosummarizesetsofdatainausefulform.
Thevaluesofxandyandanyotherquantitiesarealreadyknownfrommeasurements
andarethereforefixed,soitisnotquiterighttospeakoftheminthisdevelopmentas
variables.Thetruevariableswillbethecoefficientsthatareadjustedtogivethebest
fit.Therefore,insections14.1to14.5wewillrefertoxandtheotherindependent
piecesofdataasinputsorregressors.Aquantitysuchasy,whichisafunctionofthe
inputs,willbecalledaresponse.
14.1 SimpleLinearRegression
Thesimplestsituationisalinearorstraight-linerelationbetweenasingleinputand
theresponse.Saytheinputandresponsearexandy,respectively.Forthissimple
situationthemeanoftheprobabilitydistributionis
( )
+ x (14.1) E Y
where and areconstantparametersthatwewanttoestimate.Theyareoften
calledregressioncoefficients.Fromasampleconsistingofnpairsofdata(x
i
,y
i
),we
calculateestimates,afor andbfor.Ifatx=x
i
, y

i
istheestimatedvalueofE(Y),
wehavethefittedregressionline
y + b x
i
(14.2) a
i
wherethehaton y indicatesthatthisisanestimatedvalue.
(a) MethodofLeastSquares
Theproblemnowistodetermineaandbtogivethebestfitwiththesampledata.If
thepointsgivenby(x
i
,y
i
)areclosetoaperfectstraightline,itmightbesatisfactory
toplotthepointsanddrawthelinebyeye.However,forthepresentanalysisweneed
asystematicrecipeoralgorithm.Thereadermayrememberfromsection3.2(g)that
thesumofsquaresofdeviationsfromthemeanofasampleislessthanthesumof
squaresofdeviationsfromanyotherconstantvalue.Wecanadaptthatrequirementto
thepresentcaseasfollows.Let e
i
y
bethedeviationinthey-directionofany y
i i
datapointfromthefittedregressionline.Thentheestimatesaandbarechosenso
2
thatthesumofthesquaresofdeviationsofallthepoints,

e ,issmallerthanfor
i
alli
2
2
anyotherchoiceofaandb.Thus,aandbarechosensothat

e
i

(y
i
y
i
)
has
alli alli
aminimumvalue.Thisiscalledthemethodofleastsquaresandtheresultingequa-
tioniscalledtheregressionlineofyonx,whereyistheresponseandxistheinput.
SaythepointsareasshowninFigure14.1.Thisiscalledascatterplotforthe
data.Wecanseethatthepointsseemtoroughlyfollowastraightline,butthereare
appreciabledeviationsfromanystraightlinethatmightbedrawnthroughthepoints.
342
RegressionandCorrelation
20
y
15
Figure14.1:
PointsforRegression
10
5
0
0 4 8 12
x
Nowletusconsiderthemethodofleastsquaresinmoredetail.Ifthepointsor
pairsofvaluesare(x
i
,y
i
)andtheestimatedequationofthelineistakentobe
y =a+bx,thentheerrorsordeviationsfromthelineinthey-directionare
e
i
=[y
i
(a+bx
i
)].Thesedeviationsareoftencalledresiduals,thevariationsiny
2
thatarenotexplainedbyregression.Thesquaresofthedeviationsaree
i
=
[y
i
(a+bx
i
)]
2
,andthesumofthesquaresofthedeviationsforallnpointsis
n n
2

e
i

,y
i

(
a+ bx
i
)
]
]
2
.Thissumofthesquaresofthedeviationsorerrorsor
i1 i1
residualsforallnpointsisabbreviatedasSSE.
n
2
ThequantitywewanttominimizeinthiscaseisSSE=
e
i =
n
2
n
i1

(
y
i
y
i
)

,y
(
a+ bx
i
)
]
]
2
i .Rememberthatthenvaluesofxandthen
i1 i1
valuesofycomefromobservationsandsoarenowallfixedandnotsubjectto
variation.WewillminimizeSSEbyvaryingaandb,soaandbbecometheindepen-
dentvariablesatthispointintheanalysis.Youshouldrememberfromcalculusthat
tominimizeaquantitywetakethederivativewithrespecttotheindependentvariable
andsetitequaltozero.Inthiscasetherearetwoindependentvariables,aandb,so
wetakepartialderivativeswithrespecttoeachofthemandsetthederivativesequal
tozero.Omittingsomeofthealgebrawehave
n n n
2 j \
(
SSE
)

,y
(
a+ bx
)
] 2
,
y n a b

x
i (
0
a a
i
i1
i i
]
( i1 i1 ,
and
n n n n
2 j
2
\
(
SSE
)

,y
(
a+ bx
)
] 2
,
x y a

x b

x
(
0.
b b
i i i i
i1
i i
]
( i1 i1 i1 ,
343
Chapter14
Thesearecalledtheleastsquaresequations(ornormalequations)forestimatingthe
coefficients,aandb.Theright-handequalitiesofthesetwoequationsgiveequations
thatarelinearinthecoefficientsaandb,sotheycanbesolvedsimultaneously.The
resultsare
n n n
\j \

x y
1j
,
x
i (,
y
i (
b
i i
i1
n
( i1 ,( i1 ,
n n
(14.3)

x
i
2

1
,
j

x
i (
\
2
i1
n
( i1 ,
n

(
x
i
x
)(
y y
)

i
i1
n

(
x
i
x
)
2 (14.3a)
i1
and
n n

y b

x
a
i

i
i1 i1
y bx
(14.4)
n
Thetwoformsofequation14.3forbareequivalent,ascanbeshowneasily.The
firstformisusuallyusedforcalculations.Thesecondform,equation14.3a,is
preferredwhenroundingerrorsincalculationsmaybecomeappreciable.Thesecond
formindicatesthatthenumeratoristhesumofcertainproductsandthedenominator
isthesumofsimilarsquares.
Thesesumsofproductsandsquaresareusedrepeatedlyandsoshouldbedefined
n

2
atthispoint.Thequantity

(
x x
) issometimescalledthesumofsquaresforx
i
i1 n

2
andabbreviatedS
xx
.Similarly,thequantity

(
y y
) issometimescalledthesumof
i
i1 n
squaresforyandabbreviatedS
yy
,andthequantity

(
x x
)(
y
i
y
) issometimes
i
i1
calledthesumofproductsforxandyandabbreviatedS
xy
.Thenwehave
n n n
2
2
S

(
x
i
x
)

x
i

1
,
j

x
\
2
xx i (
(14.5)
i1 i1
n
( i1 ,
n n n
2
2
S

(
y
i
y
)

y
1
,
j

y
i
\
(
2
yy i (14.6)
i1 i1
n
( i1 ,
n n n n
\j \
S

(
x
i
x
)(
y y
)

x y
1
,
j

x
i (,
y
i ( xy i i i (14.7)
i1 i1
n
( i1 ,( i1 ,
344
RegressionandCorrelation
Equation14.3canbewrittencompactlyas
S
xy
b
(14.8)
S
xx
Theseabbreviationswillbeusedalsoinlaterequations.
Fromequation14.4wehave
a y bx (14.4a)
Substitutingfora inequation14.2withalittlerearrangementgives

i
(
y y
)
b
(
x x
)
(14.9)
i
, Thisindicatesthatthebest-fitlinepassesthroughthepoint
(
x y
)
,whichiscalled
thecentroidalpointandisthecenterofmassofthedatapoints.Aftertheslope,b,is
foundfromequation14.8,theintercept,a,isusuallycalculatedfromequation14.4a.
Equations14.3and14.4arecalledregressionequations.Thenameregression
arosebecauseanearlyexampleofitsusewasinastudyofheredity,whichshowed
thatundercertainconditionssomephysicalcharacteristicsofoffspringtendedto
revertorregressfromthecharacteristicsoftheparentstowardaveragevalues.The
nameregressionhasbecomewellestablishedforallusesforsuchequationsand
fortheprocessoffindingbest-fitequationsbythemethodofleastsquares.
Illustration
NowletsapplytheseequationstothepointsthatwereplottedinFigure14.1.The
dataaregiveninTable14.1.
Table14.1:DataforSimpleLinearRegression
x 0 1 2 3 4 5 6 7 8 9 10 11 12
y 3.85 0.03 3.50 6.13 4.07 7.07 8.6611.6515.2312.2914.7416.0216.86
Wehave13points,son=13.Thedatacanbesummarizedbythefollowing
13 13 13 13 13
2 2
sums:

x
i
78,

y 120.10,

x
i
650,

y 1483.0828,

x y 968.95
i i i i
i1 i1 i1 i1 i1
78 120.10
Thecentroidalpointisgivenby x 6, y 9.23846.
13 13
Thesumsofsquaresandthesumofproductsare
S 650
1
( )
2
=650468=182 78
xx
13
S
yy
1483.0828
1
(
120.10
)
2
=1483.08281109.5392=373.5436
13
345
xx
Chapter14
1
78 S
xy
968.95
( )(
120.10
) =968.95720.60=248.35
13
S
xy
248.35
Thenb=
1.36456,
S 182
andusingthevaluesof x and y inequation14.4awefind
a=9.23846(1.36456)(6.000)=9.238468.18736=1.0511
Thebest-fitregressionequationofyasafunctionofx(oftencalledtheregression
equationofyonx)bythemethodofleastsquaresis
y=1.0511+1.36456x
Noticethatthiscalculationinvolvestakingdifferencesbetweennumbersthatare
oftenofsimilarmagnitude,sorounding
thenumberstooearlycouldgreatly 20
reducetheaccuracyoftheresults.As
usual,roundingshouldbelefttotheend
16
ofthecalculation. y
Thecalculationsforregression,
12
especiallyforlargesetsofdata,canbe
donemuchmorequicklyusingaspread-
sheetratherthanapocketcalculator.
8
Excelissuitableforsuchcalculations.
4
Theresultingregressionequationofy
onxiscomparedwiththeoriginalpoints
inFigure14.2.Thecentroidalpointis
0
Centroidalpoint
0 4 8 12
alsoshown.Toemphasizethatdeviations
inthey-directionareminimized,lines
havebeendrawninthatdirectionbe-
tweenthepointsandtheline.
x
Figure14.2:Comparisonof
PointsandRegressionLine
(b) ComparisonofRegressionsforDifferentAssumptionsofError
Thederivationfortheregressionofyonxassumedthatthevaluesofxwereknown
withouterrorandthatonlyvaluesofycontainederror.Istheresultdifferentifthis
assumptionisnotcorrect?Theoppositeassumptionwouldbethatvaluesofyare
knownwithouterrorandonlyvaluesofxcontainerror.Inthatcasedeviationsfrom
thelinewouldbetakeninthex-directionatconstanty.Therolesofyandxwouldbe
reversed.Theequationofthenewregressionlinewouldbex=a +by,so
x a
y=
b
.
Derivationforminimumsumofsquaresofdeviationsinthiscasewould
give
346
RegressionandCorrelation
n n n
n
\j \

x y
1j
,
y
i
,
(

(
x
i
x
)(
y
i
y
)
S
b
i i
x
i (,
i1
n
( i1 ,( i1
i1

xy
n
n n
yy

y
i
2

1
,
j

y
i (
\
2

(
y
i
y
)
2 S
(14.10)
i1
i1
n
( i1 ,
and
n n

x b

y
i

i
i1 i1 (14.11)
x a b y
n
Againtheregressionlinewouldpassthroughthecentroidalpoint.Iftheequation
ofthenewlineissolvedfory,itbecomes
a 1
y + x
(14.12)
b b
ThusitsslopeisS
yy
/S
xy
,insteadofS /S
xx
fortheslopeoftheregressionofyonx.
xy
Thenewregressionlineiscalledtheregressionofxony.Thentheassumption
concerningwhichvariablecontainstheerrordoesmakeadifference.Theonlycase
inwhichthelinesfortheregressionofyonxandtheregressionofxonywould
coincideiswhenthepointsformaperfectstraightline.Themorethedatapoints
departfromastraightline,themorethetworegressionlineswilldiffer.Figure14.3
showstheregressionlineofyonxandtheregressionlineofxonyfortheillustra-
tionofFigures14.1and14.2.
20
16
y
Centroidalpoint
y
12
yonx
8
xony
4
Figure14.3:Comparisonof
Regression Lines
0
0 4 8 12
x
(c) VarianceofExperimentalPointsAroundtheLine
Nowweneedtoestimatethevarianceofpointsfromtheleast-squaresregressionline
foryonx.Thismustbefoundfromtheresiduals,deviationsofpointsfromtheleast-
squareslineinthey-direction.Aswediscussedinpart(b)ofthissection,these
347
Chapter14
y y y
i
(
a y a residualscanbecalculatedas e
i
+ bx
i
)
bx . Theerrorsum
i i i i
ofsquaresabbreviatedasSSE,isgivenby
n
2
SSE

(
y bx
i
)
a
i
i1
+ bx andthus a y or,sincey a bx ,
n
SSE

(
y
i
b
(
x x
)
]
]
2
i1
n n n
y
)
i

2
2
2

(
y y
)
2b

(
x x
)(
y + b

(
x
i
x
)
i i i
y
)
i1 i1 i1
2
S 2bS + b S
yy xy xx
2
S
2
( )
S
xy
Butfromequation14.8,b=
S
xy
,so b S
xx

S
xy
S
( )(
S
xy
)
b S ,
S
2
xx xy
xx
xx
xx
( )
S
2
and 2bS
xy
+ b S
xx
bS .Thenwehave
SSE=S bS (14.14)
xy
yy xy
Thisestimateoftheerrorsumofsquares,SSE,mustbedividedbythenumberof
degreesoffreedom.Thenumberofdegreesoffreedomavailabletoestimatethe
2
variance isthenumberofpointsorpairsofvaluesforxandy,lessonedegreeof
y x
freedomforeachoftheindependentcoefficientsestimatedfromthedata.Inthiscase
wehavenpointsandwehaveestimatedfromthedatatwoindependentcoefficients,
band y,oraandb.Theavailablenumberofdegreesoffreedomis(n2).The
estimateofthevarianceofthepointsaboutthelineis
SSE

S
yy
b S
2
s
xy
(14.15)
y x
n 2 n 2
Thisquantityisameasureofthescatterofexperimentalpointsaroundtheline.
Thesquarerootofthisquantityis,ofcourse,theestimatedstandarddeviationor
standarderrorofthepointsfromtheline.Thesubscript,y|x,ismeanttoemphasize
thattheestimatedvariancearoundthelineisfoundfromdeviationsinthey-direction
2
andatfixedvaluesofx.Thesubscriptisomittedinsomebooks.Thequantity
s
y x
canbeusedalsotoobtainestimatesofthestandarderrorsofotherparameters,such
astheslopeofthebest-fitline.Thesewillbediscussedinsection14.3forstatistical
inferences.
14.2 AssumptionsandGraphicalChecks
Letuslookfirstattheassumptionsrequiredforfindingthebest-fitlines.Afterthat,
welllookatadditionalassumptionsrequiredforstatisticalinferencessuchas
confidencelimitsandstatisticalsignificance.Thenletsseehowwecanexaminea
plotofthedatatoseewhethersomeassumptionsarereasonableforanyparticular
case.
348
RegressionandCorrelation
Forsimplelinearregressionofyonxinthesimplestcaseweassumethedata
pointsarerelatedbyanequationoftheform
y
i
=a+bx
i
+e
i
(14.16)
wherethee
i
representerrorsordeviationsorresiduals.Thisinvolvescertainassump-
tions.Thefirstisthatalinearrelationbetweenyandxrepresentsthedataadequately,
sothatthemodelrepresentedbyequation14.16issatisfactory.Thesecondassump-
tionisthattheerrorse
i
areentirelyinthey-directionandsoindependentofx;thus,
thereareassumedtobenoerrorsinthevaluesofx.Regressioncalculationsalso
assumethattheindividualresiduals,e
i
,areessentiallyindependentofoneanother,so
thatequation14.16istheonlyrelationaffectingyovertheregioninwhichmeasure-
mentshavebeentaken.Similarassumptionsapplyfortheregressionofxony.
Inordertoapplystatisticaltestsofsignificanceandconfidencelimitswemust
alsoassumethatthevarianceisconstantandnotvaryingasafunctionofthevari-
ables,andthatthestatisticaldistributionoferrorsorresidualsisatleast
approximatelynormal.Inparticular,positiveandnegativedeviationsfromtheline
shouldbeequallylikelyatallvaluesofxwithintherangeofexperimentaldata.Any
outliers,pointsforwhichresidualsaremuchlargerthantheothersinabsolutevalue,
maymakestatisticalinferencesuseless.
Thereasonablenessoftheassumptionthatthevaluesofxareknownwithout
errorandalltheerrorisinymustbetestedbyknowledgeofthequantitiesandhow
theyaremeasured.Becausethelineforregressionofyonxandthelineforxony
comeclosertogetherasdataapproachaperfectstraightline,theeffectsofthis
assumptionbecomelesssignificantasthedatacomeclosertoaperfectcorrelation.
Nowconsidertheassumptionofasimplelinearrelationbetweenyandx.Isa
linearrelationbetweenyandxasatisfactoryrepresentationofthedata?Oristhere
reasontothinkthatsomeotherformofrelationwouldrepresentthedatabetter?In
manycasestheunderlyingrelationmaybemorecomplex,butastraight-linerelation
betweenyandxmayrepresentthedatasatisfactorilyforaparticularrangeofvalues.
Toexaminetheseandotherquestions,weneedtocalculatetheresidualsfromthe
best-fitstraightline(orsomemorecomplexalternative),plotthemagainstxory,and
examinetheresults.Ifwefindsomesystematicrelationbetweentheresidualsand
eitherxory,thenapparentlythestraight-linerelationofequation14.13isnot
adequateandweshouldtryadifferentformfortherelationbetweenyandx.
Wecanobtainanindicationofwhetherthevarianceisconstantfromtheplotof
residualsagainstxory.Iftheresidualsshowmarkedlymoreormarkedlylessscatter
(orfirstone,thentheother)asxoryincreases,sothatthescattershowsasystematic
pattern,thenthevarianceisprobablynotconstant.Ofcourse,residualsvaryran-
domlybesidesanysystematicvariation,sowehavetobecarefulnottojumpto
conclusions.Itisoftendesirabletoobtainmoredatatoconfirmatentativeconclu-
sionthatthevarianceisnotconstant.
349
Chapter14
Weshouldconsidertheresidualstoseewhethertheassumptionofanormal
distributionisreasonable.Arethereaboutasmanypositiveresidualsasnegatives?
Aresmalldeviationsconsiderablymorefrequentthanlargerones?Arethereany
outstandingoutliers?Wecananswerthesequestionsbyexaminingtheplotof
residualsagainstxory(usuallyx).Thesetestsareadequateforrelativelysmallsets
ofdata.Thereareother,moresophisticatedtestsforanormaldistributionthatare
usefulforlargerdatasets,buttheyarebeyondthescopeofthisbook.Forthemthe
readerisreferredtothebookbyRyan(seereferenceinsection15.2).
Someexamplesofgraphicalchecks
4
3
2
Residuals, 1
e
0
-1
-2
-3
0 4 8 12
x
Figure14.4:Residualsplottedagainstx
Figure14.4showsaplotofresidualsversusxforthesamedataasinFigures
14.1,14.2,and14.3.Theredoesnotseemtobeanystrongpatternamongthepoints
inthisplot,sowehavenoreasontodiscardthestraight-linerelation.Furthermore,
theredoesntseemtobeanysystematicchangebeyondrandomvariationinthe
scatterasxincreases,sowehavenoreasontobelievethatthevarianceisanything
otherthanconstant.
ThedatapointsinFigure14.4showaboutthesamenumberofpositiveand
negativeresidualsandnopronouncedoutliers.Residualsofsmallabsolutevalueare
considerablymorefrequentthanlargerdeviations.Therefore,theassumptionthat
deviationsfromthelinearenormallydistributedseemstobesatisfactory.
Forcomparison,Figure14.5showsaresidualplotforacaseinwhichresiduals
plottedagainstxshowsystematicdeviations.Theyaresystematicallylessthanor
equaltozerointhemiddleofthediagram,andgreaterthanorequaltozerooneither
side.
350
RegressionandCorrelation
-4
-2
0
2
4
6
Residual,
e
0 4 8 12
x
Figure14.5:Anothersetofresidualsplottedagainstx
Thisindicatesthatalinearrelationbetweenyandxdoesnotrepresentthedata
adequately.Inthiscaseaquadraticrelationbetweenyandx,suchasy=a+bx
2
,
mightbemoreappropriate.
Figure14.6showsanotherplotofresidualsagainsttheinputquantity.Inthiscase
wecanseethatthescatteroftheresidualsgraduallyincreasesastheindependent
variableincreases.Thisindicatesthatthevarianceisnotconstantbutappearstobe
increasingasxincreases.Beforeconfidencelimitsortestsofsignificancecanbe
appliedtothesedata,wemustdosomethingtomakethevarianceaboutthelineat
leastapproximatelyconstant.Onepossibilityistofindasuitabletransformationto
giveaquantitywhichwillhaveamoreconstantvarianceabouttheline.Thatwillbe
exploredinsection14.4.DraperandSmith(seereferenceinsection15.2)suggest
alsoamethodofweightingtheresidualsinsuchawaythatthevariancebecomes
constant.
10
Residual,
5
e
0
-5
-10
0 4 8 12
x
Figure14.6:Residualsplottedagainstinputorregressor
351
Chapter14
DraperandSmithsuggestothergraphicalchecksofthedata.Inparticular,they
suggestaplotofresidualsagainsttheorderinwhichmeasurementsweremade.This
isdonetocheckforanyrelationbetweenerrorsandthetimesequence,suchasdrift
incalibrationofinstruments.
Lackofindependenceamongthey-valuesinregressionisoftenduetoacorrela-
tionbetweenthesey-valuesandsomeotherindependentvariablethatischanging
appreciably.Ifthevariousrunsandmeasurementshavenotbeenproperlyrandom-
ized,asdescribedinChapter11,biascanentersothatresultsaremisleading.
14.3 StatisticalInferences
Ifthegraphicalchecksaresatisfactory,wecanlookatconfidenceintervalsandtests
ofsignificance.Equation14.15hasgivenusanexpressionfortheestimatedvariance
2
ofpointsabouttheregressionline,s
y|x
.
(a) InferencesforSlope
Letusfirstlookatthevarianceoftheestimatedslope,b.Fromequation14.8,
n

(
x
i
x
)(
y y
)
i
S
b
xy

i1
.Rememberingthatthexsareassumedtobeknown
n
2 S
xx

(
x
i
x
)
i1
withoutanyerror,applyingtherulesforcombiningvariables,whichwerediscussed
brieflyinsection8.2,givesthefollowing:
2 2
s s
2 y x y x
s
b

n

(
x
i
x
)
2 S
xx (14.17)
i1
Theestimatedstandarderror(orstandarddeviation)oftheslopeisthesquarerootof
this.Thestandarderroroftheslopeismultipliedbythevalueoftfortheappropriate
probabilityandtheappropriatenumberofdegreesoffreedomtofindconfidence
intervals.Hypothesistestsaredoneinthesamewayasinsection9.2.
Noticefromequation14.17thattheestimatedvarianceoftheslopebecomes
n
2
smalleras S

(
x x
) increasesifthevariances
y|x
2
remainsconstant.Thenif
xx i
i1
thex-valuesaremorewidelyseparatedfromthemean,wewillhaveabetterestimate
oftheslopeofthelinebecausethestandarderroroftheslopewillbesmaller.That
seemsreasonable:ifwebaseanestimateoftheslopeonmorewidelyseparated
pointswhiles
2
,thevarianceofpointsabouttheline,remainsconstant,theestimate
oftheslopeshouldbemorereliable.
(b) VarianceoftheMeanResponse
Next,welookatthevarianceof y ,whichrepresentstheleast-squaresestimateofy
asafunctionofx.Thisissometimescalledthevarianceofthemeanresponse.We
352
RegressionandCorrelation
willuseasitssymbol
s
2
y x
.(Rememberthatthecircumflexorhatin y indicatesan
estimatefromdata.)
s
2
y x
representstheestimatedrandomvariationofthebest-fit
lineiftheexperimentisrepeatedwiththesamenumberofpointsatthesamevalues
ofx.Thepositionofthelineisdeterminedbythreeindependentquantities:the
, positionofthecentroidalpoint
(
x y
)
,theslopeoftheline,b,andthedifference
betweenthex-coordinateofaparticularpointandthex-coordinateofthecentroidal
point.Fromthisweget
2 2
2
s s
(
x x
)
,
1 (
x x
)
]
]
y x
+
y x

2
2 2
, s

s
y x
xx
y x
,

n
+
S
xx
]
]
(14.18)
n S
Thus,weseethatthevarianceofthemeanresponseissmallestatthecentroidalpoint
andincreasestotheleftandtotherightbyanamountproportionaltotheestimated
varianceofthepointsaroundthelineandtothesquareofthex-distancefromthe
centroidalpoint.Equation14.18canbeappliedatanyvalueofx,includingatx =0,
whereitwouldgivethevarianceoftheintercept,a.Remember,however,thatitis
dangeroustoextrapolateoutsidetheregioninwhichmeasurementshavebeentaken.
(c) VarianceofaSingleNewObservation
Ifwewantthevarianceofasinglenewobservationratherthanthemeanresponse,
thevarianceshowninequation14.18willbelargerbythevariancearoundtheline,
2
s
y x
.Thenthe
,
1 (
x x
2
2
varianceofasinglenewobservation=
s
y x
,1+ +
S

)
]
]
(14.19)
,
n
xx
]
]
Correspondingstandarderrorsareobtainedfromthevariancesofequations14.18
and14.19bytakingsquareroots.Thesestandarderrorsarethenmultipliedby
appropriatevaluesofttofindconfidenceintervals(inthecaseofthemeanresponse)
andpredictionintervals(inthecaseofsinglenewobservations).Testsofsignificance
areperformedsimilarly.
Illustration(continued)
Continuingwiththepreviousnumericalillustration,whichwasusedinFigures
14.1to14.4andforwhichdatawereshowninTable14.1,wehave:
2
s
2 y x 3.15046
s
b

=0.01731,s
b
=0.1316
S 182
xx
2
2
,
1 (
x x
2
Fromequation14.18
y x
s , +

)
]
]
,andatx=11wehave s
y x
,
n S
xx
]
]
2
,
1
2
s
y x

(
3.15046
)
, +
(
11 6
)
]
]
]
=0.67510and s

=0.8216.
,
13 182
]
y x

353
Chapter14
Fromequation14.19theestimatedvarianceofasinglenewobservationis
2
,
1 (
x x
,
1 (
11 6
)
]
]
s , 1+ +

)
2
]
]
; atx =11thisbecomes(3.15046)
, 1+ +
2
y x
,
n S
xx
]
]

,
13 182
]
]

=3.8256,andthecorrespondingstandarderroris1.956.
The95%confidencelimitsforthemeanresponseand95%predictionintervals
foranewindividualobservationareshowninFigure14.7forthesamerelationship
asinFigures14.1to14.4.
ThearrowsinFigure14.7emphasizethaterrorsareassumedtobeonlyinthey
direction.
25
20
y
15
10
5
0
-5
0 2 4 6 8 10 12
x
Legend: Regressionline
Upperlimitformeanresponse
Lowerlimitformeanresponse
Upperlimitforindividualresponse
Lowerlimitforindividualresponse
Figure14.7:95%Confidenceintervalsand
predictionintervalsaroundtheregressionline
354
x
i
RegressionandCorrelation
Example14.1
Theshearresistanceofsoil,ykNm
2
,isdeterminedbymeasurementsasafunction
ofthenormalstress,xkNm
2
.Thedataareasshownbelow:
10 11 12 13 14 15 16 17 18 19 20 21
y
i
14.08 15.57 16.94 17.68 18.49 19.55 20.68 21.7222.80 23.84 24.79 25.67
Findtheregressionlineofyonx.Plotthedata,theregressionline,andthecentroidal
point.
Answer:Calculationstofindtheregressionlineofyonxareshownbelow.Interme-
diateresultsarenotroundeduntilfinalresultsareobtainedtominimizerounding
errors,butnumbersarereportedheretoonlythreedecimals.
Totals
x y x
2
y
2
xy
10 14.08 100 198.161 140.770
11 15.57 121 242.506 171.299
12 16.94 144 286.844 203.238
13 17.68 169 312.646 229.863
14 18.49 196 341.710 258.795
15 19.55 225 382.179 293.241
16 20.68 256 427.479 330.809
17 21.72 289 471.698 369.216
18 22.80 324 519.880 410.416
19 23.84 361 568.520 453.029
20 24.79 400 614.473 495.771
21 25.67 441 659.118 539.139
186 241.80 3026 5025.214 3895.587
186 241.80
x 15.5 y 20.151
12 12
,
x
i
\
(
2
n
j
2
n
2
S
( )

( i1 ,
3026
(
186
)
x
i xx
i1
n 12
=30262883=143
355
Chapter14
,
y
i
\
(
2
n
j
2
S
n
yy

y
2

( i1 ,
5,025.214
(
241.80
)
i
i1
n 12
=5,025.2144,872.399= 152.815
n n
j \j
,
y
i (
\
n
,( i1
S
xy

x y
( i1
x
i (,
,
3895.587
(
186
)(
241.80
)
i i
i1
n 12
=3895.5873,747.950= 147.637
S
xy
147.637
b x 20.151
(
1.032
)(
15.5
)
4.148. Thenb =1.032 and a y
S 143
xx
BecausethecalculationsofS
yy
,S ,andaallinvolvedifferencesofnumbersofsimilar
xy
magnitudes,itisparticularlyimportantnottoroundthenumberstoosoon.The
regressionlineofyonxisy=4.148+1.032x.
Thedata,centroidalpointandregressionlineofyonxareshowninFigure14.8.
12
17
22
27
S
h
e
a
r

R
e
s
i
s
t
a
n
c
e
,

k
N
/
m
2

Centroidalpoint
10 12 14 16 18 20 22
NormalStress,xkN/m
2
Figure14.8:Shearresistanceofsoilasafunctionofnormalstress
356
RegressionandCorrelation
Example14.2
ForthedataofExample14.1calculatethestandarddeviationofpointsaboutthe
regressionline,thenplotresidualsagainsttheinput,andcommentontheresults.
Answer: CalculationsfortheseplotsareshowninTable14.1.
Table14.1:Calculationofresiduals
x y
y Residuals
10 14.08 14.47 0.39
11 15.57 15.50 +0.07
12 16.94 16.54 +0.40
13 17.68 17.57 +0.11
14 18.49 18.60 0.12
15 19.55 19.63 0.08
16 20.68 20.67 +0.01
17 21.72 21.70 +0.02
18 22.80 22.73 +0.07
19 23.84 23.77 +0.08
20 24.79 24.80 0.01
21 25.67 25.83 0.16
0.4
Residual,
e
0.2
0
-0.2
-0.4
10 12 14 16 18 20 22
x
Figure14.9:Residualsplottedagainsttheregressor
357
Chapter14
Theresidualsarejustthedifferencesbetweenthemeasuredvaluesofyandthe
correspondingvaluesontheregressionline, y .Theyareplottedagainsttheinputor
regressorinFigure14.9.
Fromequation14.14wehave
SSE =S bS
yy xy
=152.815(1.032)(147.637)
=152.815152.425
=0.3896
Fromequation14.15, s
y|x
=SQRT(SSE/(n2))
=SQRT(0.3896/10)
=0.1974
Figure14.9showsnosystematiceffectoftheinputontheresiduals,eitherin
averageorinvariability.Thusthereisnoreasontothinkthattheshearresistanceof
thesoilisnotwellrepresentedforthisrangeofvaluesbyalinearfunctionofthe
normalstress.Furthermore,thereisnoreasontothinkthatthevarianceisafunction
ofx.Thedistributionoftheresidualsisconsistentwithanormaldistribution.Thus,
wecanlegitimatelyusethecalculateddatatofindconfidenceintervalsandprediction
intervals,andapplytestsofsignificance.
Example14.3
ForthedatagiveninExamples14.1and14.2:
a) Findthe90%confidenceintervalfortheslopeoftheregressionlineofshear
resistanceonnormalstress.
b) Istheslopesignificantlylargerthan1.000atthe5%levelofsignificance?
c) Findthe95%confidenceintervalforthemeanresponseofshearresistanceata
normalstressof12kN/m
2
.
d) Isthemeanresponsefortheshearresistanceatanormalstressof12kN/m
2
significantlymorethan16.5kN/m
2
atthe1%levelofsignificance?
e) Findthe95%predictionintervalforasinglenewobservationatanormalstress
of20kN/m
2
.
Answer: a)BythemethodofleastsquareswefoundinExample14.1thatthebest
estimateoftheslopeorregressioncoefficientisb=1.0324.InExample14.2we
calculatedtheestimatedstandarderrorofthepointsaroundthebest-fitlinetobes
y|x
2
=0.1974,and
s
y x
=0.03896.Fromequation14.17theestimatedvarianceofthe
2
s
y x
slopeis s
b
2
,andfromExample14.1S
xx
=143.Thentheestimatedstandard
S
xx
358
RegressionandCorrelation
0.03896
erroroftheslopeis s
b
=0.01651.Forthe90%confidenceintervalwe
143
needthevalueoftcorrespondingtoprobability0.05ineachtailwith122=10
degreesoffreedom.ThisisshowninFigure14.10.FromTableA2wefindforaone-
tailprobability of0.05andfor10degreesoffreedom,t
1
=1.812.Thenthe90%
confidenceintervalfortheslopeisfrom1.0324(1.812)(0.01651)=1.0025to
1.0324+(1.812)(0.01651)=1.0623.
Figure14.10:t-distributionfor
90%confidenceinterval
5%probability
5%probability
t
1 t
1 t
b) H
0
: =1.000
H
a
: >1.000(one-tailedtest)
b 1.000
Teststatistic: t .
s
b
Largevaluesof|t|willindicatethatthenullhypothesisisunlikelytobecorrect.
b=1.0324
s
b
=0.01651(fromparta)
1.0324 1.000
= =1.962
For10d.f.,one-tailprobability0.05,t
critical
=1.796asinFigure14.10.
t
calculated
0.01651
Since|t
calculated
|>t
critical
,theslopeissignificantlylargerthan1.000atthe5%levelof
significance.
t
However,for10d.f.andone-tailprobabilityof0.025,TableA2showsthat
critical
=2.228.Thenat2.5%levelofsignificance,|t
calculated
| <t
critical
,sotheslopeisnot
significantlylargerthan1.000.Butwehavetoanswertheproblemasitwasstated.
c) Thevarianceofthemeanresponseatx=12isgivenbyequation14.18as
2
,
1

2
,
1 (
12 15.5
)
]
]
2 2
s

s , +
(
x x
)
]
]
0.03896, +
y x y x
,
n S
xx
]
]
,
12 143
]
]
359
Chapter14
=0.03896[0.0833+0.0857]
=0.00658
Thenthestandarderrorofthemeanresponseis0.0811.
Forthe95%confidenceintervalwerequirethevalueoft for10d.f.andaone-
tailprobabilityof0.025.ThisisshowninFigure14.11.FromTableA2wefind
t
1
=2.228.
2.5%probability
Figure14.11:
95%confidenceinterval
2.5%probability
- t
1 t
1 t
Atanormalstressof12kN/m
2
thepredictionfromtheregressionlineis
y =4.148+1.032x=4.148+(1.032)(12)=16.532.Thenthe95%confidence
intervalforthemeanresponseat12kN/m
2
is16.532(2.228)(0.0811)=16.351to
16.713.
Thenwecanhave95%confidencethatthemeanresponsefortheshearresistanceat
anormalstressof12kN/m
2
isbetween16.35and16.71kN/m
2
.
16.5

d) H
0
:

y x12
H
a
:

y x12
> 16.5
(one-tailedtest)

Teststatistic: t
y
x12
16.5
.Largevaluesof|t
calculated
|willmakeH
0
unlikelytobe
correct.
s
y x12
Figure14.12:t-distributionwith
1%probabilityinuppertail
1%probability
t
1
t
360
RegressionandCorrelation
FromTableA2wefindforanupper-tailprobabilityof0.01,10d.f.,t
critical
=
2.764.Wefoundinpart(c)thatthestandarderrorofthemeanresponseis0.0811,
andthatatanormalstressof12kN/m
2
themeanresponseis16.532kN/m
2
.Thenwe
have
16.53216.5
= =0.395 t
calculated
0.0811
Sincet
calculated
<t
critical
,themeanresponsefortheshearresistanceatanormalstressof
12kN/m
2
isnotsignificantlymorethan16.5kN/m
2
atthe1%levelofsignificance.
e) Atanormalstressof20kN/m
2
thepredictedshearresistanceofthesoilis4.148
+(1.032)(20)=24.788.Frompart(c)andFigure14.13forthe95%prediction
intervalwehavet
1
=2.228.Thestandarderrorforasinglenewobservationis
(fromequation14.19)

2 2
1 (
x x
) 1 (
2015.5
)
s 1+ + 0.1974 1 + +
=0.2185.The95%
y x
n S 12 143
xx
predictionintervalforasinglenewobservationisfrom24.788(2.228)(0.2185)
to24.788+(2.228)(0.2185),orfrom24.30to25.27kN/m
2
.
14.4 OtherFormswithSingleInputorRegressor
(a) OtherFormsLinearintheCoefficients
Withanextrastepofcalculationanimportantgroupofequationscanbefittedtodata
bythemethodofleastsquares.Forinstance,equationsoftheform,logy=a+bx,
whereaandbarecoefficientstobedeterminedbyleastsquares,canbehandled
easily.Rememberthatxandyareknownquantities,numbers.Thenwecancalculate
withoutdifficultythevalueoflogyforeachdatapoint.Thenlogycanbeusedin
placeofy,andsotheregressioncoefficientscanbecalculatedasbefore.
Example14.4
Wewanttofitthefollowingsetofdatatoanequationoftheform,lny=a+bx,by
themethodofleastsquares.
x 0 1 2 3 4 5 6 7 8 9 10 11
y 1.178 1.142 1.273 1.354 1.478 1.737 1.842 1.778 2.160 2.418 2.339 2.931
Answer:Thefirststepistocalculatelnyforeachvalueofy.Thenx
2
,(lny)
2
,andthe
product(x)(lny)arecalculated.
361
xx
Chapter14
Table14.2:Calculationsforregressionusinglny
Totals
x ln y x
2
(ln y)
2
(x)(ln y)
0 0.164 0 0.027 0.000
1 0.133 1 0.018 0.133
2 0.241 4 0.058 0.482
3 0.303 9 0.092 0.910
4 0.391 16 0.153 1.562
5 0.552 25 0.305 2.760
6 0.611 36 0.373 3.666
7 0.575 49 0.331 4.027
8 0.770 64 0.593 6.161
9 0.883 81 0.780 7.946
10 0.850 100 0.722 8.496
11 1.075 121 1.157 11.830
66 6.548 506 4.607 47.973
66
6.548
Then x 5.5 and
1ny 0.5457.
12 12
j
12
\
2
2
,
x
(
66
12
2
S

x
( i1 ,
506
( )
=506363=143
i1
12 12
12
j
,
1ny
\
(
2
2
12
2
S
1n ,1n y

(
1ny
)

( i1 ,
4.607
(
6.548
)
=4.6073.573=1.034
y
i1
12 12
12
j
12
\j
,
1ny
(
\
12
,( i1
66
=47.97336.014=11.958
x S
,1n

( )(
1ny
)

( i1
x
(,
,
47.973
( )(
6.548
)
x y
i1
12 12
Then b
S
x,1n y

11.958
0.08362
S 143
xx
anda= 1n y(b)(x)=0.5457(0.08362)(5.5)=0.54570.4599=0.0858
Thenthefittedequationis
lny=0.0858+0.08362x
Theresiduals,fittedvaluesoflnylessobservedvaluesoflny,areplottedversus
xinFigure14.13.InfactthesedatacorrespondtothedataplottedinFigure14.6as
residualsofyvs.x.Ifthesetwoplotsarecompared,itwillbeseenthatthelogarith-
mictransformationhasmadethevariancemuchmoreconstant.
362
RegressionandCorrelation
-0.1
-0.05
0
0.05
0.1
Residual
0 2 4 6 8 10 12
x
Figure14.13:Residualsoflnyvs.x
Thismodifiedmethodworksforaconsiderablenumberofcases.Therequire-
mentisthattheequationtowhichwefitdatamustbeoftheformf
1
(y)=a+bf
2
(x),
wherexistheonlyinputquantity.Thetwofunctions,f
1
(y)andf
2
(x),canbeofany
formanddonothavetobelinear,butbothaandbmustbecoefficientstobedeter-
minedbythemethodofleastsquares.Thusthefittingequationmustbelinearinthe
coefficientssothatitiseasytosolveforaandb.Themodifiedmethodissometimes
stillconsideredtobesimplelinearregression,butthenthewordsimplemeansthat
thereisonlyoneinput,andthewordlinearmeansthattheequationislinearinthe
coefficients.Fittingequationsamenabletothemodifiedmethodincludethefollow-
ingtypes:
y=a+bx
3
y=a+b x
2
x
y=a+b e
b
y=a+
x
y=a+bx
3
1
b
=a+
y
x
1
=a+b1nx
y
e
y
=a+bx
logy=a+blogx
363
Chapter14
andmanyothers.Sometimesthenonlinearformforyorxissuggestedbytheoryor
previousexperience,andsometimesitissuggestedbyconsiderationofthepatternof
theresidualsandbytrial-and-error.
Thegraphicalchecksforconstantvariance,forfittothechosenequation,andfor
normalitycanbedoneasbefore.Ifthechecksaresatisfactory,statisticalinferences
regardingconfidencelimitsandtestsofhypothesiscanbeapplied.However,remem-
berthatifyisnormallydistributed,lny,y
1
,e
y
andsoonareunlikelytobe
satisfactorilynormal.Theassumptionthattheinput,x,isknownwithouterrorstill
applies.
(b)Otherformstransformabletogiveequationslinearinthecoefficients
Variouscommonformsofequationsinvolvingoneinputcanbetransformedeasilyto
giveformsofequationswhicharelinearinthecoefficients.
(1) Theexponentialfunction,y=ab
x
,canbemodifiedsuitablybytakinglogarithms
ofbothsides.Thisgiveslogy=loga+xlogb.Noticethatthisistheformthat
givesstraightlinesonsemi-loggraphpaper.
(2) Thepowerfunction,y=ax
b
,canalsobetreatedbytakinglogarithmsofboth
sides.Theresultislogy=loga+blogx.Noticethatthisformwouldgive
straightlineson log-loggraphpaper.
x 1 a
(3) Thefunction, y ,canbeinvertedtogive + b.Analternativeisto
a b x y x +
x
multiplytheinvertedformbyxtogive + a b x .
y
Itisimportanttonotethatthesquaresofthedeviationsareminimizedinthe
transformedresponsevariable(logyor1/yorx/yinthecasesabove)ratherthany,
andthegraphicaltestsneedtobeappliedtothetransformedresponsevariable.Itis
possibleinsomecasestoapplyasimpleweightingfunctiontomakethevariance
approximatelyconstant(seethebookbyDraperandSmith,referencegivenin
section15.2).
(c) Extension:NonlinearForms
Equationsthatcannotbetransformedintoformslinearinthecoefficientscanstillbe
treatedbyleastsquares.However,nowinsteadofapplyingtherelationsdiscussedto
thispoint,iterativenumericalmethodsmustbeusedtominimizethesumofsquares
ofthedeviationsfromthefittedline.TheExcelfeaturecalledSolvercanbeusedfor
thatcalculation.
14.5 Correlation
Correlationisameasureoftheassociationbetweentworandomvariables,sayXand
Y.Wedonothaveforthiscalculationtheassumptionthatoneofthesevariablesis
knownwithouterror:bothvariablesareassumedtobevaryingrandomly.Wedo
assumeforthisanalysisthatXandYarerelatedlinearly,sotheusualcorrelation
364
RegressionandCorrelation
coefficientgivesameasureofthelinearassociationbetweenXandY.Althoughthe
underlyingcorrelationisdefinedintermsofvariancesandcovariance,inpracticewe
workwiththesamplecorrelationcoefficient.Thisiscalculatedas
S
r
xy
xy
S S
(14.20)
xx yy
whereS
xx
,S
yy
,andS
xy
aredefinedinequations14.5to14.7.Thiscorrelationcoeffi-
cientisoftendenotedsimplybyr.
Ifthepoints(x
i
,y
i
)areinaperfectstraightlineandtheslopeofthatlineis
positive,r
xy
=1.Ifthepointsareinaperfectstraightlineandtheslopeisnegative,
xy
r
xy
=1.IfthereisnosystematicrelationbetweenXandYatall,r 0,andr
xy
differs
fromzeroonlybecauseofrandomvariationinthesamplepoints.IfXandYfollowa
linearrelationaffectedbyrandomerrors,r
xy
willbecloseto+1or1.Thesecasesare
illustratedinFigure14.14.Inallcases,becauseofthedefinitions1 r
xy
+1.
6
(a)
6
5
5
y
y
4
4
3
3
2
2
1
1
0
0
(b)
0 1 2 3 4 5
0 1 2 3 4 5
x
x
6
(c)
6
5 5
y y
4 4
3 3
2 2
1 1
0 0
(d)
0 1 2 3 4 5 0 1 2 3 4 5
x x
Figure14.14:Illustrationsofvariouscorrelationcoefficients
(a)r
xy
=+1,(b)r =1,(c)r
xy
0,(d)r 1
xy xy
365
Chapter14
Examinationofequations14.8,14.7and14.20willindicatethatr
xy
hasthesame
signastheslopeorregressioncoefficient,b.Furthermore,r
xy
= b b.Thenasr
xy
becomescloserto1or+1,theequationsofthelinearregressionlineofyonxand
thelinearregressionlineofxonycomeclosertocoincidingwithoneanother.
Anapproximateexpressionforthestandarderrorofthesamplecorrelation
coefficienthasbeenderivedandisavailable.Itisthereforepossibletotestthe
hypothesisthattheunderlyingcorrelationiszeroandr
xy
differsfromitonlybecause
ofrandomfluctuations.However,thisgivesthesameinformationastestingthe
hypothesisthattheslopeorregressioncoefficientbdiffersfromzeroonlybecauseof
randomfluctuations.Wehavealreadyseen(Example14.3)howtotestahypothesis
concerningtheunderlyingvalueofthelinearslope,b.Thereforethisbookwillnotbe
concernedwithatestofhypothesisforr
xy
.
Acommonmisunderstandingistoassumethatastrongcorrelationbetweentwo
variablesisevidencethatonecausestheother.Thismaybecorrect,butoftenthereis
anotherexplanation.Huff(HowtoLiewithStatistics,seesection15.2forreference)
citesanumberofexamplesinwhichsomeotherexplanationismorelikely.One
examplehequotesisthatforcertainyearstherewasacloserelationshipbetweenthe
salariesofPresbyterianministersinMassachusettsandthepriceofruminHavana,
Cuba.Closerexaminationindicatesthatbothwerelikelyduetoacommonfactor,a
markedandwidespreadinflationofsalariesandpricesoverthoseyears.
Ifthedatacomefromawellplanned,carefullyrandomizedexperiment,rather
thanfromaccumulatedroutinedata,thereismuchlessprobabilitythatotherfactors
areresponsible.Inthatcase,itisconsiderablymorelikelythatacorrelationindicates
somesortofcausalrelationbetweenthevariables.
2
Thesquareofthecorrelationcoefficient, r
xy
,iscalledthecoefficientofdetermination.
n

2
Thesumofsquaresofdeviationsfrom y inthey-directionis

(
y y
) .The
i
i1
coefficientofdeterminationisthefractionofthissumofsquareswhichisexplained
bythelinearrelationbetweeny andxgivenbyregressionofyonx.Thenthe
n n
2

2
i

coefficientisgivenbytheratioof

(
y y
) to

(
y y
) .Acloselyrelated
i
i1 i1
quantityisthecoefficientofmultipledetermination,whichisusefulinmultiple
linearregression.
Ifthecorrelationcoefficientorthecoefficientofdeterminationbecomeslarger
forthesamealgebraicforms,thatindicatesthattherelationshipbetweenthevari-
ableshasbecomestronger.However,ifanalgebraicformchanges,sayfromxtolnx,
comparingthevaluesofthecoefficientsisnotuseful.Instead,inthatcaseweshould
comparethevariancesofthepointsaroundtheregressionlines.
366
RegressionandCorrelation
Example14.5
a) CalculateacorrelationcoefficientforthedataofExample14.1.
b) Whatfractionofthesumofsquaresofdeviationsinthey-directionfrom y is
explainedbythelinearrelationbetweenyandxgivenbyregression?
S
147.637
Answer:a) r
xy

xy
0.9987
xx yy
(
143
)(
152.815
)
S S
b) Thefractionofthesumofsquaresofdeviationsinthey-directionwhichis
explainedbythelinearregressionofyonxisequaltothecoefficientofdetermi-
nation,r
2
=(0.9987)
2
=0.997.Thus,inthiscaseonly0.3%ofthesumofsquares
ofdeviationsinthey-directionfrom y isnotexplainedbytheregression.
Forcomparison,thedatashowninTable14.1andplottedwiththeregression
lineofyonxinFigure14.2giveacorrelationcoefficientof0.952,andthe
regressionexplains90.7%ofthesumofsquaresofdeviationsinthey-direction
fromy.
14.6 Extension:IntroductiontoMultipleLinear
Regression
Ifthereismorethanoneinputorregressor,thebasicideasoflinearregressionstill
apply,butthealgebrabecomesconsiderablymorecomplicated.
Letuslookbrieflyatthreesimplecasesinwhichthereareonlytwoorthree
inputs.Ifthetwoindependentinputsarexandz,whichenteronlyasfirstpowers,the
relationforpointiwithresiduale
i
becomes
y
i
=a+bx
i
+cz
i
+e
i
(14.21)
Ifthetwoinputsarexandx
2
,therelationstillcomesundertheheadingofmultiple
linearregression:
y
i
=a+bx
i
+cx
i
2
+e
i
(14.22)
Thiscouldbeextendedtoalongerpowerseries.Ifbothxandzaffecty,butnow
second-ordertermsareincluded,therelationforpointibecomes
y
i
=a+bx
i
+cx
i
2
+dz
i
+ez
i
2
+fx
i
z
i
+e
i
(14.23)
Noticethattheterminvolvingx
i
z
i
representsaninteractionofthesortdiscussedin
section11.3.
Multiplelinearregressioncanincludetermsofthetypediscussedinpart(a)of
section14.4.Theseareformsnonlinearinxoryorboth,butlinearinthecoefficients.
Thus,thetermlinearinmultiplelinearregressionreferstofittingequations
whicharelinearinthecoefficients.Forexample,dataforvaporpressureofpure
componentsaresometimesrelatedtotemperaturebyexpressionsofthefollowing
form:
367
x
Chapter14
lny=a
b
+ c1nx+ d x
6
Thisequationcanbefittedtodatabymultiplelinearregression.
Asthenumberoftermsincreases,thecomplexityofthealgebraincreases.For
eachadditionaltermthereisanothercoefficienttobedeterminedbythemethodof
leastsquares.Thealgebraofthetheoreticaldevelopmentbecomessimplerifweuse
matrixnotation,buttheresultingexpressionsstillhavetobeexpandedintoscalar
algebraforcalculations.
Furthermore,theanalysisofmultiplelinearregressionfrequentlyinvolvesre-
calculationwithmoreorfewerterms.Weaddtermstotrytodescribetherelationship
morefully,orweremovetermsthatdonotcontributesignificantlytoauseful
description.
Thus,present-daycalculationsofmultiplelinearregressionarealmostalways
doneonadigitalcomputerusingspecializedsoftware.Variouscomputerpackages,
suchasSASandSPSS,areextremelyusefuloncethereaderhasagoodgraspofthe
fundamentals.
Ifthedataformultilinearregressioncomefromroutineoperatingdataratherthan
fromadesignedexperiment,wehavetoworryaboutpossiblecorrelationamongthe
inputs.Thatiseliminatedifdataarefromadesignedexperimentwithappropriate
randomization.
Problems
x
1. Scrapsofironwereselectedonthebasisoftheirdensities,x
i
,andtheiriron
contents,y
i
,weremeasured.Theresultswereasfollows:
i
2.8 2.9 3.0 3.1 3.2 3.2 3.2 3.3 3.4
y
i
27 23 30 28 30 32 34 33 30
Findtheregressionequationofyonxbythemethodofleastsquares.
2. Forthedatagiveninproblem1above,useagraphtocheckwhethertheformof
theequationrepresentsthedataadequately,whetherthevarianceappearstobe
independentofx,andwhethertheresidualsappeartobenormallydistributed.
3. Forthedatagiveninproblem1above,assumethatgraphicalchecksforfit,
constancyofvariance,andnormalityaresatisfied.Find:
a) thestandarderroroftheslopeoftheregressionequation,and
b) the95%confidencelimitsoftheslope.
4. Forthedatagiveninproblem1above,
a) findthesamplecorrelationcoefficient
b) whatpercentageofthevariationofy
i
about y isexplainedbyregression?
368
RegressionandCorrelation
y
x
5. Thedensityofmoltensaltmixtures,yg/cm
3
,wasmeasuredatvarioustempera-
turesxC.Theresultswere:
i
250 270 290 310 330 350
i
1.955 1.935 1.890 1.920 1.895 1.865
a) Plotagraphofyvs.xshowingthesepoints(thegraphiscalledascatter
diagram).
b) Calculatex,y,x
2
,y
2
,xy, x,y.
c) Calculatetheregressionequationofyonxintheformy=a+bx.Plotthis
lineandthecentroidalpointonthegraph.
d) Calculatetheregressionofxonyintheformx=a+by.Plotonthesame
graphasforparts(a)and(c).
6. Forthedatagiveninproblem5above,useagraphtocheckwhethertheformof
theequationinpart(c)representsthedataadequately,whetherthevarianceappears
tobeindependentofx,andwhethertheresidualsappeartobenormallydistributed.
7. Assumethatthegraphicalchecksofproblem6abovearesatisfactory.Thenfor
thedatagiveninproblem5above:
2
a) Calculatetheestimatedvariancearoundtheregressionline,
s
y x
.
b) Istheestimatedregressioncoefficient,b,significantlydifferentfromzeroat
the1%levelofsignificance?Canweconcludethattemperaturehasasignifi-
canteffectondensityinthiscase?
c) Whatisthe95%confidenceintervalfor,thetrueslopeorregression
coefficient?
d) Calculatethe95%confidenceintervalforthemeanvalueofyateachof
x=250,300,350.
e) Supposewerepeattheexperiment.Whatisthe95%predictionintervalfor
individualvaluesofyateachofx=250,300,350?
8. Forthedatagiveninproblem5above,
a) Calculatethecorrelationcoefficientforthisdataset.
b) Calculatethecoefficientofdetermination.
9. Aphysicalmeasurement,suchasintensityoflightofaparticularwavelength
transmittedthroughasolution,canoftenbecalibratedtogivetheconcentration
ofaparticularsubstanceinthesolution.9pairsofvaluesofintensity(x)and
concentration(y)wereobtainedandcanbesummarizedasfollows:
x=30.3,y=91.1,xy=345.09,x
2
=115.11,y
2
=1036.65.
(a) Findtheregressionequationforyonx.
(b) Findthecorrelationcoefficientbetweenxandy.
(c) Assumingthatgraphicalchecksaresatisfactory,testthenullhypothesisthat
theslopeoftheregressionlineofyonxisnotsignificantlydifferentfrom
zero,usingthe1%levelofsignificance,and
(d) Findthe99%confidencelimitsoftheslopeofthestraightlinerelation
givingyasafunctionofx.
369
Chapter14
10. Anengineeringstudenthasasummerjobwiththeforestryservice.Hemeasured
thetreetrunkdiameters(x)andrelatedthemtotheageofthetree(y).The
followinginformationwasobtained:
n=6,x=21,x
2
=91, y=264,y
2
=142.52,xy=113.8
a) Find:
i) theregressionequationofyonx
ii)thecorrelationcoefficientbetweenxandy.
b) Assumingthatgraphicalchecksaresatisfactory,testwhethertheregression
coefficientissignificantlydifferentfrom0atthe1%levelofsignificance.
11. Gasolineconsumptionofatestautomobilewasrecordedatspeeds(x)ranging
from56to112km/hr.Theobservedgasolineconsumptionswereconvertedto
distancetraveledperliterofgasoline(y).Thefollowinginformationwascompiled:
x=984,x
2
=84416,xy=13418.4,y=165.3,y
2
=2282.45,n=12.
a) Findtheregressionequationforyonx.
b) Findthecorrelationcoefficient.
12. Assumingthatthegraphicalchecksaresatisfactory,forthedatagiveninproblem
11above:
a) Testthehypothesisthatslope(regressioncoefficient)isequalto0.045,
using5%levelofsignificance.
b) Findthe90%confidencelimitsforthemeandistancetraveledperliterof
gasolineat90km/hrspeed.
13. Itismucheasiertomeasurediametersofspotweldsthantomeasuretheirshear
strengths,andundersomeconditionstheyarerelated.Correspondingvalues
wereobtainedfor10welds,withshearstrengthsexpressedbyyp.s.i.andweld
diameters,x,expressedasthousandthsofaninch.Forthese10pairsofdata,
y=22860,x=2325,y
2
=67,719,400,x
2
=697,425,xy=6,872,250.
a) Findtheregressionequationforyonx.
b) Findthecorrelationcoefficient.
c) Assumingthatgraphicalchecksaresatisfactory:
i) Istheslopeoftheregressionequationsignificantlydifferentfrom0at
the5%levelofsignificance?
ii) Testthehypothesisthattheslope(regressioncoefficient)isequalto10,
using1%levelofsignificance.
iii) Findthe95%confidencelimitsforthetrueshearstrengthofaweld
0.210inchesindiameter.
14. Ms.PatsyKnowlet,awaterqualityengineer,notedthatthereseemedtobea
closeconnectionbetweenanimportantstreamflowwaterqualityparameter,Y,
andtheflow,Xm
3
/s.Shefoundthat9pairsofobservationsyieldedthefollowing
data:x=15.2;x
2
=57.6;y=45.6;y
2
=518.3;xy=172.6.Shewouldlike
todevelopanequationthatwouldallowhertopredictYknowingX.
370
RegressionandCorrelation
a) Findthebestestimateofthelinearregressionlineofyonx.
b) Findthecorrelationcoefficientbetweenxandy.
15. Assumethattheappropriategraphicalcheckissatisfactory.Thenforthedata
giveninproblem14above:
a) Findthe95%confidencelimitsoftheslopeorregressioncoefficient,.
b) Whatarethe95%confidencelimitsforthepredictedwaterqualityparam-
eter,Y,ataflowof4.6m
3
/s?
16. Shearstress(y)andrateofshear(x)canbemeasuredforaliquidinaviscometer.
For12pairsofvaluesthedatacanbesummarizedasfollows:x=132.0,
y=151.7,x
2
=1944.0,y
2
=2570.48,xy=2233.2
a) Findthelinearregressionofyonx.
b) Whatisthecorrelationcoefficient?
c) Whatfractionofthevarianceofyisexplainedbythecorrelation?
17. Assumethattheappropriategraphicalcheckissatisfactory.Thenforthedata
giveninproblem16above:
a) Findthestandarderroroftheslope.
b) Findthe95%confidenceintervaloftheslope.
c) Istheslopesignificantlysmallerthanaslopeof1.210atthe5%levelof
significance?
18. Thenumberoferrorsperhourofradiotelegraphists(y)asafunctionofthe
temperature(x)isdetermined.Therelevantdataabouttherelationshipareas
follows:
x=118 x
2
=3510
y=56.1 y
2
=809.63
xy=1679.2 n=4
a) Findtheregressionlineoferrors/hourontemperature.
b) Findthecorrelationcoefficientbetweentemperaturesanderrors/hour.
c) Assumingthatgraphicalchecksaresatisfactory:
i) Istheslopeoftheregressionlinesignificantlydifferentfromzero?What
doesthisimplyabouttherelationbetweenyandx?
ii) Findthe99%confidencelimitsoftheslopeoftheregressionline.
19. Therelationshipbetweenthetemperatureofarocketengine(t)andthethrust
force(f)wasinvestigatedinaseriesoftests.Pairsofdatafortandf(insuitable
units)werecollectedandcanbesummarizedasfollows:n=15; t=540;
f=33.00;t
2
=21426;f
2
=77.08;tf=1267.10
a) Findtheregressionequationoffont.
b) Assumingthatgraphicalchecksaresatisfactory:
i) Findthe95%confidencelimitsoftheinterceptandslope.
ii) AccordingtoSnookerstheory,theslopeofthestraightlinerelatingf
andtinthisrangeofvaluesshouldbe0.0500.Dothedatadisagree
significantlywithSnookeratthe5%levelofsignificance?
371
Chapter14
20. Thespeed(rpm)ofaDanorventilationfanwasvariedandtheairflowcapacity
(cubicmeterspersecond)ofthefanwasmeasured.Thedatapairsofcapacity(y)
versusspeed(x)werecollected.Thefollowinginformationwasobtained:
n=12,x=118.98,y=30.09,x
2
=1251.06,y
2
=80.57,xy=317.34.
(a) FindtheregressionequationofYonX,and
b) FindthecorrelationcoefficientbetweenXandY.
c) Assumingthatgraphicalchecksaresatisfactory,testwhethertheslopeofthe
regressionlineissignificantlydifferentfromzeroatthe1%levelofsignifi-
cance.
21. ThevalueofY,thepercentagedecreaseofvolumeofleatherfromthevalueat
oneatmospherepressure,wasmeasuredforcertainfixedvaluesofhighpressure,
xatmospheres.Therelevantdataabouttherelationshipareasfollows:
x=28,000,y=19.0,xy=148,400,x
2
=216,000,000,y
2
=102.2,n=4.
a) Findtheregressionlineofyonx.
b) Findthecorrelationcoefficientbetweendecreaseofvolumeandpressure.
c) Assumingthatgraphicalchecksaresatisfactory:
i) Testwhethertheslopeoftheregressionlineissignificantlydifferent
fromzeroatthe1%levelofsignificance.
ii) Findthe99%confidencelimitsoftheslopeoftheregressionline.
372
CHAPTER
15
Sourcesof
FurtherInformation
15.1 UsefulReferenceBooks
Manyreferencebooksareavailableintheareaofprobabilityandstatisticsfor
engineers.SomethatIhavefoundusefulwillbementionedhere.Detailedreferences
aregiveninsection15.2.
Somereadersmaywantbooksthatarealittlemoretheoreticalandadvancedthan
thisone.ThebookbyWalpoleandMyersissuchabook.Itisclearlywrittenand
containssuchtopicsasBaysianandmaximumlikelihoodapproachestoestimation.It
alsocontainsachapteronnonparametricstatisticsandoneonstatisticalquality
control.
ThebookbyBurrfocusesonstatisticswhileprovidingagoodbackgroundin
probability.Thatauthorusestogoodeffecthisexperienceasastatisticalconsultant.
ThebookbyVardemanconcentratesonstatisticsandisverygoodinthatgeneral
area,includingconsiderablediscussionondesignofexperimentsandanalysisof
experimentalresults.Itcontainsalargenumberofreportsonstudyprojectsdoneby
undergraduatestudents.
Ontheotherhand,thebookbyZiemerconcentratesonprobabilityandits
applicationsinelectricalengineering,ratherthanonstatistics.Somereadersofthe
presentbookwillwantnoworlateramoremathematicallyrigorousdevelopmentof
probability;theyshouldconsiderthebookbyZiemer.
ThebookbyRossalsotakesamorerigorousapproachtoprobabilityandstatistics.
ThebookbyBarnesisnotablebecauseittakesanapproachstronglybasedon
usingacomputer.Itusesdiskettesofspeciallyformulatedsoftwareforcalculations
involvingprobabilityandstatistics.
ThebookbyKennedyandNevillehasbeenpopularwithengineeringstudents
becauseitincludesmanyproblemsfrompracticalengineeringsituations,bothsolved
problemsandproblemsforthestudenttosolve.Thepresentbooktriestofollowthe
exampleofKennedyandNevilleinthatregard.
373
Chapter15
Finally,therearefourareasforwhichspecializedbooksshouldberecommended.
Oneisdesignandanalysisofexperiments,forwhichmentionhasalreadybeenmade
inChapters11and12ofbooksbyBox,Hunter,andHunterandbyMontgomery.
Anotherareaisregression,bothsimpleandmultiple,forwhichIhavefoundthe
bookbyDraperandSmithveryuseful.Amoreup-to-datereference,andanexcellent
sourceofinformation,isthebookbyRyan.Athirdareaistheapplicationofprob-
abilitytothetheoryofelectricalcommunicationsystems,forwhichthebookby
Haykinissuitable.Thefourthisengineeringreliability,forwhichthebookby
BillintonandAllanissuggested.
15.2 ListofSelectedReferences
Barnes,J.Wesley.StatisticalAnalysisforEngineersandScientists,AComputer-
basedApproach.NewYork:McGraw-Hill,1994
Billinton,Roy,andRonaldN.Allan.ReliabilityEvaluationofEngineeringSystems:
ConceptsandTechniques,SecondEdition.NewYork:PlenumPress,1996
Box,G.E.P.,W.E.G.Hunter,andJ.S.Hunter.StatisticsforExperimenters.NewYork:
Wiley,1978
Burr,IrvingW.AppliedStatisticalMethods.NewYork:AcademicPress,1974
Haykin,Simon.CommunicationsSystems,FourthEdition.NewYork:Wiley,2000
Huff,Darrell.HowtoLiewithStatistics.NewYork:Norton,1954,1993(reissue)
Johnson,RichardA.Miller&FreundsProbabilityandStatisticsforEngineers,
SixthEdition.NewJersey:PrenticeHall,2000
Kennedy,JohnB.,andAdamN.Neville.BasicStatisticalMethodsforEngineers
andScientists,ThirdEdition.NewYork:Harper&Row,1986
Mendenhall,William,DennisD.WackerlyandRichardL.Scheaffer.Mathematical
StatisticswithApplications,FourthEdition.Boston:PWS-Kent,1995
Montgomery,DouglasC.DesignandAnalysisofExperiments,5thEdition.New
York:Wiley,2000
Ross,SheldonM.IntroductiontoProbabilityandStatisticsforEngineersand
Scientists,SecondEdition.AcademicPress,2000
Ryan,ThomasP.,ModernRegressionMethods,NewYork:Wiley,1997
Vardeman,StephenB.StatisticsforEngineeringProblemSolving.Boston:PWS
Publishing,1994
Walpole,RonaldE.,&RaymondH.Myers.ProbabilityandStatisticsforEngineers
andScientists,7thEdition.NewJersey:PrenticeHall,2002
Ziemer,RodgerE.ElementsofEngineeringProbability&Statistics.NewJersey:
PrenticeHall,1997
374
Appendices
AppendixAcontainsprobabilitytablesforuseinstatisticalcalculations.These
areforthenormaldistribution,thet-distribution,thechi-squareddistribution,andthe
F-distribution.ThenumbersinthetableswerecalculatedusingMSExcel.
AppendixBdescribessomeoftheabilitiesandpropertiesofMSExcelwhichare
usefulinstatisticalcalculations.ThisappendixgivesbriefinstructioninusingExcel
forsuchapurpose,butthereaderisassumedtohaveabasicknowledgeofExcel
already.
AppendixCdescribessomefunctionsofExcelnotrecommendedforusewhile
thereaderislearningthefundamentalsofprobabilityandstatistics.Thesecansave
timeincalculationsoncethereaderhasfullyunderstoodthefundamentals.
AppendixDcontainsanswerstosomeoftheproblemsets.
375
AppendixA
AppendixA:Tables
TableA1
CumulativeNormalProbability
(z)=Pr[Z<z]
4 3 2 1 0 1 2 3 4
z
TableA1(continued)
376
Appendices
TableA1(continued)
CumulativeNormalProbability
(z)=Pr[Z<z]
4 3 2 1 0 1 2 3 4
z
377
AppendixA
TableA2:t-distribution
One-tailprobability
t
1
t
378
Appendices
TableA3:Chi-squaredDistribution
Upper-tailprobability

2
Chi-squared
379
TableA4:F-Distribution
A
p
p
e
n
d
i
x

3
8
0

Upper-tailprobability
f
limit
F
ValuesofFwithdf1degreesoffreedominthenumerator
anddf2degreesoffreedominthedenominatortogive
Upper-tailProbabilityof0.05
TableA4:F-Distribution(continued)
Upper-tailprobability
f
limit
F
ValuesofFwithdf1degreesoffreedominthenumerator
anddf2degreesoffreedominthedenominatortogive
Upper-tailProbabilityof0.01
A
p
p
e
n
d
i
c
e
s

3
8
1

AppendixB
AppendixB:SomePropertiesofExcelUseful
DuringtheLearningProcess
(a) Formulas
Aformulacombinesvalueswithoperatorssuchasaplussign.Wewillbeconcerned
atpresentwithonlythearithmeticoperators,whichare+,,/,*,%,and^.Values
maybeexpressedasnumberconstants,suchas34.7,orasreferencestothecontent
ofacell,suchasF28(meaningthecellthatisbothincolumnFandinrow28).
Insteadofreferenceswemayusenames,suchasCost,ifthosenameshavebeen
defined.AnExcelformulaalwaysbeginswithanequalsign,=.Thatsignindicatesto
thecomputerthatthecontentofacellisaformulathatneedstobeevaluated.In
manycasesthemostconvenientmethodofinsertinganExcelfunctionistopasteit
intotheappropriatecell.Thisisdiscussedbrieflyattheendofsection5.5.
Asalwaysinengineeringcalculations,wemustmakeclearhowweareperform-
ingacalculation.Whentheformulainacellhasbeenenteredcorrectly,the
correspondingcellonthecomputerscreenwillshowthearithmeticresultforthat
formula.Forexample,enteringtheformula=20+34willgivetheresult,54,andthat
willshowinthespaceforthecellonthescreen.Ifweselectthatcell,theFormula
Barwillshowtheformula.Butifweprinttheworksheet,wewillseeinthatcell
onlytheresult,54,andtheformulawillnotappear.Thentomaketheprintedwork
sheetmoreunderstandable,aneighboringcell(usuallytotheleftorrightofthecell
inquestionoraboveit)shouldgiveaclearstatementoftheformulabeingused.That
statementmustnotbeginwithanequalsign,orExcelwillinterpretitastheformula
itself.Instead,itshould(forpurposesofthisbook)endwithanequalsigne.g.,20
+34=.Weseeinstancesofthisinthebodyofthetext,suchasExamples3.4,4.4,
and4.5.
(b)ArrayFormulas
Anordinaryformula,asinpart(a)above,producesaresultinjustonecell.Oftenwe
wanttoproduceresultssimultaneouslyintwoormorecells.Forthatweuseanarray
formula.Forexample,wemaywanttocalculatethedeviationofeachmeasurement
fromthemeanofthosemeasurements.Saythemeasurementsareinrows18to88of
columnB,whichweshowasB18:B88,andthemeanofthemeasurementsisincell
B90.Wecouldcalculateeachdeviationseparately,as=B18-B90incellC18,=B19-
B90incellC19,andsoon.Afasteralternativeistocalculatethemalltogetherby
thearrayformula=B18:B88-B90incellsC18:C88.Itisclearthatthearrayformula
canbeappliedmuchmorequicklythanthe71individualformulas.
Toapplythearrayformulawefirstselectthecellsinwhichwewanttheanswer
toappear,cellsC18:C88inthiscase.Thenwetypeintheformula,whichis
=B18:B88-B90forthisexample.Thentoindicatetothecomputerthatweare
applyinganarrayformula,wepressnotjustEnterbutsimultaneously
CONTROL+SHIFT+ENTER(note:threekeys)inExcelforWindows,or
382
Appendices
COMMAND+ENTERinExcelforMacintosh.Thearrayformulaisshowninthe
FormulaBarinsidebraces,{},butdonottypethebracesyourself.
(c) Sorting
TheSortcommandcansaveagooddealofeffortindevelopingafrequencydistribu-
tionandinfindingquantilesfromthedistribution.TheSortcommandisontheData
menuofExcel.Forexample,saythedatawewanttosortareincolumnsAandB,
andwewanttosortaccordingtothenumbersorlettersincolumnB,whichisheaded
Thicknessinrow1.WeselectcolumnsAandB,thenfromtheDatamenuwechoose
Sort.TheSortdialogappears,andweclickthebuttonindicatingthatthelisthasa
headerrow.WeselectsortingbytheheadingThicknessinascendingorder,increas-
inginmagnitudefromthesmallesttothelargest,thenclicktheOKbutton.(Note
thatifthefirstresultsarenotintheformdesired,wecanimmediatelyafterward
chooseUndoSortfromtheEditmenu,thentryagain.) Afterthedatahavebeen
sortedfromthesmallesttothelargestwecannumbertheminorder,sayincolumn
C,byentering1inthefirstrowand2inthesecondrow,thencompletingtheseries
byselectingthesefirsttwocellsinthecolumn,thendraggingthefillhandledown
andreleasingthemousebuttonwhenallthedatahavebeennumbered.
(d)Summing
OfcoursewecanaddupacolumnoffiguresandputtheresultincellB6byselect-
ingthatcellandtyping(say)=B1+B2+B3+B4+B5,thenpressingenterorreturn.It
isusuallyfasterandmoreconvenienttosumacolumnorrowofdatabyselectingthe
cellattheendofthatcolumnorrowofdataandclickingtheAutoSumtoolonthe
standardtoolbar.TheAutoSumtoolismarkedwiththeGreeklettersigma,.Afterit
isclicked,thecellswhichwillbeaddedaresurroundedbydottedlines,andthe
formulabarshows=SUM(B1:B5)ifcellsB1toB5aretheonesinquestion.Ifthere
arepossiblecellstotheleftoftheselectedcellaswellasaboveit,eithersetofcells
mighthavebeenchosen.Thenweneedtomakesurethatwehavetherightones.If
weselectthecellswewantjustbeforeweselectthecellforthesum,itseemsto
comeoutright,butthatmaynotalwaysbeso.
(e) Functions
AnExcelfunctionisaspecialprewrittenformulathatusesavalueorvaluesasinput,
performsanoperation,andreturnsavalueorvaluesasaresult.Excelfunctionsvary
greatlyincomplexity,fromsimplefunctionsthataddupinputquantitiestocomplex
functionsthatperformamultitudeoftasksinaparticularsequence.Excelfunctions
canbeinsertedbyeitheroftwomethods.OneistousetheInsertmenuandchoose
Function.ThenthePasteFunctiondialogappears,andwechooseacategorysuchas
Math&TrigorStatistical,thentherequiredfunctionsuchasSumorFrequency.A
dialogboxappearstopromptustochoosevaluesoftheargumentofthefunction.
Thealternativemethodistotypetheequalssign(sincethefunctionisatypeof
formula),thenameofthefunction,andthentheargumentswithinparenthesesand
separatedbycommas[forexample,=SUM(A1,A2)].
383
AppendixB
ManyoftheExcelfunctionsarenotrecommendedforusewhileapersonis
learningthefundamentalsofprobabilityandstatistics.Thatisbecausetheyactas
blackboxesthatperformcalculationswithoutrequiringanythinkingonthepartof
thepersonusingthem;thepersonhasonlytosupplytheinputvaluesandthecom-
putersuppliesthelogic.Thus,thesefunctionsdonothelptheprocessoflearningthe
fundamentals.Someofthesefunctionswhichareusefulatalaterstage,whena
personhasalreadygainedagoodfundamentalknowledgeofprobabilityandstatis-
tics,willbelistedinAppendixC.
However,afewfunctionscanberecommendedforuseevenwhenthefundamen-
talsarebeinglearned.Theyareasfollows.
(i) SumFunction
TheSUMfunctionsimplyaddsuptheargumentsor,iftheargumentsarerefer-
ences,thecontentsofthecells.TheSumfunctioninExcelisfoundinthe
MathematicsandTrigonometrycategory.Theargumentsforthisfunctionmaybe
arrays,suchasB1:B5;referencestoindividualcells,suchasA26;ornumbers,
suchas5.Thus,=SUM(16,13)gives29.IfA1contains11andB1contains7,
=SUM(A1,B1) gives18.IfA2contains14,B2contains9,andC2contains6,
=SUM(A2:C2) gives29.Infact,theAutoSumtoolwhichwesawaboveusesthe
SUMfunction.
(ii) FrequencyFunction
TheFREQUENCYfunctioncountsthenumbersofvalueswithingivenclass
boundariesandreturnsafrequencyarray.ItisfoundinthecategoryofStatistical
functions.InmostcasestheargumentsoftheFREQUENCYfunctionare,first,
thereferencetothearrayofcellscontainingvaluesofthedatawhicharetobe
counted;then,second,thereferencetothearrayofcellsgivingtheupperclass
boundariesinascendingorder(thatis,beginningwiththesmallestandworking
up).Asusual,theseargumentsareseparatedbycommas.Thenumberofvalues
lessthanthelowestupperclassboundaryappearsinthefirstcell,andthenum-
bersofvaluesmorethanpreviousupperclassboundariesbutlessthansuccessive
upperclassboundariesappearinsubsequentcells,endingwiththenumberof
valueslargerthanthelargestclassboundary.Thus,thenumberofcellsforclass
frequenciesisonemorethanthenumberofupperclassboundaries.
Theprocedureistoselecttheverticalorhorizontalarrayofcellswherewewant
theclassfrequenciestoappear,thentoenter=FREQUENCY(inputreference,class
boundariesreference).Anillustrationoftheuseofthefrequencyfunctionisgivenin
Example4.4,wherethearrayformula=FREQUENCY(B2:B122,B135:B143)was
enteredincellsD135:D144.Thenthevaluesofthedataweretakenfromcells
B2:B122,theupperclassboundarieswereincellsB135:B143,andtheresulting
frequencieswereplacedincellsD135:D144.
384
Appendices
(f) MakingHistogramsorOtherChartsorGraphs
AsweseeinChapter4,histogramsareusedfrequentlytoshowgraphicallytheclass
frequenciesforvariousclassesorintervalsofthevariate.Theinformationfora
histogramiscontainedinagroupedfrequencytable.TheChartWizardprovidesa
convenientwaytoproduceahistogramorotherchartorgraph.
Achartcanbeproducedfromatableofdata(forahistogram,thatwouldbea
groupedfrequencytable)byselectingChartfromtheInsertmenu.Modifications,
majororminor,ofthechartareproducedusingtheChartmenu.Theprocedurefor
histogramsisdiscussedinmoredetailinsection4.5,particularlyinExample4.4.
385
AppendixC
AppendixC:FunctionsUsefulOncethe
FundamentalsAreUnderstood
ThereareanumberofExcelfunctionswhichshouldnotbeusedduringthelearning
processbutcanbeveryusefullateron.Thefollowingstatisticalfunctionsfallinthis
category:
AVEDEV()calculatesthemeanoftheabsolutedeviationsfromthemean(see
section3.3.4).
AVERAGE()returnsthearithmeticmeanofthearguments.
COUNT()countsthenumbersinthelistofarguments.
COUNTA()countsthenumberofnonblankvalues.
DEVSQ()calculatesthesumofsquaresofdeviationsofdatapointsfromtheir
2
samplemean,e.g.

(
x
i
x
)
.
GEOMEAN()returnsthegeometricmeanofthearguments.
HARMEAN()givestheharmonicmeanofthearguments.
LARGE(array,k)returnsthekthlargestvalueinthearray.
MAX()givesthemaximumvalueinalistofarguments.
MEDIAN()returnsthemedianofthestatednumbers.
MIN()givestheminimumvalueinalistofarguments.
MODE()returnsthemodeofthedataset.
PERCENTILE(array,k)returnsthekthpercentileofnumbersinthearray.
PERCENTRANK(array,x,)returnsthepercentagerankofxamongthevaluesinthe
array.
QUARTILE(array,)returnstheminimum,maximum,median,lowerquartile,orupper
quartilefromthearray.
RANK()givestherank(orderinasortedlist)ofanumber.
STDEV()givesthesamplestandarddeviation,s,ofasetofnumbers.
STDEVP()calculatesthestandarddeviation,,ofasetofnumberstakenasa
completepopulation.
TRIMMEAN(array,)calculatesthemeanafteracertainpercentageofvaluesare
removedatthetopandthebottomofthesetofnumbers.
VAR()returnsthesamplevariance,s
2
,ofasetofnumbers.
VARP()findsthevariance,
2
,ofasetofnumberstakenasacompletepopulation.
386
Appendices
AppendixD:AnswerstoSomeoftheProblems
Thefollowinganswersarebelievedtobecorrect,butifyoufinddifferentanswers
whichseemright,youshouldcheckwithyourinstructor.
Inchapter2,section2.1,problemsetbeginningonpage10:
1 (a)3/14,(b)9/14,(c)11/14
2 (i)0.644,(ii)0.689,(iii)0.089,(iv)0.267
4 (a)0.0909,(b)0.143
6 (a)64,(b)84,(c)52,(d)0.619
8 (a)(i)1/6,(ii)5to1,(iii)1to5
(b)(i)1/26,(ii)25to1,(iii)1to25
Inchapter2,section2.2,problemsetbeginningonpage25:
1 (a)0.045,(b)0.955,(c)21to1
3 (i)80,(ii)0.750,(iii)0.340
5 (a)26,(b)0.308
8 6
9 3
11 (a)0.216,(b)0.324,(c)0.216
13 ForC-F-C0.512.ForF-C-F0.384.ThenchooseC-F-C.
16 (a)0.904,(b)0.0475,(c)0.0250
19 (a)0.192,(b)0.344,(c)0.757
21 (a)(i)0.526,(ii)0.0526,(b)0.0093,(c)1.53x10
9
Inchapter2,section2.3,problemsetbeginningonpage32:
1 5040
3 (i)36,(ii)15,(iii)26
7 10combinations
9 (a)6.110
4
,(b)4.9510
4
,(c)1.5410
6
11 56
15 (a)0.067,(b)0.333
Inchapter2,section2.4,problemsetbeginningonpage38:
1 (a)0.261,(b)0.652
3 (a)0.907,(b)0.118,(c)0.282
7 (a)0.28,(b)0.755,(c)0.371
9 (a)1.82%,(b)29.6%,(c)26.7%
Inchapter3,sections3.1to3.4,problemsetbeginningonpage60:
1 21.575mm,21.57mm
2 (a)0.0746mm
2
,0.273mm,1.27%
(b)0.0895mm
2
,0.299mm,1.39%
387
AppendixD
Inchapter4,sections4.1to4.5,problemsetbeginningonpage80:
3 (e)79,75,84, (f)79.4, (i)80%
Inchapter5,sections5.1and5.2,problemsetbeginningonpage91:
1 (b)1.23
3 (a)1.50,0.583,(b)0.0917
5 (b)2.333,(c)0.556,0.745
9 (a)0.162,(b)57%
Inchapter5,section5.2,problemsetbeginningonpage97:
1 $1425
7 (a)9.875,10.12,(b)0.830,(c)0.059
9 (a)0.717,(b)$350,(c)8.47
11 (a)$2.25million,(b)$0.30million
Inchapter5,section5.3,problemsetbeginningonpage111:
3 0.264
5 (b)2/3,1/3,(d)0.812,(e)28.9%
7 (a)0.0137,(b)0.0152
Inchapter5,section5.4,problemsetbeginningonpage126:
5 (a)1.20,(b)1.22
7 (b)0.36,(c)0.20,(d)1.13
9 (a)0.717,(b)0.14,(c)0.036
11 (a)0.45,(b)0.19,(c)0.14,(d)0.05
Inchapter5,section5.6,problemsetbeginningonpage138:
1 0.0575,0.6227vs0.600etc.
5 (b)1.225,1.2297,(c)1.107,(d)0.0014
Inchapter6,section6.1,problemsetbeginningonpage147:
3 (c)(i)0.393,(ii)0.368,(iii)0.238
Inchapter6,section6.2,problemsetbeginningonpage153:
1 (a)1.5,(b)0.25,(c)0,(d)1.65,(e)0.533
3 (b)1.5,(c)0.2887,(d)0.5774
5 (a)1/3month,(b)1/3month,(c)0.865,(d)0.950
Inchapter7,sections7.1to7.4,problemsetbeginningonpage170:
1 (a)95.2%,(b)0.5%
3 (a)0.4%,(b)98.6%
7 (a)50.89kg,(b)39.8%,(c)51.2kg
13 (a)0.215cm,(b)0.826
17 $13,660
21 (a)(i)0.115,(ii)0.576,(iii)0.309
(b)332L/min,(c)$29.0/hr
388
Appendices
Inchapter7,sections7.5to7.7,problemsetbeginningonpage193:
1 (b)0.3125,(d)0.308
5 (a)(i)0.002,(ii)0.005,(iii)0.01
(b)(i)0.370,(ii)0.371,(iii)0.390
7 0.0015,0.0015,0.003
Inchapter8,sections8.1to8.4,problemsetbeginningonpage208:
1 (i)0.147kg,(ii)2.94kg
3 (a)12.6%,(b)100.63kg,(c)5.008kg
5 (a)$55.46,(b)0.064
13 0.019
Inchapter9,section9.1.1,problemsetbeginningonpage218:
1 z=2.14>1.96.Adjustmentrequired.
3 Observedlevelofsignificanceis<0.1%.Significantat1%level.
5 (a)11.1%,(b)0.3%
9 (a)37.21kg,(b)0.019,(c)37.5kg,0.031
Inchapter9,section9.1.2,problemsetbeginningonpage223:
5 (a)106,(b)74%,(c)0.725%
7 (a)0.28,(b)31
11 (b)17.4%,(c)14
13 (a)17,(b)98.9%
15 (a)z=2.08,adjust,(b)0.046,(c)42
9 t
7 t
Inchapter9,section9.2,problemsetbeginningonpage240:
1 (a)t=-2.67,yes,(b)0.57to0.95ppm
calculated
=1.686,t
critical
=2.201,notsignificant
calculated
=1.745,t
critical
=2.571,nosignificantdifference
=1.673,t
critical
=2.365,differencenotsignificant (a)t 13
calculated
(b)t
calculated
=1.038,t
critical
=1.761,differencenotsignificant
Inchapter9,sections9.1and9.2,problemsetbeginningonpage245:
3 2.06<2.33sono
7 (a)4.41>1.701soyes
(b)1.63<1.701sono
Inchapter10,section10.1,problemsetbeginningonpage257:
7 (a)0.10levelofsignif.giveslimit3.18.2.04<3.18,sonotsignificant
(b)0.05levelofsignif.giveslimit1.88,2.66>1.88,sosignificantlymore
13 2.59<2.71,sonotsignificantlyhigher
Inchapter10,section10.2,problemsetbeginningonpage276:
1 (c)3
5 (b)5.6%,(c)8.1%
389
AppendixD
Inchapter14,sections14.1to14.5,problemsetbeginningonpage368:
5 (c)y=2.1417.71x10
4
x
(d)x=22101000y
7 (a)3.09x10
4
(b)3.67<4.60sonotsignificant.No
(c)1.88x10
4
to13.54x10
4
(d)Atx=250,1.913to1.984.Atx=300,1.890to1.930
(e)Atx=300,1.857to1.963.Atx=350,1.811to1.932
8 (a)0.878,(b)0.771
9 (a)y=0.257+2.930x
(b)0.991
(c)19.686>3.499sosignificant
(d)2.41to3.45
390
C
EngineeringProblem-SolverIndex
Thishandyindexshowsallofthesolvedexampleproblemsarrangedbyengineering
application.
A Example10.3,p.254
Example10.4,p.254
AnalysisofdatausingANOVA
(AnalysisofVariance)
Example12.1,p.299
Example12.2,p.308
Example12.3,p.312
Example12.4,p.318
Analysisofdatausingchi-squaredtest
Example13.2,p.328
Example13.3,p.329
Example13.4,p.331
Example13.5,p.333
Example11.1,p.276,281
Example11.2,p.281
Example11.3,p.282
Example11.4,p.183
Example11.5,p.284
Example11.6,p.286
Example11.7,p.287
Example11.8,p.288
M
Metal analysis
Example9.8,p.235
Chemical process control O
Example9.1,p.215
Choosingadistributiontypefora
Oresampleanalysis
particular application
Example9.3,p.221
Section6.3,p.155
Example9.4,p.222
Correlation
P
Example14.5,p.367
Particle size distribution
E
Example7.11,p.191
EstimatingdemandusingPoissondistribution
Plottingandanalyzingdatasets
Example5.14,p.121
Example4.1,p.63
Example5.17,p.136
Example4.2,p.68
Experiment design, testing effectiveness
Example4.3,p.72
Example9.9,p.237
Example4.4,p.75
Example9.10,p.238
391 391
Engineering Problem-Solver Index
Processcontrol
Example10.1,p.250,256
Production line quality
Example3.2,p.51
Example3.5,p.58
R
Random sampling
Example8.2,p.201
Example8.4,p.203
Example 9.2, 216
Example10.7,p.262
Example10.8,p.263
Example10.9,p.264
Regressionanalysisofdataset
Example14.1,p.355
Example14.2,p.357
Example14.3,p.358
Example14.4,p.361
Reliability, time to failure
Example7.2,p.156,163
Example7.3,p.166
Example8.5,p.204
S
Samplingcomponentsonproductionline
Example2.4,p.12
Example2.14,p.30
Example2.16,p.34
Example5.8,p.106
Example5.10,p.110
Example5.15,p.125
Example5.16,p.135
Example8.6,p.204
Example9.7,p.233
Example10.2,p.251
T
t-distribution
Example10.10,p.266
whentouseovernormaldistribution,
p.228-229
Testing for defective components
Example2.16,p.34
Example5.5,p.103
Example5.9,p.108
392
C
Index
A
additionrule,11
alternative hypothesis, 213
analysisofvariance,255,294-321
one-way, 295-304
two-way, 304-316
ANOVA,294
applications, 4
arithmetic mean, 41-42
axiomsofprobability,9
B
BayesRule,34-38
Bernoulli distribution, 132
beta distribution, 156
bias,285
binomial distribution, 101-111
nested, 110
blocking, 285
block, randomized analysis, 316
boxplots,65
Central LimitTheorem, 205-208
central location, 41
chance, 2
chi-squared distribution, 249
chi-squared function, 324-340
circular permutations, 31
classboundaries,67
coefficient of determination, 366
coefficientofvariation,50
combinations, 29-32
computer, 3, 55, 249, 325
binomial distribution, 264
equivalent to Normal Probability paper,
185
F-distribution, 253
normal distribution, 173
plottingindividualpointstocompare
withnormaldistribution,188
randomnumbers,284
conditional probability, 17
confidenceinterval,221,251,256,266
forvariance,251
confidence limits
proportion, 266
contingency tables, 331
continuous random variable, 141
correctionforcontinuity,179
correlation, 364-367
correlation coefficient, 365
cumulative distribution function, 86, 142
cumulative frequency, 67
cumulative frequency diagram, 72
cumulative probabilities, 184
D
deciles, 51
degreesoffreedom,228, 325
descriptive statistics, 41
design
sequential or evolulutionary, 278
designofexperiments,272-290
deterministic, 2
393 393
Index
diagnostic plots, 298
discreterandomvariable,84
E
empirical approach to probability, 7
errorsumofsquares,348
estimate, 221
interval, 221
estimate of variance
combinedorpooled,234
event, 9
evolutionary operation, 274
Excel, 4, 55, 75-80
expectation, 88
expected mean, 105
expected value, 149
experimentation, 273-290
factorial design, 274-276
randomization in, 280
exponential distribution, 155
extensions, 4
F
F-distribution, 252
F-test, 252, 253
factorialdesign,274,288-290
fairodds,9
fitting
normaldistributiontofrequencydata,
175
fitting binomial, 135, 136
fractional factorial design, 288-290
frequency distribution, 133
characteristics, 157
frequencygraphs,66
G
gamma distribution, 156
geometric distribution, 132
geometric mean, 43
goodnessoffit,327
graphical checks, 349
groupedfrequency,66
H
harmonicmean,43
histogram, 70
hypergeometric probability distribution,
132
hypothesis testing, 213
I
inference
mean
knownvariance,213
with estimated variance, 228
inference,forvariance,248
inferencesforthemean,228
inference, statistical, 212
inputs, 342
interaction, 274, 275
interfering factors, 236, 272
interquartile range, 45
L
leastsquares,342
levelofconfidence,221
level of significance
critical, 215
observed, 214
linear combination of independent
variables, 198
linearregression,342
list
references, 374
logarithmic mean, 43
lognormal distributions, 192
lossofsignificance,49,69
lurkingfactors,272
M
mean
arithmetic, 41
geometric, 43
meandeviationfromthemean,45
median, 43-44
mode,44
394
Index
multinomial distribution, 111
multiple linear regression, 367
multiplication rule, 16
mutually exclusive, 12
N
negative binomial distribution, 131
normalapproximationtoabinomial
distribution, 178
normal distribution, 155, 157-192
approximation to binomial distribution,
178-183
fittingtofrequencydata,175
tables, 161
normalprobabilitypaper,184
null hypothesis, 213-215
O
one-tailed test, 217
operating data, routine, 273
P
p-value, 214
percentiles, 51
permutations, 29-32
permutationsintoclasses,30
planned experiments, 273
Poisson approximation to binomial
distribution, 124
Poisson distribution, 117-125
population, 2, 197
probabilistic, 2
probability, 6
classicaloraprioriapproach,7
distributions, 84-140
empiricalorfrequencyapproach,7
subjective estimate, 7
probabilitydensityfunction,141,158
probability distributions, 84-140
probability functions, 85
probability plotting, 190
proportion, 261
binomial distribution, 108
Q
quantile, 53
quantile-quantile plotting, 190
Quartiles, 51
R
randomnumbers,133,284
randomsample,197
random variable, 84
randomization, 280
randomizing, 236
range
interquartile, 45
referencebooks,373
regression, 341-368
evidenceofcause,366
multiple linear, 367
non-linear, 364
simplelinear,342
transformableforms,364
xony,347
regressioncoefficients,342,361
regression equations, 345
regression line
yonx,342
relative cumulative frequency, 68
relative frequency, 67
reliability, 156
replication, 279
residuals, 348
response,342
RoughRule,181
rounding, 10
rulesofprobability
addition, 11
multiplication, 16
S
sample, 1
random,2
sample correlation coefficient, 365
samplerange,45
395
Index
samplesize,202
proportion, 269
samplestandarddeviation,47,105-106
sample variance, 47
sampling, 197-211
sampling with replacement, 200
sampling without replacement, 201
scale of experimentation, 273
scatterplot,342
significance test
paired measurements, 238
samplemeanvspopulationmean,233
unpairedsamplemeans,234
simplelinearregression,342
spread,data,44-51
statistical inferences, 212
slope,352
standard deviation, 46, 105-106
estimationfromasample,46
standarderrorofthemean,200
statistical inference, 1
proportion, 261
twosampleproportions,269
statistical significance, 215
sample variance vs population variance,
250, 256
statistics, 1
stem-and-leaf displays, 63-64
stochastic relations, 2
Studentst-test,229
SturgesRule,67
sumofproducts,344
T
t-distribution, 229
t-test, 233
paired, 238
unpaired, 234
testofhypothesis,213
teststatistic,214
transformationofvariables,190
treediagram,8,19
two-tailed test, 213
TypeIError,217
TypeIIError,217
U
uniform distribution, 155
unpairedt-test,234
V
variability, 44-51
variance, 45
discreterandomvariable,89
estimationfromasample,46
pointsaboutline,348
ofadifference,199
ofasinglenewobservation,353
ofasum,199
ofsamplemeans,199
varianceofthemeanresponse,352
variance ratio, 295
variance-ratio test, 252
VennDiagram,12
W
Weibull distribution, 156
Y
Yates correction, 326
396
[Thisisablankpage.]
397
.
399
LIMITEDWARRANTYANDDISCLAIMEROFLIABILITY
[[NEWNES.]]ANDANYONEELSEWHOHASBEENINVOLVEDINTHECREATIONOR
PRODUCTION OF THE ACCOMPANYING CODE (THE PRODUCT) CANNOT AND DO
NOTWARRANTTHEPERFORMANCEORRESULTSTHATMAYBEOBTAINEDBY
USINGTHEPRODUCT.THEPRODUCTISSOLDASISWITHOUTWARRANTYOF
ANY KIND (EXCEPT AS HEREAFTER DESCRIBED), EITHER EXPRESSED OR IMPLIED,
INCLUDING,BUTNOTLIMITEDTO,ANYWARRANTYOFPERFORMANCEORANY
IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR
PURPOSE.[[NEWNES.]] WARRANTS ONLY THAT THE MAGNETIC CD-ROM(S) ON
WHICHTHECODEISRECORDEDISFREEFROMDEFECTSINMATERIALANDFAULTY
WORKMANSHIPUNDERTHENORMALUSEANDSERVICEFORAPERIODOFNINETY
(90)DAYSFROMTHEDATETHEPRODUCTISDELIVERED.THEPURCHASERSSOLE
ANDEXCLUSIVEREMEDYINTHEEVENTOFADEFECTISEXPRESSLYLIMITEDTO
EITHERREPLACEMENTOFTHECD-ROM(S)ORREFUNDOFTHEPURCHASEPRICE,
AT [[NEWNES.]]S SOLE DISCRETION.
INNOEVENT,WHETHERASARESULTOFBREACHOFCONTRACT,WARRANTYOR
TORT(INCLUDINGNEGLIGENCE),WILL[[NEWNES.]]ORANYONEWHOHASBEEN
INVOLVEDINTHECREATIONORPRODUCTIONOFTHEPRODUCTBELIABLETO
PURCHASER FOR ANY DAMAGES, INCLUDING ANY LOST PROFITS, LOST SAVINGS
OR OTHER INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE
ORINABILITYTOUSETHEPRODUCTORANYMODIFICATIONSTHEREOF,ORDUETO
THECONTENTSOFTHECODE,EVENIF[[NEWNES.]]HASBEENADVISEDOFTHE
POSSIBILITYOFSUCHDAMAGES,ORFORANYCLAIMBYANYOTHERPARTY.
ANYREQUESTFORREPLACEMENTOFADEFECTIVECD-ROMMUSTBEPOSTAGE
PREPAID AND MUST BE ACCOMPANIED BY THE ORIGINAL DEFECTIVE CD-ROM,
YOURMAILINGADDRESSANDTELEPHONENUMBER,ANDPROOFOFDATEOF
PURCHASE AND PURCHASE PRICE. SEND SUCH REQUESTS, STATING THE NATURE
OFTHEPROBLEM,TOELSEVIERSCIENCECUSTOMERSERVICE,6277SEAHARBOR
DRIVE,ORLANDO,FL32887,1-800-321-5068. [[NEWNES.]]SHALLHAVENO
OBLIGATIONTOREFUNDTHEPURCHASEPRICEORTOREPLACEACD-ROMBASED
ONCLAIMSOFDEFECTSINTHENATUREOROPERATIONOFTHEPRODUCT.
SOMESTATESDONOTALLOWLIMITATIONONHOWLONGANIMPLIEDWAR-
RANTY LASTS, NOR EXCLUSIONS OR LIMITATIONS OF INCIDENTAL OR
CONSEQUENTIAL DAMAGE, SO THE ABOVE LIMITATIONS AND EXCLUSIONS MAY
NOT[[NEWNES.]]APPLYTOYOU.THISWARRANTYGIVESYOUSPECIFICLEGAL
RIGHTS,ANDYOUMAYALSOHAVEOTHERRIGHTSTHATVARYFROMJURISDIC-
TION TO JURISDICTION.
THE RE-EXPORT OF UNITED STATES ORIGIN SOFTWARE IS SUBJECT TO THE UNITED
STATES LAWS UNDER THE EXPORT ADMINISTRATION ACT OF 1969 AS AMENDED.
ANYFURTHERSALEOFTHEPRODUCTSHALLBEINCOMPLIANCEWITHTHE
UNITED STATES DEPARTMENT OF COMMERCE ADMINISTRATION REGULATIONS.
COMPLIANCE WITH SUCH REGULATIONS IS YOUR RESPONSIBILITY AND NOT THE
RESPONSIBILITY OF [[NEWNES.]].