You are on page 1of 28

Statistics250

Interactive
LectureNotes
Solutions

Fall2015Winter2016

Dr.BrendaGunderson
DepartmentofStatistics
UniversityofMichigan

TableofContents

Topic

Page

Intro

1:SummarizingData

2:Sampling,Surveys,andGatheringData

25

3:Probability

35

4:RandomVariables

45

5:LearningaboutaPopulationProportion
Part1:DistributionforaSampleProportion
Part2:EstimatingProportionswithConfidence
Part3:TestingaboutaPopulationProportion
6:LearningabouttheDifferenceinPopulationProportions
Part1:DistributionforaDifferenceinSampleProportions
Part2:ConfidenceIntervalforaDifferenceinPopulationProportions
Part3:TestingaboutaDifferenceinPopulationProportions
7:LearningaboutaPopulationMean
Part1:DistributionforaSampleMean
Part2:ConfidenceIntervalforaPopulationMean
Part3:TestingaboutaPopulationMean
8:LearningaboutaPopulationMeanDifference
Part1:DistributionforaSampleMeanDifference
Part2:ConfidenceIntervalforaPopulationMeanDifference
Part3:TestingaboutaPopulationMeanDifference
9:LearningabouttheDifferenceinPopulationMeans
Part1:DistributionforaDifferenceinSampleMeans
Part2:ConfidenceIntervalforaDifferenceinPopulationMeans
Part3:TestingaboutaDifferenceinPopulationMeans

65
71
81

93
97
99

103
111
117

127
131
135

139
143
151

10:ANOVA:AnalysisofVariance

159

11:RelationshipsbetweenQuantitativeVariables:Regression

171

12:RelationshipsbetweenCategoricalVariables:ChiSquare

193

Stat250GundersonLectureNotes
Introduction

Statistics...the most important science in the whole world: for upon it depends the
practicalapplicationofeveryotherscienceandofeveryart:theonescienceessentialto
allpoliticalandsocialadministration,alleducation,allorganizationbasedonexperience,
foritonlygivesresultsofourexperience."FlorenceNightingale,Statistician

Definitions:

Statisticsarenumbersmeasuredforsomepurpose.

Statisticsisacollectionofproceduresandprinciplesforgatheringdataand
analyzinginformationinordertohelppeoplemakedecisionswhenfacedwith
uncertainty.

CourseGoal:Learnvarioustoolsforusingdatatogainunderstandingandmakesounddecisions
abouttheworldaroundus.

Stat250GundersonLectureNotes
1:SummarizingData

Youmustnevertellathing.Youmustillustrateit.Welearnthrough
theeyeandnotthenoggin."

WillRogers(18791935)

Simplesummariesofdatacantellaninterestingstoryandareeasiertodigestthanlonglists.
Sowewillbeginbylookingatsomedata.

RawData

Rawdatacorrespondtonumbersandcategorylabelsthathavebeencollectedormeasuredbut
havenotyetbeenprocessedinanyway.OnthenextpageisasetofRAWDATAinformation
aboutagroupofitemsinthiscase,individuals.ThedatasettitleisDEPRIVEDandhasinformation
about a sample size of n = 86 college students. For each student we are provided with their
answertothequestion:Doyoufeelthatyouaresleepdeprived?(yesorno),andtheirself
reportedtypicalamountofsleeppernight(inhours).Theinformationwehaveisorganizedinto
variables. In this case these 86 college students are a subset from a larger population of all
collegestudents,sowehavesampledata.

Definition:
Avariableisacharacteristicthatdiffersfromoneindividualtothenext.

Sampledataarecollectedfromasubsetofalargerpopulation.

Populationdataarecollectedwhenallindividualsinapopulationaremeasured.

Astatisticisasummarymeasureofsampledata.

Aparameterisasummarymeasureofpopulationdata.

TypesofVariables

Wehave2variablesinourdataset.Nextwewanttodistinguishbetweenthedifferenttypesof
variablesdifferenttypesofvariablesprovidedifferentkindsofinformationandthetypewill
guidewhatkindsofsummaries(graphs/numerical)areappropriate.

Thinkaboutit:
CouldyoucomputetheAVERAGEAMOUNTOFSLEEPforthese86students?YES
CouldyoucomputetheAVERAGESLEEPDEPRIVEDSTATUSforthese86students?NO
(couldcode,butwouldbearbitrary:0and1,orcoulduseanytwovalueslike1and203)

SLEEPDEPRIVEDSTATUSissaidtobeaCATEGORICALvariable,

AMOUNTOFSLEEPisaQUANTITATIVE_variable.

Definitions:
A categorical variable places an individual or item into one of several groups or
categories.Whenthecategorieshaveanorderingorranking,itiscalledanordinal
variable.

Aquantitativevariabletakesnumericalvaluesforwhicharithmeticoperationssuchas
adding and averaging make sense. Other names for quantitative variable are:
measurementvariableandnumericalvariable.

TryIt!
Foreachvariablelistedbelow,giveitstypeascategoricalorquantitative.

Age(years)

QUANTITATIVE

TypicalClassroomSeatLocation(Front,Middle,Back) CATEGORICAL

NumberofsongsonaniPodQUANTITATIVE

Timespentstudyingmaterialforthisclassinthelast24hourperiod(inhours)QUANTITATIVE

SoftDrinkSize(small,medium,large,supersized)

CATEGORICAL(ordinal)

TheAndthen...countrecordedinapsychologystudyonchildren(detailswillbeprovided)
numberoftimesachildsaysthephraseandthenwhenaskedtorecallastudythey
justheardQUANTITATIVE

Lookingahead:Later,whenwetalkaboutrandomvariables,wewilldiscusswhether
avariableismodeleddiscretely(becauseitsvaluesarecountable)orwhetheritwould
bemodeledcontinuously(becauseitcantakeanyvalueinanintervalorcollectionof
intervals).Gobackthroughthelistaboveandthinkaboutisitdiscreteorcontinuous?

DATASET=DEPRIVED
From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission.

FeelSleep
Deprived?
No
No
No
Yes
Yes
Yes
Yes
Yes
No
No
No
No
Yes
Yes
Yes
No
No
No
Yes
Yes
No
No
No
Yes
No
No
No
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes

AmountSleep
perNight(hours)
9
7
8
7
7
8
7
8
10
8
9
8
8
4
6
8
10
4
7
8
9
9
7
8
9
9
8
6
9
7
11
7
9
7
8
7
7
9
1
7
6
8
6

FeelSleep
Deprived?
No
No
No
Yes
Yes
Yes
Yes
Yes
No
Yes
No
No
Yes
Yes
No
No
Yes
Yes
Yes
No
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes

AmountSleep
perNight(hours)
8
7
9
7
7
7
7
6
8
6
9
8
7
8
8
8
7
7
7
7
7
8
7
7
7
7
8
6
6
8
9
7
8
6
7
8
5
6
7
8
8
7
6

Ourdatasetissomewhatlarge,containingalotofmeasurementsinalonglist.Presentedasa
tablelisting,wecanviewtherecordofaparticularcollegestudent,butitisjustalisting,andnot
easytofindthelargestvaluefortheamountofsleeporthenumberofstudentswhofeltthey
aresleepdeprived.Wewouldliketolearnappropriatewaystosummarizethedata.

SummarizingCategoricalVariables

NumericalSummaries
How would you go about summarizing the SLEEP DEPRIVED STATUS data? The first step is to
simplycounthowmanyindividuals/itemsfallintoeachcategory.Sincepercentsaregenerally
more meaningful than counts, the second step is to calculate the percent (or proportion) of
individuals/itemsthatfallintoeachcategory.

Count = Frequency

Percent or Proportion is ok!

SleepDeprived?

Count

Percent

Yes

51

(51/86)*100=59.3%

No

35

(35/86)*100=40.7%

Total

86

100%

The table above provides both the frequency distribution and the relative frequency
distributionforthevariableSLEEPDEPRIVEDSTATUS.

VisualSummaries
There are two simple visual summaries for
categoricaldataabargraphorapiechart.Hereis
thetablesummaryandbargraphmadewithR.
Ifyouweremakingone:Dontforgettolabeleach
axisandshowsomevaluesoneachaxis!

counts:
Deprived
No Yes
35 51
percentages:
Deprived
No Yes
40.7 59.3

Aside:Doesitmatterwhether
theNoorYesbarisgivenfirst?No,notordinalhere=>
weshouldnotcommentonshape(i.e.donotusewordslikeskewedorincreasing
patternhere)

PieChart:Anothergraphforcategoricaldatawhichhelpsusseewhatpartofthewholeeach
groupforms.

Piechartsarenotaseasytodrawbyhand.
Itisnotaseasytocomparesizesofpie
piecesversuscomparingheightsofbars.

Thuswewillprefertouseabargraphfor
categoricaldata.

Recap: We have discussed that some


variables are categorical and others are
quantitative.Wehaveseenthatbargraphs
and pie charts can be used to display data
for categorical variables. We turn next to
displaying the data for quantitative
variables.

ExploringFeaturesofQuantitativeDatawithPictures

RecallourSleepDeprivedDataforn=86collegestudents.Wehavedataontwovariables:sleep
deprivedstatusandhoursofsleeppernight.Howwouldyougoaboutsummarizingthesleep
hoursdata?Thesemeasurementsdovary.Howdotheyvary?Whatistherangeofvalues?What
isthepatternofvariation?

Findthesmallestvalue=____1______andlargestvalue=_____11_______

Takethisoverallrangeandbreakitupintointervals(ofequalwidth).
Whatmightbereasonablehere?
Perhapsby2s;butweneedtowatchtheendpoints.

SummaryTable:

Class

Frequency
(orcount)

RelativeFrequency
(orproportion)

Percent

[0, 2]

1/86 = 0.012

1.2%

(2, 4]

2/86 = 0.023

2.3%

(4, 6]

12

0.139

13.9%

(6, 8]

56

0.651

65.1%

(8, 10]

14

0.163

16.3%

(10, 12]

0.012

1.2%

watch
endpoints
different
softwarewill
dodifferent
endpoints
W/tablewe
canreadily
drawa
histogram


Graphforquantitativedata=Histogram:

Note:ifwedivide
countby86,
wewouldhave
proportionbut
picturewould
looksame.

Note:eachbarrepresentsaclass,andthebaseofthebarcoverstheclass.

TheabovetableandhistogramshowthedistributionofthisquantitativevariableSLEEPHOURS,
thatis,theoverallpatternofhowoftenthepossiblevaluesoccur.
Remembertolabelaxesandaddsomevalues!

RHistograms(defaultontheleftandcustomizedontheright):

Allimages

Howtointerpret?
LookforOverallPattern
Threesummarycharacteristicsoftheoveralldistributionofthedata

Shape(approximatelysymmetric,skewed,bellshaped,uniform)

Location(center,average)
Approximatelythemiddlevalueorwhereitwouldbalance

Spread(variability)
Range(overallandthenwheremostoftheobservationsare)

LookfordeviationsfromOverallPattern
Outliers=adatapointthatisnotconsistentwiththebulkofthedata.
Outliersshouldnotbediscardedwithoutjustification.

DescribethedistributionforSLEEPHOURS:
Approximatelybellshaped,symmetricdistribution,unimodal,
centeredaround7hours,withmostvaluesbetween4and10hours.
Noapparentoutliers.

Whatifyouhadsomedataandyoumadeahistogramofitanditlookedlikethis

Count

Whatwouldittellyou?

Response
Wewouldcallthisabimodaldistribution.Thereappearstobetwosubgroupsofobservations.It
wouldbebesttoinvestigatewhy(e.g.maybeM/ForOLD/YOUNG).Itmayleadtoanalyzingdata
separatelyforeachgroup.

10

Othercomments
NOSPACEBETWEENBARS!Unlesstherearenoobservationsinthatinterval.
HowManyClasses?Useyourjudgment:generallysomewherebetween6and15
intervals.
Bettertouserelativefrequenciesontheyaxiswhencomparingtwoormoresetsof
observations.
Softwarehasdefaultsandmanyoptionstomodifysettings.

OneMoreExample:

AstudywasconductedinDetroit,Michigantofind
out the number of hours children aged 8 to 12 How many
yearsspentwatchingtelevisiononatypicalday. are in first

class? 3
Alistingofallhouseholdsinacertainhousingarea
having children aged 8 to 12 years was first
constructed. Out of the 100 households in this
listing, a random sample of 20 households was
selectedandallchildrenaged8to12yearsinthe
selectedhouseholdswereinterviewed.

Thefollowinghistogramwasobtainedforallthe
childrenaged8to12yearsinterviewed.

a. Complete the sentence: Based on this


histogram,thedistributionofnumberofhours
spentwatchingTVisunimodal,

withaslightskewnesstothe___left____.

b. Assumingthatallchildreninterviewedare
representedinthehistogram,whatisthetotalnumberofchildreninterviewed?

3+6+9+10+4=32

c. Whatproportionofchildrenspentlessthan2hourswatchingtelevision?

(3+6)/32=0.281orabout28%

d. Canyoudeterminethemaximumnumberofhoursspentwatchingtelevisionbyoneofthe
interviewedchildren?Ifso,reportit.Ifnot,explainwhynot.

No,itissomewherebetween4and5,
buttheexactvalueisnotknownforsure.

11

NumericalSummariesofQuantitativeVariables

Wehavediscussedsomeinterestingfeaturesofaquantitativedatasetandlearnedhowtolook
fortheminpictures(graphs).Section2.5focusesonnumericalsummariesofthecenterandthe
spreadofthedistribution(appropriateforquantitativedataonly).

Notationforagenericrawsetofdata:
x1,x2,x3,,xnwheren=#itemsinthedatasetorsamplesize

DescribingtheLocationorCenterofaDataSet
Twobasicmeasuresoflocationorcenter:

Meanthenumericalaveragevalue
Werepresentthemeanofasample(calledastatistic)by

x1 x 2 x n

x
n

Medianthemiddlevaluewhendataarrangedfromsmallesttolargest.

nodd:M=middleobs;neven:M=avgoftwomiddleobservations

TryIt!FrenchFries
Weightmeasurementsfor16smallordersofFrenchfries(ingrams).
78
72
69
81
63
67
65
75

79
74
71
83
71
79
80
69

Whatshouldwedowithdatafirst?Graphit!

Basedonourhistogram,thedistributionofweightisunimodalandapproximatelysymmetric,
socomputingnumericalsummariesisreasonable.Theweights(ingrams)rangefromthe60sto
thelower80s,centeredaroundthelower70s.

12

1. Computethemeanweight.
78 72 69 80 69
x
Does73.6makesense?(yeslookathistogram)Would83?(no)
16
73.5 grams
2. Computethemedianweight.
Ordered:63,65,67,69,69,71,71,72,74,75,78,79,79,80,81,83

(n+1)/2=(16+1)=8.5soavg8thand9thobservations=>(72+74)/2=73
Note:areabove73andarebelowit.

3. Whatifthesmallestweightwasincorrectlyenteredas3gramsinsteadof63grams?
Medianwouldstaythesame.Meanwoulddecrease.

Note: Themeanis____sensitivetoextremeobservations.

Themedianis________resistanttoextremeobservations.

Mostgraphicaldisplayswouldhavedetectedsuchanoutlyingvalue.

Somedianbetter
ifoutliersor
stronglyskewed.

SomePictures:MeanversusMedian

13

DescribingSpread:RangeandInterquartileRange

Midtermsarereturnedandtheaverage
wasreportedas76outof100.
Youreceivedascoreof88.
Howshouldyoufeel?Happytojustbeaboveaverage?
Oftenwhatismissingwhentheaverageofsomethingisreported,isacorrespondingmeasure
of spread or variability. Here we discuss various measures of variation, each useful in some
situations,eachwithsomelimitations.

Range:

Measuresthespreadover100%ofthedata.
Range=HighvalueLowvalue=MaximumMinimum
Percentiles: Thepthpercentileisthevaluesuchthatp%oftheobservationsfallatorbelow
thatvalue.

SomeCommonpercentiles:
Median:
50thpercentileQ2or.50
Firstquartile: 25thpercentileQ1or.25(medianofvaluesbelowmedian)
Thirdquartile: 75thpercentileQ3or.75(medianofvaluesabovemedian)

FiveNumberSummary:

VariableNameandUnits
(n=numberofobservations)

Median
Quartiles
Extremes

M
<=IQR=>
<=range=>

Q1
Min

Q3
Max

Provides a quick overview of the data values and information about the center and spread.
Dividesthedatasetintoapproximatequarters.

InterquartileRange: Measuresthespreadoverthemiddle50%ofthedata.

Tryit!FrenchFriesData
Ordered:63,65,67,69,69,71,71,72,74,75,78,79,79,80,81,83

Findthefivenumbersummary:Providesmeasuresoflocationandspread

WeightofFries(ingrams)
(n=16orders)

Median

73

Quartiles

69

79

Extremes

63

83

Range:8363=20grams

IQR: 7969=10grams

14

IQR=Q3Q1


AndconfirmingthesevaluesusingRwehave:

> numSummary(FrenchFries[,"Weight"], statistics=c("mean", "sd", "IQR",


+
"quantiles"), quantiles=c(0,.25,.5,.75,1))

mean
sd IQR 0% 25% 50% 75% 100% n
73.5 6.0663 10 63 69 73 79
83 16

Example:TestScores
The fivenumber summary for the distribution of test scores for a very large math class is
providedbelow:

TestScore(points)
(n=1200students)

Median
Quartiles
Extremes

58
46
34

78
95

1. Whatisthetestscoreintervalcontainingthelowestofthestudents?

34to46points

2. Supposeyouscoreda46onthetest.Whatcanyousayaboutthepercentageofstudentswho
scoredhigherthanyou?

75%

3. Supposeyouscoreda50onthetest.Whatcanyousayaboutthepercentageofstudentswho
scoredhigherthanyou?

Between50%and75%

4. Ifthetop25%ofthestudentsreceivedanAonthetest,whatwastheminimumscoreneeded
togetanAonthetest?

Needascoreof78orhigher

Boxplots
Aboxplotisagraphicalrepresentationofthefivenumber
summary.
Steps:
Labelanaxiswithvaluestocovertheminimumand
maximumofthedata.
MakeaboxwithendsatthequartilesQ1andQ3.
DrawalineintheboxatthemedianM.
Checkforpossibleoutliersusingthe1.5*IQRrule
andifany,plotthemindividually.
Extendlinesfromendofboxtosmallestandlargest
observationsthatarenotpossibleoutliers.

Note:Possibleoutliersareobservationsthataremorethan1.5*IQRoutsidethequartiles,
thatis,observationsthatarebelowQ11.5*IQRorobservationsthatareaboveQ3+1.5*IQR.

15

Tryit!FrenchFriesData
Ordered:63,65,67,69,69,71,71,72,74,75,78,79,79,80,81,83

Thefivenumbersummary:

WeightofFries(ingrams)
(n=16orders)

Median

73

Quartiles

69

79

Extremes

63

83

Fromtheboxplotshown,weseethereareno
pointsplottedseparately,sotherearenooutliers
bythe1.5(IQR)rule.

Verifytherearenooutliersusingthisrule.

IQR=7969=10grams

1.5*IQR=1.5(10)=15grams

Lowerboundary(fence)=Q11.5*IQR=6915=54

Arethereanyobservationsthatfallbelowthislowerboundary?
No,sonolowoutliers.

Upperboundary(fence)=Q3+1.5*IQR=79+15=94

Arethereanyobservationsthatfallabovethisupperboundary?
No,sonohighoutliers.

16

Whatifthelargestweightof83gramswasactually93grams?

Ordered:63,65,67,69,69,71,71,72,74,75,78,79,79,80,81,95

Thenthefivenumbersummarywouldbe:

WeightofFries(ingrams)
(n=12orders)

Median
Quartiles
Extremes

73
69
63

79

95

TheIQRand1.5*IQRwouldbethesame,so
the boundaries for checking for possible
outliersareagain54and94.

Now we would have one potential high


outlier,themaximumvalueof95.

Themodifiedboxplotwhenwehavethis
oneoutlierisshown.

Whyisthelineextendingoutonthetop
sidenowdrawnouttojust81?

81isthelargestvalueinthedataset
thatisnotanoutlier

NotesonBoxplots:
Sidebysideboxplotsaregoodfor...comparing2ormoresetsofobs.

Watchoutpointsplottedindividuallyare...

Stillpartofthedatasetdontignorethem!

Can'tconfirm....

Shapefromaboxplotonly(histogrambetterforshowingshape).

Whenreadingvaluesfromagraphshowwhatyouaredoing
(soappropriatecreditcanbegivenonexam/quiz).

17

TryIt:Sidebysideboxplots
Arandomsampleof100parentsofgradeschoolchildrenwererecentlyinterviewedregarding
thebreakfasthabitsintheirfamily.Onequestionaskedwasiftheirchildrentakethetimetoeat
abreakfast(recordedasbreakfaststatusYesorNo).Thegradesofthechildreninsomecore
classes(e.g.reading,writing,math)werealsorecordedandastandardizedgradescore(ona10
pointscale)wascomputedforeachchild.Sidebysideboxplotsofthechildrensstandardized
gradescoresareprovided.

11
a. Whatis(approx)thelowest
10
gradescoredbyachildwhodoes
havebreakfast?
9

_____4.5______points

Grades

4
b. Completethefollowing
3
sentence:
No
Yes

Do you have breakfast?


Amongthechildrenwhodidnot
eatbreakfast,
25%hadagradescoreofatleast_____7.5______points.

c. Considerthefollowingstatement:Thesymmetryintheboxplotforthechildrennoteating
breakfastimpliesthatthedistributionforthegradescoresofsuchstudentsisbellshaped.

TrueorFalse?False

FeaturesofBellShapedDistributions

Wehavealreadydescribedthedistributionofourfrenchfriesweightdataasbeingunimodaland
somewhatsymmetric.Ifweweretodrawacurvetosmoothoutthetopsofthebarsofthe
histogram,itwouldresembletheshapeofabell,andthuscouldbecalledbellshaped.

Onefairlycommondistributionofmeasurementswiththisshapehasaspecialname,calleda
normaldistributionornormalcurve.Wewillseenormalcurvesinmoredetailwhenwestudy
randomvariables.Whenadistributionissomewhatbellshaped(unimodal,indicatingafairly
homogeneous set of measurements), a useful measure of spread is called the standard
deviation. In fact, the mean and the standard deviation are two summary measures that
completelyspecifyanormalcurve.

18

DescribingSpreadwithStandardDeviation
Whenthemeanisusedtomeasurecenter,themostcommonmeasureofspreadisthestandard
deviation.Thestandarddeviationisameasureofthespreadoftheobservationsfromthemean.
We will refer to it as a kind of average distance of the observations from the mean. But it
actuallyisthesquarerootoftheaverageofthesquareddeviationsoftheobservationsfromthe
mean.Sincethatisabitcumbersome,weliketothinkofthestandarddeviationasroughly,
theaveragedistancetheobservationsfallfromthemean.Hereisaquicklookattheformula
forthestanddeviationwhenthedataareasamplefromalargerpopulation:

s=samplestandarddeviation=

( x1 x ) 2 ( x 2 x ) 2 ( x n x ) 2

n 1

(x

x)2

n 1

Note:Thesquaredstandarddeviation,denotedbys2,iscalledthevariance.Weemphasizethe
standarddeviationsinceitisintheoriginalunits.

Example:Considerthissampleofn=5scores:94,97,99,103,107.
Thesamplemeanis100points.Letsmeasuretheirspreadbyconsideringhowmucheachscore
deviatesfromthemean.Thenconsidertheaverageofthesedeviations.

Deviation from mean = 107 100 = 7

x
x x
x
x

90
95
100
105
110

Howthecalculationsaredone:

x
x100 (x100)2 Calculations
1. x =500/5=100(usedincolumns2&3)
94
6
36
2. Variance:s2=104/(51)=26
97
3
9
3. Standarddeviation:s=(26)=5.1
99
1
1

103
3
9
Note#ofdeserveddecimalsusedfors.
107
7
49
500
0
104
Sums(ortotals)ofthecolumns

19

Tryit!FrenchFriesData
Weightmeasurementsfor12smallordersofFrenchfries(ingrams).
78
72
69
81
63
67
65
75

79
74
71
83
71
79
80
69
Themeanwascomputedearliertobe73.5.Findthestandarddeviationforthisdata.

s=

(78 73.5) 2 (72 73.652 (69 73.5) 2


36.8 6.1 grams
16 1

Notmuchfuntodoitbyhand,butnottoobadforasmallnumberofobservations.Ingeneral,
wewillhaveacalculatororcomputerdoitforus.

Interpretation:

Theweightsofsmallordersoffrenchfriesareroughly

___6.1grams ____awayfromtheirmeanweightof73.5grams ,onaverage.

OR

Onaverage,theweightsofsmallordersoffrenchfriesvarybyabout6.1grams

fromtheirmeanweightof 73.5grams

Notesaboutthestandarddeviation:

s=0means...nospread(allobsarethesame);elses>0.

Likethemean,sis...sensitivetoextremeobservations

Sousethemeanand
standarddeviationfor____ reasonablysym,bellshapeddistributions_____.

Thefivenumbersummary
isbetterforskeweddistributionsorifoutliers

20

TechnicalNoteaboutdifferencebetweenpopulationandsample:
Datasetsarecommonlytreatedasiftheyrepresentasamplefromalargerpopulation.A
numericalsummarybasedonasampleiscalledastatistic.Thesamplemeanandsample
standarddeviationaretwosuchstatistics.However,ifyouhaveallmeasurementsforan
entirepopulation,thenanumericalsummarywouldbereferredtoasaparameter.

Thesymbolsforthemeanandstandarddeviationforapopulationaredifferent,andthe
formula for the standard deviation is also slightly different. A population mean is
represented by the Greek letter (mu), and a population standard deviation is
represented by the Greek letter (sigma). The formula for the population standard
deviationisbelow.

Youwillseemoreaboutthedistinctionbetweenstatisticsandparametersinthenext
chapterandbeyond.

Populationstandarddeviation:

(x

)2

N
whereNisthesizeofthepopulation.

EmpiricalRule
Forbellshapedcurves,approximately

68%ofthevaluesfallwithin1standarddeviationofthemeanineitherdirection.

95%ofthevaluesfallwithin2standarddeviationsofthemeanineitherdirection.

99.7%ofthevaluesfallwithin3standarddeviationsofthemeanineitherdirection.

21

TryIt!AmountofSleep

Thetypicalamountofsleeppernightforcollegestudentshasabellshapeddistributionwitha
meanof7hoursandastandarddeviationof1.7hours.

About68%ofcollegestudentstypicallysleepbetween___5.3___and__8.7__hourspernight.
Work:71.7=5.3and7+1.7=8.7

Verifythevaluesbelowthatcompletethesentences.
About95%ofcollegestudentstypicallysleepbetween3.6and10.4hourspernight.
About99.7%ofcollegestudentstypicallysleepbetween1.9and12.1hourspernight.

Drawapictureofthedistributionshowingthemeanandintervalsbasedontheempiricalrule.

Supposelastnightyouslept11hours.
Howmanystandarddeviationsfromthemeanareyou?(117)/1.7=2.35

Supposelastnightyousleptonly5hours.
Howmanystandarddeviationsfromthemeanareyou?(57)/1.7=1.18

Thestandarddeviationisausefulyardstickformeasuringhowfaranindividualvaluefallsfrom
themean.Thestandardizedscoreorzscoreisthedistancebetweentheobservedvalueand
themean,measuredintermsofnumberofstandarddeviations.Valuesthatareabovethemean
havepositivezscores,andvaluesthatarebelowthemeanhavenegativezscores.

Standardizedscoreorzscore: z

observed value - mean

standard deviation

22

TryIt!ScoresonaFinalExam
Scoresonthefinalinacoursehaveapproximatelyabellshapeddistribution.
Themeanscorewas70pointsandthestandarddeviationwas10points.

SupposeRob,oneofthestudents,hadascorethatwas2standarddeviationsabovethemean.
WhatwasRobsscore?70+2(10)=90points

WhatcanyousayabouttheproportionofstudentswhoscoredhigherthanRob?
About2.5%ofthestudentsscoredhigherthanRobsscoreof90points(basedonthemodel).

Sketchingapicturemayhelp.

SummaryTools

23

AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,write
outanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.

24