Statistics Toolbox: User's Guide

Computa tion
Visua liza tion

Progra mming
For Use w ith MATLAB
Users Guide
Version 3
Statistics
Toolbox
How to Conta ct The Ma thW orks:
www.mathworks.com Web
comp.soft-sys.matlab Newsgroup
support@mathworks.com Techni cal support
suggest@mathworks.com Product enhancement suggesti ons
bugs@mathworks.com Bug reports
doc@mathworks.com Documentati on error reports
service@mathworks.com Order status, l i cense renewal s, passcodes
info@mathworks.com Sal es, pri ci ng, and general i nformati on
508-647-7000 Phone
508-647-7001 Fax
The MathWorks, I nc. Mai l
3 Appl e Hi l l Dri ve
Nati ck, MA 01760-2098
For contact i nformati on about worl dwi de offi ces, see the MathWorks Web si te.
Statistics Toolbox Users Guide
COPYRI GHT 1993 - 2001 by The MathWorks, I nc.
The software descri bed i n thi s document i s furni shed under a l i cense agreement. The software may be used
or copi ed onl y under the terms of the l i cense agreement. No part of thi s manual may be photocopi ed or repro-
duced i n any form wi thout pri or wri tten consent from The MathWorks, I nc.
FEDERAL ACQUI SI TI ON: Thi s provi si on appl i es to al l acqui si ti ons of the Program and Documentati on by
or for the federal government of the Uni ted States. By accepti ng del i very of the Program, the government
hereby agrees that thi s software qual i fi es as "commerci al " computer software wi thi n the meani ng of FAR
Part 12.212, DFARS Part 227.7202-1, DFARS Part 227.7202-3, DFARS Part 252.227-7013, and DFARS Part
252.227-7014. The terms and condi ti ons of The MathWorks, I nc. Software Li cense Agreement shal l pertai n
to the governments use and di scl osure of the Program and Documentati on, and shal l supersede any
confl i cti ng contractual terms or condi ti ons. I f thi s l i cense fai l s to meet the governments mi ni mum needs or
i s i nconsi stent i n any respect wi th federal procurement l aw, the government agrees to return the Program
and Documentati on, unused, to MathWorks.
MATLAB, Si mul i nk, Statefl ow, Handl e Graphi cs, and Real -Ti me Workshop are regi stered trademarks, and
Target Language Compi l er i s a trademark of The MathWorks, I nc.
Other product or brand names are trademarks or regi stered trademarks of thei r respecti ve hol ders.
Pri nti ng Hi story: September 1993 Fi rst pri nti ng Versi on 1
March 1996 Second pri nti ng Versi on 2
January 1997 Thi rd pri nti ng For MATLAB 5
May 1997 Revi sed for MATLAB 5.1 (onl i ne versi on)
January 1998 Revi sed for MATLAB 5.2 (onl i ne versi on)
January 1999 Revi sed for Versi on 2.1.2 (Rel ease 11) (onl i ne onl y)
November 2000 Fourth pri nti ng Revi sed for Versi on 3 (Rel ease 12)
May 2001 Fi fth pri nti ng mi nor revi si on
i
Contents
Preface
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
What Is the Statistics Toolbox? . . . . . . . . . . . . . . . . . . . . . . . . . xiii
How to Use This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Related Products List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Mathematical Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Typographical Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
1
Tutorial
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Pri mary Topi c Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
Overvi ew of the Functi ons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
Overvi ew of the Di stri buti ons . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12
Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-43
Measures of Central Tendency (Locati on) . . . . . . . . . . . . . . . . 1-43
Measures of Di spersi on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-45
Functi ons for Data wi th Mi ssi ng Val ues (NaNs) . . . . . . . . . . . 1-46
Functi on for Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-47
Percenti l es and Graphi cal Descri pti ons . . . . . . . . . . . . . . . . . . 1-49
The Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-50
ii Contents
Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-53
Termi nol ogy and Basi c Procedure . . . . . . . . . . . . . . . . . . . . . . . 1-53
Fi ndi ng the Si mi l ari ti es Between Objects . . . . . . . . . . . . . . . . 1-54
Defi ni ng the Li nks Between Objects . . . . . . . . . . . . . . . . . . . . . 1-56
Eval uati ng Cl uster Formati on . . . . . . . . . . . . . . . . . . . . . . . . . 1-59
Creati ng Cl usters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-64
Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-68
One-Way Anal ysi s of Vari ance (ANOVA) . . . . . . . . . . . . . . . . . 1-69
Two-Way Anal ysi s of Vari ance (ANOVA) . . . . . . . . . . . . . . . . . 1-73
N-Way Anal ysi s of Vari ance . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-76
Mul ti pl e Li near Regressi on . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-82
Quadrati c Response Surface Model s . . . . . . . . . . . . . . . . . . . . . 1-86
Stepwi se Regressi on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-88
General i zed Li near Model s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-91
Robust and Nonparametri c Methods . . . . . . . . . . . . . . . . . . . . 1-95
Nonlinear Regression Models . . . . . . . . . . . . . . . . . . . . . . . . 1-100
Exampl e: Nonl i near Model i ng . . . . . . . . . . . . . . . . . . . . . . . . . 1-100
Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-105
Hypothesi s Test Termi nol ogy . . . . . . . . . . . . . . . . . . . . . . . . . 1-105
Hypothesi s Test Assumpti ons . . . . . . . . . . . . . . . . . . . . . . . . . 1-106
Exampl e: Hypothesi s Testi ng . . . . . . . . . . . . . . . . . . . . . . . . . 1-107
Avai l abl e Hypothesi s Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-111
Multivariate Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-112
Pri nci pal Components Anal ysi s . . . . . . . . . . . . . . . . . . . . . . . 1-112
Mul ti vari ate Anal ysi s of Vari ance (MANOVA) . . . . . . . . . . . 1-122
Statistical Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-128
Box Pl ots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-128
Di stri buti on Pl ots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-129
Scatter Pl ots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-135
Statistical Process Control (SPC) . . . . . . . . . . . . . . . . . . . . . 1-138
Control Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-138
Capabi l i ty Studi es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-141
iii
Design of Experiments (DOE) . . . . . . . . . . . . . . . . . . . . . . . . 1-143
Ful l Factori al Desi gns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-144
Fracti onal Factori al Desi gns . . . . . . . . . . . . . . . . . . . . . . . . . . 1-145
D-Opti mal Desi gns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-147
Demos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-153
The di sttool Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-154
The pol ytool Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-156
The aoctool Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-161
The randtool Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-169
The rsmdemo Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-170
The gl mdemo Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-172
The robustdemo Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-172
Selected Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-175
2
Reference
Function Category List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
anova1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
anova2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23
anovan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
aoctool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
barttest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36
betacdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37
betafi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-38
betai nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-40
betal i ke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-41
betapdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-42
betarnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-43
betastat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-44
bi nocdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-45
bi nofi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-46
bi noi nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-47
bi nopdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-48
bi nornd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-49
iv Contents
bi nostat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-50
bootstrp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51
boxpl ot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-54
capabl e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-56
capapl ot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-58
caseread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-60
casewri te . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-61
cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-62
cdfpl ot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-63
chi 2cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-65
chi 2i nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-66
chi 2pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-67
chi 2rnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-68
chi 2stat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-69
cl assi fy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-70
cl uster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-71
cl usterdata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-73
combnk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-75
cophenet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-76
cordexch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-78
corrcoef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-79
cov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-80
crosstab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-81
daugment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-83
dcovary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-84
dendrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-85
di sttool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-87
dummyvar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-88
errorbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-89
ewmapl ot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-90
expcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-92
expfi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-93
expi nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-94
exppdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-95
exprnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-96
expstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-97
fcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-98
ff2n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-99
fi nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-100
fpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-101
v
fracfact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-102
fri edman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-106
frnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-110
fstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-111
fsurfht . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-112
ful l fact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-114
gamcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-115
gamfi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-116
gami nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-117
gaml i ke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-118
gampdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-119
gamrnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-120
gamstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-121
geocdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-122
geoi nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-123
geomean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-124
geopdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-125
geornd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-126
geostat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-127
gl i ne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-128
gl mdemo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-129
gl mfi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-130
gl mval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-135
gname . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-137
gpl otmatri x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-139
grpstats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-142
gscatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-143
harmmean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-145
hi st . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-146
hi stfi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-147
hougen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-148
hygecdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-149
hygei nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-150
hygepdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-151
hygernd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-152
hygestat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-153
i cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-154
i nconsi stent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-155
i qr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-157
jbtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-158
vi Contents
kruskal wal l i s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-160
kstest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-164
kstest2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-169
kurtosi s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-172
l everage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-174
l i l l i etest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-175
l i nkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-178
l ogncdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-181
l ogni nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-182
l ognpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-184
l ognrnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-185
l ognstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-186
l sl i ne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-187
mad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-188
mahal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-189
manova1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-190
manovacl uster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-194
mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-196
medi an . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-197
ml e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-198
moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-199
mul tcompare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-200
mvnrnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-207
mvtrnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-208
nanmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-209
nanmean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-210
nanmedi an . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-211
nanmi n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-212
nanstd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-213
nansum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-214
nbi ncdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-215
nbi ni nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-216
nbi npdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-217
nbi nrnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-218
nbi nstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-219
ncfcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-220
ncfi nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-222
ncfpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-223
ncfrnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-224
ncfstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-225
vii
nctcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-226
ncti nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-227
nctpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-228
nctrnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-229
nctstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-230
ncx2cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-231
ncx2i nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-233
ncx2pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-234
ncx2rnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-235
ncx2stat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-236
nl i nfi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-237
nl i ntool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-238
nl parci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-239
nl predci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-240
normcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-242
normfi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-243
normi nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-244
normpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-245
normpl ot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-246
normrnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-248
normspec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-249
normstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-250
pareto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-251
pcacov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-252
pcares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-253
pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-254
pdi st . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-255
perms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-258
poi sscdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-259
poi ssfi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-261
poi ssi nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-262
poi sspdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-263
poi ssrnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-264
poi sstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-265
pol yconf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-266
pol yfi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-267
pol ytool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-268
pol yval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-269
prcti l e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-270
pri ncomp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-271
viii Contents
qqpl ot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-272
random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-274
randtool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-275
range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-276
ranksum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-277
rayl cdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-278
rayl i nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-279
rayl pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-280
rayl rnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-281
rayl stat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-282
rcopl ot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-283
refcurve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-284
refl i ne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-285
regress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-286
regstats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-288
ri dge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-290
robustdemo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-292
robustfi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-293
rowexch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-297
rsmdemo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-298
rstool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-299
schart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-300
si gnrank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-302
si gntest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-304
skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-306
squareform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-308
std . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-309
stepwi se . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-310
surfht . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-311
tabul ate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-312
tbl read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-313
tbl wri te . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-315
tcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-316
tdfread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-317
ti nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-319
tpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-320
tri mmean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-321
trnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-322
tstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-323
ttest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-324
ix
ttest2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-326
uni dcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-328
uni di nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-329
uni dpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-330
uni drnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-331
uni dstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-332
uni fcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-333
uni fi nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-334
uni fi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-335
uni fpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-336
uni frnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-337
uni fstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-338
var . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-339
wei bcdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-341
wei bfi t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-342
wei bi nv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-343
wei bl i ke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-344
wei bpdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-345
wei bpl ot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-346
wei brnd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-347
wei bstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-348
x2fx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-349
xbarpl ot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-350
zscore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-353
ztest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-354
x Contents

Preface
Overview . . . . . . . . . . . . . . . . . . . . . xi i
What Is the Statistics Toolbox? . . . . . . . . . . . xi i i
How to Use This Guide . . . . . . . . . . . . . . . xi v
Related Products List . . . . . . . . . . . . . . . . xv
Mathematical Notation . . . . . . . . . . . . . . . xvi i
Typographical Conventions . . . . . . . . . . . . . xvi i i
Preface
xii
Overview
Thi s chapter i ntroduces the Stati sti cs Tool box, and expl ai ns how to use the
documentati on. I t contai ns the fol l owi ng secti ons:
What I s the Stati sti cs Tool box?
How to Use Thi s Gui de
Rel ated Products Li st
Mathemati cal Notati on
Typographi cal Conventi ons
W hat Is the Stati sti cs Toolbox?
xiii
What Is the Statistics Toolbox?
The Stati sti cs Tool box i s a col l ecti on of tool s bui l t on the MATLAB
numeri c
computi ng envi ronment. The tool box supports a wi de range of common
stati sti cal tasks, from random number generati on, to curve fi tti ng, to desi gn of
experi ments and stati sti cal process control . The tool box provi des two
categori es of tool s:
Bui l di ng-bl ock probabi l i ty and stati sti cs functi ons
Graphi cal , i nteracti ve tool s
The fi rst category of tool s i s made up of functi ons that you can cal l from the
command l i ne or from your own appl i cati ons. Many of these functi ons are
MATLAB M-fi l es, seri es of MATLAB statements that i mpl ement speci al i zed
stati sti cs al gori thms. You can vi ew the MATLAB code for these functi ons usi ng
the statement
type function_name
You can change the way any tool box functi on works by copyi ng and renami ng
the M-fi l e, then modi fyi ng your copy. You can al so extend the tool box by addi ng
your own M-fi l es.
Secondl y, the tool box provi des a number of i nteracti ve tool s that l et you access
many of the functi ons through a graphi cal user i nterface (GUI ). Together, the
GUI -based tool s provi de an envi ronment for pol ynomi al fi tti ng and predi cti on,
as wel l as probabi l i ty functi on expl orati on.
Preface
xiv
How to Use This Guide
If you are a new user begi n wi th Chapter 1, Tutori al . Thi s chapter
i ntroduces the MATLAB stati sti cs envi ronment through the tool box functi ons.
I t descri bes the functi ons wi th regard to parti cul ar areas of i nterest, such as
probabi l i ty di stri buti ons, l i near and nonl i near model s, pri nci pal components
anal ysi s, desi gn of experi ments, stati sti cal process control , and descri pti ve
stati sti cs.
All toolbox users shoul d use Chapter 2, Reference, for i nformati on about
speci fi c tool s. For functi ons, reference descri pti ons i ncl ude a synopsi s of the
functi ons syntax, as wel l as a compl ete expl anati on of opti ons and operati on.
Many reference descri pti ons al so i ncl ude exampl es, a descri pti on of the
functi ons al gori thm, and references to addi ti onal readi ng materi al .
Use thi s gui de i n conjuncti on wi th the software to l earn about the powerful
features that MATLAB provi des. Each chapter provi des numerous exampl es
that appl y the tool box to representati ve stati sti cal tasks.
The random number generati on functi ons for vari ous probabi l i ty di stri buti ons
are based on al l the pri mi ti ve functi ons, randn and rand. There are many
exampl es that start by generati ng data usi ng random numbers. To dupl i cate
the resul ts i n these exampl es, fi rst execute the commands bel ow.
seed = 931316785;
rand('seed',seed);
randn('seed',seed);
You mi ght want to save these commands i n an M-fi l e scri pt cal l ed init.m.
Then, i nstead of three separate commands, you need onl y type init.
Related ProductsLi st
xv
Related Products List
The MathWorks provi des several products that are especi al l y rel evant to the
ki nds of tasks you can perform wi th the Stati sti cs Tool box.
For more i nformati on about any of these products, see ei ther:
The onl i ne documentati on for that product i f i t i s i nstal l ed or i f you are
readi ng the documentati on from the CD
The MathWorks Web si te, at http://www.mathworks.com; see the products
secti on
Note The tool boxes l i sted bel ow al l i ncl ude functi ons that extend MATLABs
capabi l i ti es. The bl ocksets al l i ncl ude bl ocks that extend Si mul i nks
capabi l i ti es.
Product Description
Data Acqui si ti on Tool box MATLAB functi ons for di rect access to l i ve,
measured data from MATLAB
Database Tool box Tool for connecti ng to, and i nteracti ng wi th,
most ODBC/JDBC databases from wi thi n
MATLAB
Fi nanci al Ti me Seri es
Tool box
Tool for anal yzi ng ti me seri es data i n the
fi nanci al markets
Fi nanci al Tool box MATLAB functi ons for quanti tati ve fi nanci al
model i ng and anal yti c prototypi ng
GARCH Tool box MATLAB functi ons for uni vari ate General i zed
Autoregressi ve Condi ti onal Heteroskedasti ci ty
(GARCH) vol ati l i ty model i ng
I mage Processi ng
Tool box
Compl ete sui te of di gi tal i mage processi ng and
anal ysi s tool s for MATLAB
Preface
xvi
Mappi ng Tool box Tool for anal yzi ng and di spl ayi ng
geographi cal l y based i nformati on from wi thi n
MATLAB
Neural Network Tool box Comprehensi ve envi ronment for neural
network research, desi gn, and si mul ati on
wi thi n MATLAB
Opti mi zati on Tool box Tool for general and l arge-scal e opti mi zati on of
nonl i near probl ems, as wel l as for l i near
programmi ng, quadrati c programmi ng,
nonl i near l east squares, and sol vi ng nonl i near
equati ons
Si gnal Processi ng
Tool box
Tool for al gori thm devel opment, si gnal and
l i near system anal ysi s, and ti me-seri es data
model i ng
System I denti fi cati on
Tool box
Tool for bui l di ng accurate, si mpl i fi ed model s of
compl ex systems from noi sy ti me-seri es data
Product Description
M athemati cal N otati on
xvii
Mathematical Notation
Thi s manual and the Stati sti cs Tool box functi ons use the fol l owi ng
mathemati cal notati on conventi ons.
Parameters i n a l i near model .
E(x) Expected val ue of x.
f(x| a,b) Probabi l i ty densi ty functi on. x i s the i ndependent vari abl e;
a and b are fi xed parameters.
F(x| a,b) Cumul ati ve di stri buti on functi on.
I ([a, b]) or
I
[a, b]
I ndi cator functi on. I n thi s exampl e the functi on takes the
val ue 1 on the cl osed i nterval from a to b and i s 0
el sewhere.
p and q p i s the probabi l i ty of some event.
q i s the probabi l i ty of ~p, so q = 1p.
E x ( ) tf t ( ) t d
=
Preface
xviii
Typographical Conventions
Thi s manual uses some or al l of these conventi ons.
Item Convention Used Example
Exampl e code Monospace font To assi gn the val ue 5 to A,
enter
A = 5
Functi on names/syntax Monospace font The cos functi on fi nds the
cosi ne of each array el ement.
Syntax l i ne exampl e i s
MLGetVar ML_var_name
Keys Boldface wi th an i ni ti al capi tal
l etter
Press the Return key.
Li teral stri ngs (i n syntax
descri pti ons i n reference
chapters)
Monospace bold for l i teral s f = freqspace(n,'whole')
Mathemati cal
expressi ons
I talics for vari abl es
Standard text font for functi ons,
operators, and constants
Thi s vector represents the
pol ynomi al
p = x
2
+ 2x + 3
MATLAB output Monospace font MATLAB responds wi th
A =
5
Menu ti tl es, menu i tems,
di al og boxes, and control s
Boldface wi th an i ni ti al capi tal
l etter
Choose the File menu.
New terms I talics An array i s an ordered
col l ecti on of i nformati on.
Omi tted i nput arguments (...) el l i psi s denotes al l of the
i nput/output arguments from
precedi ng syntaxes.
[c,ia,ib] = union(...)
Stri ng vari abl es (from a
fi ni te l i st)
Monospace italics sysc = d2c(sysd,'method')

1
Tutori al
Introduction . . . . . . . . . . . . . . . . . . . . 1-2
Probability Distributions . . . . . . . . . . . . . . 1-5
Descriptive Statistics . . . . . . . . . . . . . . . . 1-43
Cluster Analysis . . . . . . . . . . . . . . . . . . 1-53
Linear Models . . . . . . . . . . . . . . . . . . . 1-68
Nonlinear Regression Models . . . . . . . . . . . 1-100
Hypothesis Tests . . . . . . . . . . . . . . . . . 1-105
Multivariate Statistics . . . . . . . . . . . . . . 1-112
Statistical Plots . . . . . . . . . . . . . . . . . 1-128
Statistical Process Control (SPC) . . . . . . . . . 1-138
Design of Experiments (DOE) . . . . . . . . . . . 1-143
Demos . . . . . . . . . . . . . . . . . . . . . . 1-153
Selected Bibliography . . . . . . . . . . . . . . 1-175
1 Tutori al
1-2
Introduction
The Stati sti cs Tool box, for use wi th MATLAB, suppl i es basi c stati sti cs
capabi l i ty on the l evel of a fi rst course i n engi neeri ng or sci enti fi c stati sti cs.
The stati sti cs functi ons i t provi des are bui l di ng bl ocks sui tabl e for use i nsi de
other anal yti cal tool s.
Primary Topic Areas
The Stati sti cs Tool box has more than 200 M-fi l es, supporti ng work i n the
topi cal areas bel ow:
Probabi l i ty di stri buti ons
Descri pti ve stati sti cs
Cl uster anal ysi s
Li near model s
Nonl i near model s
Hypothesi s tests
Mul ti vari ate stati sti cs
Stati sti cal pl ots
Stati sti cal process control
Desi gn of experi ments
Proba bility Distributions
The Stati sti cs Tool box supports 20 probabi l i ty di stri buti ons. For each
di stri buti on there are fi ve associ ated functi ons. They are:
Probabi l i ty densi ty functi on (pdf)
Cumul ati ve di stri buti on functi on (cdf)
I nverse of the cumul ati ve di stri buti on functi on
Random number generator
Mean and vari ance as a functi on of the parameters
For data-dri ven di stri buti ons (beta, bi nomi al , exponenti al , gamma, normal ,
Poi sson, uni form, and Wei bul l ), the Stati sti cs Tool box has functi ons for
computi ng parameter esti mates and confi dence i nterval s.
Introducti on
1-3
Descriptive Sta tistics
The Stati sti cs Tool box provi des functi ons for descri bi ng the features of a data
sampl e. These descri pti ve stati sti cs i ncl ude measures of l ocati on and spread,
percenti l e esti mates and functi ons for deal i ng wi th data havi ng mi ssi ng
val ues.
Cluster Ana lysis
The Stati sti cs Tool box provi des functi ons that al l ow you to di vi de a set of
objects i nto subgroups, each havi ng members that are as much al i ke as
possi bl e. Thi s process i s cal l ed cluster analysis.
Linea r M odels
I n the area of l i near model s, the Stati sti cs Tool box supports one-way, two-way,
and hi gher-way anal ysi s of vari ance (ANOVA), anal ysi s of covari ance
(ANOCOVA), mul ti pl e l i near regressi on, stepwi se regressi on, response surface
predi cti on, ri dge regressi on, and one-way mul ti vari ate anal ysi s of vari ance
(MANOVA). I t supports nonparametri c versi ons of one- and two-way ANOVA.
I t al so supports mul ti pl e compari sons of the esti mates produced by ANOVA
and ANOCOVA functi ons.
N onlinea r M odels
For nonl i near model s, the Stati sti cs Tool box provi des functi ons for parameter
esti mati on, i nteracti ve predi cti on and vi sual i zati on of mul ti di mensi onal
nonl i near fi ts, and confi dence i nterval s for parameters and predi cted val ues.
Hypothesis Tests
The Stati sti cs Tool box al so provi des functi ons that do the most common tests
of hypothesi s t-tests, Z-tests, nonparametri c tests, and di stri buti on tests.
M ultiva ria te Sta tistics
The Stati sti cs Tool box supports methods i n mul ti vari ate stati sti cs, i ncl udi ng
pri nci pal components anal ysi s, l i near di scri mi nant anal ysi s, and one-way
mul ti vari ate anal ysi s of vari ance.
1 Tutori al
1-4
Sta tistica l Plots
The Stati sti cs Tool box adds box pl ots, normal probabi l i ty pl ots, Wei bul l
probabi l i ty pl ots, control charts, and quanti l e-quanti l e pl ots to the arsenal of
graphs i n MATLAB. There i s al so extended support for pol ynomi al curve fi tti ng
and predi cti on. There are functi ons to create scatter pl ots or matri ces of scatter
pl ots for grouped data, and to i denti fy poi nts i nteracti vel y on such pl ots. There
i s a functi on to i nteracti vel y expl ore a fi tted regressi on model .
Sta tistica l Process Control (SPC)
For SPC, the Stati sti cs Tool box provi des functi ons for pl otti ng common control
charts and performi ng process capabi l i ty studi es.
Design of Ex periments (DO E)
The Stati sti cs Tool box supports ful l and fracti onal factori al desi gns and
D-opti mal desi gns. There are functi ons for generati ng desi gns, augmenti ng
desi gns, and opti mal l y assi gni ng uni ts wi th fi xed covari ates.
Probabi li ty Di stri buti ons
1-5
Probability Distributions
Probabi l i ty di stri buti ons ari se from experi ments where the outcome i s subject
to chance. The nature of the experi ment di ctates whi ch probabi l i ty
di stri buti ons may be appropri ate for model i ng the resul ti ng random outcomes.
There are two types of probabi l i ty di stri buti ons continuous and discrete.
Suppose you are studyi ng a machi ne that produces vi deotape. One measure of
the qual i ty of the tape i s the number of vi sual defects per hundred feet of tape.
The resul t of thi s experi ment i s an i nteger, si nce you cannot observe 1.5
defects. To model thi s experi ment you shoul d use a di screte probabi l i ty
di stri buti on.
A measure affecti ng the cost and qual i ty of vi deotape i s i ts thi ckness. Thi ck
tape i s more expensi ve to produce, whi l e vari ati on i n the thi ckness of the tape
on the reel i ncreases the l i kel i hood of breakage. Suppose you measure the
thi ckness of the tape every 1000 feet. The resul ti ng numbers can take a
conti nuum of possi bl e val ues, whi ch suggests usi ng a conti nuous probabi l i ty
di stri buti on to model the resul ts.
Usi ng a probabi l i ty model does not al l ow you to predi ct the resul t of any
i ndi vi dual experi ment but you can determi ne the probabi l i ty that a gi ven
outcome wi l l fal l i nsi de a speci fi c range of val ues.
Continuous (data) Continuous (statistics) Discrete
Beta Chi -square Bi nomi al
Exponenti al Noncentral Chi -square Di screte Uni form
Gamma F Geometri c
Lognormal Noncentral F Hypergeometri c
Normal t Negati ve Bi nomi al
Rayl ei gh Noncentral t Poi sson
Uni form
Wei bul l
1 Tutori al
1-6
Thi s fol l owi ng two secti ons provi de more i nformati on about the avai l abl e
di stri buti ons:
Overvi ew of the Functi ons
Overvi ew of the Di stri buti ons
Overview of the Functions
MATLAB provi des fi ve functi ons for each di stri buti on, whi ch are di scussed i n
the fol l owi ng secti ons:
Probabi l i ty Densi ty Functi on (pdf)
Cumul ati ve Di stri buti on Functi on (cdf)
I nverse Cumul ati ve Di stri buti on Functi on
Random Number Generator
Mean and Vari ance as a Functi on of Parameters
Proba bility Density Function (pdf)
The probabi l i ty densi ty functi on (pdf) has a di fferent meani ng dependi ng on
whether the di stri buti on i s di screte or conti nuous.
For di screte di stri buti ons, the pdf i s the probabi l i ty of observi ng a parti cul ar
outcome. I n our vi deotape exampl e, the probabi l i ty that there i s exactl y one
defect i n a gi ven hundred feet of tape i s the val ue of the pdf at 1.
Unl i ke di screte di stri buti ons, the pdf of a conti nuous di stri buti on at a val ue i s
not the probabi l i ty of observi ng that val ue. For conti nuous di stri buti ons the
probabi l i ty of observi ng any parti cul ar val ue i s zero. To get probabi l i ti es you
must i ntegrate the pdf over an i nterval of i nterest. For exampl e the probabi l i ty
of the thi ckness of a vi deotape bei ng between one and two mi l l i meters i s the
i ntegral of the appropri ate pdf from one to two.
A pdf has two theoreti cal properti es:
The pdf i s zero or posi ti ve for every possi bl e outcome.
The i ntegral of a pdf over i ts enti re range of val ues i s one.
A pdf i s not a si ngl e functi on. Rather a pdf i s a fami l y of functi ons characteri zed
by one or more parameters. Once you choose (or esti mate) the parameters of a
pdf, you have uni quel y speci fi ed the functi on.
1-7
The pdf functi on cal l has the same general format for every di stri buti on i n the
Stati sti cs Tool box. The fol l owi ng commands i l l ustrate how to cal l the pdf for
the normal di stri buti on.
x = [-3:0.1:3];
f = normpdf(x,0,1);
The vari abl e f contai ns the densi ty of the normal pdf wi th parameters =0 and
=1 at the val ues i n x. The fi rst i nput argument of every pdf i s the set of val ues
for whi ch you want to eval uate the densi ty. Other arguments contai n as many
parameters as are necessary to defi ne the di stri buti on uni quel y. The normal
di stri buti on requi res two parameters; a l ocati on parameter (the mean, ) and
a scal e parameter (the standard devi ati on, ).
Cumula tive Distribution Function (cdf)
I f f i s a probabi l i ty densi ty functi on for random vari abl e X, the associ ated
cumul ati ve di stri buti on functi on (cdf) F i s
The cdf of a val ue x, F(x), i s the probabi l i ty of observi ng any outcome l ess than
or equal to x.
A cdf has two theoreti cal properti es:
The cdf ranges from 0 to 1.
I f y > x, then the cdf of y i s greater than or equal to the cdf of x.
The cdf functi on cal l has the same general format for every di stri buti on i n the
Stati sti cs Tool box. The fol l owi ng commands i l l ustrate how to cal l the cdf for the
normal di stri buti on.
x = [-3:0.1:3];
p = normcdf(x,0,1);
The vari abl e p contai ns the probabi l i ti es associ ated wi th the normal cdf wi th
parameters =0 and =1 at the val ues i n x. The fi rst i nput argument of every
cdf i s the set of val ues for whi ch you want to eval uate the probabi l i ty. Other
arguments contai n as many parameters as are necessary to defi ne the
di stri buti on uni quel y.
F x ( ) P X x ( ) f t ( ) t d

x
= =
1 Tutori al
1-8
Inverse Cumula tive Distribution Function
The i nverse cumul ati ve di stri buti on functi on returns cri ti cal val ues for
hypothesi s testi ng gi ven si gni fi cance probabi l i ti es. To understand the
rel ati onshi p between a conti nuous cdf and i ts i nverse functi on, try the
fol l owi ng:
x = [-3:0.1:3];
xnew = norminv(normcdf(x,0,1),0,1);
How does xnew compare wi th x? Conversel y, try thi s:
p = [0.1:0.1:0.9];
pnew = normcdf(norminv(p,0,1),0,1);
How does pnew compare wi th p?
Cal cul ati ng the cdf of val ues i n the domai n of a conti nuous di stri buti on returns
probabi l i ti es between zero and one. Appl yi ng the i nverse cdf to these
probabi l i ti es yi el ds the ori gi nal val ues.
For di screte di stri buti ons, the rel ati onshi p between a cdf and i ts i nverse
functi on i s more compl i cated. I t i s l i kel y that there i s no x val ue such that the
cdf of x yi el ds p. I n these cases the i nverse functi on returns the fi rst val ue x
such that the cdf of x equal s or exceeds p. Try thi s:
x = [0:10];
y = binoinv(binocdf(x,10,0.5),10,0.5);
How does x compare wi th y?
The commands bel ow i l l ustrate the probl em wi th reconstructi ng the
probabi l i ty p from the val ue x for di screte di stri buti ons.
p = [0.1:0.2:0.9];
pnew = binocdf(binoinv(p,10,0.5),10,0.5)
pnew =
0.1719 0.3770 0.6230 0.8281 0.9453
The i nverse functi on i s useful i n hypothesi s testi ng and producti on of
confi dence i nterval s. Here i s the way to get a 99% confi dence i nterval for a
normal l y di stri buted sampl e.
1-9
p = [0.005 0.995];
x = norminv(p,0,1)
x =
-2.5758 2.5758
The vari abl e x contai ns the val ues associ ated wi th the normal i nverse functi on
wi th parameters =0 and =1 at the probabi l i ti es i n p. The di fference
p(2)-p(1) i s 0.99. Thus, the val ues i n x defi ne an i nterval that contai ns 99%
of the standard normal probabi l i ty.
The i nverse functi on cal l has the same general format for every di stri buti on i n
the Stati sti cs Tool box. The fi rst i nput argument of every i nverse functi on i s the
set of probabi l i ti es for whi ch you want to eval uate the cri ti cal val ues. Other
arguments contai n as many parameters as are necessary to defi ne the
di stri buti on uni quel y.
Ra ndom N umber Genera tor
The methods for generati ng random numbers from any di stri buti on al l start
wi th uni form random numbers. Once you have a uni form random number
generator, you can produce random numbers from other di stri buti ons ei ther
di rectl y or by usi ng i nversi on or rejecti on methods, descri bed bel ow. See
Syntax for Random Number Functi ons on page 1-10 for detai l s on usi ng
generator functi ons.
Direct. Di rect methods fl ow from the defi ni ti on of the di stri buti on.
As an exampl e, consi der generati ng bi nomi al random numbers. You can thi nk
of bi nomi al random numbers as the number of heads i n n tosses of a coi n wi th
probabi l i ty p of a heads on any toss. I f you generate n uni form random numbers
and count the number that are greater than p, the resul t i s bi nomi al wi th
parameters n and p.
Inversion. The i nversi on method works due to a fundamental theorem that
rel ates the uni form di stri buti on to other conti nuous di stri buti ons.
I f F i s a conti nuous di stri buti on wi th i nverse F
-1
, and U i s a uni form random
number, then F
-1
(U) has di stri buti on F.
So, you can generate a random number from a di stri buti on by appl yi ng the
i nverse functi on for that di stri buti on to a uni form random number.
Unfortunatel y, thi s approach i s usual l y not the most effi ci ent.
1 Tutori al
1-10
Rejection. The functi onal form of some di stri buti ons makes i t di ffi cul t or ti me
consumi ng to generate random numbers usi ng di rect or i nversi on methods.
Rejecti on methods can someti mes provi de an el egant sol uti on i n these cases.
Suppose you want to generate random numbers from a di stri buti on wi th pdf f.
To use rejecti on methods you must fi rst fi nd another densi ty, g, and a
constant, c, so that the i nequal i ty bel ow hol ds.
You then generate the random numbers you want usi ng the fol l owi ng steps:
1 Generate a random number x from di stri buti on G wi th densi ty g.
2 Form the rati o .
3 Generate a uni form random number u.
4 I f the product of u and r i s l ess than one, return x.
5 Otherwi se repeat steps one to three.
For effi ci ency you need a cheap method for generati ng random numbers
from G, and the scal ar c shoul d be smal l . The expected number of i terati ons
i s c.
Synta x for Ra ndom N umber Functions. You can generate random numbers from
each di stri buti on. Thi s functi on provi des a si ngl e random number or a matri x
of random numbers, dependi ng on the arguments you speci fy i n the functi on
cal l .
For exampl e, here i s the way to generate random numbers from the beta
di stri buti on. Four statements obtai n random numbers: the fi rst returns a
si ngl e number, the second returns a 2-by-2 matri x of random numbers, and the
thi rd and fourth return 2-by-3 matri ces of random numbers.
a = 1;
b = 2;
c = [.1 .5; 1 2];
d = [.25 .75; 5 10];
m = [2 3];
nrow = 2;
ncol = 3;
f x ( ) cg x ( ) x
r
cg x ( )
f x ( )
-------------- =
1-11
r1 = betarnd(a,b)
r1 =
0.4469
r2 = betarnd(c,d)
r2 =
0.8931 0.4832
0.1316 0.2403
r3 = betarnd(a,b,m)
r3 =
0.4196 0.6078 0.1392
0.0410 0.0723 0.0782
r4 = betarnd(a,b,nrow,ncol)
r4 =
0.0520 0.3975 0.1284
0.3891 0.1848 0.5186
M ea n a nd Va ria nce a s a Function of Pa ra meters
The mean and vari ance of a probabi l i ty di stri buti on are general l y si mpl e
functi ons of the parameters of the di stri buti on. The Stati sti cs Tool box
functi ons endi ng i n "stat" al l produce the mean and vari ance of the desi red
di stri buti on for the gi ven parameters.
The exampl e bel ow shows a contour pl ot of the mean of the Wei bul l di stri buti on
as a functi on of the parameters.
x = (0.5:0.1:5);
y = (1:0.04:2);
[X,Y] = meshgrid(x,y);
Z = weibstat(X,Y);
[c,h] = contour(x,y,Z,[0.4 0.6 1.0 1.8]);
clabel(c);
1 Tutori al
1-12
Overview of the Distributions
The fol l owi ng secti ons descri be the avai l abl e probabi l i ty di stri buti ons:
Beta Di stri buti on on page 1-13
Bi nomi al Di stri buti on on page 1-15
Chi -Square Di stri buti on on page 1-17
Noncentral Chi -Square Di stri buti on on page 1-18
Di screte Uni form Di stri buti on on page 1-20
Exponenti al Di stri buti on on page 1-21
F Di stri buti on on page 1-23
Noncentral F Di stri buti on on page 1-24
Gamma Di stri buti on on page 1-25
Geometri c Di stri buti on on page 1-27
Hypergeometri c Di stri buti on on page 1-28
Lognormal Di stri buti on on page 1-30
Negati ve Bi nomi al Di stri buti on on page 1-31
Normal Di stri buti on on page 1-32
Poi sson Di stri buti on on page 1-34
Rayl ei gh Di stri buti on on page 1-35
Students t Di stri buti on on page 1-37
Noncentral t Di stri buti on on page 1-38
Uni form (Conti nuous) Di stri buti on on page 1-39
Wei bul l Di stri buti on on page 1-40
1 2 3 4 5
1
1.2
1.4
1.6
1.8
2
0.4
0.6
1
1.8
1-13
Beta Distribution
The fol l owi ng secti ons provi de an overvi ew of the beta di stri buti on.
Ba ckground on the Beta Distribution. The beta di stri buti on descri bes a fami l y of
curves that are uni que i n that they are nonzero onl y on the i nterval (0 1). A
more general versi on of the functi on assi gns parameters to the end-poi nts of
the i nterval .
The beta cdf i s the same as the i ncompl ete beta functi on.
The beta di stri buti on has a functi onal rel ati onshi p wi th the t di stri buti on. I f Y
i s an observati on from Students t di stri buti on wi th degrees of freedom, then
the fol l owi ng transformati on generates X, whi ch i s beta di stri buted.
i f then
The Stati sti cs Tool box uses thi s rel ati onshi p to compute val ues of the t cdf and
i nverse functi on as wel l as generati ng t di stri buted random numbers.
Definition of the Beta Distribution. The beta pdf i s
where B( ) i s the Beta functi on. The i ndi cator functi on I
(0,1)
(x) ensures that
onl y val ues of x i n the range (0 1) have nonzero probabi l i ty.
Pa ra meter Estima tion for the Beta Distribution. Suppose you are col l ecti ng data that
has hard l ower and upper bounds of zero and one respecti vel y. Parameter
esti mati on i s the process of determi ni ng the parameters of the beta
di stri buti on that fi t thi s data best i n some sense.
One popul ar cri teri on of goodness i s to maxi mi ze the l i kel i hood functi on. The
l i kel i hood has the same form as the beta pdf. But for the pdf, the parameters
are known constants and the vari abl e i s x. The l i kel i hood functi on reverses the
rol es of the vari abl es. Here, the sampl e val ues (the xs) are al ready observed.
So they are the fi xed constants. The vari abl es are the unknown parameters.
X
1
2
---
1
2
---
Y
Y
2
+
-------------------- + =
Y t ( ) X

2
---

2
--- ,
,
_
y f x a b , ( )
1
B a b , ( )
-------------------x
a 1
1 x ( )
b 1
I
0 1 , ( )
x ( ) = =
1 Tutori al
1-14
Maxi mum l i kel i hood esti mati on (MLE) i nvol ves cal cul ati ng the val ues of the
parameters that gi ve the hi ghest l i kel i hood gi ven the parti cul ar set of data.
The functi on betafit returns the MLEs and confi dence i nterval s for the
parameters of the beta di stri buti on. Here i s an exampl e usi ng random numbers
from the beta di stri buti on wi th a =5 and b =0.2.
r = betarnd(5,0.2,100,1);
[phat, pci] = betafit(r)
phat =
4.5330 0.2301
pci =
2.8051 0.1771
6.2610 0.2832
The MLE for parameter a i s 4.5330, compared to the true val ue of 5. The 95%
confi dence i nterval for a goes from 2.8051 to 6.2610, whi ch i ncl udes the true
val ue.
Si mi l arl y the MLE for parameter b i s 0.2301, compared to the true val ue of 0.2.
The 95% confi dence i nterval for b goes from 0.1771 to 0.2832, whi ch al so
i ncl udes the true val ue. Of course, i n thi s made-up exampl e we know the true
val ue. I n experi mentati on we do not.
Exa mple a nd Plot of the Beta Distribution. The shape of the beta di stri buti on i s qui te
vari abl e dependi ng on the val ues of the parameters, as i l l ustrated by the pl ot
bel ow.
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
2
2.5
a = b = 1
a = b = 4
a = b = 0.75
1-15
The constant pdf (the fl at l i ne) shows that the standard uni form di stri buti on i s
a speci al case of the beta di stri buti on.
Binomia l Distribution
The fol l owi ng secti ons provi de an overvi ew of the bi nomi al di stri buti on.
Ba ckground of the Binomia l Distribution. The bi nomi al di stri buti on model s the total
number of successes i n repeated tri al s from an i nfi ni te popul ati on under the
fol l owi ng condi ti ons:
Onl y two outcomes are possi bl e on each of n tri al s.
The probabi l i ty of success for each tri al i s constant.
Al l tri al s are i ndependent of each other.
James Bernoul l i deri ved the bi nomi al di stri buti on i n 1713 (Ars Conjectandi).
Earl i er, Bl ai se Pascal had consi dered the speci al case where p = 1/2.
Definition of the Binomia l Distribution. The bi nomi al pdf i s
where and .
The bi nomi al di stri buti on i s di screte. For zero and for posi ti ve i ntegers l ess
than n, the pdf i s nonzero.
Pa ra meter Estima tion for the Binomia l Distribution. Suppose you are col l ecti ng data
from a wi dget manufacturi ng process, and you record the number of wi dgets
wi thi n speci fi cati on i n each batch of 100. You mi ght be i nterested i n the
probabi l i ty that an i ndi vi dual wi dget i s wi thi n speci fi cati on. Parameter
esti mati on i s the process of determi ni ng the parameter, p, of the bi nomi al
di stri buti on that fi ts thi s data best i n some sense.
l i kel i hood has the same form as the bi nomi al pdf above. But for the pdf, the
parameters (n and p) are known constants and the vari abl e i s x. The l i kel i hood
functi on reverses the rol es of the vari abl es. Here, the sampl e val ues (the xs)
are al ready observed. So they are the fi xed constants. The vari abl es are the
y f x n p , ( )
n
x ,
_
p
x
q
1 x ( )
I
0 1 n , , , ( )
x ( ) = =
n
x ,
_
n!
x! n x ( )!
------------------------ = q 1 p =
1 Tutori al
1-16
unknown parameters. MLE i nvol ves cal cul ati ng the val ue of p that gi ve the
hi ghest l i kel i hood gi ven the parti cul ar set of data.
The functi on binofit returns the MLEs and confi dence i nterval s for the
parameters of the bi nomi al di stri buti on. Here i s an exampl e usi ng random
numbers from the bi nomi al di stri buti on wi th n =100 and p =0.9.
r = binornd(100,0.9)
r =
88
[phat, pci] = binofit(r,100)
phat =
0.8800
pci =
0.7998
0.9364
The MLE for parameter p i s 0.8800, compared to the true val ue of 0.9. The 95%
confi dence i nterval for p goes from 0.7998 to 0.9364, whi ch i ncl udes the true
val ue. Of course, i n thi s made-up exampl e we know the true val ue of p. I n
experi mentati on we do not.
Exa mple a nd Plot of the Binomia l Distribution. The fol l owi ng commands generate a
pl ot of the bi nomi al pdf for n = 10 and p = 1/2.
x = 0:10;
y = binopdf(x,10,0.5);
plot(x,y,'+')
1-17
Chi- Squa re Distribution
The fol l owi ng secti ons provi de an overvi ew of the
2
di stri buti on.
Ba ckground of the Chi-Squa re Distribution. The
2
di stri buti on i s a speci al case of the
gamma di stri buti on where b =2 i n the equati on for gamma di stri buti on bel ow.
The
2
di stri buti on gets speci al attenti on because of i ts i mportance i n normal
sampl i ng theory. I f a set of n observati ons i s normal l y di stri buted wi th
vari ance
2
, and s
2
i s the sampl e standard devi ati on, then
The Stati sti cs Tool box uses the above rel ati onshi p to cal cul ate confi dence
i nterval s for the esti mate of the normal parameter
2
i n the functi on normfit.
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
0.25
y f x a b , ( )
1
b
a
a ( )
------------------x
a 1
e
x
b
---
= =
n 1 ( )s
2
2
-----------------------
2
n 1 ( )
1 Tutori al
1-18
Definition of the Chi-Squa re Distribution. The
2
pdf i s
where ( ) i s the Gamma functi on, and i s the degrees of freedom.
Exa mple a nd Plot of the Chi-Squa re Distribution. The
2
di stri buti on i s skewed to the
ri ght especi al l y for few degrees of freedom (). The pl ot shows the
2

di stri buti on wi th four degrees of freedom.
x = 0:0.2:15;
y = chi2pdf(x,4);
plot(x,y)
N oncentra l Chi- Squa re Distribution
The fol l owi ng secti ons provi de an overvi ew of the noncentral
2
di stri buti on.
Ba ckground of the N oncentra l Chi-Squa re Distribution. The
2
di stri buti on i s actual l y
a si mpl e speci al case of the noncentral chi -square di stri buti on. One way to
generate random numbers wi th a
2
di stri buti on (wi th degrees of freedom) i s
to sum the squares of standard normal random numbers (mean equal to zero.)
What i f we al l ow the normal l y di stri buted quanti ti es to have a mean other than
zero? The sum of squares of these numbers yi el ds the noncentral chi -square
di stri buti on. The noncentral chi -square di stri buti on requi res two parameters;
the degrees of freedom and the noncentral i ty parameter. The noncentral i ty
parameter i s the sum of the squared means of the normal l y di stri buted
quanti ti es.
y f x ( )
x
2 ( ) 2
e
x 2
2
v
2
---
2 ( )
------------------------------------- = =
0 5 10 15
0
0.05
0.1
0.15
0.2
1-19
The noncentral chi -square has sci enti fi c appl i cati on i n thermodynami cs and
si gnal processi ng. The l i terature i n these areas may refer to i t as the Ri cean or
general i zed Rayl ei gh di stri buti on.
Definition of the N oncentra l Chi-Squa re Distribution. There are many equi val ent
formul as for the noncentral chi -square di stri buti on functi on. One formul ati on
uses a modi fi ed Bessel functi on of the fi rst ki nd. Another uses the general i zed
Laguerre pol ynomi al s. The Stati sti cs Tool box computes the cumul ati ve
di stri buti on functi on val ues usi ng a wei ghted sum of
2
probabi l i ti es wi th the
wei ghts equal to the probabi l i ti es of a Poi sson di stri buti on. The Poi sson
parameter i s one-hal f of the noncentral i ty parameter of the noncentral
chi -square.
where i s the noncentral i ty parameter.
Exa mple of the N oncentra l Chi-Squa re Distribution. The fol l owi ng commands generate
a pl ot of the noncentral chi -square pdf.
x = (0:0.1:10)';
p1 = ncx2pdf(x,4,2);
p = chi2pdf(x,4);
plot(x,p,'--',x,p1,'-')
F x , ( )
1
2
---
,
_
j
j!
--------------e

2
---
,

_
Pr
2j +
2
x [ ]
j 0 =
=
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
1 Tutori al
1-20
Discrete Unifor m Distribution
The fol l owi ng secti ons provi de an overvi ew of the di screte uni form di stri buti on.
Ba ckground of the Discrete Uniform Distribution. The di screte uni form di stri buti on i s
a si mpl e di stri buti on that puts equal wei ght on the i ntegers from one to N.
Definition of the Discrete Uniform Distribution. The di screte uni form pdf i s
Exa mple a nd Plot of the Discrete Uniform Distribution. As for al l di screte di stri buti ons,
the cdf i s a step functi on. The pl ot shows the di screte uni form cdf for N = 10.
x = 0:10;
y = unidcdf(x,10);
stairs(x,y)
set(gca,'Xlim',[0 11])
To pi ck a random sampl e of 10 from a l i st of 553 i tems:
numbers = unidrnd(553,1,10)
numbers =
293 372 5 213 37 231 380 326 515 468
y f x N ( )
1
N
---- I
1 N , , ( )
x ( ) = =
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
1-21
Ex ponentia l Distribution
The fol l owi ng secti ons provi de an overvi ew of the exponenti al di stri buti on.
Ba ckground of the Exponentia l Distribution. Li ke the chi -square di stri buti on, the
exponenti al di stri buti on i s a speci al case of the gamma di stri buti on (obtai ned
by setti ng a =1)
where ( ) i s the Gamma functi on.
The exponenti al di stri buti on i s speci al because of i ts uti l i ty i n model i ng events
that occur randoml y over ti me. The mai n appl i cati on area i s i n studi es of
l i feti mes.
Definition of the Exponentia l Distribution. The exponenti al pdf i s
Pa ra meter Estima tion for the Exponentia l Distribution. Suppose you are stress testi ng
l i ght bul bs and col l ecti ng data on thei r l i feti mes. You assume that these
l i feti mes fol l ow an exponenti al di stri buti on. You want to know how l ong you
can expect the average l i ght bul b to l ast. Parameter esti mati on i s the process
of determi ni ng the parameters of the exponenti al di stri buti on that fi t thi s data
best i n some sense.
l i kel i hood has the same form as the exponenti al pdf above. But for the pdf, the
parameters are known constants and the vari abl e i s x. The l i kel i hood functi on
reverses the rol es of the vari abl es. Here, the sampl e val ues (the xs) are al ready
observed. So they are the fi xed constants. The vari abl es are the unknown
parameters. MLE i nvol ves cal cul ati ng the val ues of the parameters that gi ve
the hi ghest l i kel i hood gi ven the parti cul ar set of data.
y f x a b , ( )
1
b
a
a ( )
------------------x
a 1
e
x
b
---
= =
y f x ( )
1
---e
x
---
= =
1 Tutori al
1-22
The functi on expfit returns the MLEs and confi dence i nterval s for the
parameters of the exponenti al di stri buti on. Here i s an exampl e usi ng random
numbers from the exponenti al di stri buti on wi th =700.
lifetimes = exprnd(700,100,1);
[muhat, muci] = expfit(lifetimes)
muhat =
672.8207
muci =
547.4338
810.9437
The MLE for parameter i s 672, compared to the true val ue of 700. The 95%
confi dence i nterval for goes from 547 to 811, whi ch i ncl udes the true val ue.
I n our l i fe tests we do not know the true val ue of so i t i s ni ce to have a
confi dence i nterval on the parameter to gi ve a range of l i kel y val ues.
Exa mple a nd Plot of the Exponentia l Distribution. For exponenti al l y di stri buted
l i feti mes, the probabi l i ty that an i tem wi l l survi ve an extra uni t of ti me i s
i ndependent of the current age of the i tem. The exampl e shows a speci fi c case
of thi s speci al property.
l = 10:10:60;
lpd = l+0.1;
deltap = (expcdf(lpd,50)-expcdf(l,50))./(1-expcdf(l,50))
deltap =
0.0020 0.0020 0.0020 0.0020 0.0020 0.0020
The pl ot bel ow shows the exponenti al pdf wi th i ts parameter (and mean), , set
to 2.
x = 0:0.1:10;
y = exppdf(x,2);
plot(x,y)
1-23
F Distribution
The fol l owi ng secti ons provi de an overvi ew of the F di stri buti on.
Ba ckground of the F distribution. The F di stri buti on has a natural rel ati onshi p wi th
the chi -square di stri buti on. I f
1
and
2
are both chi -square wi th
1
and
2

degrees of freedom respecti vel y, then the stati sti c F bel ow i s F di stri buted.
The two parameters,
1
and
2
, are the numerator and denomi nator degrees of
freedom. That i s,
1
and
2
are the number of i ndependent pi eces i nformati on
used to cal cul ate
1
and
2
respecti vel y.
Definition of the F distribution. The pdf for the F di stri buti on i s
Exa mple a nd Plot of the F distribution. The most common appl i cati on of the F
di stri buti on i s i n standard tests of hypotheses i n anal ysi s of vari ance and
regressi on.
0 2 4 6 8 10
0
0.1
0.2
0.3
0.4
0.5
F
1

2
, ( )
1
------
2
------
------ =
y f x
1

2
, ( )

1

2
+ ( )
2
-----------------------

1
2
------
,
_

2
2
------
,
_
--------------------------------

1
2
------
,
_
1
2
-----
x
1
2
2
--------------
1

1
2
------
,
_
x +
1

2
+
2
-----------------
-------------------------------------------- = =
1 Tutori al
1-24
The pl ot shows that the F di stri buti on exi sts on the posi ti ve real numbers and
i s skewed to the ri ght.
x = 0:0.01:10;
y = fpdf(x,5,3);
plot(x,y)
N oncentra l F Distribution
The fol l owi ng secti ons provi de an overvi ew of the noncentral F di stri buti on.
Ba ckground of the N oncentra l F Distribution. As wi th the
2
di stri buti on, the
F di stri buti on i s a speci al case of the noncentral F di stri buti on. The
F di stri buti on i s the resul t of taki ng the rati o of two
2
random vari abl es each
di vi ded by i ts degrees of freedom.
I f the numerator of the rati o i s a noncentral chi -square random vari abl e
di vi ded by i ts degrees of freedom, the resul ti ng di stri buti on i s the noncentral
F di stri buti on.
The mai n appl i cati on of the noncentral F di stri buti on i s to cal cul ate the power
of a hypothesi s test rel ati ve to a parti cul ar al ternati ve.
Definition of the N oncentra l F Distribution. Si mi l ar to the noncentral
2
di stri buti on,
the tool box cal cul ates noncentral F di stri buti on probabi l i ti es as a wei ghted
sum of i ncompl ete beta functi ons usi ng Poi sson probabi l i ti es as the wei ghts.
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
F x
1

2
, , ( )
1
2
---
,
_
j
j!
--------------e

2
---
,

_
I

1
x
2
+
1
x
-------------------------

1
2
------ j +

2
2
------ ,
,

_
j 0 =
=
1-25
I (x| a,b) i s the i ncompl ete beta functi on wi th parameters a and b, and i s the
noncentral i ty parameter.
Exa mple a nd Plot of the N oncentra l F Distribution. The fol l owi ng commands generate
a pl ot of the noncentral F pdf.
x = (0.01:0.1:10.01)';
p1 = ncfpdf(x,5,20,10);
p = fpdf(x,5,20);
plot(x,p,'--',x,p1,'-')
Ga mma Distribution
The fol l owi ng secti ons provi de an overvi ew of the gamma di stri buti on.
Ba ckground of the Ga mma Distribution. The gamma di stri buti on i s a fami l y of
curves based on two parameters. The chi -square and exponenti al di stri buti ons,
whi ch are chi l dren of the gamma di stri buti on, are one-parameter di stri buti ons
that fi x one of the two gamma parameters.
The gamma di stri buti on has the fol l owi ng rel ati onshi p wi th the i ncompl ete
Gamma functi on.
For b = 1 the functi ons are i denti cal .
When a i s l arge, the gamma di stri buti on cl osel y approxi mates a normal
di stri buti on wi th the advantage that the gamma di stri buti on has densi ty onl y
for posi ti ve real numbers.
0 2 4 6 8 10 12
0
0.2
0.4
0.6
0.8
x a b , ( ) gammai nc
x
b
--- a ,
,
_
=
1 Tutori al
1-26
Definition of the Ga mma Distribution. The gamma pdf i s
Pa ra meter Estima tion for the Ga mma Distribution. Suppose you are stress testi ng
computer memory chi ps and col l ecti ng data on thei r l i feti mes. You assume that
these l i feti mes fol l ow a gamma di stri buti on. You want to know how l ong you
can expect the average computer memory chi p to l ast. Parameter esti mati on i s
the process of determi ni ng the parameters of the gamma di stri buti on that fi t
thi s data best i n some sense.
l i kel i hood has the same form as the gamma pdf above. But for the pdf, the
parameters are known constants and the vari abl e i s x. The l i kel i hood functi on
reverses the rol es of the vari abl es. Here, the sampl e val ues (the xs) are al ready
observed. So they are the fi xed constants. The vari abl es are the unknown
parameters. MLE i nvol ves cal cul ati ng the val ues of the parameters that gi ve
the hi ghest l i kel i hood gi ven the parti cul ar set of data.
The functi on gamfit returns the MLEs and confi dence i nterval s for the
parameters of the gamma di stri buti on. Here i s an exampl e usi ng random
numbers from the gamma di stri buti on wi th a =10 and b =5.
lifetimes = gamrnd(10,5,100,1);
[phat, pci] = gamfit(lifetimes)
phat =
10.9821 4.7258
pci =
7.4001 3.1543
14.5640 6.2974
Note phat(1) = and phat(2) = . The MLE for parameter a i s 10.98,
compared to the true val ue of 10. The 95% confi dence i nterval for a goes from
7.4 to 14.6, whi ch i ncl udes the true val ue.
y f x a b , ( )
1
b
a
a ( )
------------------x
a 1
e
x
b
---
= =
a b

1-27
Si mi l arl y the MLE for parameter b i s 4.7, compared to the true val ue of 5. The
95% confi dence i nterval for b goes from 3.2 to 6.3, whi ch al so i ncl udes the true
val ue.
I n our l i fe tests we do not know the true val ue of a and b so i t i s ni ce to have a
confi dence i nterval on the parameters to gi ve a range of l i kel y val ues.
Exa mple a nd Plot of the Ga mma Distribution. I n the exampl e the gamma pdf i s
pl otted wi th the sol i d l i ne. The normal pdf has a dashed l i ne type.
x = gaminv((0.005:0.01:0.995),100,10);
y = gampdf(x,100,10);
y1 = normpdf(x,1000,100);
plot(x,y,'-',x,y1,'-.')
Geometric Distribution
The fol l owi ng secti ons provi de an overvi ew of the geometri c di stri buti on.
Ba ckground of the Geometric Distribution. The geometri c di stri buti on i s di screte,
exi sti ng onl y on the nonnegati ve i ntegers. I t i s useful for model i ng the runs of
consecuti ve successes (or fai l ures) i n repeated i ndependent tri al s of a system.
The geometri c di stri buti on model s the number of successes before one fai l ure
i n an i ndependent successi on of tests where each test resul ts i n success or
fai l ure.
700 800 900 1000 1100 1200 1300
0
1
2
3
4
5
x 10
-3
1 Tutori al
1-28
Definition of the Geometric Distribution. The geometri c pdf i s
where q = 1 p.
Exa mple a nd Plot of the Geometric Distribution. Suppose the probabi l i ty of a
fi ve-year-ol d battery fai l i ng i n col d weather i s 0.03. What i s the probabi l i ty of
starti ng 25 consecuti ve days duri ng a l ong col d snap?
1 - geocdf(25,0.03)
ans =
0.4530
The pl ot shows the cdf for thi s scenari o.
x = 0:25;
y = geocdf(x,0.03);
stairs(x,y)
Hypergeometric Distribution
The fol l owi ng secti ons provi de an overvi ew of the hypergeometri c di stri buti on.
Ba ckground of the Hypergeometric Distribution. The hypergeometri c di stri buti on
model s the total number of successes i n a fi xed si ze sampl e drawn wi thout
repl acement from a fi ni te popul ati on.
The di stri buti on i s di screte, exi sti ng onl y for nonnegati ve i ntegers l ess than the
number of sampl es or the number of possi bl e successes, whi chever i s greater.
y f x p ( ) pq
x
I
0 1 , , ( )
x ( ) = =
0 5 10 15 20 25
0
0.2
0.4
0.6
1-29
The hypergeometri c di stri buti on di ffers from the bi nomi al onl y i n that the
popul ati on i s fi ni te and the sampl i ng from the popul ati on i s wi thout
repl acement.
The hypergeometri c di stri buti on has three parameters that have di rect
physi cal i nterpretati ons. M i s the si ze of the popul ati on. K i s the number of
i tems wi th the desi red characteri sti c i n the popul ati on. n i s the number of
sampl es drawn. Sampl i ng wi thout repl acement means that once a parti cul ar
sampl e i s chosen, i t i s removed from the rel evant popul ati on for al l subsequent
sel ecti ons.
Definition of the Hypergeometric Distribution. The hypergeometri c pdf i s
Exa mple a nd Plot of the Hypergeometric Distribution. The pl ot shows the cdf of an
experi ment taki ng 20 sampl es from a group of 1000 where there are 50 i tems
of the desi red type.
x = 0:10;
y = hygecdf(x,1000,50,20);
stairs(x,y)
y f x M K n , , ( )
K
x ,
_
M K
n x ,
_
M
n ,
_
------------------------------- = =
0 2 4 6 8 10
0.2
0.4
0.6
0.8
1
1 Tutori al
1-30
Lognor ma l Distribution
The fol l owi ng secti ons provi de an overvi ew of the l ognormal di stri buti on.
Ba ckground of the Lognorma l Distribution. The normal and l ognormal di stri buti ons
are cl osel y rel ated. I f X i s di stri buted l ognormal wi th parameters and
2
, then
lnX i s di stri buted normal wi th parameters and
2
.
The l ognormal di stri buti on i s appl i cabl e when the quanti ty of i nterest must be
posi ti ve, si nce lnX exi sts onl y when the random vari abl e X i s posi ti ve.
Economi sts often model the di stri buti on of i ncome usi ng a l ognormal
di stri buti on.
Definition of the Lognorma l Distribution. The l ognormal pdf i s
Exa mple a nd Plot of the Lognorma l Distribution. Suppose the i ncome of a fami l y of
four i n the Uni ted States fol l ows a l ognormal di stri buti on wi th =l og(20,000)
and
2
=1.0. Pl ot the i ncome densi ty.
x = (10:1000:125010)';
y = lognpdf(x,log(20000),1.0);
plot(x,y)
set(gca,'xtick',[0 30000 60000 90000 120000])
set(gca,'xticklabel',str2mat('0','$30,000','$60,000',...
'$90,000','$120,000'))
y f x , ( )
1
x 2
------------------
e
l n x ( )
2
2
2
----------------------------
= =
0 $30,000 $60,000 $90,000 $120,000
0
2
4
x 10
-5
1-31
N ega tive Binomia l Distribution
The fol l owi ng secti ons provi de an overvi ew of the negati ve bi nomi al
di stri buti on.
Ba ckground of the N ega tive Binomia l Distribution. The geometri c di stri buti on i s a
speci al case of the negati ve bi nomi al di stri buti on (al so cal l ed the Pascal
di stri buti on). The geometri c di stri buti on model s the number of successes
before one fai l ure i n an i ndependent successi on of tests where each test resul ts
i n success or fai l ure.
I n the negati ve bi nomi al di stri buti on the number of fai l ures i s a parameter of
the di stri buti on. The parameters are the probabi l i ty of success, p, and the
number of fai l ures, r.
Definition of the N ega tive Binomia l Distribution. The negati ve bi nomi al pdf i s
where .
Exa mple a nd Plot of the N ega tive Binomia l Distribution. The fol l owi ng commands
generate a pl ot of the negati ve bi nomi al pdf.
x = (0:10);
y = nbinpdf(x,3,0.5);
plot(x,y,'+')
set(gca,'XLim',[-0.5,10.5])
y f x r p , ( )
r x 1 +
x ,
_
p
r
q
x
I
0 1 , , ( )
x ( ) = =
q 1 p =
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
1 Tutori al
1-32
N or ma l Distribution
The fol l owi ng secti ons provi de an overvi ew of the normal di stri buti on.
Ba ckground of the N orma l Distribution. The normal di stri buti on i s a two parameter
fami l y of curves. The fi rst parameter, , i s the mean. The second, , i s the
standard devi ati on. The standard normal di stri buti on (wri tten (x)) sets to 0
and to 1.
(x) i s functi onal l y rel ated to the error functi on, erf.
The fi rst use of the normal di stri buti on was as a conti nuous approxi mati on to
the bi nomi al .
The usual justi fi cati on for usi ng the normal di stri buti on for model i ng i s the
Central Li mi t Theorem, whi ch states (roughl y) that the sum of i ndependent
sampl es from any di stri buti on wi th fi ni te mean and vari ance converges to the
normal di stri buti on as the sampl e si ze goes to i nfi ni ty.
Definition of the N orma l Distribution. The normal pdf i s
Pa ra meter Estima tion for the N orma l Distribution. One of the fi rst appl i cati ons of the
normal di stri buti on i n data anal ysi s was model i ng the hei ght of school
chi l dren. Suppose we want to esti mate the mean, , and the vari ance,
2
, of al l
the 4th graders i n the Uni ted States.
We have al ready i ntroduced MLEs. Another desi rabl e cri teri on i n a stati sti cal
esti mator i s unbi asedness. A stati sti c i s unbi ased i f the expected val ue of the
stati sti c i s equal to the parameter bei ng esti mated. MLEs are not al ways
unbi ased. For any data sampl e, there may be more than one unbi ased
esti mator of the parameters of the parent di stri buti on of the sampl e. For
i nstance, every sampl e val ue i s an unbi ased esti mate of the parameter of a
normal di stri buti on. The Mi ni mum Vari ance Unbi ased Esti mator (MVUE) i s
the stati sti c that has the mi ni mum vari ance of al l unbi ased esti mators of a
parameter.
erf x ( ) 2 x 2 ( ) 1 =
y f x , ( )
1
2
---------------
e
x ( )
2
2
2
----------------------
= =
1-33
The MVUEs of parameters and
2
for the normal di stri buti on are the sampl e
average and vari ance. The sampl e average i s al so the MLE for . There are two
common textbook formul as for the vari ance.
They are
where
Equati on 1 i s the maxi mum l i kel i hood esti mator for
2
, and equati on 2 i s the
MVUE.
The functi on normfit returns the MVUEs and confi dence i nterval s for and
2
. Here i s a pl ayful exampl e model i ng the hei ghts (i nches) of a randoml y
chosen 4th grade cl ass.
height = normrnd(50,2,30,1); % Simulate heights.
[mu,s,muci,sci] = normfit(height)
mu =
50.2025
s =
1.7946
muci =
49.5210
50.8841
sci =
1.4292
2.4125
1) s
2 1
n
--- = x
i
x ( )
2
i 1 =
n
2) s
2 1
n 1
------------- x
i
x ( )
2
i 1 =
n
=
x
x
i
n
----
i 1 =
n
=
1 Tutori al
1-34
Exa mple a nd Plot of the N orma l Distribution. The pl ot shows the bel l curve of the
standard normal pdf, wi th = 0 and = 1.
Poisson Distribution
The fol l owi ng secti ons provi de an overvi ew of the Poi sson di stri buti on.
Ba ckground of the Poisson Distribution. The Poi sson di stri buti on i s appropri ate for
appl i cati ons that i nvol ve counti ng the number of ti mes a random event occurs
i n a gi ven amount of ti me, di stance, area, etc. Sampl e appl i cati ons that i nvol ve
Poi sson di stri buti ons i ncl ude the number of Gei ger counter cl i cks per second,
the number of peopl e wal ki ng i nto a store i n an hour, and the number of fl aws
per 1000 feet of vi deo tape.
The Poi sson di stri buti on i s a one parameter di screte di stri buti on that takes
nonnegati ve i nteger val ues. The parameter, , i s both the mean and the
vari ance of the di stri buti on. Thus, as the si ze of the numbers i n a parti cul ar
sampl e of Poi sson random numbers gets l arger, so does the vari abi l i ty of the
numbers.
As Poi sson (1837) showed, the Poi sson di stri buti on i s the l i mi ti ng case of a
bi nomi al di stri buti on where N approaches i nfi ni ty and p goes to zero whi l e
Np =.
The Poi sson and exponenti al di stri buti ons are rel ated. I f the number of counts
fol l ows the Poi sson di stri buti on, then the i nterval between i ndi vi dual counts
fol l ows the exponenti al di stri buti on.
-3 -2 -1 0 1 2 3
0
0.1
0.2
0.3
0.4
1-35
Definition of the Poisson Distribution. The Poi sson pdf i s
Pa ra meter Estima tion for the Poisson Distribution. The MLE and the MVUE of the
Poi sson parameter, , i s the sampl e mean. The sum of i ndependent Poi sson
random vari abl es i s al so Poi sson di stri buted wi th the parameter equal to the
sum of the i ndi vi dual parameters. The Stati sti cs Tool box makes use of thi s fact
to cal cul ate confi dence i nterval s on . As gets l arge the Poi sson di stri buti on
can be approxi mated by a normal di stri buti on wi th = and
2
= . The
Stati sti cs Tool box uses thi s approxi mati on for cal cul ati ng confi dence i nterval s
for val ues of greater than 100.
Exa mple a nd Plot of the Poisson Distribution. The pl ot shows the probabi l i ty for each
nonnegati ve i nteger when = 5.
x = 0:15;
y = poisspdf(x,5);
plot(x,y,'+')
Ra yleigh Distribution
The fol l owi ng secti ons provi de an overvi ew of the Rayl ei gh di stri buti on.
Ba ckground of the Ra yleigh Distribution. The Rayl ei gh di stri buti on i s a speci al case
of the Wei bul l di stri buti on. I f A and B are the parameters of the Wei bul l
di stri buti on, then the Rayl ei gh di stri buti on wi th parameter i s equi val ent to
the Wei bul l di stri buti on wi th parameters and .
y f x ( )

x
x!
-----e

I
0 1 , , ( )
x ( ) = =
0 5 10 15
0
0.05
0.1
0.15
0.2
b
A 1 2b
2
( ) = B 2 =
1 Tutori al
1-36
I f the component vel oci ti es of a parti cl e i n the x and y di recti ons are two
i ndependent normal random vari abl es wi th zero means and equal vari ances,
then the di stance the parti cl e travel s per uni t ti me i s di stri buted Rayl ei gh.
Definition of the Ra yleigh Distribution. The Rayl ei gh pdf i s
Pa ra meter Estima tion for the Ra yleigh Distribution. The raylfit functi on returns the
MLE of the Rayl ei gh parameter. Thi s esti mate i s
Exa mple a nd Plot of the Ra yleigh Distribution. The fol l owi ng commands generate a
pl ot of the Rayl ei gh pdf.
x = [0:0.01:2];
p = raylpdf(x,0.5);
plot(x,p)
y f x b ( )
x
b
2
------e
x
2
2b
2
---------
,
_
= =
b
1
2n
------- x
i
2
i 1 =
n
=
0 0.5 1 1.5 2
0
0.5
1
1.5
1-37
Students t Distribution
The fol l owi ng secti ons provi de an overvi ew of Students t di stri buti on.
Ba ckground of Students t Distribution. The t di stri buti on i s a fami l y of curves
dependi ng on a si ngl e parameter (the degrees of freedom). As goes to
i nfi ni ty, the t di stri buti on converges to the standard normal di stri buti on.
W. S. Gossett (1908) di scovered the di stri buti on through hi s work at the
Gui nness brewery. At that ti me, Gui nness di d not al l ow i ts staff to publ i sh, so
Gossett used the pseudonym Student.
I f x and s are the mean and standard devi ati on of an i ndependent random
sampl e of si ze n from a normal di stri buti on wi th mean and
2
= n, then
Definition of Students t Distribution. Students t pdf i s
Exa mple a nd Plot of Students t Distribution. The pl ot compares the t di stri buti on
wi th = 5 (sol i d l i ne) to the shorter tai l ed, standard normal di stri buti on
(dashed l i ne).
x = -5:0.1:5;
y = tpdf(x,5);
z = normpdf(x,0,1);
plot(x,y,'-',x,z,'-.')
t ( )
x
s
------------ =
n 1 =
y f x ( )
1 +
2
------------
,
_

2
---
,
_
----------------------
1
----------
1
1
x
2
----- +
,
_
1 +
2
------------
-------------------------------- = =
1 Tutori al
1-38
N oncentra l t Distribution
The fol l owi ng secti ons provi de an overvi ew of the noncentral t di stri buti on.
Ba ckground of the N oncentra l t Distribution. The noncentral t di stri buti on i s a
general i zati on of the fami l i ar Students t di stri buti on.
I f x and s are the mean and standard devi ati on of an i ndependent random
sampl e of si ze n from a normal di stri buti on wi th mean and
2
= n, then
Suppose that the mean of the normal di stri buti on i s not . Then the rati o has
the noncentral t di stri buti on. The noncentral i ty parameter i s the di fference
between the sampl e mean and .
The noncentral t di stri buti on al l ows us to determi ne the probabi l i ty that we
woul d detect a di fference between x and i n a t test. Thi s probabi l i ty i s the
power of the test. As x- i ncreases, the power of a test al so i ncreases.
Definition of the N oncentra l t Distribution. The most general representati on of the
noncentral t di stri buti on i s qui te compl i cated. Johnson and Kotz (1970) gi ve a
formul a for the probabi l i ty that a noncentral t vari ate fal l s i n the range [-t, t].
-5 0 5
0
0.1
0.2
0.3
0.4
t ( )
x
s
------------ =
n 1 =
Pr t ( ) x t < < , ( ) ( )
1
2
---
2
,
_
j
j!
-----------------e

2
2
-----
,

_
I
x
2
x
2
+
---------------
1
2
--- j +

2
--- ,
,

_
j 0 =
=
1-39
I (x| a,b) i s the i ncompl ete beta functi on wi th parameters a and b, i s the
noncentral i ty parameter, and i s the degrees of freedom.
Exa mple a nd Plot of the N oncentra l t Distribution. The fol l owi ng commands generate
a pl ot of the noncentral t pdf.
x = (-5:0.1:5)';
p1 = nctcdf(x,10,1);
p = tcdf(x,10);
plot(x,p,'--',x,p1,'-')
Unifor m (Continuous) Distribution
The fol l owi ng secti ons provi de an overvi ew of the uni form di stri buti on.
Ba ckground of the Uniform Distribution. The uni form di stri buti on (al so cal l ed
rectangul ar) has a constant pdf between i ts two parameters a (the mi ni mum)
and b (the maxi mum). The standard uni form di stri buti on (a = 0 and b = 1) i s a
speci al case of the beta di stri buti on, obtai ned by setti ng both of i ts parameters
to 1.
The uni form di stri buti on i s appropri ate for representi ng the di stri buti on of
round-off errors i n val ues tabul ated to a parti cul ar number of deci mal pl aces.
Definition of the Uniform Distribution. The uni form cdf i s
Pa ra meter Estima tion for the Uniform Distribution. The sampl e mi ni mum and
maxi mum are the MLEs of a and b respecti vel y.
-5 0 5
0
0.2
0.4
0.6
0.8
1
p F x a b , ( )
x a
b a
------------I
a b , [ ]
x ( ) = =
1 Tutori al
1-40
Exa mple a nd Plot of the Uniform Distribution. The exampl e i l l ustrates the i nversi on
method for generati ng normal random numbers usi ng rand and norminv. Note
that the MATLAB functi on, randn, does not use i nversi on si nce i t i s not
effi ci ent for thi s case.
u = rand(1000,1);
x = norminv(u,0,1);
hist(x)
Weibull Distribution
The fol l owi ng secti ons provi de an overvi ew of the Wei bul l di stri buti on.
Ba ckground of the Weibull Distribution. Wal oddi Wei bul l (1939) offered the
di stri buti on that bears hi s name as an appropri ate anal yti cal tool for model i ng
the breaki ng strength of materi al s. Current usage al so i ncl udes rel i abi l i ty and
l i feti me model i ng. The Wei bul l di stri buti on i s more fl exi bl e than the
exponenti al for these purposes.
To see why, consi der the hazard rate functi on (i nstantaneous fai l ure rate). I f
f(t) and F(t) are the pdf and cdf of a di stri buti on, then the hazard rate i s
Substi tuti ng the pdf and cdf of the exponenti al di stri buti on for f(t) and F(t)
above yi el ds a constant. The exampl e bel ow shows that the hazard rate for the
Wei bul l di stri buti on can vary.
-4 -2 0 2 4
0
100
200
300
h t ( )
f t ( )
1 F t ( )
-------------------- =
1-41
Definition of the Weibull Distribution. The Wei bul l pdf i s
Pa ra meter Estima tion for the Weibull Distribution. Suppose we want to model the
tensi l e strength of a thi n fi l ament usi ng the Wei bul l di stri buti on. The functi on
weibfit gi ves MLEs and confi dence i nterval s for the Wei bul l parameters.
strength = weibrnd(0.5,2,100,1); % Simulated strengths.
[p,ci] = weibfit(strength)
p =
0.4746 1.9582
ci =
0.3851 1.6598
0.5641 2.2565
The defaul t 95% confi dence i nterval for each parameter contai ns the true
val ue.
Exa mple a nd Plot of the Weibull Distribution. The exponenti al di stri buti on has a
constant hazard functi on, whi ch i s not general l y the case for the Wei bul l
di stri buti on.
The pl ot shows the hazard functi ons for exponenti al (dashed l i ne) and Wei bul l
(sol i d l i ne) di stri buti ons havi ng the same mean l i fe. The Wei bul l hazard rate
here i ncreases wi th age (a reasonabl e assumpti on).
t = 0:0.1:3;
h1 = exppdf(t,0.6267) ./ (1-expcdf(t,0.6267));
h2 = weibpdf(t,2,2) ./ (1-weibcdf(t,2,2));
plot(t,h1,'--',t,h2,'-')
y f x a b , ( ) abx
b 1
e
ax
b
I
0 , ( )
x ( ) = =
1 Tutori al
1-42
0 0.5 1 1.5 2 2.5 3
0
5
10
15
Descri pti ve Stati sti cs
1-43
Descriptive Statistics
Data sampl es can have thousands (even mi l l i ons) of val ues. Descri pti ve
stati sti cs are a way to summari ze thi s data i nto a few numbers that contai n
most of the rel evant i nformati on. The fol l owi ng secti ons expl ore the features
provi ded by the Stati sti cs Tool box for worki ng wi th descri pti ve stati sti cs:
Measures of Central Tendency (Locati on)
Measures of Di spersi on
Functi ons for Data wi th Mi ssi ng Val ues (NaNs)
Functi on for Grouped Data
Percenti l es and Graphi cal Descri pti ons
The Bootstrap
Measures of Central Tendency (Location)
The purpose of measures of central tendency i s to l ocate the data val ues on the
number l i ne. Another term for these stati sti cs i s measures of location.
The tabl e gi ves the functi on names and descri pti ons.
The average i s a si mpl e and popul ar esti mate of l ocati on. I f the data sampl e
comes from a normal di stri buti on, then the sampl e average i s al so opti mal
(MVUE of ).
Measures of Location
geomean Geometri c mean
harmmean Harmoni c mean
mean Ari thmeti c average (i n MATLAB)
median 50th percenti l e (i n MATLAB)
trimmean Tri mmed mean
1 Tutori al
1-44
Unfortunatel y, outl i ers, data entry errors, or gl i tches exi st i n al most al l real
data. The sampl e average i s sensi ti ve to these probl ems. One bad data val ue
can move the average away from the center of the rest of the data by an
arbi trari l y l arge di stance.
The medi an and tri mmed mean are two measures that are resi stant (robust) to
outl i ers. The medi an i s the 50th percenti l e of the sampl e, whi ch wi l l onl y
change sl i ghtl y i f you add a l arge perturbati on to any val ue. The i dea behi nd
the tri mmed mean i s to i gnore a smal l percentage of the hi ghest and l owest
val ues of a sampl e when determi ni ng the center of the sampl e.
The geometri c mean and harmoni c mean, l i ke the average, are not robust to
outl i ers. They are useful when the sampl e i s di stri buted l ognormal or heavi l y
skewed.
The exampl e bel ow shows the behavi or of the measures of l ocati on for a sampl e
wi th one outl i er.
x = [ones(1,6) 100]
x =
1 1 1 1 1 1 100
locate = [geomean(x) harmmean(x) mean(x) median(x)...
trimmean(x,25)]
locate =
1.9307 1.1647 15.1429 1.0000 1.0000
You can see that the mean i s far from any data val ue because of the i nfl uence
of the outl i er. The medi an and tri mmed mean i gnore the outl yi ng val ue and
descri be the l ocati on of the rest of the data val ues.
1-45
Measures of Dispersion
The purpose of measures of di spersi on i s to fi nd out how spread out the data
val ues are on the number l i ne. Another term for these stati sti cs i s measures of
spread.
The tabl e gi ves the functi on names and descri pti ons.
The range (the di fference between the maxi mum and mi ni mum val ues) i s the
si mpl est measure of spread. But i f there i s an outl i er i n the data, i t wi l l be the
mi ni mum or maxi mum val ue. Thus, the range i s not robust to outl i ers.
The standard devi ati on and the vari ance are popul ar measures of spread that
are opti mal for normal l y di stri buted sampl es. The sampl e vari ance i s the
MVUE of the normal parameter
2
. The standard devi ati on i s the square root
of the vari ance and has the desi rabl e property of bei ng i n the same uni ts as the
data. That i s, i f the data i s i n meters, the standard devi ati on i s i n meters as
wel l . The vari ance i s i n meters
2
, whi ch i s more di ffi cul t to i nterpret.
Nei ther the standard devi ati on nor the vari ance i s robust to outl i ers. A data
val ue that i s separate from the body of the data can i ncrease the val ue of the
stati sti cs by an arbi trari l y l arge amount.
The Mean Absol ute Devi ati on (MAD) i s al so sensi ti ve to outl i ers. But the MAD
does not move qui te as much as the standard devi ati on or vari ance i n response
to bad data.
The I nterquarti l e Range (I QR) i s the di fference between the 75th and 25th
percenti l e of the data. Si nce onl y the mi ddl e 50% of the data affects thi s
measure, i t i s robust to outl i ers.
Measures of Dispersion
iqr I nterquarti l e Range
mad Mean Absol ute Devi ati on
range Range
std Standard devi ati on (i n MATLAB)
var Vari ance (i n MATLAB)
1 Tutori al
1-46
The exampl e bel ow shows the behavi or of the measures of di spersi on for a
sampl e wi th one outl i er.
x = [ones(1,6) 100]
x =
1 1 1 1 1 1 100
stats = [iqr(x) mad(x) range(x) std(x)]
stats =
0 24.2449 99.0000 37.4185
Functions for Data with Missing Values (NaNs)
Most real -worl d data sets have one or more mi ssi ng el ements. I t i s conveni ent
to code mi ssi ng entri es i n a matri x as NaN (Not a Number).
Here i s a si mpl e exampl e.
m = magic(3);
m([1 5]) = [NaN NaN]
m =
NaN 1 6
3 NaN 7
4 9 2
Any ari thmeti c operati on that i nvol ves the mi ssi ng val ues i n thi s matri x yi el ds
NaN, as bel ow.
sum(m)
ans =
NaN NaN 15
Removi ng cel l s wi th NaN woul d destroy the matri x structure. Removi ng whol e
rows that contai n NaN woul d di scard real data. I nstead, the Stati sti cs Tool box
has a vari ety of functi ons that are si mi l ar to other MATLAB functi ons, but that
treat NaN val ues as mi ssi ng and therefore i gnore them i n the cal cul ati ons.
1-47
nansum(m)
ans =
7 10 13
I n addi ti on, other Stati sti cs Tool box functi ons operate onl y on the numeri c
val ues, i gnori ng NaNs. These i ncl ude iqr, kurtosis, mad, prctile, range,
skewness, and trimmean.
Function for Grouped Data
As we saw i n the previ ous secti on, the descri pti ve stati sti cs functi ons can
compute stati sti cs on each col umn i n a matri x. Someti mes, however, you may
have your data arranged di fferentl y so that measurements appear i n one
col umn or vari abl e, and a groupi ng code appears i n a second col umn or
vari abl e. Al though MATLABs syntax makes i t si mpl e to appl y functi ons to a
subset of an array, i n thi s case i t i s si mpl er to use the grpstats functi on.
The grpstats functi on can compute the mean, standard error of the mean, and
count (number of observati ons) for each group defi ned by one or more groupi ng
vari abl es. I f you suppl y a si gni fi cance l evel , i t al so creates a graph of the group
means wi th confi dence i nterval s.
As an exampl e, l oad the l arger car data set. We can l ook at the average val ue
of MPG (mi l es per gal l on) for cars grouped by org (l ocati on of the ori gi n of the
car).
NaN Functions
nanmax Maxi mum i gnori ng NaNs
nanmean Mean i gnori ng NaNs
nanmedian Medi an i gnori ng NaNs
nanmin Mi ni mum i gnori ng NaNs
nanstd Standard devi ati on i gnori ng NaNs
nansum Sum i gnori ng NaNs
1 Tutori al
1-48
load carbig
grpstats(MPG,org,0.05)
ans =
20.084
27.891
30.451
We can al so get the compl ete set of stati sti cs for MPG grouped by three vari abl es:
org, cyl4 (the engi ne has four cyl i nders or not), and when (when the car was
made).
[m,s,c,n] = grpstats(MPG,{org cyl4 when});
[n num2cell([m s c])]
ans =
'USA' 'Other' 'Early' [14.896] [0.33306] [77]
'USA' 'Other' 'Mid' [17.479] [0.30225] [75]
'USA' 'Other' 'Late' [21.536] [0.97961] [25]
'USA' 'Four' 'Early' [23.333] [0.87328] [12]
'USA' 'Four' 'Mid' [27.027] [0.75456] [22]
'USA' 'Four' 'Late' [29.734] [0.71126] [38]
'Europe' 'Other' 'Mid' [ 17.5] [ 0.9478] [ 4]
'Europe' 'Other' 'Late' [30.833] [ 3.1761] [ 3]
USA Europe Japan
18
20
22
24
26
28
30
32
Group
M
e
a
n
Means and Confidence Intervals for Each Group
1-49
'Europe' 'Four' 'Early' [24.714] [0.73076] [21]
'Europe' 'Four' 'Mid' [26.912] [ 1.0116] [26]
'Europe' 'Four' 'Late' [ 35.7] [ 1.4265] [16]
'Japan' 'Other' 'Early' [ 19] [0.57735] [ 3]
'Japan' 'Other' 'Mid' [20.833] [0.92796] [ 3]
'Japan' 'Other' 'Late' [ 26.5] [ 2.0972] [ 4]
'Japan' 'Four' 'Early' [26.083] [ 1.1772] [12]
'Japan' 'Four' 'Mid' [ 29.5] [0.86547] [25]
'Japan' 'Four' 'Late' [ 35.3] [0.68346] [32]
Percentiles and Graphical Descriptions
Tryi ng to descri be a data sampl e wi th two numbers, a measure of l ocati on and
a measure of spread, i s frugal but may be mi sl eadi ng.
Another opti on i s to compute a reasonabl e number of the sampl e percenti l es.
Thi s provi des i nformati on about the shape of the data as wel l as i ts l ocati on
and spread.
The exampl e shows the resul t of l ooki ng at every quarti l e of a sampl e
contai ni ng a mi xture of two di stri buti ons.
x = [normrnd(4,1,1,100) normrnd(6,0.5,1,200)];
p = 100*(0:0.25:1);
y = prctile(x,p);
z = [p;y]
z =
0 25.0000 50.0000 75.0000 100.0000
1.5172 4.6842 5.6706 6.1804 7.6035
Compare the fi rst two quanti l es to the rest.
The box pl ot i s a graph for descri pti ve stati sti cs. The graph bel ow i s a box pl ot
of the data above.
boxplot(x)
1 Tutori al
1-50
The l ong l ower tai l and pl us si gns show the l ack of symmetry i n the sampl e
val ues. For more i nformati on on box pl ots, see Stati sti cal Pl ots on page 1-128.
The hi stogram i s a compl ementary graph.
hist(x)
The Bootstrap
I n recent years the stati sti cal l i terature has exami ned the properti es of
resampl i ng as a means to acqui re i nformati on about the uncertai nty of
stati sti cal esti mators.
The bootstrap i s a procedure that i nvol ves choosi ng random sampl es with
replacement from a data set and anal yzi ng each sampl e the same way.
Sampl i ng with replacement means that every sampl e i s returned to the data set
after sampl i ng. So a parti cul ar data poi nt from the ori gi nal data set coul d
appear mul ti pl e ti mes i n a gi ven bootstrap sampl e. The number of el ements i n
each bootstrap sampl e equal s the number of el ements i n the ori gi nal data set.
1
2
3
4
5
6
7
V
a
l
u
e
s
Column Number
1 2 3 4 5 6 7 8
0
20
40
60
80
100
1-51
The range of sampl e esti mates we obtai n al l ows us to establ i sh the uncertai nty
of the quanti ty we are esti mati ng.
Here i s an exampl e taken from Efron and Ti bshi rani (1993) compari ng Law
School Admi ssi on Test (LSAT) scores and subsequent l aw school grade poi nt
average (GPA) for a sampl e of 15 l aw school s.
load lawdata
plot(lsat,gpa,'+')
lsline
The l east squares fi t l i ne i ndi cates that hi gher LSAT scores go wi th hi gher l aw
school GPAs. But how sure are we of thi s concl usi on? The pl ot gi ves us some
i ntui ti on but nothi ng quanti tati ve.
We can cal cul ate the correl ati on coeffi ci ent of the vari abl es usi ng the corrcoef
functi on.
rhohat = corrcoef(lsat,gpa)
rhohat =
1.0000 0.7764
0.7764 1.0000
Now we have a number, 0.7764, descri bi ng the posi ti ve connecti on between
LSAT and GPA, but though 0.7764 may seem l arge, we sti l l do not know i f i t i s
stati sti cal l y si gni fi cant.
Usi ng the bootstrp functi on we can resampl e the lsat and gpa vectors as
many ti mes as we l i ke and consi der the vari ati on i n the resul ti ng correl ati on
coeffi ci ents.
540 560 580 600 620 640 660 680
2.6
2.8
3
3.2
3.4
3.6
1 Tutori al
1-52
Here i s an exampl e.
rhos1000 = bootstrp(1000,'corrcoef',lsat,gpa);
Thi s command resampl es the lsat and gpa vectors 1000 ti mes and computes
the corrcoef functi on on each sampl e. Here i s a hi stogram of the resul t.
hist(rhos1000(:,2),30)
Nearl y al l the esti mates l i e on the i nterval [0.4 1.0].
Thi s i s strong quanti tati ve evi dence that LSAT and subsequent GPA are
posi ti vel y correl ated. Moreover, i t does not requi re us to make any strong
assumpti ons about the probabi l i ty di stri buti on of the correl ati on coeffi ci ent.
0.2 0.4 0.6 0.8 1
0
20
40
60
80
100
C luster Analysi s
1-53
Cluster Analysis
Cl uster anal ysi s, al so cal l ed segmentati on anal ysi s or taxonomy anal ysi s, i s a
way to parti ti on a set of objects i nto groups, or clusters, i n such a way that the
profi l es of objects i n the same cl uster are very si mi l ar and the profi l es of objects
i n di fferent cl usters are qui te di sti nct.
Cl uster anal ysi s can be performed on many di fferent types of data sets. For
exampl e, a data set mi ght contai n a number of observati ons of subjects i n a
study where each observati on contai ns a set of vari abl es.
Many di fferent fi el ds of study, such as engi neeri ng, zool ogy, medi ci ne,
l i ngui sti cs, anthropol ogy, psychol ogy, and marketi ng, have contri buted to the
devel opment of cl usteri ng techni ques and the appl i cati on of such techni ques.
For exampl e, cl uster anal ysi s can be used to fi nd two si mi l ar groups for the
experi ment and control groups i n a study. I n thi s way, i f stati sti cal di fferences
are found i n the groups, they can be attri buted to the experi ment and not to
any i ni ti al di fference between the groups.
The fol l owi ng secti ons expl ore the cl usteri ng features i n the Stati sti cs Tool box:
Termi nol ogy and Basi c Procedure
Fi ndi ng the Si mi l ari ti es Between Objects
Defi ni ng the Li nks Between Objects
Eval uati ng Cl uster Formati on
Creati ng Cl usters
Terminology and Basic Procedure
To perform cl uster anal ysi s on a data set usi ng the Stati sti cs Tool box functi ons,
fol l ow thi s procedure:
1 Find the similarity or dissimilarity between every pair of objects in the
data set. I n thi s step, you cal cul ate the distance between objects usi ng the
pdist functi on. The pdist functi on supports many di fferent ways to
compute thi s measurement. See Fi ndi ng the Si mi l ari ti es Between Objects
on page 1-54 for more i nformati on.
2 Group the objects into a binary, hierarchical cluster tree. I n thi s step,
you l i nk together pai rs of objects that are i n cl ose proxi mi ty usi ng the
1 Tutori al
1-54
linkage functi on. The linkage functi on uses the di stance i nformati on
generated i n step 1 to determi ne the proxi mi ty of objects to each other. As
objects are pai red i nto bi nary cl usters, the newl y formed cl usters are
grouped i nto l arger cl usters unti l a hi erarchi cal tree i s formed. See Defi ni ng
the Li nks Between Objects on page 1-56 for more i nformati on.
3 Determine where to divide the hierarchical tree into clusters. I n thi s
step, you di vi de the objects i n the hi erarchi cal tree i nto cl usters usi ng the
cluster functi on. The cluster functi on can create cl usters by detecti ng
natural groupi ngs i n the hi erarchi cal tree or by cutti ng off the hi erarchi cal
tree at an arbi trary poi nt. See Creati ng Cl usters on page 1-64 for more
i nformati on.
The fol l owi ng secti ons provi de more i nformati on about each of these steps.
Note The Stati sti cs Tool box i ncl udes a conveni ence functi on, clusterdata,
whi ch performs al l these steps for you. You do not need to execute the pdist,
linkage, or cluster functi ons separatel y. However, the clusterdata functi on
does not gi ve you access to the opti ons each of the i ndi vi dual routi nes offers.
For exampl e, i f you use the pdist functi on you can choose the di stance
cal cul ati on method, whereas i f you use the clusterdata functi on you cannot.
Finding the Similarities Between Objects
You use the pdist functi on to cal cul ate the di stance between every pai r of
objects i n a data set. For a data set made up of m objects, there are
pai rs i n the data set. The resul t of thi s computati on i s commonl y
known as a si mi l ari ty matri x (or di ssi mi l ari ty matri x).
There are many ways to cal cul ate thi s di stance i nformati on. By defaul t, the
pdist functi on cal cul ates the Eucl i dean di stance between objects; however,
you can speci fy one of several other opti ons. See pdist for more i nformati on.
m m 1 ( ) 2
C luster Analysi s
1-55
Note You can opti onal l y normal i ze the val ues i n the data set before
cal cul ati ng the di stance i nformati on. I n a real worl d data set, vari abl es can be
measured agai nst di fferent scal es. For exampl e, one vari abl e can measure
I ntel l i gence Quoti ent (I Q) test scores and another vari abl e can measure head
ci rcumference. These di screpanci es can di stort the proxi mi ty cal cul ati ons.
Usi ng the zscore functi on, you can convert al l the val ues i n the data set to use
the same proporti onal scal e. See zscore for more i nformati on.
For exampl e, consi der a data set, X, made up of fi ve objects where each object
i s a set of x,y coordi nates.
Object 1: 1, 2
Object 2: 2.5, 4.5
Object 3: 2, 2
Object 4: 4, 1.5
Object 5: 4, 2.5
You can defi ne thi s data set as a matri x
X = [1 2;2.5 4.5;2 2;4 1.5;4 2.5]
and pass i t to pdist. The pdist functi on cal cul ates the di stance between
object 1 and object 2, object 1 and object 3, and so on unti l the di stances
between al l the pai rs have been cal cul ated. The fol l owi ng fi gure pl ots these
objects i n a graph. The di stance between object 2 and object 3 i s shown to
i l l ustrate one i nterpretati on of di stance.
1
1
5
2 4 3
4
3
2
5
di stance
1
3
4
5
2
1 Tutori al
1-56
Retur ning Dista nce Infor ma tion
The pdist functi on returns thi s di stance i nformati on i n a vector, Y, where each
el ement contai ns the di stance between a pai r of objects.
Y = pdist(X)
Y =
Columns 1 through 7
2.9155 1.0000 3.0414 3.0414 2.5495 3.3541 2.5000
Columns 8 through 10
2.0616 2.0616 1.0000
To make i t easi er to see the rel ati onshi p between the di stance i nformati on
generated by pdist and the objects i n the ori gi nal data set, you can reformat
the di stance vector i nto a matri x usi ng the squareform functi on. I n thi s matri x,
el ement i,j corresponds to the di stance between object i and object j i n the
ori gi nal data set. I n the fol l owi ng exampl e, el ement 1,1 represents the di stance
between object 1 and i tsel f (whi ch i s zero). El ement 1,2 represents the di stance
between object 1 and object 2, and so on.
squareform(Y)
ans =
0 2.9155 1.0000 3.0414 3.0414
2.9155 0 2.5495 3.3541 2.5000
1.0000 2.5495 0 2.0616 2.0616
3.0414 3.3541 2.0616 0 1.0000
3.0414 2.5000 2.0616 1.0000 0
Defining the Links Between Objects
Once the proxi mi ty between objects i n the data set has been computed, you can
determi ne whi ch objects i n the data set shoul d be grouped together i nto
cl usters, usi ng the linkage functi on. The linkage functi on takes the di stance
i nformati on generated by pdist and l i nks pai rs of objects that are cl ose
together i nto bi nary cl usters (cl usters made up of two objects). The linkage
functi on then l i nks these newl y formed cl usters to other objects to create bi gger
cl usters unti l al l the objects i n the ori gi nal data set are l i nked together i n a
hi erarchi cal tree.
C luster Analysi s
1-57
For exampl e, gi ven the di stance vector Y generated by pdist from the sampl e
data set of x and y coordi nates, the linkage functi on generates a hi erarchi cal
cl uster tree, returni ng the l i nkage i nformati on i n a matri x, Z.
Z = linkage(Y)
Z =
1.0000 3.0000 1.0000
4.0000 5.0000 1.0000
6.0000 7.0000 2.0616
8.0000 2.0000 2.5000
I n thi s output, each row i denti fi es a l i nk. The fi rst two col umns i denti fy the
objects that have been l i nked, that i s, object 1, object 2, and so on. The thi rd
col umn contai ns the di stance between these objects. For the sampl e data set of
x and y coordi nates, the linkage functi on begi ns by groupi ng together objects 1
and 3, whi ch have the cl osest proxi mi ty (di stance val ue = 1.0000). The linkage
functi on conti nues by groupi ng objects 4 and 5, whi ch al so have a di stance
val ue of 1.0000.
The thi rd row i ndi cates that the linkage functi on grouped together objects 6
and 7. I f our ori gi nal sampl e data set contai ned onl y fi ve objects, what are
objects 6 and 7? Object 6 i s the newl y formed bi nary cl uster created by the
groupi ng of objects 1 and 3. When the linkage functi on groups two objects
together i nto a new cl uster, i t must assi gn the cl uster a uni que i ndex val ue,
starti ng wi th the val ue m+1, where m i s the number of objects i n the ori gi nal
data set. (Val ues 1 through m are al ready used by the ori gi nal data set.)
Object 7 i s the i ndex for the cl uster formed by objects 4 and 5.
As the fi nal cl uster, the linkage functi on grouped object 8, the newl y formed
cl uster made up of objects 6 and 7, wi th object 2 from the ori gi nal data set. The
fol l owi ng fi gure graphi cal l y i l l ustrates the way linkage groups the objects i nto
a hi erarchy of cl usters.
1
1
5
2 4 3
4
3
2
5
6
7
8
1
2
3
4
5
1 Tutori al
1-58
The hi erarchi cal , bi nary cl uster tree created by the linkage functi on i s most
easi l y understood when vi ewed graphi cal l y. The Stati sti cs Tool box i ncl udes the
dendrogram functi on that pl ots thi s hi erarchi cal tree i nformati on as a graph,
as i n the fol l owi ng exampl e.
dendrogram(Z)
I n the fi gure, the numbers al ong the hori zontal axi s represent the i ndi ces of the
objects i n the ori gi nal data set. The l i nks between objects are represented as
upsi de down U-shaped l i nes. The hei ght of the U i ndi cates the di stance
between the objects. For exampl e, the l i nk representi ng the cl uster contai ni ng
objects 1 and 3 has a hei ght of 1. For more i nformati on about creati ng a
dendrogram di agram, see the dendrogram functi on reference page.
0.5
1
2.5
3 5 4
2
1.5
1
2
C luster Analysi s
1-59
Evaluating Cluster Formation
After l i nki ng the objects i n a data set i nto a hi erarchi cal cl uster tree, you may
want to veri fy that the tree represents si gni fi cant si mi l ari ty groupi ngs. I n
addi ti on, you may want more i nformati on about the l i nks between the objects.
The Stati sti cs Tool box provi des functi ons to perform both these tasks, as
descri bed i n the fol l owi ng secti ons:
Veri fyi ng the Cl uster Tree
Getti ng More I nformati on About Cl uster Li nks
Verifying the Cluster Tree
One way to measure the val i di ty of the cl uster i nformati on generated by the
linkage functi on i s to compare i t wi th the ori gi nal proxi mi ty data generated by
the pdist functi on. I f the cl usteri ng i s val i d, the l i nki ng of objects i n the cl uster
tree shoul d have a strong correl ati on wi th the di stances between objects i n the
di stance vector. The cophenet functi on compares these two sets of val ues and
computes thei r correl ati on, returni ng a val ue cal l ed the cophenetic correlation
coefficient. The cl oser the val ue of the copheneti c correl ati on coeffi ci ent i s to 1,
the better the cl usteri ng sol uti on.
You can use the copheneti c correl ati on coeffi ci ent to compare the resul ts of
cl usteri ng the same data set usi ng di fferent di stance cal cul ati on methods or
cl usteri ng al gori thms.
For exampl e, you can use the cophenet functi on to eval uate the cl usters
created for the sampl e data set
c = cophenet(Z,Y)
c =
0.8573
where Z i s the matri x output by the linkage functi on and Y i s the di stance
vector output by the pdist functi on.
Execute pdist agai n on the same data set, thi s ti me speci fyi ng the Ci ty Bl ock
metri c. After runni ng the linkage functi on on thi s new pdist output, use the
cophenet functi on to eval uate the cl usteri ng usi ng a di fferent di stance metri c.
c = cophenet(Z,Y)
c =
0.9289
1 Tutori al
1-60
The copheneti c correl ati on coeffi ci ent shows a stronger correl ati on when the
Ci ty Bl ock metri c i s used.
Getting M ore Infor ma tion About Cluster Links
One way to determi ne the natural cl uster di vi si ons i n a data set i s to compare
the l ength of each l i nk i n a cl uster tree wi th the l engths of nei ghbori ng l i nks
bel ow i t i n the tree.
I f a l i nk i s approxi matel y the same l ength as nei ghbori ng l i nks, i t i ndi cates
that there are si mi l ari ti es between the objects joi ned at thi s l evel of the
hi erarchy. These l i nks are sai d to exhi bi t a hi gh l evel of consi stency.
I f the l ength of a l i nk di ffers from nei ghbori ng l i nks, i t i ndi cates that there are
di ssi mi l ari ti es between the objects at thi s l evel i n the cl uster tree. Thi s l i nk i s
sai d to be i nconsi stent wi th the l i nks around i t. I n cl uster anal ysi s,
i nconsi stent l i nks can i ndi cate the border of a natural di vi si on i n a data set.
The cluster functi on uses a measure of i nconsi stency to determi ne where to
di vi de a data set i nto cl usters. (See Creati ng Cl usters on page 1-64 for more
i nformati on.)
The next secti on provi des an exampl e.
Exa mple: Inconsistent Links. To i l l ustrate, the fol l owi ng exampl e creates a data set
of random numbers wi th three del i berate natural groupi ngs. I n the
dendrogram, note how the objects tend to col l ect i nto three groups. These three
groups are then connected by three l onger l i nks. These l onger l i nks are
i nconsi stent when compared wi th the l i nks bel ow them i n the hi erarchy.
rand('seed',3)
X = [rand(10,2)+1;rand(10,2)+2;rand(10,2)+3];
Y = pdist(X);
Z = linkage(Y);
dendrogram(Z);
C luster Analysi s
1-61
The rel ati ve consi stency of each l i nk i n a hi erarchi cal cl uster tree can be
quanti fi ed and expressed as the inconsistency coefficient. Thi s val ue compares
the l ength of a l i nk i n a cl uster hi erarchy wi th the average l ength of
nei ghbori ng l i nks. I f the object i s consi stent wi th those around i t, i t wi l l have a
l ow i nconsi stency coeffi ci ent. I f the object i s i nconsi stent wi th those around i t,
i t wi l l have a hi gher i nconsi stency coeffi ci ent.
To generate a l i sti ng of the i nconsi stency coeffi ci ent for each l i nk the cl uster
tree, use the inconsistent functi on. The inconsistent functi on compares
each l i nk i n the cl uster hi erarchy wi th adjacent l i nks two l evel s bel ow i t i n the
cl uster hi erarchy. Thi s i s cal l ed the depth of the compari son. Usi ng the
inconsistent functi on, you can speci fy other depths. The objects at the bottom
of the cl uster tree, cal l ed l eaf nodes, that have no further objects bel ow them,
have an i nconsi stency coeffi ci ent of zero.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
23 25 29 30 27 28 26 24 21 22 11 12 15 13 16 18 20 14 17 19 1 2 7 3 6 8 9 10 4 5
These l i nks show consi stency.
These l i nks show i nconsi stency when compared to l i nks bel ow them.
1 Tutori al
1-62
For exampl e, returni ng to the sampl e data set of x and y coordi nates, we can
use the inconsistent functi on to cal cul ate the i nconsi stency val ues for the
l i nks created by the linkage functi on, descri bed i n Defi ni ng the Li nks
Between Objects on page 1-56.
I = inconsistent(Z)
I =
1.0000 0 1.0000 0
1.0000 0 1.0000 0
1.3539 0.8668 3.0000 0.8165
2.2808 0.3100 2.0000 0.7071
The inconsistent functi on returns data about the l i nks i n an (m-1)-by-4
matri x where each col umn provi des data about the l i nks.
I n the sampl e output, the fi rst row represents the l i nk between objects 1 and 3.
(Thi s cl uster i s assi gned the i ndex 6 by the linkage functi on.) Because thi s a
l eaf node, the i nconsi stency coeffi ci ent i s zero. The second row represents the
l i nk between objects 4 and 5, al so a l eaf node. (Thi s cl uster i s assi gned the
i ndex 7 by the l i nkage functi on.)
The thi rd row eval uates the l i nk that connects these two l eaf nodes, objects 6
and 7. (Thi s cl uster i s cal l ed object 8 i n the linkage output). Col umn three
i ndi cates that three l i nks are consi dered i n the cal cul ati on: the l i nk i tsel f and
the two l i nks di rectl y bel ow i t i n the hi erarchy. Col umn one represents the
mean of the l engths of these l i nks. The inconsistent functi on uses the l ength
i nformati on output by the linkage functi on to cal cul ate the mean. Col umn two
represents the standard devi ati on between the l i nks. The l ast col umn contai ns
the i nconsi stency val ue for these l i nks, 0.8165.
Column Description
1 Mean of the l engths of al l the l i nks i ncl uded i n the cal cul ati on
2 Standard devi ati on of al l the l i nks i ncl uded i n the cal cul ati on
3 Number of l i nks i ncl uded i n the cal cul ati on
4 I nconsi stency coeffi ci ent
C luster Analysi s
1-63
The fol l owi ng fi gure i l l ustrates the l i nks and l engths i ncl uded i n thi s
cal cul ati on.
Row four i n the output matri x descri bes the l i nk between object 8 and object 2.
Col umn three i ndi cates that two l i nks are i ncl uded i n thi s cal cul ati on: the l i nk
i tsel f and the l i nk di rectl y bel ow i t i n the hi erarchy. The i nconsi stency
coeffi ci ent for thi s l i nk i s 0.7071.
0.5
1
2.5
3 5 4
2
1.5
1
2
Li nk 1
Lengths
Li nk 2
Li nk 3
1 Tutori al
1-64
The fol l owi ng fi gure i l l ustrates the l i nks and l engths i ncl uded i n thi s
cal cul ati on.
Creating Clusters
After you create the hi erarchi cal tree of bi nary cl usters, you can di vi de the
hi erarchy i nto l arger cl usters usi ng the cluster functi on. The cluster
functi on l ets you create cl usters i n two ways, as di scussed i n the fol l owi ng
secti ons:
Fi ndi ng the Natural Di vi si ons i n the Data Set
Speci fyi ng Arbi trary Cl usters
Finding the N a tura l Divisions in the Da ta Set
I n the hi erarchi cal cl uster tree, the data set may natural l y al i gn i tsel f i nto
cl usters. Thi s can be parti cul arl y evi dent i n a dendrogram di agram where
groups of objects are densel y packed i n certai n areas and not i n others. The
i nconsi stency coeffi ci ent of the l i nks i n the cl uster tree can i denti fy these poi nts
where the si mi l ari ti es between objects change. (See Eval uati ng Cl uster
Formati on on page 1-59 for more i nformati on about the i nconsi stency
coeffi ci ent.) You can use thi s val ue to determi ne where the cluster functi on
draws cl uster boundari es.
0.5
1
2.5
3 5 4
2
1.5
1
2
Li nk 1
Lengths
Li nk 2
C luster Analysi s
1-65
For exampl e, i f you use the cluster functi on to group the sampl e data set i nto
cl usters, speci fyi ng an i nconsi stency coeffi ci ent threshol d of 0.9 as the val ue of
the cutoff argument, the cluster functi on groups al l the objects i n the sampl e
data set i nto one cl uster. I n thi s case, none of the l i nks i n the cl uster hi erarchy
had an i nconsi stency coeffi ci ent greater than 0.9.
T = cluster(Z,0.9)
T =
1
1
1
1
1
The cluster functi on outputs a vector, T, that i s the same si ze as the ori gi nal
data set. Each el ement i n thi s vector contai ns the number of the cl uster i nto
whi ch the correspondi ng object from the ori gi nal data set was pl aced.
I f you l ower the i nconsi stency coeffi ci ent threshol d to 0.8, the cluster functi on
di vi des the sampl e data set i nto three separate cl usters.
T = cluster(Z,0.8)
T =
1
3
1
2
2
Thi s output i ndi cates that objects 1 and 3 were pl aced i n cl uster 1, objects 4
and 5 were pl aced i n cl uster 2, and object 2 was pl aced i n cl uster 3.
Specifying Arbitra r y Clusters
I nstead of l etti ng the cluster functi on create cl usters determi ned by the
natural di vi si ons i n the data set, you can speci fy the number of cl usters you
want created. I n thi s case, the val ue of the cutoff argument speci fi es the poi nt
i n the cl uster hi erarchy at whi ch to create the cl usters.
For exampl e, you can speci fy that you want the cluster functi on to di vi de the
sampl e data set i nto two cl usters. I n thi s case, the cluster functi on creates one
cl uster contai ni ng objects 1, 3, 4, and 5 and another cl uster contai ni ng object 2.
1 Tutori al
1-66
T = cluster(Z,2)
T =
1
2
1
1
1
To hel p you vi sual i ze how the cluster functi on determi nes how to create these
cl usters, the fol l owi ng fi gure shows the dendrogram of the hi erarchi cal cl uster
tree. When you speci fy a val ue of 2, the cluster functi on draws an i magi nary
hori zontal l i ne across the dendrogram that bi sects two verti cal l i nes. Al l the
objects bel ow the l i ne bel ong to one of these two cl usters.
I f you speci fy a cutoff val ue of 3, the cluster functi on cuts off the hi erarchy
at a l ower poi nt, bi secti ng three l i nes.
T = cluster(Z,3)
T =
1
3
1
2
2
0.5
1
2.5
3 5 4
2
1.5
1
2
cutoff = 2
C luster Analysi s
1-67
Thi s ti me, objects 1 and 3 are grouped i n a cl uster, objects 4 and 5 are grouped
i n a cl uster, and object 2 i s pl aced i nto a cl uster, as seen i n the fol l owi ng fi gure.
0.5
1
2.5
3 5 4
2
1.5
1
2
cutoff = 3
1 Tutori al
1-68
Linear Models
Li near model s represent the rel ati onshi p between a conti nuous response
vari abl e and one or more predi ctor vari abl es (ei ther conti nuous or categori cal )
i n the form
where:
y i s an n-by-1 vector of observati ons of the response vari abl e.
X i s the n-by-p desi gn matri x determi ned by the predi ctors.
i s a p-by-1 vector of parameters.
i s an n-by-1 vector of random di sturbances, i ndependent of each other and
usual l y havi ng a normal di stri buti on.
MATLAB uses thi s general form of the l i near model to sol ve a vari ety of speci fi c
regressi on and anal ysi s of vari ance (ANOVA) probl ems. For exampl e, for
pol ynomi al and mul ti pl e regressi on probl ems, the col umns of X are predi ctor
vari abl e val ues or powers of such val ues. For one-way, two-way, and
hi gher-way ANOVA model s, the col umns of X are dummy (or i ndi cator)
vari abl es that encode the predi ctor categori es. For anal ysi s of covari ance
(ANOCOVA) model s, X contai ns val ues of a conti nuous predi ctor and codes for
a categori cal predi ctor.
The fol l owi ng secti ons descri be a number of functi ons for fi tti ng vari ous types
of l i near model s:
One-Way Anal ysi s of Vari ance (ANOVA)
Two-Way Anal ysi s of Vari ance (ANOVA)
N-Way Anal ysi s of Vari ance
Mul ti pl e Li near Regressi on
Quadrati c Response Surface Model s
Stepwi se Regressi on
General i zed Li near Model s
Robust and Nonparametri c Methods
y X + =
Li near M odels
1-69
See the secti ons bel ow for a tour of some of the rel ated graphi cal tool s:
The pol ytool Demo on page 1-156
The aoctool Demo on page 1-161
The rsmdemo Demo on page 1-170
One-Way Analysis of Variance (ANOVA)
The purpose of one-way ANOVA i s to fi nd out whether data from several
groups have a common mean. That i s, to determi ne whether the groups are
actual l y di fferent i n the measured characteri sti c.
One-way ANOVA i s a si mpl e speci al case of the l i near model . The one-way
ANOVA form of the model i s
where:
y
ij
i s a matri x of observati ons i n whi ch each col umn represents a di fferent
group.

.j
i s a matri x whose col umns are the group means. (The dot j notati on
means that appl i es to al l rows of the jth col umn. That i s, the val ue
ij
i s
the same for al l i.)

ij
i s a matri x of random di sturbances.
The model posi ts that the col umns of y are a constant pl us a random
di sturbance. You want to know i f the constants are al l the same.
The fol l owi ng secti ons expl ore one-way ANOVA i n greater detai l :
Exampl e: One-Way ANOVA
Mul ti pl e Compari sons
Ex a mple: O ne- Wa y AN O VA
The data bel ow comes from a study by Hogg and Ledol ter (1987) of bacteri a
counts i n shi pments of mi l k. The col umns of the matri x hogg represent
di fferent shi pments. The rows are bacteri a counts from cartons of mi l k chosen
randoml y from each shi pment. Do some shi pments have hi gher counts than
others?
y
i j

.j

i j
+ =
1 Tutori al
1-70
load hogg
hogg
hogg =
24 14 11 7 19
15 7 9 7 24
21 12 7 4 19
27 17 13 7 15
33 14 12 12 10
23 16 18 18 20
[p,tbl,stats] = anova1(hogg);
p
p =
1.1971e-04
The standard ANOVA tabl e has col umns for the sums of squares, degrees of
freedom, mean squares (SS/df), F stati sti c, and p-val ue.
You can use the F stati sti c to do a hypothesi s test to fi nd out i f the bacteri a
counts are the same. anova1 returns the p-val ue from thi s hypothesi s test.
I n thi s case the p-val ue i s about 0.0001, a very smal l val ue. Thi s i s a strong
i ndi cati on that the bacteri a counts from the di fferent tankers are not the same.
An F stati sti c as extreme as the observed F woul d occur by chance onl y once i n
10,000 ti mes i f the counts were trul y equal .
The p-val ue returned by anova1 depends on assumpti ons about the random
di sturbances
ij
i n the model equati on. For the p-val ue to be correct, these
di sturbances need to be i ndependent, normal l y di stri buted, and have constant
vari ance. See Robust and Nonparametri c Methods on page 1-95 for a
nonparametri c functi on that does not requi re a normal assumpti on.
Li near M odels
1-71
You can get some graphi cal assurance that the means are di fferent by l ooki ng
at the box pl ots i n the second fi gure wi ndow di spl ayed by anova1.
M ultiple Compa risons
Someti mes you need to determi ne not just i f there are any di fferences among
the means, but speci fi cal l y whi ch pai rs of means are si gni fi cantl y di fferent. I t
i s tempti ng to perform a seri es of t tests, one for each pai r of means, but thi s
procedure has a pi tfal l .
I n a t test, we compute a t stati sti c and compare i t to a cri ti cal val ue. The
cri ti cal val ue i s chosen so that when the means are real l y the same (any
apparent di fference i s due to random chance), the probabi l i ty that the t
stati sti c wi l l exceed the cri ti cal val ue i s smal l , say 5%. When the means are
di fferent, the probabi l i ty that the stati sti c wi l l exceed the cri ti cal val ue i s
l arger.
I n thi s exampl e there are fi ve means, so there are 10 pai rs of means to compare.
I t stands to reason that i f al l the means are the same, and i f we have a 5%
chance of i ncorrectl y concl udi ng that there i s a di fference i n one pai r, then the
probabi l i ty of maki ng at l east one i ncorrect concl usi on among al l 10 pai rs i s
much l arger than 5%.
Fortunatel y, there are procedures known as multiple comparison procedures
that are desi gned to compensate for mul ti pl e tests.
1 2 3 4 5
5
10
15
20
25
30
V
a
l
u
e
s
Column Number
1 Tutori al
1-72
Exa mple: Multiple Compa risons. You can perform a mul ti pl e compari son test usi ng
the multcompare functi on and suppl yi ng i t wi th the stats output from anova1.
[c,m] = multcompare(stats)
c =
1.0000 2.0000 2.4953 10.5000 18.5047
1.0000 3.0000 4.1619 12.1667 20.1714
1.0000 4.0000 6.6619 14.6667 22.6714
1.0000 5.0000 -2.0047 6.0000 14.0047
2.0000 3.0000 -6.3381 1.6667 9.6714
2.0000 4.0000 -3.8381 4.1667 12.1714
2.0000 5.0000 -12.5047 -4.5000 3.5047
3.0000 4.0000 -5.5047 2.5000 10.5047
3.0000 5.0000 -14.1714 -6.1667 1.8381
4.0000 5.0000 -16.6714 -8.6667 -0.6619
m =
23.8333 1.9273
13.3333 1.9273
11.6667 1.9273
9.1667 1.9273
17.8333 1.9273
The fi rst output from multcompare has one row for each pai r of groups, wi th an
esti mate of the di fference i n group means and a confi dence i nterval for that
group. For exampl e, the second row has the val ues
1.0000 3.0000 4.1619 12.1667 20.1714
i ndi cati ng that the mean of group 1 mi nus the mean of group 3 i s esti mated to
be 12.1667, and a 95% confi dence i nterval for thi s di fference i s
[4.1619, 20.1714]. Thi s i nterval does not contai n 0, so we can concl ude that the
means of groups 1 and 3 are di fferent.
The second output contai ns the mean and i ts standard error for each group.
I t i s easi er to vi sual i ze the di fference between group means by l ooki ng at the
graph that multcompare produces.
Li near M odels
1-73
The graph shows that group 1 i s si gni fi cantl y di fferent from groups 2, 3, and 4.
By usi ng the mouse to sel ect group 4, you can determi ne that i t i s al so
si gni fi cantl y di fferent from group 5. Other pai rs are not si gni fi cantl y di fferent.
Two-Way Analysis of Variance (ANOVA)
The purpose of two-way ANOVA i s to fi nd out whether data from several
groups have a common mean. One-way ANOVA and two-way ANOVA di ffer i n
that the groups i n two-way ANOVA have two categori es of defi ni ng
characteri sti cs i nstead of one.
Suppose an automobi l e company has two factori es, and each factory makes the
same three model s of car. I t i s reasonabl e to ask i f the gas mi l eage i n the cars
vari es from factory to factory as wel l as from model to model . We use two
predi ctors, factory and model , to expl ai n di fferences i n mi l eage.
There coul d be an overal l di fference i n mi l eage due to a di fference i n the
producti on methods between factori es. There i s probabl y a di fference i n the
mi l eage of the di fferent model s (i rrespecti ve of the factory) due to di fferences
i n desi gn speci fi cati ons. These effects are cal l ed additive.
1 Tutori al
1-74
Fi nal l y, a factory mi ght make hi gh mi l eage cars i n one model (perhaps because
of a superi or producti on l i ne), but not be di fferent from the other factory for
other model s. Thi s effect i s cal l ed an interaction. I t i s i mpossi bl e to detect an
i nteracti on unl ess there are dupl i cate observati ons for some combi nati on of
factory and car model .
Two-way ANOVA i s a speci al case of the l i near model . The two-way ANOVA
form of the model i s
where, wi th respect to the automobi l e exampl e above:
y
ijk
i s a matri x of gas mi l eage observati ons (wi th row i ndex i, col umn i ndex j,
and repeti ti on i ndex k).
i s a constant matri x of the overal l mean gas mi l eage.

.j
i s a matri x whose col umns are the devi ati ons of each cars gas mi l eage
(from the mean gas mi l eage ) that are attri butabl e to the cars model. Al l
val ues i n a gi ven col umn of
.j
are i denti cal , and the val ues i n each row of
.j

sum to 0.

i.
i s a matri x whose rows are the devi ati ons of each cars gas mi l eage (from
the mean gas mi l eage ) that are attri butabl e to the cars factory. Al l val ues
i n a gi ven row of
i.
are i denti cal , and the val ues i n each col umn of
i.
sum
to 0.

ij
i s a matri x of i nteracti ons. The val ues i n each row of
ij
sum to 0, and the
val ues i n each col umn of
ij
sum to 0.

ijk
i s a matri x of random di sturbances.
The next secti on provi des an exampl e of a two-way anal ysi s.
Ex a mple: Tw o- Wa y AN O VA
The purpose of the exampl e i s to determi ne the effect of car model and factory
on the mi l eage rati ng of cars.
load mileage
mileage
y
i j k

.j

i .

i j

i j k
+ + + + =
Li near M odels
1-75
mileage =
33.3000 34.5000 37.4000
33.4000 34.8000 36.8000
32.9000 33.8000 37.6000
32.6000 33.4000 36.6000
32.5000 33.7000 37.0000
33.0000 33.9000 36.7000
cars = 3;
[p,tbl,stats] = anova2(mileage,cars);
p
p =
0.0000 0.0039 0.8411
There are three model s of cars (col umns) and two factori es (rows). The reason
there are si x rows i n mileage i nstead of two i s that each factory provi des three
cars of each model for the study. The data from the fi rst factory i s i n the fi rst
three rows, and the data from the second factory i s i n the l ast three rows.
The standard ANOVA tabl e has col umns for the sums of squares,
degrees-of-freedom, mean squares (SS/df), F stati sti cs, and p-val ues.
You can use the F stati sti cs to do hypotheses tests to fi nd out i f the mi l eage i s
the same across model s, factori es, and model -factory pai rs (after adjusti ng for
the addi ti ve effects). anova2 returns the p-val ue from these tests.
The p-val ue for the model effect i s zero to four deci mal pl aces. Thi s i s a strong
i ndi cati on that the mi l eage vari es from one model to another. An F stati sti c as
extreme as the observed F woul d occur by chance l ess than once i n 10,000 ti mes
i f the gas mi l eage were trul y equal from model to model . I f you used the
1 Tutori al
1-76
multcompare functi on to perform a mul ti pl e compari son test, you woul d fi nd
that each pai r of the three model s i s si gni fi cantl y di fferent.
The p-val ue for the factory effect i s 0.0039, whi ch i s al so hi ghl y si gni fi cant.
Thi s i ndi cates that one factory i s out-performi ng the other i n the gas mi l eage
of the cars i t produces. The observed p-val ue i ndi cates that an F stati sti c as
extreme as the observed F woul d occur by chance about four out of 1000 ti mes
i f the gas mi l eage were trul y equal from factory to factory.
There does not appear to be any i nteracti on between factori es and model s. The
p-val ue, 0.8411, means that the observed resul t i s qui te l i kel y (84 out 100
ti mes) gi ven that there i s no i nteracti on.
The p-val ues returned by anova2 depend on assumpti ons about the random
di sturbances
ijk
i n the model equati on. For the p-val ues to be correct these
di sturbances need to be i ndependent, normal l y di stri buted, and have constant
vari ance. See Robust and Nonparametri c Methods on page 1-95 for
nonparametri c methods that do not requi re a normal di stri buti on.
I n addi ti on, anova2 requi res that data be balanced, whi ch i n thi s case means
there must be the same number of cars for each combi nati on of model and
factory. The next secti on di scusses a functi on that supports unbal anced data
wi th any number of predi ctors.
N-Way Analysis of Variance
You can use N-way ANOVA to determi ne i f the means i n a set of data di ffer
when grouped by mul ti pl e factors. I f they do di ffer, you can determi ne whi ch
factors or combi nati ons of factors are associ ated wi th the di fference.
N-way ANOVA i s a general i zati on of two-way ANOVA. For three factors, the
model can be wri tten
I n thi s notati on parameters wi th two subscri pts, such as ()
ij.
, represent the
i nteracti on effect of two factors. The parameter ()
ijk
represents the
three-way i nteracti on. An ANOVA model can have the ful l set of parameters or
any subset, but conventi onal l y i t does not i ncl ude compl ex i nteracti on terms
unl ess i t al so i ncl udes al l si mpl er terms for those factors. For exampl e, one
woul d general l y not i ncl ude the three-way i nteracti on wi thout al so i ncl udi ng
al l two-way i nteracti ons.
y
i j kl

.j .

i ..

..k
( )
i j .
( )
i .k
( )
.j k
( )
i j k
+ + + +
i j kl
+ + + + =
Li near M odels
1-77
The anovan functi on performs N-way ANOVA. Unl i ke the anova1 and anova2
functi ons, anovan does not expect data i n a tabul ar form. I nstead, i t expects a
vector of response measurements and a separate vector (or text array)
contai ni ng the val ues correspondi ng to each factor. Thi s i nput data format i s
more conveni ent than matri ces when there are more than two factors or when
the number of measurements per factor combi nati on i s not constant.
The fol l owi ng exampl es expl ore anovan i n greater detai l :
Exampl e: N-Way ANOVA wi th Smal l Data Set
Exampl e: N-Way ANOVA wi th Large Data Set
Ex a mple: N - Wa y AN O VA w ith Sma ll Da ta Set
Consi der the fol l owi ng two-way exampl e usi ng anova2.
m = [23 15 20;27 17 63;43 3 55;41 9 90]
m =
23 15 20
27 17 63
43 3 55
41 9 90
anova2(m,2)
ans =
0.0197 0.2234 0.2663
The factor i nformati on i s i mpl i ed by the shape of the matri x m and the number
of measurements at each factor combi nati on (2). Al though anova2 does not
actual l y requi re arrays of factor val ues, for i l l ustrati ve purposes we coul d
create them as fol l ows.
cfactor = repmat(1:3,4,1)
cfactor =
1 2 3
1 2 3
1 2 3
1 2 3
1 Tutori al
1-78
rfactor = [ones(2,3); 2*ones(2,3)]
rfactor =
1 1 1
1 1 1
2 2 2
2 2 2
The cfactor matri x shows that each col umn of m represents a di fferent l evel of
the col umn factor. The rfactor matri x shows that the top two rows of m
represent one l evel of the row factor, and bottom two rows of m represent a
second l evel of the row factor. I n other words, each val ue m(i,j) represents an
observati on at col umn factor l evel cfactor(i,j) and row factor l evel
cfactor(i,j).
To sol ve the above probl em wi th anovan, we need to reshape the matri ces m,
cfactor, and rfactor to be vectors.
m = m(:);
cfactor = cfactor(:);
rfactor = rfactor(:);
[m cfactor rfactor]
ans =
23 1 1
27 1 1
43 1 2
41 1 2
15 2 1
17 2 1
3 2 2
9 2 2
20 3 1
63 3 1
55 3 2
90 3 2
Li near M odels
1-79
anovan(m,{cfactor rfactor},2)
ans =
0.0197
0.2234
0.2663
Ex a mple: N - Wa y AN O VA w ith La rge Da ta Set
I n the previ ous exampl e we used anova2 to study a smal l data set measuri ng
car mi l eage. Now we study a l arger set of car data wi th mi l eage and other
i nformati on on 406 cars made between 1970 and 1982. Fi rst we l oad the data
set and l ook at the vari abl e names.
load carbig
whos
Name Size Bytes Class
Acceleration 406x1 3248 double array
Cylinders 406x1 3248 double array
Displacement 406x1 3248 double array
Horsepower 406x1 3248 double array
MPG 406x1 3248 double array
Model 406x36 29232 char array
Model_Year 406x1 3248 double array
Origin 406x7 5684 char array
Weight 406x1 3248 double array
cyl4 406x5 4060 char array
org 406x7 5684 char array
when 406x5 4060 char array
We wi l l focus our attenti on on four vari abl es. MPG i s the number of mi l es per
gal l on for each of 406 cars (though some have mi ssi ng val ues coded as NaN). The
other three vari abl es are factors: cyl4 (four-cyl i nder car or not), org (car
ori gi nated i n Europe, Japan, or the USA), and when (car was bui l t earl y i n the
peri od, i n the mi ddl e of the peri od, or l ate i n the peri od).
1 Tutori al
1-80
Fi rst we fi t the ful l model , requesti ng up to three-way i nteracti ons and Type 3
sums-of-squares.
varnames = {'Origin';'4Cyl';'MfgDate'};
anovan(MPG,{org cyl4 when},3,3,varnames)
ans =
0.0000
NaN
0
0.7032
0.0001
0.2072
0.6990
Note that many terms are marked by a # symbol as not havi ng ful l rank, and
one of them has zero degrees of freedom and i s mi ssi ng a p-val ue. Thi s can
happen when there are mi ssi ng factor combi nati ons and the model has
hi gher-order terms. I n thi s case, the cross-tabul ati on bel ow shows that there
are no cars made i n Europe duri ng the earl y part of the peri od wi th other than
four cyl i nders, as i ndi cated by the 0 i n table(2,1,1).
[table,factorvals] = crosstab(org,when,cyl4)
table(:,:,1) =
82 75 25
0 4 3
3 3 4
Li near M odels
1-81
table(:,:,2) =
12 22 38
23 26 17
12 25 32
factorvals =
'USA' 'Early' 'Other'
'Europe' 'Mid' 'Four'
'Japan' 'Late' []
Consequentl y i t i s i mpossi bl e to esti mate the three-way i nteracti on effects, and
i ncl udi ng the three-way i nteracti on term i n the model makes the fi t si ngul ar.
Usi ng even the l i mi ted i nformati on avai l abl e i n the ANOVA tabl e, we can see
that the three-way i nteracti on has a p-val ue of 0.699, so i t i s not si gni fi cant. We
deci de to request onl y two-way i nteracti ons thi s ti me.
[p,tbl,stats,termvec] = anovan(MPG,{org cyl4 when},2,3,varnames);
termvec'
ans =
1 2 4 3 5 6
Now al l terms are esti mabl e. The p-val ues for i nteracti on term 4
(Origin*4Cyl) and i nteracti on term 6 (4Cyl*MfgDate) are much l arger than a
typi cal cutoff val ue of 0.05, i ndi cati ng these terms are not si gni fi cant. We coul d
choose to omi t these terms and pool thei r effects i nto the error term. The output
termvec vari abl e returns a vector of codes, each of whi ch i s a bi t pattern
representi ng a term. We can omi t terms from the model by del eti ng thei r
1 Tutori al
1-82
entri es from termvec and runni ng anovan agai n, thi s ti me suppl yi ng the
resul ti ng vector as the model argument.
termvec([4 6]) = []
termvec =
1
2
4
5
anovan(MPG,{org cyl4 when},termvec,3,varnames)
Now we have a more parsi moni ous model i ndi cati ng that the mi l eage of these
cars seems to be rel ated to al l three factors, and that the effect of the
manufacturi ng date depends on where the car was made.
Multiple Linear Regression
The purpose of mul ti pl e l i near regressi on i s to establ i sh a quanti tati ve
rel ati onshi p between a group of predi ctor vari abl es (the col umns of X) and a
response, y. Thi s rel ati onshi p i s useful for:
Understandi ng whi ch predi ctors have the greatest effect.
Knowi ng the di recti on of the effect (i .e., i ncreasi ng x i ncreases/decreases y).
Usi ng the model to predi ct future val ues of the response when onl y the
predi ctors are currentl y known.
The fol l owi ng secti ons expl ai n mul ti pl e l i near regressi on i n greater detai l :
Mathemati cal Foundati ons of Mul ti pl e Li near Regressi on
Exampl e: Mul ti pl e Li near Regressi on
Li near M odels
1-83
M a thema tica l Founda tions of M ultiple Linea r Regression
The l i near model takes i ts common form
where:
y i s an n-by-1 vector of observati ons.
X i s an n-by-p matri x of regressors.
i s a p-by-1 vector of parameters.
i s an n-by-1 vector of random di sturbances.
The sol uti on to the probl em i s a vector, b, whi ch esti mates the unknown vector
of parameters, . The l east squares sol uti on i s
Thi s equati on i s useful for devel opi ng l ater stati sti cal formul as, but has poor
numeri c properti es. regress uses QR decomposi ti on of X fol l owed by the
backsl ash operator to compute b. The QR decomposi ti on i s not necessary for
computi ng b, but the matri x R i s useful for computi ng confi dence i nterval s.
You can pl ug b back i nto the model formul a to get the predi cted y val ues at the
data poi nts.
Stati sti ci ans use a hat (ci rcumfl ex) over a l etter to denote an esti mate of a
parameter or a predi cti on from a model . The projecti on matri x H i s cal l ed the
hat matrix, because i t puts the hat on y.
The resi dual s are the di fference between the observed and predi cted y val ues.
The resi dual s are useful for detecti ng fai l ures i n the model assumpti ons, si nce
they correspond to the errors, , i n the model equati on. By assumpti on, these
errors each have i ndependent normal di stri buti ons wi th mean zero and a
constant vari ance.
y X + =
b
X
T
X ( )
1
X
T
y = =
y Xb Hy = =
H X X
T
X ( )
1
X
T
=
r y y = I H ( )y =
1 Tutori al
1-84
The resi dual s, however, are correl ated and have vari ances that depend on the
l ocati ons of the data poi nts. I t i s a common practi ce to scal e (Studenti ze) the
resi dual s so they al l have the same vari ance.
I n the equati on bel ow, the scal ed resi dual , t
i
, has a Students t di stri buti on
wi th (n-p-1) degrees of freedom
where
and:
t
i
i s the scal ed resi dual for the ith data poi nt.
r
i
i s the raw resi dual for the ith data poi nt.
n i s the sampl e si ze.
p i s the number of parameters i n the model .
h
i
i s the ith di agonal el ement of H.
The l eft-hand si de of the second equati on i s the esti mate of the vari ance of the
errors excl udi ng the ith data poi nt from the cal cul ati on.
A hypothesi s test for outl i ers i nvol ves compari ng t
i
wi th the cri ti cal val ues of
the t di stri buti on. I f t
i
i s l arge, thi s casts doubt on the assumpti on that thi s
resi dual has the same vari ance as the others.
A confi dence i nterval for the mean of each error i s
Confi dence i nterval s that do not i ncl ude zero are equi val ent to rejecti ng the
hypothesi s (at a si gni fi cance probabi l i ty of ) that the resi dual mean i s zero.
Such confi dence i nterval s are good evi dence that the observati on i s an outl i er
for the gi ven model .
t
i
r
i
i ( )
1 h
i
---------------------------- =

2
i ( )
r
2
n p 1
----------------------
r
i
2
n p 1 ( ) 1 h
i
( )
----------------------------------------------- =
c
i
r
i
t
1

2
--- ,
,
_
t
i ( )
1 h
i
=
Li near M odels
1-85
Ex a mple: M ultiple Linea r Regression
The exampl e comes from Chatterjee and Hadi (1986) i n a paper on regressi on
di agnosti cs. The data set (ori gi nal l y from Moore (1975)) has fi ve predi ctor
vari abl es and one response.
load moore
X = [ones(size(moore,1),1) moore(:,1:5)];
Matri x X has a col umn of ones, and then one col umn of val ues for each of the
fi ve predi ctor vari abl es. The col umn of ones i s necessary for esti mati ng the
y-i ntercept of the l i near model .
y = moore(:,6);
[b,bint,r,rint,stats] = regress(y,X);
The y-i ntercept i s b(1), whi ch corresponds to the col umn i ndex of the col umn
of ones.
stats
stats =
0.8107 11.9886 0.0001
The el ements of the vector stats are the regressi on R
2
stati sti c, the F stati sti c
(for the hypothesi s test that al l the regressi on coeffi ci ents are zero), and the
p-val ue associ ated wi th thi s F stati sti c.
R
2
i s 0.8107 i ndi cati ng the model accounts for over 80% of the vari abi l i ty i n the
observati ons. The F stati sti c of about 12 and i ts p-val ue of 0.0001 i ndi cate that
i t i s hi ghl y unl i kel y that al l of the regressi on coeffi ci ents are zero.
rcoplot(r,rint)
0 5 10 15 20
-0.5
0
0.5
R
e
s
i
d
u
a
l
s
Case Number
1 Tutori al
1-86
The pl ot shows the resi dual s pl otted i n case order (by row). The 95% confi dence
i nterval s about these resi dual s are pl otted as error bars. The fi rst observati on
i s an outl i er si nce i ts error bar does not cross the zero reference l i ne.
I n probl ems wi th just a si ngl e predi ctor, i t i s si mpl er to use the polytool
functi on (see The pol ytool Demo on page 1-156). Thi s functi on can form an
X matri x wi th predi ctor val ues, thei r squares, thei r cubes, and so on.
Quadratic Response Surface Models
Response Surface Methodol ogy (RSM) i s a tool for understandi ng the
quanti tati ve rel ati onshi p between mul ti pl e i nput vari abl es and one output
vari abl e.
Consi der one output, z, as a pol ynomi al functi on of two i nputs, x and y. The
functi on z =f(x,y) descri bes a two-di mensi onal surface i n the space (x,y,z). Of
course, you can have as many i nput vari abl es as you want and the resul ti ng
surface becomes a hypersurface. You can have mul ti pl e output vari abl es wi th
a separate hypersurface for each one.
For three i nputs (x
1
, x
2
, x
3
), the equati on of a quadrati c response surface i s
I t i s di ffi cul t to vi sual i ze a k-di mensi onal surface i n k+1 di mensi onal space
for k>2. The functi on rstool i s a graphi cal user i nterface (GUI ) desi gned to
make thi s vi sual i zati on more i ntui ti ve, as i s di scussed i n the next secti on.
Ex ploring Gra phs of M ultidimensiona l Polynomia ls
The functi on rstool i s useful for fi tti ng response surface model s. The purpose
of rstool i s l arger than just fi tti ng and predi cti on for pol ynomi al model s. Thi s
GUI provi des an envi ronment for expl orati on of the graph of a
mul ti di mensi onal pol ynomi al .
You can l earn about rstool by tryi ng the commands bel ow. The chemi stry
behi nd the data i n reaction.mat deal s wi th reacti on ki neti cs as a functi on of
y b
0
b
1
x
1
b
2
x
2
b
3
x
3

b
12
x
1
x
2
b
13
x
1
x
3
b
23
x
2
x
3

b
11
x
1
2
b
22
x
2
2
b
33
x
3
2
+ + + +
+ + + +
+ + +
= (l i near terms)
(i nteracti on terms)
(quadrati c terms)
Li near M odels
1-87
the parti al pressure of three chemi cal reactants: hydrogen, n-pentane, and
i sopentane.
load reaction
rstool(reactants,rate,'quadratic',0.01,xn,yn)
You wi l l see a vector of three pl ots. The dependent vari abl e of al l three pl ots
i s the reacti on rate. The fi rst pl ot has hydrogen as the i ndependent vari abl e.
The second and thi rd pl ots have n-pentane and i sopentane respecti vel y.
Each pl ot shows the fi tted rel ati onshi p of the reacti on rate to the i ndependent
vari abl e at a fi xed val ue of the other two i ndependent vari abl es. The fi xed
val ue of each i ndependent vari abl e i s i n an edi tabl e text box bel ow each axi s.
You can change the fi xed val ue of any i ndependent vari abl e by ei ther typi ng a
new val ue i n the box or by draggi ng any of the three verti cal l i nes to a new
posi ti on.
When you change the val ue of an i ndependent vari abl e, al l the pl ots update to
show the current pi cture at the new poi nt i n the space of the i ndependent
vari abl es.
Note that whi l e thi s exampl e onl y uses three i nputs (reactants) and one output
(rate), rstool can accommodate an arbi trary number of i nputs and outputs.
I nterpretabi l i ty may be l i mi ted by the si ze of the moni tor for l arge numbers of
i nputs or outputs.
The GUI al so has two pop-up menus. The Export menu faci l i tates savi ng
vari ous i mportant vari abl es i n the GUI to the base workspace. Bel ow the
Export menu there i s another menu that al l ows you to change the order of the
pol ynomi al model from wi thi n the GUI . I f you used the commands above, thi s
menu wi l l have the stri ng Full Quadratic. Other choi ces are:
Linear has the constant and fi rst order terms onl y.
Pure Quadratic i ncl udes constant, l i near and squared terms.
Interactions i ncl udes constant, l i near, and cross product terms.
The rstool GUI i s used by the rsmdemo functi on to vi sual i ze the resul ts of a
desi gned experi ment for studyi ng a chemi cal reacti on. See The rsmdemo
Demo on page 1-170.
1 Tutori al
1-88
Stepwise Regression
Stepwi se regressi on i s a techni que for choosi ng the vari abl es to i ncl ude i n a
mul ti pl e regressi on model . Forward stepwi se regressi on starts wi th no model
terms. At each step i t adds the most stati sti cal l y si gni fi cant term (the one wi th
the hi ghest F stati sti c or l owest p-val ue) unti l there are none l eft. Backward
stepwi se regressi on starts wi th al l the terms i n the model and removes the
l east si gni fi cant terms unti l al l the remai ni ng terms are stati sti cal l y
si gni fi cant. I t i s al so possi bl e to start wi th a subset of al l the terms and then
add si gni fi cant terms or remove i nsi gni fi cant terms.
An i mportant assumpti on behi nd the method i s that some i nput vari abl es i n a
mul ti pl e regressi on do not have an i mportant expl anatory effect on the
response. I f thi s assumpti on i s true, then i t i s a conveni ent si mpl i fi cati on to
keep onl y the stati sti cal l y si gni fi cant terms i n the model .
One common probl em i n mul ti pl e regressi on anal ysi s i s mul ti col l i neari ty of the
i nput vari abl es. The i nput vari abl es may be as correl ated wi th each other as
they are wi th the response. I f thi s i s the case, the presence of one i nput vari abl e
i n the model may mask the effect of another i nput. Stepwi se regressi on used as
a canned procedure i s a dangerous tool because the resul ti ng model may
i ncl ude di fferent vari abl es dependi ng on the choi ce of starti ng model and
i ncl usi on strategy.
The fol l owi ng exampl e expl ores an i nteracti ve tool for stepwi se regressi on.
Ex a mple: Stepw ise Regression
The Stati sti cs Tool box provi des an i nteracti ve graphi cal user i nterface (GUI ) to
make compari son of competi ng model s more understandabl e. You can expl ore
the GUI usi ng the Hal d (1960) data set. Here are the commands to get started.
load hald
stepwise(ingredients,heat)
The Hal d data come from a study of the heat of reacti on of vari ous cement
mi xtures. There are four components i n each mi xture, and the amount of heat
produced depends on the amount of each i ngredi ent i n the mi xture.
Li near M odels
1-89
The i nterface consi sts of three i nteracti vel y l i nked fi gure wi ndows. Two of
these are di scussed i n the fol l owi ng secti ons:
Stepwi se Regressi on Pl ot
Stepwi se Regressi on Di agnosti cs Tabl e
Al l three wi ndows have hot regi ons. When your mouse i s above one of these
regi ons, the poi nter changes from an arrow to a ci rcl e. Cl i cki ng on thi s poi nt
i ni ti ates some acti vi ty i n the i nterface.
Stepw ise Regression Plot
Thi s pl ot shows the regressi on coeffi ci ent and confi dence i nterval for every
term (i n or out of the model ). The green l i nes represent terms i n the model
whi l e red l i nes i ndi cate terms that are not currentl y i n the model .
Stati sti cal l y si gni fi cant terms are sol i d l i nes. Dotted l i nes show that the fi tted
coeffi ci ent i s not si gni fi cantl y di fferent from zero.
Cl i cki ng on a l i ne i n thi s pl ot toggl es i ts state. That i s, a term currentl y i n the
model (green l i ne) i s removed (turns red), and a term currentl y not i n the model
(red l i ne) i s added (turns green).
The coeffi ci ent for a term out of the model i s the coeffi ci ent resul ti ng from
addi ng that term to the current model .
Sca le Inputs. Pressi ng thi s button centers and normal i zes the col umns of the
i nput matri x to have a standard devi ati on of one.
Expor t. Thi s pop-up menu al l ows you to export vari abl es from the stepwi se
functi on to the base workspace.
Close. The Close button removes al l the fi gure wi ndows.
1 Tutori al
1-90
Stepw ise Regression Dia gnostics Ta ble
Thi s tabl e i s a quanti tati ve vi ew of the i nformati on i n the Stepwi se Regressi on
Pl ot. The tabl e shows the Hal d model wi th the second and thi rd terms removed.
Coefficients a nd Confidence Inter va ls. The tabl e at the top of the fi gure shows the
regressi on coeffi ci ent and confi dence i nterval for every term (i n or out of the
model .) The green rows i n the tabl e (on your moni tor) represent terms i n the
model whi l e red rows i ndi cate terms not currentl y i n the model .
Cl i cki ng on a row i n thi s tabl e toggl es the state of the correspondi ng term. That
i s, a term currentl y i n the model (green row) i s removed (turns red), and a term
currentl y not i n the model (red row) i s added to the model (turns green).
The coeffi ci ent for a term out of the model i s the coeffi ci ent resul ti ng from
addi ng that term to the current model .
Additiona l Dia gnostic Sta tistics. There are al so several di agnosti c stati sti cs at the
bottom of the tabl e:
RMSE the root mean squared error of the current model .
R-square the amount of response vari abi l i ty expl ai ned by the model .
F the overal l F stati sti c for the regressi on.
P the associ ated si gni fi cance probabi l i ty.
Close Button. Shuts down al l wi ndows.
Confidence Intervals
Column #
RMSE
Parameter
R-square
Lower
F
Upper
P
1 1.44 1.02 1.86
2.734 0.9725 176.6 1.581e-08
2 0.4161 -0.1602 0.9924
2.734 0.9725 176.6 1.581e-08
3 -0.41 -1.029 0.2086
2.734 0.9725 176.6 1.581e-08
4 -0.614 -0.7615 -0.4664
2.734 0.9725 176.6 1.581e-08
Li near M odels
1-91
Help Button. Acti vates onl i ne hel p.
Stepw ise Histor y. Thi s pl ot shows the RMSE and a confi dence i nterval for every
model generated i n the course of the i nteracti ve use of the other wi ndows.
Recrea ting a Previous Model. Cl i cki ng on one of these l i nes recreates the current
model at that poi nt i n the anal ysi s usi ng a new set of wi ndows. You can thus
compare the two candi date model s di rectl y.
Generalized Linear Models
So far, the functi ons i n thi s secti on have deal t wi th model s that have a l i near
rel ati onshi p between the response and one or more predi ctors. Someti mes you
may have a nonl i near rel ati onshi p i nstead. To fi t nonl i near model s you can use
the functi ons descri bed i n Nonl i near Regressi on Model s on page 1-100.
There are some nonl i near model s, known as general i zed l i near model s, that
you can fi t usi ng si mpl er l i near methods. To understand general i zed l i near
model s, fi rst l ets revi ew the l i near model s we have seen so far. Each of these
model s has the fol l owi ng three characteri sti cs:
The response has a normal di stri buti on wi th mean .
A coeffi ci ent vector b defi nes a l i near combi nati on X*b of the predi ctors X.
The model equates the two as = X*b.
I n general i zed l i near model s, these characteri sti cs are general i zed as fol l ows:
The response has a di stri buti on that may be normal , bi nomi al , Poi sson,
gamma, or i nverse Gaussi an, wi th parameters i ncl udi ng a mean .
A coeffi ci ent vector b defi nes a l i near combi nati on X*b of the predi ctors X.
A l i nk functi on f() defi nes the l i nk between the two as f() = X*b.
The fol l owi ng exampl e expl ores thi s i n greater detai l .
Ex a mple: Genera lized Linea r M odels
For exampl e, consi der the fol l owi ng data deri ved from the carbig data set. We
have cars of vari ous wei ghts, and we record the total number of cars of each
wei ght and the number qual i fyi ng as poor-mi l eage cars because thei r mi l es per
gal l on val ue i s bel ow some target. (Suppose we dont know the mi l es per gal l on
for each car, onl y the number passi ng the test.) I t mi ght be reasonabl e to
1 Tutori al
1-92
assume that the val ue of the vari abl e poor fol l ows a bi nomi al di stri buti on wi th
parameter N=total and wi th a p parameter that depends on the car wei ght. A
pl ot shows that the proporti on of poor-mi l eage cars fol l ows a nonl i near
S-shape.
w = [2100 2300 2500 2700 2900 3100 3300 3500 3700 3900 4100 4300]';
poor = [1 2 0 3 8 8 14 17 19 15 17 21]';
total = [48 42 31 34 31 21 23 23 21 16 17 21]';
[w poor total]
ans =
2100 1 48
2300 2 42
2500 0 31
2700 3 34
2900 8 31
3100 8 21
3300 14 23
3500 17 23
3700 19 21
3900 15 16
4100 17 17
4300 21 21
plot(w,poor./total,'x')
2000 2500 3000 3500 4000 4500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Li near M odels
1-93
Thi s shape i s typi cal of graphs of proporti ons, as they have natural boundari es
at 0.0 and 1.0.
A l i near regressi on model woul d not produce a sati sfactory fi t to thi s graph. Not
onl y woul d the fi tted l i ne not fol l ow the data poi nts, i t woul d produce i nval i d
proporti ons l ess than 0 for l i ght cars, and hi gher than 1 for heavy cars.
There i s a cl ass of regressi on model s for deal i ng wi th proporti on data. The
l ogi sti c model i s one such model . I t defi nes the rel ati onshi p between proporti on
p and wei ght w to be
I s thi s a good model for our data? I t woul d be hel pful to graph the data on thi s
scal e, to see i f the rel ati onshi p appears l i near. However, some of our
proporti ons are 0 and 1, so we cannot expl i ci tl y eval uate the l eft-hand-si de of
the equati on. A useful tri ck i s to compute adjusted proporti ons by addi ng smal l
i ncrements to the poor and total val ues say a hal f observati on to poor and
a ful l observati on to total. Thi s keeps the proporti ons wi thi n range. A graph
now shows a more nearl y l i near rel ati onshi p.
padj = (poor+.5) ./ (total+1);
plot(w,log(padj./(1-padj)),'x')
We can use the glmfit functi on to fi t thi s l ogi sti c model .
p
1 p
------------
,
_
l og b
1
b
2
w + =
2000 2500 3000 3500 4000 4500
5
4
3
2
1
0
1
2
3
4
1 Tutori al
1-94
b = glmfit(w,[poor total],'binomial')
b =
-13.3801
0.0042
To use these coeffi ci ents to compute a fi tted proporti on, we have to i nvert the
l ogi sti c rel ati onshi p. Some si mpl e al gebra shows that the l ogi sti c equati on can
al so be wri tten as
Fortunatel y, the functi on glmval can decode thi s l i nk functi on to compute the
fi tted val ues. Usi ng thi s functi on we can graph fi tted proporti ons for a range of
car wei ghts, and superi mpose thi s curve on the ori gi nal scatter pl ot.
x = 2100:100:4500;
y = glmval(b,x,logit);
plot(w,poor./total,'x',x,y,'r-')
General i zed l i near model s can fi t a vari ety of di stri buti ons wi th a vari ety of
rel ati onshi ps between the di stri buti on parameters and the predi ctors. A ful l
descri pti on i s beyond the scope of thi s document. For more i nformati on see
Dobson (1990), or McCul l agh and Nel der (1990). Al so see the reference
materi al for glmfit.
p
1
1 b
1
b
2
w ( ) exp +
--------------------------------------------------- =
2000 2500 3000 3500 4000 4500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Li near M odels
1-95
Robust and Nonparametric Methods
As menti oned i n the previ ous secti ons, regressi on and anal ysi s of vari ance
procedures depend on certai n assumpti ons, such as a normal di stri buti on for
the error term. Someti mes such an assumpti on i s not warranted. For exampl e,
i f the di stri buti on of the errors i s asymmetri c or prone to extreme outl i ers, that
i s a vi ol ati on of the assumpti on of normal errors.
The Stati sti cs Tool box has a robust regressi on functi on that i s useful when
there may be outl i ers. Robust methods are desi gned to be rel ati vel y i nsensi ti ve
to l arge changes i n a smal l part of the data.
The Stati sti cs Tool box al so has nonparametri c versi ons of the one-way and
two-way anal ysi s of vari ance functi ons. Unl i ke cl assi cal tests, nonparametri c
tests make onl y mi l d assumpti ons about the data, and are appropri ate when
the di stri buti on of the data i s not normal . On the other hand, they are l ess
powerful than cl assi cal methods for normal l y di stri buted data.
The fol l owi ng secti ons descri be the robust regressi on and nonparametri c
functi ons i n greater detai l :
Robust Regressi on
Kruskal -Wal l i s Test
Fri edmans Test
Both of the nonparametri c functi ons descri bed here can return a stats
structure that you can use as i nput to the multcompare functi on to perform
mul ti pl e compari sons.
Robust Regression
I n Exampl e: Mul ti pl e Li near Regressi on on page 1-85 we found an outl i er
when we used ordi nary l east squares regressi on to model a response as a
functi on of fi ve predi ctors. How di d that outl i er affect the resul ts?
Lets esti mate the coeffi ci ents usi ng the robustfit functi on.
load moore
x = moore(:,1:5);
y = moore(:,6);
[br,statsr] = robustfit(x,y);
br
1 Tutori al
1-96
br =
-1.7742
0.0000
0.0009
0.0002
0.0062
0.0001
Compare these esti mates to those we obtai ned from the regress functi on.
b
b =
-2.1561
-0.0000
0.0013
0.0001
0.0079
0.0001
To understand why the two di ffer, i t i s hel pful to l ook at the wei ght vari abl e
from the robust fi t. I t measures how much wei ght was gi ven to each poi nt
duri ng the fi t. I n thi s case, the fi rst poi nt had a very l ow wei ght so i t was
effecti vel y i gnored.
statsr.w'
ans =
Columns 1 through 7
0.0577 0.9977 0.9776 0.9455 0.9687 0.8734 0.9177
0.9990 0.9653 0.9679 0.9768 0.9882 0.9998 0.9979
0.8185 0.9757 0.9875 0.9991 0.9021 0.6953
For another exampl e i l l ustrati ng robust fi tti ng, see The robustdemo Demo on
page 1-172.
Li near M odels
1-97
Kr uska l- Wa llis Test
I n One-Way Anal ysi s of Vari ance (ANOVA) on page 1-69 we used one-way
anal ysi s of vari ance to determi ne i f the bacteri a counts of mi l k vari ed from
shi pment to shi pment. Our one-way anal ysi s rested on the assumpti on that the
measurements were i ndependent, and that each had a normal di stri buti on
wi th a common vari ance and wi th a mean that was constant i n each col umn.
We concl uded that the col umn means were not al l the same. Lets repeat that
anal ysi s usi ng a nonparametri c procedure.
The Kruskal -Wal l i s test i s a nonparametri c versi on of one-way anal ysi s of
vari ance. The assumpti on behi nd thi s test i s that the measurements come from
a conti nuous di stri buti on, but not necessari l y a normal di stri buti on. The test
i s based on an anal ysi s of vari ance usi ng the ranks of the data val ues, not the
data val ues themsel ves. Output i ncl udes a tabl e si mi l ar to an anova tabl e, and
a box pl ot.
We can run thi s test as fol l ows.
p = kruskalwallis(hogg)
p =
0.0020
The l ow p-val ue means the Kruskal -Wal l i s test resul ts agree wi th the one-way
anal ysi s of vari ance resul ts.
Friedma ns Test
I n Two-Way Anal ysi s of Vari ance (ANOVA) on page 1-73 we used two-way
anal ysi s of vari ance to study the effect of car model and factory on car mi l eage.
We tested whether ei ther of these factors had a si gni fi cant effect on mi l eage,
and whether there was an i nteracti on between these factors. We concl uded
that there was no i nteracti on, but that each i ndi vi dual factor had a si gni fi cant
effect. Now we wi l l see i f a nonparametri c anal ysi s wi l l l ead to the same
concl usi on.
Fri edmans test i s a nonparametri c test for data havi ng a two-way l ayout (data
grouped by two categori cal factors). Unl i ke two-way anal ysi s of vari ance,
Fri edmans test does not treat the two factors symmetri cal l y and i t does not
test for an i nteracti on between them. I nstead, i t i s a test for whether the
col umns are di fferent after adjusti ng for possi bl e row di fferences. The test i s
based on an anal ysi s of vari ance usi ng the ranks of the data across categori es
of the row factor. Output i ncl udes a tabl e si mi l ar to an anova tabl e.
1 Tutori al
1-98
We can run Fri edmans test as fol l ows.
p = friedman(mileage, 3)
ans =
7.4659e-004
Recal l the cl assi cal anal ysi s of vari ance gave a p-val ue to test col umn effects,
row effects, and i nteracti on effects. Thi s p-val ue i s for col umn effects. Usi ng
ei ther thi s p-val ue or the p-val ue from ANOVA (p < 0.0001), we concl ude that
there are si gni fi cant col umn effects.
I n order to test for row effects, we need to rearrange the data to swap the rol es
of the rows i n col umns. For a data matri x x wi th no repl i cati ons, we coul d
si mpl y transpose the data and type
p = friedman(x)
Wi th repl i cated data i t i s sl i ghtl y more compl i cated. A si mpl e way i s to
transform the matri x i nto a three-di mensi onal array wi th the fi rst di mensi on
representi ng the repl i cates, swappi ng the other two di mensi ons, and restori ng
the two-di mensi onal shape.
x = reshape(mileage, [3 2 3]);
x = permute(x, [1 3 2]);
x = reshape(x, [9 2])
x =
33.3000 32.6000
33.4000 32.5000
32.9000 33.0000
34.5000 33.4000
34.8000 33.7000
33.8000 33.9000
37.4000 36.6000
36.8000 37.0000
37.6000 36.7000
friedman(x, 3)
ans =
0.0082
Li near M odels
1-99
Agai n, the concl usi on i s si mi l ar to the concl usi on from the cl assi cal anal ysi s of
vari ance. Both thi s p-val ue and the one from ANOVA (p = 0.0039) l ead us to
concl ude there are si gni fi cant row effects.
You cannot use Fri edmans test to test for i nteracti ons between the row and
col umn factors.
1 Tutori al
1-100
Nonlinear Regression Models
Response Surface Methodol ogy (RSM) i s an empi ri cal model i ng approach usi ng
pol ynomi al s as l ocal approxi mati ons to the true i nput/output rel ati onshi p. Thi s
empi ri cal approach i s often adequate for process i mprovement i n an i ndustri al
setti ng.
I n sci enti fi c appl i cati ons there i s usual l y rel evant theory that al l ows us to
make a mechani sti c model . Often such model s are nonl i near i n the unknown
parameters. Nonl i near model s are more di ffi cul t to fi t, requi ri ng i terati ve
methods that start wi th an i ni ti al guess of the unknown parameters. Each
i terati on al ters the current guess unti l the al gori thm converges.
The Stati sti cs Tool box has functi ons for fi tti ng nonl i near model s of the form
where:
y i s an-n by-1 vector of observati ons.
f i s any functi on of X and .
X i s an n-by-p matri x of i nput vari abl es.
i s a p-by-1 vector of unknown parameters to be esti mated.
i s an n-by-1 vector of random di sturbances.
Thi s i s expl ored further i n the fol l owi ng exampl e.
Example: Nonlinear Modeling
The Hougen-Watson model (Bates and Watts 1988) for reacti on ki neti cs i s one
speci fi c exampl e of thi s type. The form of the model i s
where
1
,
2
, ...,
5
are the unknown parameters, and x
1
, x
2
, and x
3
are the
three i nput vari abl es. The three i nputs are hydrogen, n-pentane, and
i sopentane. I t i s easy to see that the parameters do not enter the model
l i nearl y.
y f X , ( ) + =
rate

1
x
2
x
3

5

1
2
x
1

3
x
2

4
x
3
+ + +
------------------------------------------------------------------------ =
N onli near Regressi on M odels
1-101
The fi l e reaction.mat contai ns si mul ated data from thi s reacti on.
load reaction
who
Your variables are:
beta rate xn
model reactants yn
The vari abl es are as fol l ows:
rate i s a 13-by-1 vector of observed reacti on rates.
reactants i s a 13-by-3 matri x of reactants.
beta i s 5-by-1 vector of i ni ti al parameter esti mates.
model i s a stri ng contai ni ng the nonl i near functi on name.
xn i s a stri ng matri x of the names of the reactants.
yn i s a stri ng contai ni ng the name of the response.
The data and model are expl ored further i n the fol l owi ng secti ons:
Fi tti ng the Hougen-Watson Model
Confi dence I nterval s on the Parameter Esti mates
Confi dence I nterval s on the Predi cted Responses
An I nteracti ve GUI for Nonl i near Fi tti ng and Predi cti on
Fitting the Hougen- Wa tson M odel
The Stati sti cs Tool box provi des the functi on nlinfit for fi ndi ng parameter
esti mates i n nonl i near model i ng. nlinfit returns the l east squares parameter
esti mates. That i s, i t fi nds the parameters that mi ni mi ze the sum of the
squared di fferences between the observed responses and thei r fi tted val ues. I t
uses the Gauss-Newton al gori thm wi th Levenberg-Marquardt modi fi cati ons
for gl obal convergence.
nlinfit requi res the i nput data, the responses, and an i ni ti al guess of the
unknown parameters. You must al so suppl y the name of a functi on that takes
the i nput data and the current parameter esti mate and returns the predi cted
responses. I n MATLAB termi nol ogy, nlinfit i s cal l ed a functi on functi on.
1 Tutori al
1-102
Here i s the hougen functi on.
function yhat = hougen(beta,x)
%HOUGEN Hougen-Watson model for reaction kinetics.
% YHAT = HOUGEN(BETA,X) gives the predicted values of the
% reaction rate, YHAT, as a function of the vector of
% parameters, BETA, and the matrix of data, X.
% BETA must have five elements and X must have three
% columns.
%
% The model form is:
% y = (b1*x2 - x3/b5)./(1+b2*x1+b3*x2+b4*x3)
b1 = beta(1);
b2 = beta(2);
b3 = beta(3);
b4 = beta(4);
b5 = beta(5);
x1 = x(:,1);
x2 = x(:,2);
x3 = x(:,3);
yhat = (b1*x2 - x3/b5)./(1+b2*x1+b3*x2+b4*x3);
To fi t the reaction data, cal l the functi on nlinfit.
load reaction
betahat = nlinfit(reactants,rate,'hougen',beta)
betahat =
1.2526
0.0628
0.0400
0.1124
1.1914
nlinfit has two opti onal outputs. They are the resi dual s and Jacobi an matri x
at the sol uti on. The resi dual s are the di fferences between the observed and
fi tted responses. The Jacobi an matri x i s the di rect anal og of the matri x X i n the
standard l i near regressi on model .
N onli near Regressi on M odels
1-103
These outputs are useful for obtai ni ng confi dence i nterval s on the parameter
esti mates and predi cted responses.
Confidence Inter va ls on the Pa ra meter Estima tes
Usi ng nlparci, form 95% confi dence i nterval s on the parameter esti mates,
betahat, from the reacti on ki neti cs exampl e.
[betahat,resid,J] = nlinfit(reactants,rate,'hougen',beta);
betaci = nlparci(betahat,resid,J)
betaci =
-0.7467 3.2519
-0.0377 0.1632
-0.0312 0.1113
-0.0609 0.2857
-0.7381 3.1208
Confidence Inter va ls on the Predicted Responses
Usi ng nlpredci, form 95% confi dence i nterval s on the predi cted responses
from the reacti on ki neti cs exampl e.
[yhat,delta] = nlpredci('hougen',reactants,betahat,resid,J);
opd = [rate yhat delta]
opd =
8.5500 8.2937 0.9178
3.7900 3.8584 0.7244
4.8200 4.7950 0.8267
0.0200 -0.0725 0.4775
2.7500 2.5687 0.4987
14.3900 14.2227 0.9666
2.5400 2.4393 0.9247
4.3500 3.9360 0.7327
13.0000 12.9440 0.7210
8.5000 8.2670 0.9459
0.0500 -0.1437 0.9537
11.3200 11.3484 0.9228
3.1300 3.3145 0.8418
1 Tutori al
1-104
Matri x opd has the observed rates i n col umn 1 and the predi cti ons i n col umn 2.
The 95% confi dence i nterval i s col umn 2tcol umn 3. These are si mul taneous
confi dence i nterval s for the esti mated functi on at each i nput val ue. They are
not i nterval s for new response observati ons at those i nputs, even though most
of the confi dence i nterval s do contai n the ori gi nal observati ons.
An Intera ctive GUI for N onlinea r Fitting a nd Prediction
The functi on nlintool for nonl i near model s i s a di rect anal og of rstool for
pol ynomi al model s. nlintool cal l s nlinfit and requi res the same i nputs.
The purpose of nlintool i s l arger than just fi tti ng and predi cti on for nonl i near
model s. Thi s GUI provi des an envi ronment for expl orati on of the graph of a
mul ti di mensi onal nonl i near functi on.
I f you have al ready l oaded reaction.mat, you can start nlintool.
nlintool(reactants,rate,'hougen',beta,0.01,xn,yn)
You wi l l see a vector of three pl ots. The dependent vari abl e of al l three pl ots
i s the reacti on rate. The fi rst pl ot has hydrogen as the i ndependent vari abl e.
The second and thi rd pl ots have n-pentane and i sopentane respecti vel y.
Each pl ot shows the fi tted rel ati onshi p of the reacti on rate to the i ndependent
vari abl e at a fi xed val ue of the other two i ndependent vari abl es. The fi xed
val ue of each i ndependent vari abl e i s i n an edi tabl e text box bel ow each axi s.
You can change the fi xed val ue of any i ndependent vari abl e by ei ther typi ng a
new val ue i n the box or by draggi ng any of the three verti cal l i nes to a new
posi ti on.
When you change the val ue of an i ndependent vari abl e, al l the pl ots update to
show the current pi cture at the new poi nt i n the space of the i ndependent
vari abl es.
Note that whi l e thi s exampl e onl y uses three reactants, nlintool can
accommodate an arbi trary number of i ndependent vari abl es. I nterpretabi l i ty
may be l i mi ted by the si ze of the moni tor for l arge numbers of i nputs.
Hypothesi sTests
1-105
Hypothesis Tests
A hypothesi s test i s a procedure for determi ni ng i f an asserti on about a
characteri sti c of a popul ati on i s reasonabl e.
For exampl e, suppose that someone says that the average pri ce of a gal l on of
regul ar unl eaded gas i n Massachusetts i s $1.15. How woul d you deci de
whether thi s statement i s true? You coul d try to fi nd out what every gas stati on
i n the state was chargi ng and how many gal l ons they were sel l i ng at that pri ce.
That approach mi ght be defi ni ti ve, but i t coul d end up costi ng more than the
i nformati on i s worth.
A si mpl er approach i s to fi nd out the pri ce of gas at a smal l number of randoml y
chosen stati ons around the state and compare the average pri ce to $1.15.
Of course, the average pri ce you get wi l l probabl y not be exactl y $1.15 due to
vari abi l i ty i n pri ce from one stati on to the next. Suppose your average pri ce
was $1.18. I s thi s three cent di fference a resul t of chance vari abi l i ty, or i s the
ori gi nal asserti on i ncorrect? A hypothesi s test can provi de an answer.
The fol l owi ng secti ons provi de an overvi ew of hypothesi s testi ng wi th the
Stati sti cs Tool box:
Hypothesi s Test Termi nol ogy
Hypothesi s Test Assumpti ons
Exampl e: Hypothesi s Testi ng
Avai l abl e Hypothesi s Tests
Hypothesis Test Terminology
To get started, there are some terms to defi ne and assumpti ons to make:
The null hypothesis i s the ori gi nal asserti on. I n thi s case the nul l hypothesi s
i s that the average pri ce of a gal l on of gas i s $1.15. The notati on i s
H
0
: = 1.15.
There are three possi bi l i ti es for the alternative hypothesis. You mi ght onl y be
i nterested i n the resul t i f gas pri ces were actual l y hi gher. I n thi s case, the
al ternati ve hypothesi s i s H
1
: > 1.15. The other possi bi l i ti es are H
1
: < 1.15
and H
1
: 1.15.
The significance level i s rel ated to the degree of certai nty you requi re i n order
to reject the nul l hypothesi s i n favor of the al ternati ve. By taki ng a smal l
1 Tutori al
1-106
sampl e you cannot be certai n about your concl usi on. So you deci de i n
advance to reject the nul l hypothesi s i f the probabi l i ty of observi ng your
sampl ed resul t i s l ess than the si gni fi cance l evel . For a typi cal si gni fi cance
l evel of 5%, the notati on i s = 0.05. For thi s si gni fi cance l evel , the
probabi l i ty of i ncorrectl y rejecti ng the nul l hypothesi s when i t i s actual l y
true i s 5%. I f you need more protecti on from thi s error, then choose a l ower
val ue of .
The p-value i s the probabi l i ty of observi ng the gi ven sampl e resul t under the
assumpti on that the nul l hypothesi s i s true. I f the p-val ue i s l ess than , then
you reject the nul l hypothesi s. For exampl e, i f = 0.05 and the p-val ue i s
0.03, then you reject the nul l hypothesi s.
The converse i s not true. I f the p-val ue i s greater than , you have
i nsuffi ci ent evi dence to reject the nul l hypothesi s.
The outputs for many hypothesi s test functi ons al so i ncl ude confidence
intervals. Loosel y speaki ng, a confi dence i nterval i s a range of val ues that
have a chosen probabi l i ty of contai ni ng the true hypothesi zed quanti ty.
Suppose, i n our exampl e, 1.15 i s i nsi de a 95% confi dence i nterval for the
mean, . That i s equi val ent to bei ng unabl e to reject the nul l hypothesi s at a
si gni fi cance l evel of 0.05. Conversel y i f the 100(1-) confi dence i nterval does
not contai n 1.15, then you reject the nul l hypothesi s at the l evel of
si gni fi cance.
Hypothesis Test Assumptions
The di fference between hypothesi s test procedures often ari ses from
di fferences i n the assumpti ons that the researcher i s wi l l i ng to make about the
data sampl e. For exampl e, the Z-test assumes that the data represents
i ndependent sampl es from the same normal di stri buti on and that you know the
standard devi ati on, . The t-test has the same assumpti ons except that you
esti mate the standard devi ati on usi ng the data i nstead of speci fyi ng i t as a
known quanti ty.
Both tests have an associ ated si gnal -to-noi se rati o
Z
x
------------ or T
x
s
------------ = =
where x
x
i
n
----
i 1 =
n
=
Hypothesi sTests
1-107
The si gnal i s the di fference between the average and the hypothesi zed mean.
The noi se i s the standard devi ati on posi ted or esti mated.
I f the nul l hypothesi s i s true, then Z has a standard normal di stri buti on,
N(0,1). T has a Students t di stri buti on wi th the degrees of freedom, , equal to
one l ess than the number of data val ues.
Gi ven the observed resul t for Z or T, and knowi ng the di stri buti on of Z and T
assumi ng the nul l hypothesi s i s true, i t i s possi bl e to compute the probabi l i ty
(p-val ue) of observi ng thi s resul t. A very smal l p-val ue casts doubt on the truth
of the nul l hypothesi s. For exampl e, suppose that the p-val ue was 0.001,
meani ng that the probabi l i ty of observi ng the gi ven Z or T was one i n a
thousand. That shoul d make you skepti cal enough about the nul l hypothesi s
that you reject i t rather than bel i eve that your resul t was just a l ucky 999 to 1
shot.
There are al so nonparametri c tests that do not even requi re the assumpti on
that the data come from a normal di stri buti on. I n addi ti on, there are functi ons
for testi ng whether the normal assumpti on i s reasonabl e.
Example: Hypothesis Testing
Thi s exampl e uses the gasol i ne pri ce data i n gas.mat. There are two sampl es
of 20 observed gas pri ces for the months of January and February, 1993.
load gas
prices = [price1 price2];
As a fi rst step, you may want to test whether the sampl es from each month
fol l ow a normal di stri buti on. As each sampl e i s rel ati vel y smal l , you mi ght
choose to perform a Li l l i efors test (rather than a Jarque-Bera test):
lillietest(price1)
ans =
0
lillietest(price2)
ans =
0
1 Tutori al
1-108
The resul t of the hypothesi s test i s a Bool ean val ue that i s 0 when you do not
reject the nul l hypothesi s, and 1 when you do reject that hypothesi s. I n each
case, there i s no need to reject the nul l hypothesi s that the sampl es have a
normal di stri buti on.
Suppose i t i s hi stori cal l y true that the standard devi ati on of gas pri ces at gas
stati ons around Massachusetts i s four cents a gal l on. The Z-test i s a procedure
for testi ng the nul l hypothesi s that the average pri ce of a gal l on of gas i n
January (price1) i s $1.15.
[h,pvalue,ci] = ztest(price1/100,1.15,0.04)
h =
0
pvalue =
0.8668
ci =
1.1340 1.1690
The Bool ean output i s h = 0, so you do not reject the nul l hypothesi s.
The resul t suggests that $1.15 i s reasonabl e. The 95% confi dence i nterval
[1.1340 1.1690] neatl y brackets $1.15.
What about February? Try a t-test wi th price2. Now you are not assumi ng
that you know the standard devi ati on i n pri ce.
[h,pvalue,ci] = ttest(price2/100,1.15)
h =
1
pvalue =
4.9517e-04
ci =
1.1675 1.2025
Wi th the Bool ean resul t h = 1, you can reject the nul l hypothesi s at the defaul t
si gni fi cance l evel , 0.05.
Hypothesi sTests
1-109
I t l ooks l i ke $1.15 i s not a reasonabl e esti mate of the gasol i ne pri ce i n
February. The l ow end of the 95% confi dence i nterval i s greater than 1.15.
The functi on ttest2 al l ows you to compare the means of the two data sampl es.
[h,sig,ci] = ttest2(price1,price2)
h =
1
sig =
0.0083
ci =
-5.7845 -0.9155
The confi dence i nterval (ci above) i ndi cates that gasol i ne pri ces were between
one and si x cents l ower i n January than February.
I f the two sampl es were not normal l y di stri buted but had si mi l ar shape, i t
woul d have been more appropri ate to use the nonparametri c rank sum test i n
pl ace of the t-test. We can sti l l use the rank sum test wi th normal l y di stri buted
data, but i t i s l ess powerful than the t-test.
[p,h,stats] = ranksum(price1, price2)
p =
0.0092
h =
1
stats =
zval: -2.6064
ranksum: 314
As mi ght be expected, the rank sum test l eads to the same concl usi on but i t i s
l ess sensi ti ve to the di fference between sampl es (hi gher p-val ue).
1 Tutori al
1-110
The box pl ot bel ow gi ves the same concl usi on graphi cal l y. Note that the
notches have l i ttl e, i f any, overl ap. Refer to Stati sti cal Pl ots on page 1-128 for
more i nformati on about box pl ots.
boxplot(prices,1)
set(gca,'XtickLabel',str2mat('January','February'))
xlabel('Month')
ylabel('Prices ($0.01)')
January February
110
115
120
125
P
r
i
c
e
s

(
$
0
.
0
1
)
Month
Hypothesi sTests
1-111
Available Hypothesis Tests
The Stati sti cs Tool box has functi ons for performi ng the fol l owi ng tests.
Function What it Tests
jbtest Normal di stri buti on for one sampl e
kstest Any speci fi ed di stri buti on for one sampl e
kstest2 Equal di stri buti ons for two sampl es
lillietest Normal di stri buti on for one sampl e
ranksum Medi an of two unpai red sampl es
signrank Medi an of two pai red sampl es
signtest Medi an of two pai red sampl es
ttest Mean of one normal sampl e
ttest2 Mean of two normal sampl es
ztest Mean of normal sampl e wi th known standard devi ati on
1 Tutori al
1-112
Multivariate Statistics
Mul ti vari ate stati sti cs i s an omni bus term for a number of di fferent stati sti cal
methods. The defi ni ng characteri sti c of these methods i s that they al l ai m to
understand a data set by consi deri ng a group of vari abl es together rather than
focusi ng on onl y one vari abl e at a ti me.
The Stati sti cs Tool box has functi ons for pri nci pal components anal ysi s
(princomp), mul ti vari ate anal ysi s of vari ance (manova1), and l i near
di scri mi nant anal ysi s (classify). The fol l owi ng secti ons i l l ustrate the fi rst two
functi ons:
Pri nci pal Components Anal ysi s
Mul ti vari ate Anal ysi s of Vari ance (MANOVA)
Principal Components Analysis
One of the di ffi cul ti es i nherent i n mul ti vari ate stati sti cs i s the probl em of
vi sual i zi ng mul ti di mensi onal i ty. I n MATLAB, the plot command di spl ays a
graph of the rel ati onshi p between two vari abl es. The plot3 and surf
commands di spl ay di fferent three-di mensi onal vi ews. When there are more
than three vari abl es, i t stretches the i magi nati on to vi sual i ze thei r
rel ati onshi ps.
Fortunatel y, i n data sets wi th many vari abl es, groups of vari abl es often move
together. One reason for thi s i s that more than one vari abl e may be measuri ng
the same dri vi ng pri nci pl e governi ng the behavi or of the system. I n many
systems there are onl y a few such dri vi ng forces. But an abundance of
i nstrumentati on al l ows us to measure dozens of system vari abl es. When thi s
happens, we can take advantage of thi s redundancy of i nformati on. We can
si mpl i fy our probl em by repl aci ng a group of vari abl es wi th a si ngl e new
vari abl e.
Pri nci pal components anal ysi s i s a quanti tati vel y ri gorous method for
achi evi ng thi s si mpl i fi cati on. The method generates a new set of vari abl es,
cal l ed principal components. Each pri nci pal component i s a l i near combi nati on
of the ori gi nal vari abl es. Al l the pri nci pal components are orthogonal to each
other so there i s no redundant i nformati on. The pri nci pal components as a
whol e form an orthogonal basi s for the space of the data.
M ulti vari ate Stati sti cs
1-113
There are an i nfi ni te number of ways to construct an orthogonal basi s for
several col umns of data. What i s so speci al about the pri nci pal component
basi s?
The fi rst pri nci pal component i s a si ngl e axi s i n space. When you project each
observati on on that axi s, the resul ti ng val ues form a new vari abl e. And the
vari ance of thi s vari abl e i s the maxi mum among al l possi bl e choi ces of the fi rst
axi s.
The second pri nci pal component i s another axi s i n space, perpendi cul ar to the
fi rst. Projecti ng the observati ons on thi s axi s generates another new vari abl e.
The vari ance of thi s vari abl e i s the maxi mum among al l possi bl e choi ces of thi s
second axi s.
The ful l set of pri nci pal components i s as l arge as the ori gi nal set of vari abl es.
But i t i s commonpl ace for the sum of the vari ances of the fi rst few pri nci pal
components to exceed 80% of the total vari ance of the ori gi nal data. By
exami ni ng pl ots of these few new vari abl es, researchers often devel op a deeper
understandi ng of the dri vi ng forces that generated the ori gi nal data.
The fol l owi ng secti on provi des an exampl e.
Ex a mple: Principa l Components Ana lysis
Let us l ook at a sampl e appl i cati on that uses ni ne di fferent i ndi ces of the
qual i ty of l i fe i n 329 U.S. ci ti es. These are cl i mate, housi ng, heal th, cri me,
transportati on, educati on, arts, recreati on, and economi cs. For each i ndex,
hi gher i s better; so, for exampl e, a hi gher i ndex for cri me means a l ower cri me
rate.
We start by l oadi ng the data i n cities.mat.
load cities
whos
categories 9x14 252 char array
names 329x43 28294 char array
ratings 329x9 23688 double array
The whos command generates a tabl e of i nformati on about al l the vari abl es i n
the workspace.
1 Tutori al
1-114
The ci ti es data set contai ns three vari abl es:
categories, a stri ng matri x contai ni ng the names of the i ndi ces.
names, a stri ng matri x contai ni ng the 329 ci ty names.
ratings, the data matri x wi th 329 rows and 9 col umns.
Lets l ook at the val ue of the categories vari abl e.
categories
categories =
climate
housing
health
crime
transportation
education
arts
recreation
economics
Now, l ets l ook at the fi rst several rows of names vari abl e.
first5 = names(1:5,:)
first5 =
Abilene, TX
Akron, OH
Albany, GA
Albany-Troy, NY
Albuquerque, NM
To get a qui ck i mpressi on of the rati ngs data, make a box pl ot.
boxplot(ratings,0,'+',0)
set(gca,'YTicklabel',categories)
These commands generate the pl ot bel ow. Note that there i s substanti al l y more
vari abi l i ty i n the rati ngs of the arts and housi ng than i n the rati ngs of cri me
and cl i mate.
1-115
Ordi nari l y you mi ght al so graph pai rs of the ori gi nal vari abl es, but there are
36 two-vari abl e pl ots. Perhaps pri nci pal components anal ysi s can reduce the
number of vari abl es we need to consi der.
Someti mes i t makes sense to compute pri nci pal components for raw data. Thi s
i s appropri ate when al l the vari abl es are i n the same uni ts. Standardi zi ng the
data i s reasonabl e when the vari abl es are i n di fferent uni ts or when the
vari ance of the di fferent col umns i s substanti al (as i n thi s case).
You can standardi ze the data by di vi di ng each col umn by i ts standard
devi ati on.
stdr = std(ratings);
sr = ratings./repmat(stdr,329,1);
Now we are ready to fi nd the pri nci pal components.
[pcs,newdata,variances,t2] = princomp(sr);
The fol l owi ng secti ons expl ai n the four outputs from princomp:
The Pri nci pal Components (Fi rst Output)
The Component Scores (Second Output)
The Component Vari ances (Thi rd Output)
Hotel l i ngs T
2
(Fourth Output)
0 1 2 3 4 5
x 10
4
climate
housing
health
crime
transportation
education
arts
recreation
economics
Values
C
o
l
u
m
n

N
u
m
b
e
r
1 Tutori al
1-116
The Principa l Components (First O utput)
The fi rst output of the princomp functi on, pcs, contai ns the ni ne pri nci pal
components. These are the l i near combi nati ons of the ori gi nal vari abl es that
generate the new vari abl es.
Lets l ook at the fi rst three pri nci pal component vectors.
p3 = pcs(:,1:3)
p3 =
0.2064 0.2178 -0.6900
0.3565 0.2506 -0.2082
0.4602 -0.2995 -0.0073
0.2813 0.3553 0.1851
0.3512 -0.1796 0.1464
0.2753 -0.4834 0.2297
0.4631 -0.1948 -0.0265
0.3279 0.3845 -0.0509
0.1354 0.4713 0.6073
The l argest wei ghts i n the fi rst col umn (fi rst pri nci pal component) are the thi rd
and seventh el ements, correspondi ng to the vari abl es health and arts. Al l the
el ements of the fi rst pri nci pal component are the same si gn, maki ng i t a
wei ghted average of al l the vari abl es.
To show the orthogonal i ty of the pri nci pal components, note that
premul ti pl yi ng them by thei r transpose yi el ds the i denti ty matri x.
I = p3'*p3
I =
1.0000 -0.0000 -0.0000
-0.0000 1.0000 -0.0000
-0.0000 -0.0000 1.0000
1-117
The Component Scores (Second O utput)
The second output, newdata, i s the data i n the new coordi nate system defi ned
by the pri nci pal components. Thi s output i s the same si ze as the i nput data
matri x.
A pl ot of the fi rst two col umns of newdata shows the rati ngs data projected onto
the fi rst two pri nci pal components.
plot(newdata(:,1),newdata(:,2),'+')
xlabel('1st Principal Component');
ylabel('2nd Principal Component');
Note the outl yi ng poi nts i n the l ower ri ght corner.
The functi on gname i s useful for graphi cal l y i denti fyi ng a few poi nts i n a pl ot
l i ke thi s. You can cal l gname wi th a stri ng matri x contai ni ng as many case
l abel s as poi nts i n the pl ot. The stri ng matri x names works for l abel i ng poi nts
wi th the ci ty names.
gname(names)
Move your cursor over the pl ot and cl i ck once near each poi nt at the top ri ght.
As you cl i ck on each poi nt, MATLAB l abel s i t wi th the proper row from the
names stri ng matri x. When you are fi ni shed l abel i ng poi nts, press the Return
key.
4 2 0 2 4 6 8 10 12 14
4
3
2
1
0
1
2
3
4
1st Principal Component
2
n
d

P
r
i
n
c
i
p
a
l

C
o
m
p
o
n
e
n
t
1 Tutori al
1-118
Here i s the resul ti ng pl ot.
The l abel ed ci ti es are the bi ggest popul ati on centers i n the Uni ted States.
Perhaps we shoul d consi der them as a compl etel y separate group. I f we cal l
gname wi thout arguments, i t l abel s each poi nt wi th i ts row number.
4 2 0 2 4 6 8 10 12 14
4
3
2
1
0
1
2
3
4
2
n
d

P
r
i
n
c
i
p
a
l

C
o
m
p
o
n
e
n
t
New York, NY
Los Angeles, Long Beach, CA
San Francisco, CA
Boston, MA
Washington, DCMDVA
Chicago, IL
4 2 0 2 4 6 8 10 12 14
4
3
2
1
0
1
2
3
4
2
n
d

P
r
i
n
c
i
p
a
l

C
o
m
p
o
n
e
n
t
213
179
270
43
314
65
237
234
1-119
We can create an i ndex vari abl e contai ni ng the row numbers of al l the
metropol i tan areas we chose.
metro = [43 65 179 213 234 270 314];
names(metro,:)
ans =
Boston, MA
Chicago, IL
Los Angeles, Long Beach, CA
New York, NY
Philadelphia, PA-NJ
San Francisco, CA
Washington, DC-MD-VA
To remove these rows from the rati ngs matri x, type the fol l owi ng.
rsubset = ratings;
nsubset = names;
nsubset(metro,:) = [];
rsubset(metro,:) = [];
size(rsubset)
ans =
322 9
To practi ce, repeat the anal ysi s usi ng the vari abl e rsubset as the new data
matri x and nsubset as the stri ng matri x of l abel s.
1 Tutori al
1-120
The Component Va ria nces (Third O utput)
The thi rd output, variances, i s a vector contai ni ng the vari ance expl ai ned by
the correspondi ng col umn of newdata.
variances
variances =
3.4083
1.2140
1.1415
0.9209
0.7533
0.6306
0.4930
0.3180
0.1204
You can easi l y cal cul ate the percent of the total vari abi l i ty expl ai ned by each
pri nci pal component.
percent_explained = 100*variances/sum(variances)
percent_explained =
37.8699
13.4886
12.6831
10.2324
8.3698
7.0062
5.4783
3.5338
1.3378
A Scree pl ot i s a pareto pl ot of the percent vari abi l i ty expl ai ned by each
pri nci pal component.
pareto(percent_explained)
xlabel('Principal Component')
ylabel('Variance Explained (%)')
1-121
We can see that the fi rst three pri nci pal components expl ai n roughl y two thi rds
of the total vari abi l i ty i n the standardi zed rati ngs.
Hotellings T
2
(Four th O utput)
The l ast output of the princomp functi on, t2, i s Hotel l i ngs T
2
, a stati sti cal
measure of the mul ti vari ate di stance of each observati on from the center of the
data set. Thi s i s an anal yti cal way to fi nd the most extreme poi nts i n the data.
[st2, index] = sort(t2); % Sort in ascending order.
st2 = flipud(st2); % Values in descending order.
index = flipud(index); % Indices in descending order.
extreme = index(1)
extreme =
213
names(extreme,:)
ans =
New York, NY
I t i s not surpri si ng that the rati ngs for New York are the furthest from the
average U.S. town.
1 2 3 4 5 6 7
0
10
20
30
40
50
60
70
80
90
100
Principal Component
V
a
r
i
a
n
c
e

E
x
p
l
a
i
n
e
d

(
%
)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 Tutori al
1-122
Multivariate Analysis of Variance (MANOVA)
We revi ewed the anal ysi s of vari ance techni que i n One-Way Anal ysi s of
Vari ance (ANOVA) on page 1-69. Wi th thi s techni que we can take a set of
grouped data and determi ne whether the mean of a vari abl e di ffers
si gni fi cantl y between groups. Often we have mul ti pl e vari abl es, and we are
i nterested i n determi ni ng whether the enti re set of means i s di fferent from one
group to the next. There i s a mul ti vari ate versi on of anal ysi s of vari ance that
can address that probl em, as i l l ustrated i n the fol l owi ng exampl e.
Ex a mple: M ultiva ria te Ana lysis of Va ria nce
The carsmall data set has measurements on a vari ety of car model s from the
years 1970, 1976, and 1982. Suppose we are i nterested i n whether the
characteri sti cs of the cars have changed over ti me.
Fi rst we l oad the data.
load carsmall
whos
Acceleration 100x1 800 double array
Cylinders 100x1 800 double array
Displacement 100x1 800 double array
Horsepower 100x1 800 double array
MPG 100x1 800 double array
Model 100x36 7200 char array
Model_Year 100x1 800 double array
Origin 100x7 1400 char array
Weight 100x1 800 double array
Four of these vari abl es (Acceleration, Displacement, Horsepower, and MPG)
are conti nuous measurements on i ndi vi dual car model s. The vari abl e
Model_Year i ndi cates the year i n whi ch the car was made. We can create a
grouped pl ot matri x of these vari abl es usi ng the gplotmatrix functi on.
x = [MPG Horsepower Displacement Weight];
gplotmatrix(x,[],Model_Year,[],'+xo')
1-123
(When the second argument of gplotmatrix i s empty, the functi on graphs the
col umns of the x argument agai nst each other, and pl aces hi stograms al ong the
di agonal s. The empty fourth argument produces a graph wi th the defaul t
col ors. The fi fth argument control s the symbol s used to di sti ngui sh between
groups.)
I t appears the cars do di ffer from year to year. The upper ri ght pl ot, for
exampl e, i s a graph of MPG versus Weight. The 1982 cars appear to have hi gher
mi l eage than the ol der cars, and they appear to wei gh l ess on average. But as
a group, are the three years si gni fi cantl y di fferent from one another? The
manova1 functi on can answer that questi on.
[d,p,stats] = manova1(x,Model_Year)
d =
2
p =
1.0e-006 *
0
0.1141
2000 3000 4000 200 400 100 200 20 40
2000
3000
4000
100
200
300
400
10
20
30
40
50
100
150
200
70
76
82
1 Tutori al
1-124
stats =
W: [4x4 double]
B: [4x4 double]
T: [4x4 double]
dfW: 90
dfB: 2
dfT: 92
lambda: [2x1 double]
chisq: [2x1 double]
chisqdf: [2x1 double]
eigenval: [4x1 double]
eigenvec: [4x4 double]
canon: [100x4 double]
mdist: [100x1 double]
gmdist: [3x3 double]
The manova1 functi on produces three outputs:
The fi rst output, d, i s an esti mate of the di mensi on of the group means. I f the
means were al l the same, the di mensi on woul d be 0, i ndi cati ng that the
means are at the same poi nt. I f the means di ffered but fel l al ong a l i ne, the
di mensi on woul d be 1. I n the exampl e the di mensi on i s 2, i ndi cati ng that the
group means fal l i n a pl ane but not al ong a l i ne. Thi s i s the l argest possi bl e
di mensi on for the means of three groups.
The second output, p, i s a vector of p-val ues for a sequence of tests. The fi rst
p-val ue tests whether the di mensi on i s 0, the next whether the di mensi on
i s 1, and so on. I n thi s case both p-val ues are smal l . Thats why the esti mated
di mensi on i s 2.
The thi rd output, stats, i s a structure contai ni ng several fi el ds, descri bed i n
the fol l owi ng secti on.
The Fields of the sta ts Structure. The W, B, and T fi el ds are matri x anal ogs to the
wi thi n, between, and total sums of squares i n ordi nary one-way anal ysi s of
vari ance. The next three fi el ds are the degrees of freedom for these matri ces.
Fi el ds lambda, chisq, and chisqdf are the i ngredi ents of the test for the
di mensi onal i ty of the group means. (The p-val ues for these tests are the fi rst
output argument of manova1.)
The next three fi el ds are used to do a canoni cal anal ysi s. Recal l that i n
pri nci pal components anal ysi s (Pri nci pal Components Anal ysi s on
1-125
page 1-112) we l ook for the combi nati on of the ori gi nal vari abl es that has the
l argest possi bl e vari ati on. I n mul ti vari ate anal ysi s of vari ance, we i nstead l ook
for the l i near combi nati on of the ori gi nal vari abl es that has the l argest
separati on between groups. I t i s the si ngl e vari abl e that woul d gi ve the most
si gni fi cant resul t i n a uni vari ate one-way anal ysi s of vari ance. Havi ng found
that combi nati on, we next l ook for the combi nati on wi th the second hi ghest
separati on, and so on.
The eigenvec fi el d i s a matri x that defi nes the coeffi ci ents of the l i near
combi nati ons of the ori gi nal vari abl es. The eigenval fi el d i s a vector
measuri ng the rati o of the between-group vari ance to the wi thi n-group
vari ance for the correspondi ng l i near combi nati on. The canon fi el d i s a matri x
of the canoni cal vari abl e val ues. Each col umn i s a l i near combi nati on of the
mean-centered ori gi nal vari abl es, usi ng coeffi ci ents from the eigenvec matri x.
A grouped scatter pl ot of the fi rst two canoni cal vari abl es shows more
separati on between groups then a grouped scatter pl ot of any pai r of ori gi nal
vari abl es. I n thi s exampl e i t shows three cl ouds of poi nts, overl appi ng but wi th
di sti nct centers. One poi nt i n the bottom ri ght si ts apart from the others. By
usi ng the gname functi on, we can see that thi s i s the 20th poi nt.
c1 = stats.canon(:,1);
c2 = stats.canon(:,2);
gscatter(c2,c1,Model_Year,[],'oxs')
gname
4 3 2 1 0 1 2 3 4 5
6
4
2
0
2
4
6
c2
c
1
20
70
76
82
1 Tutori al
1-126
Roughl y speaki ng, the fi rst canoni cal vari abl e, c1, separates the 1982 cars
(whi ch have hi gh val ues of c1) from the ol der cars. The second canoni cal
vari abl e, c2, reveal s some separati on between the 1970 and 1976 cars.
The fi nal two fi el ds of the stats structure are Mahal anobi s di stances. The
mdist fi el d measures the di stance from each poi nt to i ts group mean. Poi nts
wi th l arge val ues may be outl i ers. I n thi s data set, the l argest outl i er i s the one
we saw i n the scatter pl ot, the Bui ck Estate stati on wagon. (Note that we coul d
have suppl i ed the model name to the gname functi on above i f we wanted to l abel
the poi nt wi th i ts model name rather than i ts row number.)
max(stats.mdist)
ans =
31.5273
find(stats.mdist == ans)
ans =
20
Model(20,:)
ans =
buick_estate_wagon_(sw)
The gmdist fi el d measures the di stances between each pai r of group means.
The fol l owi ng commands exami ne the group means and thei r di stances:
grpstats(x, Model_Year)
ans =
1.0e+003 *
0.0177 0.1489 0.2869 3.4413
0.0216 0.1011 0.1978 3.0787
0.0317 0.0815 0.1289 2.4535
stats.gmdist
ans =
0 3.8277 11.1106
3.8277 0 6.1374
11.1106 6.1374 0
1-127
As mi ght be expected, the mul ti vari ate di stance between the extreme years
1970 and 1982 (11.1) i s l arger than the di fference between more cl osel y spaced
years (3.8 and 6.1). Thi s i s consi stent wi th the scatter pl ots, where the poi nts
seem to fol l ow a progressi on as the year changes from 1970 through 1976 to
1982. I f we had more groups, we mi ght have found i t i nstructi ve to use the
manovacluster functi on to draw a di agram that presents cl usters of the
groups, formed usi ng the di stances between thei r means.
1 Tutori al
1-128
Statistical Plots
The Stati sti cs Tool box adds speci al i zed pl ots to the extensi ve graphi cs
capabi l i ti es of MATLAB.
Box plots are graphs for descri bi ng data sampl es. They are al so useful for
graphi c compari sons of the means of many sampl es (see One-Way Anal ysi s
of Vari ance (ANOVA) on page 1-69).
Distribution plots are graphs for vi sual i zi ng the di stri buti on of one or more
sampl es. They i ncl ude normal and Wei bul l probabi l i ty pl ots,
quanti l e-quanti l e pl ots, and empi ri cal cumul ati ve di stri buti on pl ots.
Scatter plots are graphs for vi sual i zi ng the rel ati onshi p between a pai r of
vari abl es or several such pai rs. Grouped versi ons of these pl ots use di fferent
pl otti ng symbol s to i ndi cate group membershi p. The gname functi on can l abel
poi nts on these pl ots wi th a text l abel or an observati on number.
The pl ot types are descri bed further i n the fol l owi ng secti ons:
Box Pl ots
Di stri buti on Pl ots
Scatter Pl ots
Box Plots
The graph shows an exampl e of a notched box pl ot.
1
110
115
120
125
V
a
l
u
e
s
Column Number
Stati sti cal Plots
1-129
Thi s pl ot has several graphi c el ements:
The l ower and upper l i nes of the box are the 25th and 75th percenti l es of
the sampl e. The di stance between the top and bottom of the box i s the
i nterquarti l e range.
The l i ne i n the mi ddl e of the box i s the sampl e medi an. I f the medi an i s not
centered i n the box, that i s an i ndi cati on of skewness.
The whi skers are l i nes extendi ng above and bel ow the box. They show the
extent of the rest of the sampl e (unl ess there are outl i ers). Assumi ng no
outl i ers, the maxi mum of the sampl e i s the top of the upper whi sker. The
mi ni mum of the sampl e i s the bottom of the l ower whi sker. By defaul t, an
outl i er i s a val ue that i s more than 1.5 ti mes the i nterquarti l e range away
from the top or bottom of the box.
The pl us si gn at the top of the pl ot i s an i ndi cati on of an outl i er i n the data.
Thi s poi nt may be the resul t of a data entry error, a poor measurement or a
change i n the system that generated the data.
The notches i n the box are a graphi c confi dence i nterval about the medi an of
a sampl e. Box pl ots do not have notches by defaul t.
A si de-by-si de compari son of two notched box pl ots i s the graphi cal equi val ent
of a t-test. See Hypothesi s Tests on page 1-105.
Distribution Plots
There are several types of pl ots for exami ni ng the di stri buti on of one or more
sampl es, as descri bed i n the fol l owi ng secti ons:
Normal Probabi l i ty Pl ots
Quanti l e-Quanti l e Pl ots
Wei bul l Probabi l i ty Pl ots
Empi ri cal Cumul ati ve Di stri buti on Functi on (CDF)
N or ma l Proba bility Plots
A normal probabi l i ty pl ot i s a useful graph for assessi ng whether data comes
from a normal di stri buti on. Many stati sti cal procedures make the assumpti on
that the underl yi ng di stri buti on of the data i s normal , so thi s pl ot can provi de
some assurance that the assumpti on of normal i ty i s not bei ng vi ol ated, or
provi de an earl y warni ng of a probl em wi th your assumpti ons.
1 Tutori al
1-130
Thi s exampl e shows a typi cal normal probabi l i ty pl ot.
x = normrnd(10,1,25,1);
normplot(x)
The pl ot has three graphi cal el ements. The pl us si gns show the empi ri cal
probabi l i ty versus the data val ue for each poi nt i n the sampl e. The sol i d l i ne
connects the 25th and 75th percenti l es of the data and represents a robust
l i near fi t (i .e., i nsensi ti ve to the extremes of the sampl e). The dashed l i ne
extends the sol i d l i ne to the ends of the sampl e.
The scal e of the y-axi s i s not uni form. The y-axi s val ues are probabi l i ti es and,
as such, go from zero to one. The di stance between the ti ck marks on the y-axi s
matches the di stance between the quanti l es of a normal di stri buti on. The
quanti l es are cl ose together near the medi an (probabi l i ty = 0.5) and stretch out
symmetri cal l y movi ng away from the medi an. Compare the verti cal di stance
from the bottom of the pl ot to the probabi l i ty 0.25 wi th the di stance from 0.25
to 0.50. Si mi l arl y, compare the di stance from the top of the pl ot to the
probabi l i ty 0.75 wi th the di stance from 0.75 to 0.50.
8.5 9 9.5 10 10.5 11 11.5
0.01
0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.95
0.98
0.99
Data
P
r
o
b
a
b
i
l
i
t
y
Normal Probability Plot
Stati sti cal Plots
1-131
I f al l the data poi nts fal l near the l i ne, the assumpti on of normal i ty i s
reasonabl e. But, i f the data i s nonnormal , the pl us si gns may fol l ow a curve, as
i n the exampl e usi ng exponenti al data bel ow.
x = exprnd(10,100,1);
normplot(x)
Thi s pl ot i s cl ear evi dence that the underl yi ng di stri buti on i s not normal .
Q ua ntile- Q ua ntile Plots
A quanti l e-quanti l e pl ot i s useful for determi ni ng whether two sampl es come
from the same di stri buti on (whether normal l y di stri buted or not).
The exampl e shows a quanti l e-quanti l e pl ot of two sampl es from a Poi sson
di stri buti on.
x = poissrnd(10,50,1);
y = poissrnd(5,100,1);
qqplot(x,y);
0 5 10 15 20 25 30 35 40 45
0.003
0.01
0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.95
0.98
0.99
0.997
Data
P
r
o
b
a
b
i
l
i
t
y
1 Tutori al
1-132
Even though the parameters and sampl e si zes are di fferent, the strai ght l i ne
rel ati onshi p shows that the two sampl es come from the same di stri buti on.
Li ke the normal probabi l i ty pl ot, the quanti l e-quanti l e pl ot has three graphi cal
el ements. The pl uses are the quanti l es of each sampl e. By defaul t the number
of pl uses i s the number of data val ues i n the smal l er sampl e. The sol i d l i ne joi ns
the 25th and 75th percenti l es of the sampl es. The dashed l i ne extends the sol i d
l i ne to the extent of the sampl e.
The exampl e bel ow shows what happens when the underl yi ng di stri buti ons are
not the same.
x = normrnd(5,1,100,1);
y = weibrnd(2,0.5,100,1);
qqplot(x,y);
2 4 6 8 10 12 14 16 18
-2
0
2
4
6
8
10
12
X Quantiles
Y

Q
u
a
n
t
i
l
e
s
Stati sti cal Plots
1-133
These sampl es cl earl y are not from the same di stri buti on.
I t i s i ncorrect to i nterpret a l i near pl ot as a guarantee that the two sampl es
come from the same di stri buti on. But, for assessi ng the val i di ty of a stati sti cal
procedure that depends on the two sampl es comi ng from the same di stri buti on
(e.g., ANOVA), a l i near quanti l e-quanti l e pl ot shoul d be suffi ci ent.
Weibull Proba bility Plots
A Wei bul l probabi l i ty pl ot i s a useful graph for assessi ng whether data comes
from a Wei bul l di stri buti on. Many rel i abi l i ty anal yses make the assumpti on
that the underl yi ng di stri buti on of the l i feti mes i s Wei bul l , so thi s pl ot can
provi de some assurance that thi s assumpti on i s not bei ng vi ol ated, or provi de
an earl y warni ng of a probl em wi th your assumpti ons.
The scal e of the y-axi s i s not uni form. The y-axi s val ues are probabi l i ti es and,
as such, go from zero to one. The di stance between the ti ck marks on the y-axi s
matches the di stance between the quanti l es of a Wei bul l di stri buti on.
I f the data poi nts (pl uses) fal l near the l i ne, the assumpti on that the data comes
from a Wei bul l di stri buti on i s reasonabl e.
2 3 4 5 6 7 8
-2
0
2
4
6
8
10
12
14
16
X Quantiles
Y

Q
u
a
n
t
i
l
e
s
1 Tutori al
1-134
Thi s exampl e shows a typi cal Wei bul l probabi l i ty pl ot.
y = weibrnd(2,0.5,100,1);
weibplot(y)
Empirica l Cumula tive Distribution Function (CDF)
I f you are not wi l l i ng to assume that your data fol l ows a speci fi c probabi l i ty
di stri buti on, you can use the cdfplot functi on to graph an empi ri cal esti mate
of the cumul ati ve di stri buti on functi on (cdf). Thi s functi on computes the
proporti on of data poi nts l ess than each x val ue, and pl ots the proporti on as a
functi on of x. The y-axi s scal e i s l i near, not a probabi l i ty scal e for a speci fi c
di stri buti on.
Thi s exampl e shows the empi ri cal cumul ati ve di stri buti on functi on for a
Wei bul l sampl e.
y = weibrnd(2,0.5,100,1);
cdfplot(y)
10
-4
10
-2
10
0
0.003
0.01
0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.96
0.99
Data
P
r
o
b
a
b
i
l
i
t
y
Weibull Probability Plot
Stati sti cal Plots
1-135
The pl ot shows a probabi l i ty functi on that ri ses steepl y near x=0 and l evel s off
for l arger val ues. Over 80% of the observati ons are l ess than 1, wi th the
remai ni ng val ues spread over the range [1 5].
Scatter Plots
A scatter pl ot i s a si mpl e pl ot of one vari abl e agai nst another. The MATLAB
plot and scatter functi ons can produce scatter pl ots. The MATLAB
plotmatrix functi on can produce a matri x of such pl ots showi ng the
rel ati onshi p between several pai rs of vari abl es.
The Stati sti cs Tool box adds functi ons that produce grouped versi ons of these
pl ots. These are useful for determi ni ng whether the val ues of two vari abl es or
the rel ati onshi p between those vari abl es i s the same i n each group.
Suppose we want to exami ne the wei ght and mi l eage of cars from three
di fferent model years.
load carsmall
gscatter(Weight,MPG,Model_Year,'','xos')
0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F
(
x
)
Empirical CDF
1 Tutori al
1-136
Thi s shows that not onl y i s there a strong rel ati onshi p between the wei ght of a
car and i ts mi l eage, but al so that newer cars tend to be l i ghter and have better
gas mi l eage than ol der cars.
(The defaul t arguments for gscatter produce a scatter pl ot wi th the di fferent
groups shown wi th the same symbol but di fferent col ors. The l ast two
arguments above request that al l groups be shown i n defaul t col ors and wi th
di fferent symbol s.)
The carsmall data set contai ns other vari abl es that descri be di fferent aspects
of cars. We can exami ne several of them i n a si ngl e di spl ay by creati ng a
grouped pl ot matri x.
xvars = [Weight Displacement Horsepower];
yvars = [MPG Acceleration];
gplotmatrix(xvars,yvars,Model_Year,'','xos')
1500 2000 2500 3000 3500 4000 4500 5000
5
10
15
20
25
30
35
40
45
Weight
M
P
G
70
76
82
Stati sti cal Plots
1-137
The upper ri ght subpl ot di spl ays MPG agai nst Horsepower, and shows that over
the years the horsepower of the cars has decreased but the gas mi l eage has
i mproved.
The gplotmatrix functi on can al so graph al l pai rs from a si ngl e l i st of
vari abl es, al ong wi th hi stograms for each vari abl e. See Mul ti vari ate Anal ysi s
of Vari ance (MANOVA) on page 1-122.
50 100 150 200 100 200 300 400 2000 3000 4000
10
15
20
25
10
20
30
40
70
76
82
1 Tutori al
1-138
Statistical Process Control (SPC)
SPC i s an omni bus term for a number of methods for assessi ng and moni tori ng
the qual i ty of manufactured goods. These methods are si mpl e, whi ch makes
them easy to i mpl ement even i n a producti on envi ronment. The fol l owi ng
secti ons di scuss some of the SPC features of the Stati sti cs Tool box:
Control Charts
Capabi l i ty Studi es
Control Charts
These graphs were popul ari zed by Wal ter Shewhart i n hi s work i n the 1920s
at Western El ectri c. A control chart i s a pl ot of a measurements over ti me wi th
stati sti cal l i mi ts appl i ed. Actual l y, control chart i s a sl i ght mi snomer. The
chart i tsel f i s actual l y a moni tori ng tool . The control acti vi ty may occur i f the
chart i ndi cates that the process i s changi ng i n an undesi rabl e systemati c
di recti on.
The Stati sti cs Tool box supports three common control charts, descri bed i n the
fol l owi ng secti ons:
Xbar Charts
S Charts
EWMA Charts
Xba r Cha r ts
Xbar charts are a pl ot of the average of a sampl e of a process taken at regul ar
i nterval s. Suppose we are manufacturi ng pi stons to a tol erance of
0.5 thousandths of an i nch. We measure the runout (devi ati on from ci rcul ari ty
i n thousandths of an i nch) at four poi nts on each pi ston.
load parts
conf = 0.99;
spec = [-0.5 0.5];
xbarplot(runout,conf,spec)
Stati sti cal ProcessC ontrol (SPC )
1-139
The l i nes at the bottom and the top of the pl ot show the process speci fi cati ons.
The central l i ne i s the average runout over al l the pi stons. The two l i nes
fl anki ng the center l i ne are the 99% stati sti cal control l i mi ts. By chance onl y
one measurement i n 100 shoul d fal l outsi de these l i nes. We can see that even
i n thi s smal l run of 36 parts, there are several poi nts outsi de the boundari es
(l abel ed by thei r observati on numbers). Thi s i s an i ndi cati on that the process
mean i s not i n stati sti cal control . Thi s mi ght not be of much concern i n practi ce,
si nce al l the parts are wel l wi thi n speci fi cati on.
S Cha r ts
The S chart i s a pl ot of the standard devi ati on of a process taken at regul ar
i nterval s. The standard devi ati on i s a measure of the vari abi l i ty of a process.
So, the pl ot i ndi cates whether there i s any systemati c change i n the process
vari abi l i ty. Conti nui ng wi th the pi ston manufacturi ng exampl e, we can l ook at
the standard devi ati on of each set of four measurements of runout.
schart(runout)
0 10 20 30 40
-0.4
-0.2
0
0.2
0.4
0.6
1
2
21
25
26
30
Xbar Chart
USL
LSL
Samples
M
e
a
s
u
r
e
m
e
n
t
s
LCL
UCL
1 Tutori al
1-140
The average runout i s about 0.1 thousandths of an i nch. There i s no i ndi cati on
of nonrandom vari abi l i ty.
EW M A Cha r ts
The exponenti al l y-wei ghted movi ng average (EWMA) chart i s another chart
for moni tori ng the process average. I t operates on sl i ghtl y di fferent
assumpti ons than the Xbar chart. The mathemati cal model behi nd the Xbar
chart posi ts that the process mean i s actual l y constant over ti me and any
vari ati on i n i ndi vi dual measurements i s due enti rel y to chance.
The EWMA model i s a l i ttl e l ooser. Here we assume that the mean may be
varyi ng i n ti me. Here i s an EWMA chart of our runout exampl e. Compare thi s
wi th the pl ot i n Xbar Charts on page 1-138.
ewmaplot(runout,0.5,0.01,spec)
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
S Chart
Sample Number
S
t
a
n
d
a
r
d

D
e
v
i
a
t
i
o
n
UCL
LCL
Stati sti cal ProcessC ontrol (SPC )
1-141
Capability Studies
Before goi ng i nto ful l -scal e producti on, many manufacturers run a pi l ot study
to determi ne whether thei r process can actual l y bui l d parts to the
speci fi cati ons demanded by the engi neeri ng drawi ng.
Usi ng the data from these capabi l i ty studi es wi th a stati sti cal model al l ows us
to get a prel i mi nary esti mate of the percentage of parts that wi l l fal l outsi de
the speci fi cati ons.
[p,Cp,Cpk] = capable(mean(runout),spec)
p =
1.3940e-09
Cp =
2.3950
Cpk =
1.9812
0 5 10 15 20 25 30 35 40
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
21
25
26
Exponentially Weighted Moving Average (EWMA) Chart
USL
LSL
Sample Number
E
W
M
A
UCL
LCL
1 Tutori al
1-142
The resul t above shows that the probabi l i ty (p =1.3940e-09) of observi ng an
unacceptabl e runout i s extremel y l ow. Cp and Cpk are two popul ar capabi l i ty
i ndi ces.
C
p
i s the rati o of the range of the speci fi cati ons to si x ti mes the esti mate of the
process standard devi ati on.
For a process that has i ts average val ue on target, a C
p
of 1 transl ates to a l i ttl e
more than one defect per thousand. Recentl y many i ndustri es have set a
qual i ty goal of one part per mi l l i on. Thi s woul d correspond to a C
p
=1.6. The
hi gher the val ue of C
p
, the more capabl e the process.
C
pk
i s the rati o of di fference between the process mean and the cl oser
speci fi cati on l i mi t to three ti mes the esti mate of the process standard
devi ati on.
where the process mean i s . For processes that do not mai ntai n thei r average
on target, C
pk
, i s a more descri pti ve i ndex of process capabi l i ty.
C
p
USL L SL
6
-------------------------------- =
C
pk
mi n
USL
3
-----------------------
L SL
3
---------------------- ,
,
_
=
Desi gn of Experi ments(DO E)
1-143
Design of Experiments (DOE)
There i s a worl d of di fference between data and i nformati on. To extract
i nformati on from data you have to make assumpti ons about the system that
generated the data. Usi ng these assumpti ons and physi cal theory you may be
abl e to devel op a mathemati cal model of the system.
General l y, even ri gorousl y formul ated model s have some unknown constants.
The goal of experi mentati on i s to acqui re data that al l ow us to esti mate these
constants.
But why do we need to experi ment at al l ? We coul d i nstrument the system we
want to study and just l et i t run. Sooner or l ater we woul d have al l the data we
coul d use.
I n fact, thi s i s a fai rl y common approach. There are three characteri sti cs of
hi stori cal data that pose probl ems for stati sti cal model i ng:
Suppose we observe a change i n the operati ng vari abl es of a system fol l owed
by a change i n the outputs of the system. That does not necessari l y mean
that the change i n the system caused the change i n the outputs.
A common assumpti on i n stati sti cal model i ng i s that the observati ons are
i ndependent of each other. Thi s i s not the way a system i n normal operati on
works.
Control l i ng a system i n operati on often means changi ng system vari abl es i n
tandem. But i f two vari abl es change together, i t i s i mpossi bl e to separate
thei r effects mathemati cal l y.
Desi gned experi ments di rectl y address these probl ems. The overwhel mi ng
advantage of a desi gned experi ment i s that you acti vel y mani pul ate the system
you are studyi ng. Wi th DOE you may generate fewer data poi nts than by usi ng
passi ve i nstrumentati on, but the qual i ty of the i nformati on you get wi l l be
hi gher.
The Stati sti cs Tool box provi des several functi ons for generati ng experi mental
desi gns appropri ate to vari ous si tuati ons. These are di scussed i n the fol l owi ng
secti ons:
Ful l Factori al Desi gns
Fracti onal Factori al Desi gns
D-Opti mal Desi gns
1 Tutori al
1-144
Full Factorial Designs
Suppose you want to determi ne whether the vari abi l i ty of a machi ni ng process
i s due to the di fference i n the l athes that cut the parts or the operators who run
the l athes.
I f the same operator al ways runs a gi ven l athe then you cannot tel l whether
the machi ne or the operator i s the cause of the vari ati on i n the output. By
al l owi ng every operator to run every l athe you can separate thei r effects.
Thi s i s a factori al approach. fullfact i s the functi on that generates the desi gn.
Suppose we have four operators and three machi nes. What i s the factori al
desi gn?
d = fullfact([4 3])
d =
1 1
2 1
3 1
4 1
1 2
2 2
3 2
4 2
1 3
2 3
3 3
4 3
Each row of d represents one operator/machi ne combi nati on. Note that there
are 4*3 = 12 rows.
One speci al subcl ass of factori al desi gns i s when al l the vari abl es take onl y two
val ues. Suppose you want to qui ckl y determi ne the sensi ti vi ty of a process to
hi gh and l ow val ues of three vari abl es.
d2 = ff2n(3)
1-145
d2 =
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
There are 2
3
= 8 combi nati ons to check.
Fractional Factorial Designs
One di ffi cul ty wi th factori al desi gns i s that the number of combi nati ons
i ncreases exponenti al l y wi th the number of vari abl es you want to mani pul ate.
For exampl e, the sensi ti vi ty study di scussed above mi ght be i mpracti cal i f
there were seven vari abl es to study i nstead of just three. A ful l factori al desi gn
woul d requi re 2
7
= 128 runs!
I f we assume that the vari abl es do not act synergi sti cal l y i n the system, we can
assess the sensi ti vi ty wi th far fewer runs. The theoreti cal mi ni mum number i s
ei ght. A desi gn known as the Pl ackett-Burman desi gn uses a Hadamard matri x
to defi ne thi s mi ni mal number of runs. To see the desi gn (X) matri x for the
Pl ackett-Burman desi gn, we use the hadamard functi on.
X = hadamard(8)
X =
1 1 1 1 1 1 1 1
1 -1 1 -1 1 -1 1 -1
1 1 -1 -1 1 1 -1 -1
1 -1 -1 1 1 -1 -1 1
1 1 1 1 -1 -1 -1 -1
1 -1 1 -1 -1 1 -1 1
1 1 -1 -1 -1 -1 1 1
1 -1 -1 1 -1 1 1 -1
The l ast seven col umns are the actual vari abl e setti ngs (-1 for l ow, 1 for hi gh.)
The fi rst col umn (al l ones) al l ows us to measure the mean effect i n the l i near
equati on, . y X + =
1 Tutori al
1-146
The Pl ackett-Burman desi gn enabl es us to study the mai n (l i near) effects of
each vari abl e wi th a smal l number of runs. I t does thi s by usi ng a fracti on, i n
thi s case 8/128, of the runs requi red for a ful l factori al desi gn. A drawback of
thi s desi gn i s that i f the effect of one vari abl e does vary wi th the val ue of
another vari abl e, then the esti mated effects wi l l be bi ased (that i s, they wi l l
tend to be off by a systemati c amount).
At a cost of a somewhat l arger desi gn, we can fi nd a fracti onal factori al that i s
much smal l er than a ful l factori al , but that does al l ow esti mati on of mai n
effects i ndependent of i nteracti ons between pai rs of vari abl es. We can do thi s
by speci fyi ng generators that control the confoundi ng between vari abl es.
As an exampl e, suppose we create a desi gn wi th the fi rst four vari abl es varyi ng
i ndependentl y as i n a ful l factori al , but wi th the other three vari abl es formed
by mul ti pl yi ng di fferent tri pl ets of the fi rst four. Wi th thi s desi gn the effects of
the l ast three vari abl es are confounded wi th three-way i nteracti ons among the
fi rst four vari abl es. The esti mated effect of any si ngl e vari abl e, however, i s not
confounded wi th (i s i ndependent of) i nteracti on effects between any pai r of
vari abl es. I nteracti on effects are confounded wi th each other. Box, Hunter, and
Hunter (1978) present the properti es of these desi gns and provi de the
generators needed to produce them.
The fracfact functi on can produce thi s fracti onal factori al desi gn usi ng the
generator stri ngs that Box, Hunter, and Hunter provi de.
X = fracfact('a b c d abc bcd acd')
1-147
X =
-1 -1 -1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 1
-1 -1 1 -1 1 1 1
-1 -1 1 1 1 -1 -1
-1 1 -1 -1 1 1 -1
-1 1 -1 1 1 -1 1
-1 1 1 -1 -1 -1 1
-1 1 1 1 -1 1 -1
1 -1 -1 -1 1 -1 1
1 -1 -1 1 1 1 -1
1 -1 1 -1 -1 1 -1
1 -1 1 1 -1 -1 1
1 1 -1 -1 -1 1 1
1 1 -1 1 -1 -1 -1
1 1 1 -1 1 -1 -1
1 1 1 1 1 1 1
D-Optimal Designs
Al l the desi gns above were i n use by earl y i n the 20th century. I n the 1970s
stati sti ci ans started to use the computer i n experi mental desi gn by recasti ng
the desi gn of experi ments (DOE) i n terms of opti mi zati on. A D-opti mal desi gn
i s one that maxi mi zes the determi nant of Fi shers i nformati on matri x, X
T
X.
Thi s matri x i s proporti onal to the i nverse of the covari ance matri x of the
parameters. So maxi mi zi ng det(X
T
X) i s equi val ent to mi ni mi zi ng the
determi nant of the covari ance of the parameters.
A D-opti mal desi gn mi ni mi zes the vol ume of the confi dence el l i psoi d of the
regressi on esti mates of the l i near model parameters, .
There are several functi ons i n the Stati sti cs Tool box that generate D-opti mal
desi gns. These are cordexch, daugment, dcovary, and rowexch. The fol l owi ng
secti ons expl ore D-opti mal desi gn i n greater detai l :
Generati ng D-Opti mal Desi gns
Augmenti ng D-Opti mal Desi gns
Desi gni ng Experi ments wi th Uncontrol l ed I nputs
1 Tutori al
1-148
Genera ting D- O ptima l Designs
cordexch and rowexch are two competi ng opti mi zati on al gori thms for
computi ng a D-opti mal desi gn gi ven a model speci fi cati on.
Both cordexch and rowexch are i terati ve al gori thms. They operate by
i mprovi ng a starti ng desi gn by maki ng i ncremental changes to i ts el ements. I n
the coordi nate exchange al gori thm, the i ncrements are the i ndi vi dual el ements
of the desi gn matri x. I n row exchange, the el ements are the rows of the desi gn
matri x. Atki nson and Donev (1992) i s a reference.
To generate a D-opti mal desi gn you must speci fy the number of i nputs, the
number of runs, and the order of the model you want to fi t.
Both cordexch and rowexch take the fol l owi ng stri ngs to speci fy the model :
'linear' or 'l' the defaul t model wi th constant and fi rst order terms
'interaction' or 'i' i ncl udes constant, l i near, and cross product terms
'quadratic' or 'q' i nteracti ons pl us squared terms
'purequadratic' or 'p' i ncl udes constant, l i near and squared terms
Al ternati vel y, you can use a matri x of i ntegers to speci fy the terms. Detai l s are
i n the hel p for the uti l i ty functi on x2fx.
For a si mpl e exampl e usi ng the coordi nate-exchange al gori thm, consi der the
probl em of quadrati c model i ng wi th two i nputs. The model form i s
Suppose we want the D-opti mal desi gn for fi tti ng thi s model wi th ni ne runs.
settings = cordexch(2,9,'q')
settings =
-1 1
1 1
0 1
1 -1
-1 -1
0 -1
1 0
0 0
-1 0
y
0

1
x
1

2
x
2

12
x
1
x
2

11
x
1
2
22
x
2
2
+ + + + + + =
1-149
We can pl ot the col umns of setti ngs agai nst each other to get a better pi cture
of the desi gn.
h = plot(settings(:,1),settings(:,2),'.');
set(gca,'Xtick',[-1 0 1])
set(gca,'Ytick',[-1 0 1])
set(h,'Markersize',20)
For a si mpl e exampl e usi ng the row-exchange al gori thm, consi der the
i nteracti on model wi th two i nputs. The model form i s
Suppose we want the D-opti mal desi gn for fi tti ng thi s model wi th four runs.
[settings, X] = rowexch(2,4,'i')
settings =
-1 1
-1 -1
1 -1
1 1
X =
1 -1 1 -1
1 -1 -1 1
1 1 -1 -1
1 1 1 1
The setti ngs matri x shows how to vary the i nputs from run to run. The X matri x
i s the desi gn matri x for fi tti ng the above regressi on model . The fi rst col umn of X
-1 0 1
-1
0
1
y
0

1
x
1

2
x
2

12
x
1
x
2
+ + + + =
1 Tutori al
1-150
i s for fi tti ng the constant term. The l ast col umn i s the el ement-wi se product of
the second and thi rd col umns.
The associ ated pl ot i s si mpl e but el egant.
h = plot(settings(:,1),settings(:,2),'.');
set(gca,'Xtick',[-1 0 1])
set(gca,'Ytick',[-1 0 1])
set(h,'Markersize',20)
Augmenting D- O ptima l Designs
I n practi ce, experi mentati on i s an i terati ve process. We often want to add runs
to a compl eted experi ment to l earn more about our system. The functi on
daugment al l ows you choose these extra runs opti mal l y.
Suppose we have executed the ei ght-run desi gn bel ow for fi tti ng a l i near model
to four i nput vari abl es.
settings = cordexch(4,8)
settings =
1 -1 1 1
-1 -1 1 -1
-1 1 1 1
1 1 1 -1
-1 1 -1 1
1 -1 -1 1
-1 -1 -1 -1
1 1 -1 -1
-1 0 1
-1
0
1
1-151
Thi s desi gn i s adequate to fi t the l i near model for four i nputs, but cannot fi t the
si x cross-product (i nteracti on) terms. Suppose we are wi l l i ng to do ei ght more
runs to fi t these extra terms. Heres how.
[augmented, X] = daugment(settings,8,'i');
augmented
augmented =
1 -1 1 1
-1 -1 1 -1
-1 1 1 1
1 1 1 -1
-1 1 -1 1
1 -1 -1 1
-1 -1 -1 -1
1 1 -1 -1
-1 -1 -1 1
1 1 1 1
-1 -1 1 1
-1 1 1 -1
1 -1 1 -1
1 -1 -1 -1
-1 1 -1 -1
1 1 -1 1
info = X'*X
info =
16 0 0 0 0 0 0 0 0 0 0
0 16 0 0 0 0 0 0 0 0 0
0 0 16 0 0 0 0 0 0 0 0
0 0 0 16 0 0 0 0 0 0 0
0 0 0 0 16 0 0 0 0 0 0
0 0 0 0 0 16 0 0 0 0 0
0 0 0 0 0 0 16 0 0 0 0
0 0 0 0 0 0 0 16 0 0 0
0 0 0 0 0 0 0 0 16 0 0
0 0 0 0 0 0 0 0 0 16 0
0 0 0 0 0 0 0 0 0 0 16
1 Tutori al
1-152
The augmented desi gn i s orthogonal , si nce X'*X i s a mul ti pl e of the i denti ty
matri x. I n fact, thi s desi gn i s the same as a 2
4
factori al desi gn.
Designing Ex periments w ith Uncontrolled Inputs
Someti mes i t i s i mpossi bl e to control every experi mental i nput. But you may
know the val ues of some i nputs i n advance. An exampl e i s the ti me each run
takes pl ace. I f a process i s experi enci ng l i near dri ft, you may want to i ncl ude
the ti me of each test run as a vari abl e i n the model .
The functi on dcovary al l ows you to choose the setti ngs for each run i n order to
maxi mi ze your i nformati on despi te a l i near dri ft i n the process.
Suppose we want to execute an ei ght-run experi ment wi th three factors that i s
opti mal wi th respect to a l i near dri ft i n the response over ti me. Fi rst we create
our drift i nput vari abl e. Note, that drift i s normal i zed to have mean zero. I ts
mi ni mum i s -1 and i ts maxi mum i s 1.
drift = (linspace(-1,1,8))'
drift =
-1.0000
-0.7143
-0.4286
-0.1429
0.1429
0.4286
0.7143
1.0000
settings = dcovary(3,drift,'linear')
settings =
1.0000 1.0000 -1.0000 -1.0000
-1.0000 -1.0000 -1.0000 -0.7143
-1.0000 1.0000 1.0000 -0.4286
1.0000 -1.0000 1.0000 -0.1429
-1.0000 1.0000 -1.0000 0.1429
1.0000 1.0000 1.0000 0.4286
-1.0000 -1.0000 1.0000 0.7143
1.0000 -1.0000 -1.0000 1.0000
Demos
1-153
Demos
The Stati sti cs Tool box has demonstrati on programs that create an i nteracti ve
envi ronment for expl ori ng the probabi l i ty di stri buti ons, random number
generati on, curve fi tti ng, and desi gn of experi ments functi ons. Most of them
provi de a graphi cal user i nterface that can be used wi th your real data, not just
wi th the sampl e data provi ded.
The avai l abl e demos are l i sted bel ow.
Most of these functi ons are descri bed bel ow. The nlintool, rstool, and
stepwise demos are di scussed i n earl i er secti ons:
nlintool: An I nteracti ve GUI for Nonl i near Fi tti ng and Predi cti on on
page 1-104
rstool: Expl ori ng Graphs of Mul ti di mensi onal Pol ynomi al s on page 1-86
stepwise: Exampl e: Stepwi se Regressi on on page 1-88
Demo Purpose
aoctool I nteracti ve graphi c predi cti on of anocova fi ts
disttool Graphi c i nteracti on wi th probabi l i ty di stri buti ons
glmdemo General i zed l i near model s sl i de show
nlintool I nteracti ve fi tti ng of nonl i near model s
polytool I nteracti ve graphi c predi cti on of pol ynomi al fi ts
randtool I nteracti ve control of random number generati on
robustdemo I nteracti ve compari son of robust and l east squares fi ts
rsmdemo Desi gn of experi ments and regressi on model i ng
rstool Expl ori ng graphs of mul ti di mensi onal pol ynomi al s
stepwise I nteracti ve stepwi se regressi on
1 Tutori al
1-154
The disttool Demo
disttool i s a graphi c envi ronment for devel opi ng an i ntui ti ve understandi ng
of probabi l i ty di stri buti ons.
The disttool demo has the fol l owi ng features:
A graph of the cdf (pdf) for the gi ven parameters of a di stri buti on.
A pop-up menu for changi ng the di stri buti on functi on.
A pop-up menu for changi ng the functi on type (cdf <> pdf).
Sl i ders to change the parameter setti ngs.
Data entry boxes to choose speci fi c parameter val ues.
Data entry boxes to change the l i mi ts of the parameter sl i ders.
Draggabl e hori zontal and verti cal reference l i nes to do i nteracti ve eval uati on
of the functi on at varyi ng val ues.
A data entry box to eval uate the functi on at a speci fi c x-val ue.
For cdf pl ots, a data entry box on the probabi l i ty axi s (y-axi s) to fi nd cri ti cal
val ues correspondi ng to a speci fi c probabi l i ty.
A Close button to end the demonstrati on.
Demos
1-155
Function type
pop-up
cdf function
Draggable
vertical
reference line
Parameter value
Distributions
pop-up
cdf value
x value
Parameter control
Draggable
horizontal
reference line
Upper and
lower
parameter
bounds
1 Tutori al
1-156
The polytool Demo
The polytool demo i s an i nteracti ve graphi c envi ronment for pol ynomi al curve
fi tti ng and predi cti on.
The polytool demo has the fol l owi ng features:
A graph of the data, the fi tted pol ynomi al , and gl obal confi dence bounds on
a new predi cted val ue.
y-axi s text to di spl ay the predi cted y-val ue and i ts uncertai nty at the current
x-val ue.
A data entry box to change the degree of the pol ynomi al fi t.
A data entry box to eval uate the pol ynomi al at a speci fi c x-val ue.
A draggabl e verti cal reference l i ne to do i nteracti ve eval uati on of the
pol ynomi al at varyi ng x-val ues.
Bounds and Method menus to control the confi dence bounds and choose
between l east squares or robust fi tti ng.
An Export l i st box to store fi t resul ts i nto vari abl es.
You can use polytool to do curve fi tti ng and predi cti on for any set of x-y data,
but, for the sake of demonstrati on, the Stati sti cs Tool box provi des a data set
(polydata.mat) to teach some basi c concepts.
To start the demonstrati on, you must fi rst l oad the data set.
load polydata
who
Your variables are:
x x1 y y1
The vari abl es x and y are observati ons made wi th error from a cubi c
pol ynomi al . The vari abl es x1 and y1 are data poi nts from the true functi on
wi thout error.
I f you do not speci fy the degree of the pol ynomi al , polytool does a l i near fi t to
the data.
polytool(x,y)
Demos
1-157
The l i near fi t i s not very good. The bul k of the data wi th x-val ues between zero
and two has a steeper sl ope than the fi tted l i ne. The two poi nts to the ri ght are
draggi ng down the esti mate of the sl ope.
I n the Degree box at the top, type 3 for a cubi c model . Then, drag the verti cal
reference l i ne to the x-val ue of 2 (or type 2 i n the X Values text box).
Predicted
value
Polynomial
degree
95%
confidence
interval
Draggable
reference
line
Lower
confidence
bound
Fitted line
Upper
confidence
bound
x-value
Data point
1 Tutori al
1-158
Thi s graph shows a much better fi t to the data. The confi dence bounds are
cl oser together i ndi cati ng that there i s l ess uncertai nty i n predi cti on. The data
at both ends of the pl ot tracks the fi tted curve.
The fol l owi ng secti ons expl ore addi ti onal aspects of the tool :
Confi dence Bounds
Overfi tti ng
Confidence Bounds
By defaul t, the confi dence bounds are nonsi mul taneous bounds for a new
observati on. What does thi s mean? Let p(x) be the true but unknown functi on
we want to esti mate. The graph contai ns the fol l owi ng three curves:
f(x), our fi tted functi on
l(x), the l ower confi dence bounds
u(x), the upper confi dence bounds
Demos
1-159
Suppose we pl an to take a new observati on at the val ue . Cal l i t
. Thi s new observati on has i ts own error , so i t sati sfi es the
equati on
What are the l i kel y val ues for thi s new observati on? The confi dence bounds
provi de the answer. The i nterval [ , ] i s a 95% confi dence bound for
.
These are the defaul t bounds, but the Bounds menu on the polytool fi gure
wi ndow provi des opti ons for changi ng the meani ng of these bounds. Thi s menu
has opti ons that l et you speci fy whether the bounds are to appl y to the
esti mated functi on or to a new observati on, and whether the bounds shoul d be
si mul taneous or not. Usi ng these opti ons you can produce any of the fol l owi ng
types of confi dence bounds.
O ver fitting
I f the cubi c pol ynomi al i s a good fi t, i t i s tempti ng to try a hi gher order
pol ynomi al to see i f even more preci se predi cti ons are possi bl e.
Si nce the true functi on i s cubi c, thi s amounts to overfi tti ng the data. Use the
data entry box for degree and type 5 for a qui nti c model .
Simultaneous? For Quantity Yields Confidence Bounds for
Nonsi mul taneous Observati on
Nonsi mul taneous Curve
Si mul taneous Observati on , gl obal l y for any x
Si mul taneous Curve , si mul taneousl y for al l x
x
n 1 +
y
n 1 +
x
n 1 +
( )
n 1 +
y
n 1 +
x
n 1 +
( ) p x
n 1 +
( )
n 1 +
+ =
l
n 1 +
u
n 1 +
y
n 1 +
x
n 1 +
( )
y
n 1 +
x
n 1 +
( )
p x
n 1 +
( )
y
n 1 +
x ( )
p x ( )
1 Tutori al
1-160
As measured by the confi dence bounds, the fi t i s preci se near the data poi nts.
But, i n the regi on between the data groups, the uncertai nty of predi cti on ri ses
dramati cal l y.
Thi s bul ge i n the confi dence bounds happens because the data real l y does not
contai n enough i nformati on to esti mate the hi gher order pol ynomi al terms
preci sel y, so even i nterpol ati on usi ng pol ynomi al s can be ri sky i n some cases.
Demos
1-161
The aoctool Demo
The aoctool demo i s an i nteracti ve graphi cal envi ronment for fi tti ng and
predi cti on wi th anal ysi s of covari ance (anocova) model s. I t i s si mi l ar to the
polytool demo.
Anal ysi s of covari ance i s a techni que for anal yzi ng grouped data havi ng a
response (y, the vari abl e to be predi cted) and a predi ctor (x, the vari abl e used
to do the predi cti on). Usi ng anal ysi s of covari ance, you can model y as a l i near
functi on of x, wi th the coeffi ci ents of the l i ne possi bl y varyi ng from group to
group. The aoctool functi on fi ts the fol l owi ng model s for the ith group:
I n the fourth model , for exampl e, the i ntercept vari es from one group to the
next, but the sl ope i s the same for each group. I n the fi rst model , there i s a
common i ntercept and no sl ope. I n order to make the group coeffi ci ents wel l
determi ned, we i mpose the constrai nts
i

i
= 0.
The aoctool demo di spl ays the resul ts of the fi t i n three fi gure wi ndows. One
wi ndow di spl ays esti mates of the coeffi ci ents (,
i
, ,
i
). A second di spl ays an
anal ysi s of vari ance tabl e that you can use to test whether a more compl ex
model i s si gni fi cantl y better than a si mpl er one. The thi rd, mai n graphi cs
wi ndow has the fol l owi ng features:
A graph of the data wi th superi mposed fi tted l i nes and opti onal confi dence
bounds.
y-axi s text to di spl ay the predi cted y-val ue and i ts uncertai nty at the current
x-val ue for the current group, i f a group i s currentl y sel ected.
A data entry box to eval uate the fi t at a speci fi c x-val ue.
A l i st box to eval uate the fi t for a speci fi c group or to di spl ay fi tted l i nes for
al l groups.
A draggabl e verti cal reference l i ne to do i nteracti ve eval uati on of the fi t at
varyi ng x-val ues.
1 same mean
2 separate means
3 same l i ne
4 paral l el l i nes
5 separate l i nes
y + =
y
i
+ ( ) + =
y x + + =
y
i
+ ( ) x + + =
y
i
+ ( )
i
+ ( )x + + =
1 Tutori al
1-162
An Export l i st box to store fi t resul ts i nto vari abl es.
The fol l owi ng secti on provi des an i l l ustrati ve exampl e.
Ex a mple: a octool w ith Sa mple Da ta
The Stati sti cs Tool box has a smal l data set named carsmall wi th i nformati on
about cars. I t i s a good sampl e data set to use wi th aoctool. You can al so use
aoctool wi th your own data.
To start the demonstrati on, l oad the data set.
load carsmall
who
Your variables are:
Acceleration Horsepower Model_Year
Cylinders MPG Origin
Displacement Model Weight
Suppose we want to study the rel ati onshi p between the wei ght of a car and i ts
mi l eage, and whether thi s rel ati onshi p has changed over the years.
Next, start up the tool .
[h,atab,ctab,stats] = aoctool(Weight,MPG,Model_Year);
Note: 6 observations with missing values have been removed.
The graphi cal output consi sts of the fol l owi ng mai n wi ndow, pl us a tabl e of
coeffi ci ent esti mates and an anal ysi s of vari ance tabl e.
Demos
1-163
The group of each data poi nt i s coded by i ts col or and symbol , and the fi t for
each group has the same col or as the data poi nts.
The i ni ti al fi t model s the y vari abl e, MPG, as a l i near functi on of the x vari abl e,
Weight. Each group has a separate l i ne. The coeffi ci ents of the three l i nes
1 Tutori al
1-164
appear i n the fi gure ti tl ed ANOCOVA Coefficients. You can see that the sl opes
are roughl y -0.0078, wi th a smal l devi ati on for each group:
Noti ce that the three fi tted l i nes have sl opes that are roughl y si mi l ar. Coul d
they real l y be the same? The Model_Year*Weight i nteracti on expresses the
di fference i n sl opes, and the ANOVA tabl e shows a test for the si gni fi cance of
thi s term. Wi th an F stati sti c of 5.23 and a p-val ue of 0.0072, the sl opes are
si gni fi cantl y di fferent.
To exami ne the fi ts when the sl opes are constrai ned to be the same, return to
the ANOCOVA Prediction Plot wi ndow and use the Model pop-up to sel ect a
Parallel Lines model . The wi ndow updates to show the graph bel ow.
Model year 70:
Model year 76:
Model year 82:
y 45.9798 8.5805 ( ) 0.0078 0.002 + ( )x + + =
y 45.9798 3.8902 ( ) 0.0078 0.0011 + ( )x + + =
y 45.9798 12.4707 + ( ) 0.0078 0.0031 ( )x + + =
Demos
1-165
Though thi s fi t l ooks reasonabl e, we know i t i s si gni fi cantl y worse than the
Separate Lines model . Use the Model pop-up agai n to return to the ori gi nal
model .
The fol l owi ng secti ons focus on two other i nteresti ng aspects of aoctool:
Confi dence Bounds
Mul ti pl e Compari sons
Confidence Bounds. Now we have esti mates of the rel ati onshi p between MPG and
Weight for each Model_Year, but how accurate are they? We can superi mpose
confi dence bounds on the fi ts by exami ni ng them one group at a ti me. I n the
Model_Year menu at the l ower ri ght of the fi gure, change the setti ng from
All Groups to 82. The data and fi ts for the other groups are di mmed, and
confi dence bounds appear around the 82 fi t.
1 Tutori al
1-166
The dashed l i nes form an envel ope around the fi tted l i ne for model year 82.
Under the assumpti on that the true rel ati onshi p i s l i near, these bounds
provi de a 95% confi dence regi on for the true l i ne. Note that the fi ts for the other
model years are wel l outsi de these confi dence bounds for Weight val ues
between 2000 and 3000.
Someti mes i t i s more val uabl e to be abl e to predi ct the response val ue for a new
observati on, not just esti mate the average response val ue. Li ke the polytool
functi on, the aoctool functi on has a Bounds menu to change the defi ni ti on of
the confi dence bounds. Use that menu to change from Line to Observation.
The resul ti ng wi der i nterval s refl ect the uncertai nty i n the parameter
esti mates as wel l as the randomness of a new observati on.
Demos
1-167
Al so l i ke the polytool functi on, the aoctool functi on has crosshai rs you can
use to mani pul ate the Weight and watch the esti mate and confi dence bounds
al ong the y-axi s update. These val ues appear onl y when a si ngl e group i s
sel ected, not when All Groups i s sel ected.
Multiple Compa risons. We can perform a mul ti pl e compari son test by usi ng the
stats output from aoctool as i nput to the multcompare functi on. The
multcompare functi on can test ei ther sl opes, i ntercepts, or popul ati on margi nal
means (the hei ghts of the four l i nes eval uated at the mean X val ue). I n thi s
exampl e, we have al ready determi ned that the sl opes are not al l the same, but
coul d i t be that two are the same and onl y the other one i s di fferent? We can
test that hypothesi s.
multcompare(stats,0.05,'on','','s')
ans =
1.0000 2.0000 -0.0012 0.0008 0.0029
1.0000 3.0000 0.0013 0.0051 0.0088
2.0000 3.0000 0.0005 0.0042 0.0079
1 Tutori al
1-168
Thi s matri x shows that the esti mated di fference between the i ntercepts of
groups 1 and 2 (1970 and 1976) i s 0.0008, and a confi dence i nterval for the
di fference i s [-0.0012, 0.0029]. There i s no si gni fi cant di fference between the
two. There are si gni fi cant di fferences, however, between the i ntercept for 1982
and each of the other two. The graph shows the same i nformati on.
Note that the stats structure was created i n the i ni ti al cal l to the aoctool
functi on, so i t i s based on the i ni ti al model fi t (typi cal l y a separate-l i nes model ).
I f you change the model i nteracti vel y and want to base your mul ti pl e
compari sons on the new model , you need to run aoctool agai n to get another
stats structure, thi s ti me speci fyi ng your new model as the i ni ti al model .
Demos
1-169
The randtool Demo
randtool i s a graphi c envi ronment for generati ng random sampl es from
vari ous probabi l i ty di stri buti ons and di spl ayi ng the sampl e hi stogram.
The randtool demo has the fol l owi ng features:
A hi stogram of the sampl e.
A pop-up menu for changi ng the di stri buti on functi on.
Sl i ders to change the parameter setti ngs.
A data entry box to choose the sampl e si ze.
Data entry boxes to choose speci fi c parameter val ues.
Data entry boxes to change the l i mi ts of the parameter sl i ders.
An Output button to output the current sampl e to the vari abl e ans.
A Resample button to al l ow repeti ti ve sampl i ng wi th constant sampl e si ze
and fi xed parameters.
Parameter value
Drawagain
fromthe
same
distribution
Parameter control
Histogram
Upper and
lower
parameter
bounds
Sample
size
Distributions
pop-up
Output to
variable
ans
1 Tutori al
1-170
The rsmdemo Demo
The rsmdemo uti l i ty i s an i nteracti ve graphi c envi ronment that demonstrates
the desi gn of experi ments and surface fi tti ng through the si mul ati on of a
chemi cal reacti on. The goal of the demo i s to fi nd the l evel s of the reactants
needed to maxi mi ze the reacti on rate.
There are two parts to the demo:
Part 1 Compare data gathered through tri al and error wi th data from a
desi gned experi ment.
Part 2 Compare response surface (pol ynomi al ) model i ng wi th nonl i near
model i ng.
Pa r t 1
Begi n the demo by usi ng the sl i ders i n the Reaction Simulator wi ndow to
control the parti al pressures of three reactants: Hydrogen, n-Pentane, and
Isopentane. Each ti me you cl i ck the Run button, the l evel s for the reactants
and resul ts of the run are entered i n the Trial and Error Data wi ndow.
Based on the resul ts of previ ous runs, you can change the l evel s of the
reactants to i ncrease the reacti on rate. (The resul ts are determi ned usi ng an
underl yi ng model that takes i nto account the noi se i n the process, so even i f you
keep al l of the l evel s the same, the resul ts wi l l vary from run to run.) You are
al l otted a budget of 13 runs. When you have compl eted the runs, you can use
the Plot menu on the Trial and Error Data wi ndow to pl ot the rel ati onshi ps
between the reactants and the reacti on rate, or cl i ck the Analyze button. When
you cl i ck Analyze, rsmdemo cal l s the rstool functi on, whi ch you can then use
to try to opti mi ze the resul ts.)
Next, perform another set of 13 runs, thi s ti me from a desi gned experi ment. I n
the Experimental Design Data wi ndow, cl i ck the Do Experiment button.
rsmdemo cal l s the cordexch functi on to generate a D-opti mal desi gn, and then,
for each run, computes the reacti on rate.
Now use the Plot menu on the Experimental Design Data wi ndow to pl ot the
rel ati onshi ps between the l evel s of the reactants and the reacti on rate, or cl i ck
the Response Surface button to cal l rstool to fi nd the opti mal l evel s of the
reactants.
Demos
1-171
Compare the anal ysi s resul ts for the two sets of data. I t i s l i kel y (though not
certai n) that youl l fi nd some or al l of these di fferences:
You can fi t a ful l quadrati c model wi th the data from the desi gned
experi ment, but the tri al and error data may be i nsuffi ci ent for fi tti ng a
quadrati c model or i nteracti ons model .
Usi ng the data from the desi gned experi ment, you are more l i kel y to be abl e
to fi nd l evel s for the reactants that resul t i n the maxi mum reacti on rate.
Even i f you fi nd the best setti ngs usi ng the tri al and error data, the
confi dence bounds are l i kel y to be wi der than those from the desi gned
experi ment.
Pa r t 2
Now anal yze the experi mental desi gn data wi th a pol ynomi al model and a
nonl i near model , and compari ng the resul ts. The true model for the process,
whi ch i s used to generate the data, i s actual l y a nonl i near model . However,
wi thi n the range of the data, a quadrati c model approxi mates the true model
qui te wel l .
To see the pol ynomi al model , cl i ck the Response Surface button on the
Experimental Design Data wi ndow. rsmdemo cal l s rstool, whi ch fi ts a ful l
quadrati c model to the data. Drag the reference l i nes to change the l evel s of the
reactants, and fi nd the opti mal reacti on rate. Observe the wi dth of the
confi dence i nterval s.
Now cl i ck the Nonlinear Model button on the Experimental Design Data
wi ndow. rsmdemo cal l s nlintool, whi ch fi ts a Hougen-Watson model to the
data. As wi th the quadrati c model , you can drag the reference l i nes to change
the reactant l evel s. Observe the reacti on rate and the confi dence i nterval s.
Compare the anal ysi s resul ts for the two model s. Even though the true model
i s nonl i near, you may fi nd that the pol ynomi al model provi des a good fi t.
Because pol ynomi al model s are much easi er to fi t and work wi th than
nonl i near model s, a pol ynomi al model i s often preferabl e even when model i ng
a nonl i near process. Keep i n mi nd, however, that such model s are unl i kel y to
be rel i abl e for extrapol ati ng outsi de the range of the data.
1 Tutori al
1-172
The glmdemo Demo
The glmdemo functi on presents a si mpl e sl i de show descri bi ng general i zed
l i near model s. I t presents exampl es of what functi ons and di stri buti ons are
avai l abl e wi th general i zed l i near model s. I t presents an exampl e where
tradi ti onal l i near l east squares fi tti ng i s not appropri ate, and shows how to use
the glmfit functi on to fi t a l ogi sti c regressi on model and the glmval functi on
to compute predi cti ons from that model .
The robustdemo Demo
The robustdemo functi on presents a si mpl e compari son of l east squares and
robust fi ts for a response and a si ngl e predi ctor. You can use robustdemo wi th
your own data or wi th the sampl e data provi ded.
To begi n usi ng robustdemo wi th the bui l t-i n sampl e data, si mpl y type the
functi on name.
robustdemo
The resul ti ng fi gure presents a scatter pl ot wi th two fi tted l i nes. One l i ne i s the
fi t from an ordi nary l east squares regressi on. The other i s from a robust
regressi on. Al ong the bottom of the fi gure are the equati ons for the fi tted l i ne
and the esti mated error standard devi ati on for each fi t.
The effect of any poi nt on the l east squares fi t depends on the resi dual and
l everage for that poi nt. The resi dual i s si mpl y the verti cal di stance from the
poi nt to the l i ne. The l everage i s a measure of how far the poi nt i s from the
center of the X data.
The effect of any poi nt on the robust fi t al so depends on the wei ght assi gned to
the poi nt. Poi nts far from the l i ne get l ower wei ght.
You can use the ri ght mouse button to cl i ck on any poi nt and see i ts l east
squares l everage and robust wei ght.
Demos
1-173
I n thi s exampl e, the ri ghtmost poi nt has a l everage val ue of 0.35. I t i s al so far
from the l i ne, so i t exerts a l arge i nfl uence on the l east squares fi t. I t has a
smal l wei ght, though, so i t i s effecti vel y excl uded from the robust fi t.
Usi ng the l eft mouse button, you can experi ment to see how changes i n the data
affect the two fi ts. Sel ect any poi nt, and drag i t to a new l ocati on whi l e hol di ng
the l eft button down. When you rel ease the poi nt, both fi ts update.
Bri ngi ng the ri ghtmost poi nt cl oser to the l i ne makes the two fi tted l i nes nearl y
i denti cal . Now, the poi nt has nearl y ful l wei ght i n the robust fi t.
1 Tutori al
1-174
Selected Bi bli ography
1-175
Selected Bibliography
Atki nson, A.C., and A.N. Donev, Optimum Experimental Designs, Oxford
Sci ence Publ i cati ons 1992.
Bates, D. and D. Watts. Nonlinear Regression Analysis and I ts Applications,
John Wi l ey and Sons. 1988. pp. 271272.
Bernoul l i , J., Ars Conjectandi, Basi l i ea: Thurni si us [11.19], 1713
Box, G.E.P., W.G. Hunter, and J.S. Hunter. Statistics for Experimenters. Wi l ey,
New York. 1978.
Chatterjee, S. and A.S. Hadi . I nfluential Observations, High Leverage Points,
and Outliers in Linear Regression. Stati sti cal Sci ence, 1986. pp. 379416.
Dobson, A. J., An I ntroduction to Generalized Linear Models, 1990, CRC Press.
Efron, B., and R.J. Ti bshi rani . An I ntroduction to the Bootstrap, Chapman and
Hal l , New York. 1993.
Evans, M., N. Hasti ngs, and B. Peacock. Statistical Distributions, Second
Edition. John Wi l ey and Sons, 1993.
Hal d, A., Statistical Theory with Engineering Applications, John Wi l ey and
Sons, 1960. p. 647.
Hogg, R.V., and J. Ledol ter. Engineering Statistics. MacMi l l an Publ i shi ng
Company, 1987.
Johnson, N., and S. Kotz. Distributions in Statistics: Continuous Univariate
Distributions. John Wi l ey and Sons, 1970.
MuCul l agh, P., and J. A. Nel der, Generalized Linear Models, 2nd edi ti on, 1990,
Chapman and Hal l .
Moore, J., Total Biochemical Oxygen Demand of Dairy Manures. Ph.D. thesi s.
Uni versi ty of Mi nnesota, Department of Agri cul tural Engi neeri ng, 1975.
Poi sson, S.D., Recherches sur l a Probabi l i t des Jugements en Mati ere
Cri mi nel l e et en Meti re Ci vi l e, Prcdes des Regl es Gnral es du Cal cul des
Probabi l i ti s. Pari s: Bachel i er, I mpri meur-Li brai re pour l es Mathemati ques,
1837.
Student, On the Probable Error of the Mean. Bi ometri ka, 6:1908. pp. 125.
1 Tutori al
1-176
Wei bul l , W., A Statistical Theory of the Strength of Materials. I ngeni ors
Vetenskaps Akademi ens Handl i ngar, Royal Swedi sh I nsti tute for Engi neeri ng
Research. Stockhol m, Sweden, No. 153. 1939.

2
Reference
2 Reference
2-2
Thi s chapter contai ns detai l ed descri pti ons of al l the Stati sti cs Tool box
functi ons. I t i s di vi ded i nto two secti ons:
Functi on Category Li st a l i st of functi ons, grouped by subject area
Functi on descri pti ons i n al phabeti cal order
Functi on C ategory Li st
2-3
Function Category List
The Stati sti cs Tool box provi des several categori es of functi ons.
The Statistics Toolboxs Main Categories of Functions
Probabi l i ty Di stri buti ons Parameter Esti mati on
Cumul ati ve Di stri buti on Functi ons (cdf)
Probabi l i ty Densi ty Functi ons (pdf)
I nverse Cumul ati ve Di stri buti on Functi ons
Random Number Generators
Moments of Di stri buti on Functi ons
Descri pti ve Stati sti cs Descri pti ve stati sti cs for data sampl es
Stati sti cal Pl otti ng Stati sti cal pl ots
Stati sti cal Process Control Stati sti cal Process Control
Cl uster Anal ysi s Groupi ng i tems wi th si mi l ar characteri sti cs
i nto cl usters
Li near Model s Fi tti ng l i near model s to data
Nonl i near Regressi on Fi tti ng nonl i near regressi on model s
Desi gn of Experi ments Desi gn of Experi ments
Pri nci pal Components
Anal ysi s
Pri nci pal Components Anal ysi s
Hypothesi s Tests Stati sti cal tests of hypotheses
Fi l e I /O Readi ng data from and wri ti ng data to
operati ng-system fi l es
Demonstrati ons Demonstrati ons
Data Data for exampl es
2 Reference
2-4
The fol l owi ng tabl es l i st the functi ons i n each of these speci fi c areas. The fi rst
seven tabl es contai n probabi l i ty di stri buti on functi ons. The remai ni ng tabl es
descri be the other categori es of functi ons.
Parameter Estimation
betafit Parameter esti mati on for the beta di stri buti on
betalike Beta l og-l i kel i hood functi on
binofit Parameter esti mati on for the bi nomi al di stri buti on
expfit Parameter esti mati on for the exponenti al di stri buti on
gamfit Parameter esti mati on for the gamma di stri buti on
gamlike Gamma l og-l i kel i hood functi on
mle Maxi mum l i kel i hood esti mati on
normlike Normal l og-l i kel i hood functi on
normfit Parameter esti mati on for the normal di stri buti on
poissfit Parameter esti mati on for the Poi sson di stri buti on
unifit Parameter esti mati on for the uni form di stri buti on
Cumulative Distribution Functions (cdf)
betacdf Beta cdf
binocdf Bi nomi al cdf
cdf Parameteri zed cdf routi ne
chi2cdf Chi -square cdf
expcdf Exponenti al cdf
2-5
fcdf F cdf
gamcdf Gamma cdf
geocdf Geometri c cdf
hygecdf Hypergeometri c cdf
logncdf Lognormal cdf
nbincdf Negati ve bi nomi al cdf
ncfcdf Noncentral F cdf
nctcdf Noncentral t cdf
ncx2cdf Noncentral Chi -square cdf
normcdf Normal (Gaussi an) cdf
poisscdf Poi sson cdf
raylcdf Rayl ei gh cdf
tcdf Students t cdf
unidcdf Di screte uni form cdf
unifcdf Conti nuous uni form cdf
weibcdf Wei bul l cdf
Probability Density Functions (pdf)
betapdf Beta pdf
binopdf Bi nomi al pdf
chi2pdf Chi -square pdf
exppdf Exponenti al pdf
Cumulative Distribution Functions (cdf) (Continued)
2 Reference
2-6
fpdf F pdf
gampdf Gamma pdf
geopdf Geometri c pdf
hygepdf Hypergeometri c pdf
normpdf Normal (Gaussi an) pdf
lognpdf Lognormal pdf
nbinpdf Negati ve bi nomi al pdf
ncfpdf Noncentral F pdf
nctpdf Noncentral t pdf
ncx2pdf Noncentral Chi -square pdf
pdf Parameteri zed pdf routi ne
poisspdf Poi sson pdf
raylpdf Rayl ei gh pdf
tpdf Students t pdf
unidpdf Di screte uni form pdf
unifpdf Conti nuous uni form pdf
weibpdf Wei bul l pdf
Inverse Cumulative Distribution Functions
betainv Beta cri ti cal val ues
binoinv Bi nomi al cri ti cal val ues
chi2inv Chi -square cri ti cal val ues
Probability Density Functions (pdf) (Continued)
2-7
expinv Exponenti al cri ti cal val ues
finv F cri ti cal val ues
gaminv Gamma cri ti cal val ues
geoinv Geometri c cri ti cal val ues
hygeinv Hypergeometri c cri ti cal val ues
logninv Lognormal cri ti cal val ues
nbininv Negati ve bi nomi al cri ti cal val ues
ncfinv Noncentral F cri ti cal val ues
nctinv Noncentral t cri ti cal val ues
ncx2inv Noncentral Chi -square cri ti cal val ues
icdf Parameteri zed i nverse di stri buti on routi ne
norminv Normal (Gaussi an) cri ti cal val ues
poissinv Poi sson cri ti cal val ues
raylinv Rayl ei gh cri ti cal val ues
tinv Students t cri ti cal val ues
unidinv Di screte uni form cri ti cal val ues
unifinv Conti nuous uni form cri ti cal val ues
weibinv Wei bul l cri ti cal val ues
Random Number Generators
betarnd Beta random numbers
binornd Bi nomi al random numbers
Inverse Cumulative Distribution Functions (Continued)
2 Reference
2-8
chi2rnd Chi -square random numbers
exprnd Exponenti al random numbers
frnd F random numbers
gamrnd Gamma random numbers
geornd Geometri c random numbers
hygernd Hypergeometri c random numbers
lognrnd Lognormal random numbers
nbinrnd Negati ve bi nomi al random numbers
ncfrnd Noncentral F random numbers
nctrnd Noncentral t random numbers
ncx2rnd Noncentral Chi -square random numbers
normrnd Normal (Gaussi an) random numbers
poissrnd Poi sson random numbers
raylrnd Rayl ei gh random numbers
random Parameteri zed random number routi ne
trnd Students t random numbers
unidrnd Di screte uni form random numbers
unifrnd Conti nuous uni form random numbers
weibrnd Wei bul l random numbers
Random Number Generators (Continued)
2-9
Moments of Distribution Functions
betastat Beta mean and vari ance
binostat Bi nomi al mean and vari ance
chi2stat Chi -square mean and vari ance
expstat Exponenti al mean and vari ance
fstat F mean and vari ance
gamstat Gamma mean and vari ance
geostat Geometri c mean and vari ance
hygestat Hypergeometri c mean and vari ance
lognstat Lognormal mean and vari ance
nbinstat Negati ve bi nomi al mean and vari ance
ncfstat Noncentral F mean and vari ance
nctstat Noncentral t mean and vari ance
ncx2stat Noncentral Chi -square mean and vari ance
normstat Normal (Gaussi an) mean and vari ance
poisstat Poi sson mean and vari ance
raylstat Rayl ei gh mean and vari ance
tstat Students t mean and vari ance
unidstat Di screte uni form mean and vari ance
unifstat Conti nuous uni form mean and vari ance
weibstat Wei bul l mean and vari ance
2 Reference
2-10
Descriptive Statistics
corrcoef Correl ati on coeffi ci ents (i n MATLAB)
cov Covari ance matri x (i n MATLAB)
geomean Geometri c mean
harmmean Harmoni c mean
iqr I nterquarti l e range
kurtosis Sampl e kurtosi s
mad Mean absol ute devi ati on
mean Ari thmeti c average (i n MATLAB)
median 50th percenti l e (i n MATLAB)
moment Central moments of al l orders
nanmax Maxi mum i gnori ng mi ssi ng data
nanmean Average i gnori ng mi ssi ng data
nanmedian Medi an i gnori ng mi ssi ng data
nanmin Mi ni mum i gnori ng mi ssi ng data
nanstd Standard devi ati on i gnori ng mi ssi ng data
nansum Sum i gnori ng mi ssi ng data
prctile Empi ri cal percenti l es of a sampl e
range Sampl e range
skewness Sampl e skewness
std Standard devi ati on (i n MATLAB)
trimmean Tri mmed mean
var Vari ance
2-11
Statistical Plotting
boxplot Box pl ots
errorbar Error bar pl ot
fsurfht I nteracti ve contour pl ot of a functi on
gline I nteracti ve l i ne drawi ng
gname I nteracti ve poi nt l abel i ng
lsline Add l east-squares fi t l i ne to pl otted data
normplot Normal probabi l i ty pl ots
pareto Pareto charts
qqplot Quanti l e-Quanti l e pl ots
rcoplot Regressi on case order pl ot
refcurve Reference pol ynomi al
refline Reference l i ne
surfht I nteracti ve i nterpol ati ng contour pl ot
weibplot Wei bul l pl otti ng
Statistical Process Control
capable Qual i ty capabi l i ty i ndi ces
capaplot Pl ot of process capabi l i ty
ewmaplot Exponenti al l y wei ghted movi ng average pl ot
histfit Hi stogram and normal densi ty curve
normspec Pl ot normal densi ty between l i mi ts
2 Reference
2-12
schart Ti me pl ot of standard devi ati on
xbarplot Ti me pl ot of means
Cluster Analysis
cluster Create cl usters from linkage output
clusterdata Create cl usters from a dataset
cophenet Cal cul ate the copheneti c correl ati on coeffi ci ent
dendrogram Pl ot a hi erarchi cal tree i n a dendrogram graph
inconsistent Cal cul ate the i nconsi stency val ues of objects i n a cl uster
hi erarchy tree
linkage Li nk objects i n a dataset i nto a hi erarchi cal tree of
bi nary cl usters
pdist Cal cul ate the pai rwi se di stance between objects i n a
dataset
squareform Reformat output of pdist functi on from vector to square
matri x
zscore Normal i ze a dataset before cal cul ati ng the di stance
Linear Models
anova1 One-way Anal ysi s of Vari ance (ANOVA)
anova2 Two-way Anal ysi s of Vari ance
lscov Regressi on gi ven a covari ance matri x (i n MATLAB)
Statistical Process Control (Continued)
2-13
polyconf Pol ynomi al predi cti on wi th confi dence i nterval s
polyfit Pol ynomi al fi tti ng (i n MATLAB)
polyval Pol ynomi al predi cti on (i n MATLAB)
regress Mul ti pl e l i near regressi on
ridge Ri dge regressi on
rstool Response surface tool
stepwise Stepwi se regressi on GUI
Nonlinear Regression
nlinfit Nonl i near l east-squares fi tti ng
nlintool Predi cti on graph for nonl i near fi ts
nlparci Confi dence i nterval s on parameters
nlpredci Confi dence i nterval s for predi cti on
nnls Nonnegati ve l east squares (i n MATLAB)
Design of Experiments
cordexch D-opti mal desi gn usi ng coordi nate exchange
daugment D-opti mal augmentati on of desi gns
dcovary D-opti mal desi gn wi th fi xed covari ates
ff2n Two-l evel ful l factori al desi gns
fullfact Mi xed l evel ful l factori al desi gns
Linear Models (Continued)
2 Reference
2-14
hadamard Hadamard desi gns (i n MATLAB)
rowexch D-opti mal desi gn usi ng row exchange
Principal Components Analysis
barttest Bartl etts test
pcacov PCA from covari ance matri x
pcares Resi dual s from PCA
princomp PCA from raw data matri x
Hypothesis Tests
ranksum Wi l coxon rank sum test
signrank Wi l coxon si gned rank test
signtest Si gn test for pai red sampl es
ttest One sampl e t-test
ttest2 Two sampl e t-test
ztest Z-test
File I/ O
caseread Read casenames from a fi l e
casewrite Wri te casenames from a stri ng matri x to a fi l e
Design of Experiments (Continued)
2-15
tblread Retri eve tabul ar data from the fi l e system
tblwrite Wri te data i n tabul ar form to the fi l e system
Demonstrations
disttool I nteracti ve expl orati on of di stri buti on functi ons
randtool I nteracti ve random number generati on
polytool I nteracti ve fi tti ng of pol ynomi al model s
rsmdemo I nteracti ve process experi mentati on and anal ysi s
Data
census.mat U. S. Popul ati on 1790 to 1980
cities.mat Names of U.S. metropol i tan areas
discrim.mat Cl assi fi cati on data
gas.mat Gasol i ne pri ces
hald.mat Hal d data
hogg.mat Bacteri a counts from mi l k shi pments
lawdata.mat GPA versus LSAT for 15 l aw school s
mileage.mat Mi l eage data for three car model s from two factori es
moore.mat Fi ve factor one response regressi on data
parts.mat Di mensi onal runout on 36 ci rcul ar parts
popcorn.mat Data for popcorn exampl e (anova2, friedman)
File I/ O (Continued)
2 Reference
2-16
polydata.mat Data for polytool demo
reaction.mat Reacti on ki neti cs data
sat.dat ASCI I data for tblread exampl e
Data (Continued)
anova1
2-17
2anova1
Purpose One-way Anal ysi s of Vari ance (ANOVA).
Syntax p = anova1(X)
p = anova1(X,group)
p = anova1(X,group,'displayopt')
[p,table] = anova1(...)
[p,table,stats] = anova1(...)
Description p = anova1(X) performs a bal anced one-way ANOVA for compari ng the
means of two or more col umns of data i n the m-by-n matri x X, where each
col umn represents an i ndependent sampl e contai ni ng m mutual l y i ndependent
observati ons. The functi on returns the p-val ue for the nul l hypothesi s that al l
sampl es i n X are drawn from the same popul ati on (or from di fferent
popul ati ons wi th the same mean).
I f the p-val ue i s near zero, thi s casts doubt on the nul l hypothesi s and suggests
that at l east one sampl e mean i s si gni fi cantl y di fferent than the other sampl e
means. The choi ce of a cri ti cal p-val ue to determi ne whether the resul t i s
judged stati sti cal l y si gni fi cant i s l eft to the researcher. I t i s common to
decl are a resul t si gni fi cant i f the p-val ue i s l ess than 0.05 or 0.01.
The anova1 functi on di spl ays two fi gures. The fi rst fi gure i s the standard
ANOVA tabl e, whi ch di vi des the vari abi l i ty of the data i n X i nto two parts:
Vari abi l i ty due to the di fferences among the col umn means (vari abi l i ty
between groups)
Vari abi l i ty due to the di fferences between the data i n each col umn and the
col umn mean (vari abi l i ty within groups)
The ANOVA tabl e has si x col umns:
The fi rst shows the source of the vari abi l i ty.
The second shows the Sum of Squares (SS) due to each source.
The thi rd shows the degrees of freedom (df) associ ated wi th each source.
The fourth shows the Mean Squares (MS) for each source, whi ch i s the rati o
SS/df.
The fi fth shows the F stati sti c, whi ch i s the rati o of the MSs.
The si xth shows the p-val ue, whi ch i s deri ved from the cdf of F. As F
i ncreases, the p-val ue decreases.
anova1
2-18
The second fi gure di spl ays box pl ots of each col umn of X. Large di fferences i n
the center l i nes of the box pl ots correspond to l arge val ues of F and
correspondi ngl y smal l p-val ues.
p = anova1(X,group) uses the val ues i n group (a character array or cel l
array) as l abel s for the box pl ot of the sampl es i n X, when X i s a matri x. Each
row of group contai ns the l abel for the data i n the correspondi ng col umn of X,
so group must have l ength equal to the number of col umns i n X.
When X i s a vector, anova1 performs a one-way ANOVA on the sampl es
contai ned i n X, as i ndexed by i nput group (a vector, character array, or cel l
array). Each el ement i n group i denti fi es the group (i .e., sampl e) to whi ch the
correspondi ng el ement i n vector X bel ongs, so group must have the same l ength
as X. The l abel s contai ned i n group are al so used to annotate the box pl ot. The
vector-i nput form of anova1 does not requi re equal numbers of observati ons i n
each sampl e, so i t i s appropri ate for unbal anced data.
I t i s not necessary to l abel sampl es sequenti al l y (1, 2, 3, ...). For exampl e, i f X
contai ns measurements taken at three di fferent temperatures, -27, 65, and
110, you coul d use these numbers as the sampl e l abel s i n group. I f a row of
group contai ns an empty cel l or empty stri ng, that row and the correspondi ng
observati on i n X are di sregarded. NaNs i n ei ther i nput are si mi l arl y i gnored.
p = anova1(X,group,'displayopt') enabl es the ANOVA tabl e and box pl ot
di spl ays when 'displayopt' i s 'on' (defaul t) and suppresses the di spl ays
when 'displayopt' i s 'off'.
[p,table] = anova1(...) returns the ANOVA tabl e (i ncl udi ng col umn and
row l abel s) i n cel l array table. (You can copy a text versi on of the ANOVA tabl e
to the cl i pboard by usi ng the Copy Text i tem on the Edit menu.)
[p,table,stats] = anova1(...) returns a stats structure that you can use
to perform a fol l ow-up mul ti pl e compari son test. The anova1 test eval uates the
hypothesi s that the sampl es al l have the same mean agai nst the al ternati ve
that the means are not al l the same. Someti mes i t i s preferabl e to perform a
test to determi ne which pairs of means are si gni fi cantl y di fferent, and whi ch
are not. You can use the multcompare functi on to perform such tests by
suppl yi ng the stats structure as i nput.
anova1
2-19
Assumptions
The ANOVA test makes the fol l owi ng assumpti ons about the data i n X:
Al l sampl e popul ati ons are normal l y di stri buted.
Al l sampl e popul ati ons have equal vari ance.
Al l observati ons are mutual l y i ndependent.
The ANOVA test i s known to be robust to modest vi ol ati ons of the fi rst two
assumpti ons.
Examples Ex a mple 1
The fi ve col umns of X are the constants one through fi ve pl us a random normal
di sturbance wi th mean zero and standard devi ati on one.
X = meshgrid(1:5)
X =
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
X = X + normrnd(0,1,5,5)
X =
2.1650 3.6961 1.5538 3.6400 4.9551
1.6268 2.0591 2.2988 3.8644 4.2011
1.0751 3.7971 4.2460 2.6507 4.2348
1.3516 2.2641 2.3610 2.7296 5.8617
0.3035 2.8717 3.5774 4.9846 4.9438
p = anova1(X)
p =
5.9952e-005
anova1
2-20
The very smal l p-val ue of 6e-5 i ndi cates that di fferences between the col umn
means are hi ghl y si gni fi cant. The probabi l i ty of thi s outcome under the nul l
hypothesi s (i .e., the probabi l i ty that sampl es actual l y drawn from the same
popul ati on woul d have means di fferi ng by the amounts seen i n X) i s l ess than
6 i n 100,000. The test therefore strongl y supports the al ternate hypothesi s,
that one or more of the sampl es are drawn from popul ati ons wi th di fferent
means.
Ex a mple 2
The fol l owi ng exampl e comes from a study of the materi al strength of
structural beams i n Hogg (1987). The vector strength measures the defl ecti on
of a beam i n thousandths of an i nch under 3,000 pounds of force. Stronger
beams defl ect l ess. The ci vi l engi neer performi ng the study wanted to
determi ne whether the strength of steel beams was equal to the strength of two
more expensi ve al l oys. Steel i s coded 'st' i n the vector alloy. The other
materi al s are coded 'al1' and 'al2'.
1 2 3 4 5
1
2
3
4
5
6
V
a
l
u
e
s
Column Number
anova1
2-21
strength = [82 86 79 83 84 85 86 87 74 82 78 75 76 77 79 ...
79 77 78 82 79];
alloy = {'st','st','st','st','st','st','st','st',...
'al1','al1','al1','al1','al1','al1',...
'al2','al2','al2','al2','al2','al2'};
Though alloy i s sorted i n thi s exampl e, you do not need to sort the groupi ng
vari abl e.
p = anova1(strength,alloy)
p =
1.5264e-004
The p-val ue i ndi cates that the three al l oys are si gni fi cantl y di fferent. The box
pl ot confi rms thi s graphi cal l y and shows that the steel beams defl ect more than
the more expensi ve al l oys.
st al1 al2
74
76
78
80
82
84
86
V
a
l
u
e
s
anova1
2-22
References Hogg, R. V., and J. Ledol ter. Engineering Statistics. MacMi l l an Publ i shi ng
Company, 1987.
See Also anova2, anovan, boxplot, ttest
anova2
2-23
2anova2
Purpose Two-way Anal ysi s of Vari ance (ANOVA).
Syntax p = anova2(X,reps)
p = anova2(X,reps,'displayopt')
[p,table] = anova2(...)
[p,table,stats] = anova2(...)
Description anova2(X,reps) performs a bal anced two-way ANOVA for compari ng the
means of two or more col umns and two or more rows of the observati ons i n X.
The data i n di fferent col umns represent changes i n factor A. The data i n
di fferent rows represent changes i n factor B. I f there i s more than one
observati on for each combi nati on of factors, i nput reps i ndi cates the number of
repl i cates i n each cel l , whi ch much be constant. (For unbal anced desi gns, use
anovan.)
The matri x bel ow shows the format for a set-up where col umn factor A has two
l evel s, row factor B has three l evel s, and there are two repl i cati ons (reps=2).
The subscri pts i ndi cate row, col umn, and repl i cate, respecti vel y.
When reps i s 1 (defaul t), anova2 returns two p-val ues i n vector p:
1 The p-val ue for the nul l hypothesi s, H
0A
, that al l sampl es from factor A
(i .e., al l col umn-sampl es i n X) are drawn from the same popul ati on
0B
, that al l sampl es from factor B
(i .e., al l row-sampl es i n X) are drawn from the same popul ati on
x
111
x
121
x
112
x
122
x
211
x
221
x
212
x
222
x
311
x
321
x
312
x
322
B =3
B =2
B =1
A

=

1
A

=

2
anova2
2-24
When reps i s greater than 1, anova2 returns a thi rd p-val ue i n vector p:
0AB
, that the effects due to factors
A and B are additive (i .e., that there i s no i nteracti on between factors
A and B)
I f any p-val ue i s near zero, thi s casts doubt on the associ ated nul l hypothesi s.
A suffi ci entl y smal l p-val ue for H
0A
suggests that at l east one col umn-sampl e
mean i s si gni fi cantl y di fferent that the other col umn-sampl e means; i .e., there
i s a mai n effect due to factor A. A suffi ci entl y smal l p-val ue for H
0B
suggests
that at l east one row-sampl e mean i s si gni fi cantl y di fferent than the other
row-sampl e means; i .e., there i s a mai n effect due to factor B. A suffi ci entl y
smal l p-val ue for H
0AB
suggests that there i s an i nteracti on between factors A
and B. The choi ce of a l i mi t for the p-val ue to determi ne whether a resul t i s
stati sti cal l y si gni fi cant i s l eft to the researcher. I t i s common to decl are a
resul t si gni fi cant i f the p-val ue i s l ess than 0.05 or 0.01.
anova2 al so di spl ays a fi gure showi ng the standard ANOVA tabl e, whi ch
di vi des the vari abi l i ty of the data i n X i nto three or four parts dependi ng on the
val ue of reps:
The vari abi l i ty due to the di fferences among the col umn means
The vari abi l i ty due to the di fferences among the row means
The vari abi l i ty due to the i nteracti on between rows and col umns (i f reps i s
greater than i ts defaul t val ue of one)
The remai ni ng vari abi l i ty not expl ai ned by any systemati c source
The ANOVA tabl e has fi ve col umns:
The fourth shows the Mean Squares (MS), whi ch i s the rati o SS/df.
The fi fth shows the F stati sti cs, whi ch i s the rati o of the mean squares.
p = anova2(X,reps,'displayopt') enabl es the ANOVA tabl e di spl ay when
'displayopt' i s 'on' (defaul t) and suppresses the di spl ay when 'displayopt'
i s 'off'.
anova2
2-25
[p,table] = anova2(...) returns the ANOVA tabl e (i ncl udi ng col umn and
to the cl i pboard by usi ng the Copy Text i tem on the Edit menu.)
[p,table,stats] = anova2(...) returns a stats structure that you can use
to perform a fol l ow-up mul ti pl e compari son test.
The anova2 test eval uates the hypothesi s that the row, col umn, and i nteracti on
effects are al l the same, agai nst the al ternati ve that they are not al l the same.
Someti mes i t i s preferabl e to perform a test to determi ne which pairs of effects
are si gni fi cantl y di fferent, and whi ch are not. You can use the multcompare
functi on to perform such tests by suppl yi ng the stats structure as i nput.
Examples The data bel ow come from a study of popcorn brands and popper type (Hogg
1987). The col umns of the matri x popcorn are brands (Gourmet, Nati onal , and
Generi c). The rows are popper type (Oi l and Ai r.) The study popped a batch of
each brand three ti mes wi th each popper. The val ues are the yi el d i n cups of
popped popcorn.
load popcorn
popcorn
popcorn =
5.5000 4.5000 3.5000
5.5000 4.5000 4.0000
6.0000 4.0000 3.0000
6.5000 5.0000 4.0000
7.0000 5.5000 5.0000
7.0000 5.0000 4.5000
p = anova2(popcorn,3)
p =
0.0000 0.0001 0.7462
anova2
2-26
The vector p shows the p-val ues for the three brands of popcorn, 0.0000, the
two popper types, 0.0001, and the i nteracti on between brand and popper
type, 0.7462. These val ues i ndi cate that both popcorn brand and popper type
affect the yi el d of popcorn, but there i s no evi dence of a synergi sti c (i nteracti on)
effect of the two.
The concl usi on i s that you can get the greatest yi el d usi ng the Gourmet brand
and an Ai r popper (the three val ues popcorn(4:6,1)).
Reference Hogg, R. V. and J. Ledol ter. Engineering Statistics. MacMi l l an Publ i shi ng
Company, 1987.
See Also anova1, anovan
anovan
2-27
2anovan
Purpose N-way Anal ysi s of Vari ance (ANOVA).
Syntax p = anovan(X,group)
p = anovan(X,group,'model')
p = anovan(X,group,'model',sstype)
p = anovan(X,group,'model',sstype,gnames)
p = anovan(X,group,'model',sstype,gnames,'displayopt')
[p,table] = anovan(...)
[p,table,stats] = anovan(...)
[p,table,stats,terms] = anovan(...)
Description p = anovan(X,group) performs a bal anced or unbal anced mul ti -way ANOVA
for compari ng the means of the observati ons i n vector X wi th respect to N
di fferent factors. The factors and factor l evel s of the observati ons i n X are
assi gned by the cel l array group. Each of the N cel l s i n group contai ns a l i st of
factor l evel s i denti fyi ng the observati ons i n X wi th respect to one of the N
factors. The l i st wi thi n each cel l can be a vector, character array, or cel l array
of stri ngs, and must have the same number of el ements as X.
As an exampl e, consi der the X and group i nputs bel ow.
X = [x1 x2 x3 x4 x5 x6 x7 x8];
group = {[1 2 1 2 1 2 1 2];...
['hi';'hi';'lo';'lo';'hi';'hi';'lo';'lo'];...
{'may' 'may' 'may' 'may' 'june' 'june' 'june' 'june'}};
I n thi s case, anovan(X,group) i s a three-way ANOVA wi th two l evel s of each
factor. Every observati on i n X i s i denti fi ed by a combi nati on of factor l evel s i n
group. I f the factors are A, B, and C, then observati on x1 i s associ ated wi th:
Level 1 of factor A
Level 'hi' of factor B
Level 'may' of factor C
Si mi l arl y, observati on x6 i s associ ated wi th:
Level 2 of factor A
Level 'hi' of factor B
Level 'june' of factor C
anovan
2-28
Output vector p contai ns p-val ues for the nul l hypotheses on the N mai n
effects. El ement p(1) contai ns the p-val ue for the nul l hypotheses, H
0A
, that
sampl es at al l l evel s of factor A are drawn from the same popul ati on,
el ement p(2) contai ns the p-val ue for the nul l hypotheses, H
0B
, that sampl es
at al l l evel s of factor B are drawn from the same popul ati on, and so on.
I f any p-val ue i s near zero, thi s casts doubt on the associ ated nul l hypothesi s.
For exampl e, a suffi ci entl y smal l p-val ue for H
0A
suggests that at l east one
A-sampl e mean i s si gni fi cantl y di fferent that the other A-sampl e means;
i .e., there i s a mai n effect due to factor A. The choi ce of a l i mi t for the p-val ue
to determi ne whether a resul t i s stati sti cal l y si gni fi cant i s l eft to the
researcher. I t i s common to decl are a resul t si gni fi cant i f the p-val ue i s l ess
than 0.05 or 0.01.
anovan al so di spl ays a fi gure showi ng the standard ANOVA tabl e, whi ch by
defaul t di vi des the vari abi l i ty of the data i n X i nto:
The vari abi l i ty due to di fferences between the l evel s of each factor accounted
for i n the model (one row for each factor)
The fi fth shows the F stati sti cs, whi ch i s the rati o of the mean squares.
The si xth shows the p-val ues for the F stati sti cs.
p = anovan(X,group,'model') performs the ANOVA usi ng the model
speci fi ed by 'model', where 'model' can be 'linear', 'interaction', 'full',
or an i nteger or vector. The defaul t 'linear' model computes onl y the p-val ues
for the nul l hypotheses on the N mai n effects. The 'interaction' model
computes the p-val ues for nul l hypotheses on the N mai n effects and the
two-factor i nteracti ons. The 'full' model computes the p-val ues for nul l
hypotheses on the N mai n effects and i nteracti ons at al l l evel s.
N
2 ,
_
anovan
2-29
For an i nteger val ue of 'model', k (k N), anovan computes al l i nteracti on
l evel s through the kth l evel . The val ues k=1 and k=2 are equi val ent to the
'linear' and 'interaction' speci fi cati ons, respecti vel y, whi l e the val ue k=N
i s equi val ent to the 'full' speci fi cati on.
For more preci se control over the mai n and i nteracti on terms that anovan
computes, 'model' can speci fy a vector contai ni ng one el ement for each mai n
or i nteracti on term to i ncl ude i n the ANOVA model . Each vector el ement
encodes the correspondi ng ANOVA term as the deci mal equi val ent of an N-bi t
number, where N i s the number of factors. The tabl e bel ow i l l ustrates the
codi ng for a 3-factor ANOVA.
For exampl e, i f 'model' i s the vector [2 4 6], then output vector p contai ns
the p-val ues for the nul l hypotheses on the mai n effects B and C and the
i nteracti on effect BC, i n that order. A si mpl e way to generate the 'model'
vector i s to modi fy the terms output, whi ch codes the terms i n the current
model usi ng the format descri bed above. I f anovan returned [2 4 6] for terms,
for exampl e, and there was no si gni fi cant resul t for i nteracti on BC, you coul d
recompute the ANOVA on just the mai n effects B and C by speci fyi ng [2 4] for
'model'.
p = anovan(X,group,'model',sstype) computes the ANOVA usi ng the type
of sum-of-squares speci fi ed by sstype, whi ch can be 1, 2, or 3 to desi gnate
Type 1, Type 2, or Type 3 sum-of-squares, respecti vel y. The defaul t i s 3. The
val ue of sstype onl y i nfl uences computati ons on unbal anced data.
3-bit Code Decimal Value Corresponding ANOVA Terms
[0 0 1] 1 Mai n term A
[0 1 0] 2 Mai n term B
[1 0 0] 4 Mai n term C
[0 1 1] 3 I nteracti on term AB
[1 1 0] 6 I nteracti on term BC
[1 0 1] 5 I nteracti on term AC
[1 1 1] 7 I nteracti on term ABC
anovan
2-30
The sum of squares for any term i s determi ned by compari ng two model s. The
Type 1 sum of squares for a term i s the reducti on i n resi dual sum of squares
obtai ned by addi ng that term to a fi t that al ready i ncl udes the terms l i sted
before i t. The Type 2 sum of squares i s the reducti on i n resi dual sum of squares
obtai ned by addi ng that term to a model consi sti ng of al l other terms that do
not contai n the term i n questi on. The Type 3 sum of squares i s the reducti on i n
resi dual sum of squares obtai ned by addi ng that term to a model contai ni ng al l
other terms, but wi th thei r effects constrai ned to obey the usual si gma
restri cti ons that make model s esti mabl e.
Suppose we are fi tti ng a model wi th two factors and thei r i nteracti on, and that
the terms appear i n the order A, B, AB. Let R() represent the resi dual sum of
squares for a model , so for exampl e R(A,B,AB) i s the resi dual sum of squares
fi tti ng the whol e model , R(A) i s the resi dual sum of squares fi tti ng just the
mai n effect of A, and R(1) i s the resi dual sum of squares fi tti ng just the mean.
The three types of sums of squares are as fol l ows:
The model s for Type 3 sum of squares have si gma restri cti ons i mposed. Thi s
means, for exampl e, that i n fi tti ng R(B,AB), the array of AB effects i s
constrai ned to sum to 0 over A for each val ue of B, and over B for each val ue
of A.
p = anovan(X,group,'model',sstype,gnames) uses the stri ng val ues i n
character array gnames to l abel the N experi mental factors i n the ANOVA
tabl e. The array can be a stri ng matri x wi th one row per observati on, or a cel l
array of stri ngs wi th one el ement per observati on. When gnames i s not
speci fi ed, the defaul t l abel s 'X1', 'X2', 'X3', ..., 'XN' are used.
p = anovan(X,group,'model',sstype,gnames,'displayopt') enabl es the
ANOVA tabl e di spl ay when 'displayopt' i s 'on' (defaul t) and suppresses the
di spl ay when 'displayopt' i s 'off'.
Term Type 1 SS Type 2 SS Type 3 SS
A R(1)-R(A) R(B)-R(A,B) R(B,AB)-R(A,B,AB)
B R(A)-R(A,B) R(A)-R(A,B) R(A,AB)-R(A,B,AB)
AB R(A,B)-R(A,B,AB) R(A,B)-R(A,B,AB) R(A,B)-R(A,B,AB)
anovan
2-31
[p,table] = anovan(...) returns the ANOVA tabl e (i ncl udi ng factor l abel s)
i n cel l array table. (You can copy a text versi on of the ANOVA tabl e to the
cl i pboard by usi ng the Copy Text i tem on the Edit menu.)
[p,table,stats] = anovan(...) returns a stats structure that you can use
to perform a fol l ow-up mul ti pl e compari son test.
The anovan test eval uates the hypothesi s that the di fferent l evel s of a factor (or
more general l y, a term) have the same effect, agai nst the al ternati ve that they
do not al l have the same effect. Someti mes i t i s preferabl e to perform a test to
determi ne which pairs of l evel s are si gni fi cantl y di fferent, and whi ch are not.
You can use the multcompare functi on to perform such tests by suppl yi ng the
stats structure as i nput.
[p,table,stats,terms] = anovan(...) returns the mai n and i nteracti on
terms used i n the ANOVA computati ons. The terms are encoded i n output
vector terms usi ng the same format descri bed above for i nput 'model'. When
'model' i tsel f i s speci fi ed i n thi s vector format, the vector returned i n terms i s
i denti cal .
Examples I n the previ ous secti on we used anova2 to anal yze the effects of two factors on
a response i n a bal anced desi gn. For a desi gn that i s not bal anced, we can use
anovan i nstead.
The dataset carbig contai ns a number of measurements on 406 cars. Lets
study how the mi l eage depends on where and when the cars were made.
load carbig
anovan(MPG,{org when},2,3,{'Origin';'Mfg date'})
ans =
0
0
0.30587
The p-val ue for the i nteracti on term i s not smal l , i ndi cati ng l i ttl e evi dence that
the effect of the cars year or manufacture (when) depends on where the car was
made (org). The l i near effects of those two factors, though, are si gni fi cant.
anovan
2-32
Reference Hogg, R. V. and J. Ledol ter. Engineering Statistics. MacMi l l an Publ i shi ng
Company, 1987.
See Also anova1, anova2, multcompare
aoctool
2-33
2aoctool
Purpose I nteracti ve pl ot for fi tti ng and predi cti ng anal ysi s of covari ance model s.
Syntax aoctool(x,y,g)
aoctool(x,y,g,alpha)
aoctool(x,y,g,alpha,xname,yname,gname)
aoctool(x,y,g,alpha,xname,yname,gname,'displayopt')
aoctool(x,y,g,alpha,xname,yname,gname,'displayopt','model')
h = aoctool(...)
[h,atab,ctab] = aoctool(...)
[h,atab,ctab,stats] = aoctool(...)
Description aoctool(x,y,g) fi ts a separate l i ne to the col umn vectors, x and y, for each
group defi ned by the val ues i n the array g. These types of model s are known as
one-way anal ysi s of covari ance (ANOCOVA) model s. The output consi sts of
three fi gures:
An i nteracti ve graph of the data and predi cti on curves
An ANOVA tabl e
A tabl e of parameter esti mates
You can use the fi gures to change model s and to test di fferent parts of the
model . More i nformati on about i nteracti ve use of the aoctool functi on appears
on The aoctool Demo on page 1-161.
aoctool(x,y,g,alpha) determi nes the confi dence l evel s of the predi cti on
i nterval s. The confi dence l evel i s 100*(1-alpha)%. The defaul t val ue of alpha
i s 0.05.
aoctool(x,y,g,alpha,xname,yname,gname) speci fi es the name to use for the
x, y, and g vari abl es i n the graph and tabl es. I f you enter si mpl e vari abl e names
for the x, y, and g arguments, the aoctool functi on uses those names. I f you
enter an expressi on for one of these arguments, you can speci fy a name to use
i n pl ace of that expressi on by suppl yi ng these arguments. For exampl e, i f you
enter m(:,2) as the x argument, you mi ght choose to enter 'Col 2' as the
xname argument.
aoctool(x,y,g,alpha,xname,yname,gname,'displayopt') enabl es the
graph and tabl e di spl ays when 'displayopt' i s 'on' (defaul t) and suppresses
those di spl ays when 'displayopt' i s 'off'.
aoctool
2-34
aoctool(x,y,g,alpha,xname,yname,gname,'displayopt','model')
speci fi es the i ni ti al model to fi t. The val ue of 'model' can be any of the
fol l owi ng:
'same mean' fi t a si ngl e mean, i gnori ng groupi ng
'separate means' fi t a separate mean to each group
'same line' fi t a si ngl e l i ne, i gnori ng groupi ng
'parallel lines' fi t a separate l i ne to each group, but constrai n the l i nes
to be paral l el
'separate lines' fi t a separate l i ne to each group, wi th no constrai nts
h = aoctool(...) returns a vector of handl es to the l i ne objects i n the pl ot.
[h,atab,ctab] = aoctool(...) returns cel l arrays contai ni ng the entri es i n
ANOVA tabl e (atab) and the tabl e of coeffi ci ent esti mates (ctab). (You can copy
a text versi on of ei ther tabl e to the cl i pboard by usi ng the Copy Text i tem on
the Edit menu.)
[h,atab,ctab,stats] = aoctool(...) returns a stats structure that you
can use to perform a fol l ow-up mul ti pl e compari son test. The ANOVA tabl e
output i ncl udes tests of the hypotheses that the sl opes or i ntercepts are al l the
same, agai nst a general al ternati ve that they are not al l the same. Someti mes
i t i s preferabl e to perform a test to determi ne whi ch pai rs of val ues are
si gni fi cantl y di fferent, and whi ch are not. You can use the multcompare
functi on to perform such tests by suppl yi ng the stats structure as i nput. You
can test ei ther the sl opes, the i ntercepts, or popul ati on margi nal means (the
hei ghts of the curves at the mean x val ue).
Example Thi s exampl e i l l ustrates how to fi t di fferent model s non-i nteracti vel y. Fi rst, we
l oad the smal l er car dataset and fi t a separate-sl opes model , then exami ne the
coeffi ci ent esti mates.
[h,a,c,s] = aoctool(Weight,MPG,Model_Year,0.05,...
'','','','off','separate lines');
c(:,1:2)
aoctool
2-35
ans =
'Term' 'Estimate'
'Intercept' [45.97983716833132]
' 70' [-8.58050531454973]
' 76' [-3.89017396094922]
' 82' [12.47067927549897]
'Slope' [-0.00780212907455]
' 70' [ 0.00195840368824]
' 76' [ 0.00113831038418]
' 82' [-0.00309671407243]
Roughl y speaki ng, the l i nes rel ati ng MPG to Weight have an i ntercept cl ose to
45.98 and a sl ope cl ose to -0.0078. Each groups coeffi ci ents are offset from
these val ues somewhat. For i nstance, the i ntercept for the cars made i n 1970
i s 45.98-8.58 = 37.40.
Next, we try a fi t usi ng paral l el l i nes. (I f we had exami ned the ANOVA tabl e,
we woul d have found that the paral l el -l i nes fi t i s si gni fi cantl y worse than the
separate-l i nes fi t.)
[h,a,c,s] = aoctool(Weight,MPG,Model_Year,0.05,...
'','','','off','parallel lines');
c(:,1:2)
ans =
'Term' 'Estimate'
'Intercept' [43.38984085130596]
' 70' [-3.27948192983761]
' 76' [-1.35036234809006]
' 82' [ 4.62984427792768]
'Slope' [-0.00664751826198]
Here we agai n have separate i ntercepts for each group, but thi s ti me the sl opes
are constrai ned to be the same.
See Also anova1, multcompare, polytool
barttest
2-36
2barttest
Purpose Bartl etts test for di mensi onal i ty.
Syntax ndim = barttest(x,alpha)
[ndim,prob,chisquare] = barttest(x,alpha)
Description ndim = barttest(x,alpha) returns the number of di mensi ons necessary to
expl ai n the nonrandom vari ati on i n the data matri x x, usi ng the si gni fi cance
probabi l i ty alpha. The di mensi on i s determi ned by a seri es of hypothesi s tests.
The test for ndim=1 tests the hypothesi s that the vari ances of the data val ues
al ong each pri nci pal component are equal , the test for ndim=2 tests the
hypothesi s that the vari ances al ong the second through l ast components are
equal , and so on.
[ndim,prob,chisquare] = barttest(x,alpha) returns the number of
di mensi ons, the si gni fi cance val ues for the hypothesi s tests, and the
2
val ues
associ ated wi th the tests.
Example x = mvnrnd([0 0],[1 0.99; 0.99 1],20);
x(:,3:4) = mvnrnd([0 0],[1 0.99; 0.99 1],20);
x(:,5:6) = mvnrnd([0 0],[1 0.99; 0.99 1],20);
[ndim, prob] = barttest(x,0.05)
ndim =
3
prob =
0
0
0
0.5081
0.6618
See Also princomp, pcacov, pcares
betacdf
2-37
2betacdf
Purpose Beta cumul ati ve di stri buti on functi on (cdf).
Syntax p = betacdf(X,A,B)
Description p = betacdf(X,A,B) computes the beta cdf at each of the val ues i n X usi ng the
correspondi ng parameters i n A and B. Vector or matri x i nputs for X, A, and B
must al l have the same si ze. A scal ar i nput i s expanded to a constant matri x
wi th the same di mensi ons as the other i nputs. The parameters i n A and B must
al l be posi ti ve, and the val ues i n X must l i e on the i nterval [0 1].
The beta cdf for a gi ven val ue x and gi ven pai r of parameters a and b i s
where B( ) i s the Beta functi on. The resul t, p, i s the probabi l i ty that a si ngl e
observati on from a beta di stri buti on wi th parameters a and b wi l l fal l i n the
i nterval [0 x].
Examples x = 0.1:0.2:0.9;
a = 2;
b = 2;
p = betacdf(x,a,b)
p =
0.0280 0.2160 0.5000 0.7840 0.9720
a = [1 2 3];
p = betacdf(0.5,a,a)
p =
0.5000 0.5000 0.5000
See Also betafit, betainv, betalike, betapdf, betarnd, betastat, cdf
p F x a b , ( )
1
B a b , ( )
------------------- t
a 1
0
x
1 t ( )
b 1
dt = =
betafit
2-38
2betafi t
Purpose Parameter esti mates and confi dence i nterval s for beta di stri buted data.
Syntax phat = betafit(x)
[phat,pci] = betafit(x,alpha)
Description phat = betafit(x) computes the maxi mum l i kel i hood esti mates of the beta
di stri buti on parameters a and b from the data i n vector x, where the beta cdf
i s gi ven by
and B( ) i s the Beta functi on. The el ements of x must l i e i n the i nterval (0 1).
[phat,pci] = betafit(x,alpha) returns confi dence i nterval s on the a and b
parameters i n the 2-by-2 matri x pci. The fi rst col umn of the matri x contai ns
the l ower and upper confi dence bounds for parameter a, and the second col umn
contai ns the confi dence bounds for parameter b. The opti onal i nput argument
alpha i s a val ue i n the range [0 1] speci fyi ng the wi dth of the confi dence
i nterval s. By defaul t, alpha i s 0.05, whi ch corresponds to 95% confi dence
i nterval s.
Example Thi s exampl e generates 100 beta di stri buted observati ons. The true a and b
parameters are 4 and 3, respecti vel y. Compare these to the val ues returned
i n p. Note that the col umns of ci both bracket the true parameters.
r = betarnd(4,3,100,1);
[p,ci] = betafit(r,0.01)
p =
3.9010 2.6193
ci =
2.5244 1.7488
5.2776 3.4898
F x a b , ( )
1
B a b , ( )
------------------- t
a 1
0
x
1 t ( )
b 1
dt =
betafit
2-39
Reference Hahn, Geral d J., & Shapi ro, Samuel , S. Statistical Models in Engineering.
John Wi l ey & Sons, New York. 1994. p. 95.
See Also betalike, mle
betainv
2-40
2betai nv
Purpose I nverse of the beta cumul ati ve di stri buti on functi on.
Syntax X = betainv(P,A,B)
Description X = betainv(P,A,B) computes the i nverse of the beta cdf wi th parameters
speci fi ed by A and B for the correspondi ng probabi l i ti es i n P. Vector or matri x
i nputs for P, A, and B must al l have the same si ze. A scal ar i nput i s expanded
to a constant matri x wi th the same di mensi ons as the other i nputs. The
parameters i n A and B must al l be posi ti ve, and the val ues i n P must l i e on the
i nterval [0 1].
The i nverse beta cdf for a gi ven probabi l i ty p and a gi ven pai r of parameters
a and b i s
where
and B( ) i s the Beta functi on. Each el ement of output X i s the val ue whose
cumul ati ve probabi l i ty under the beta cdf defi ned by the correspondi ng
parameters i n A and B i s speci fi ed by the correspondi ng val ue i n P.
Algorithm The betainv functi on uses Newtons method wi th modi fi cati ons to constrai n
steps to the al l owabl e range for x, i .e., [0 1].
Examples p = [0.01 0.5 0.99];
x = betainv(p,10,5)
x =
0.3726 0.6742 0.8981
Accordi ng to thi s resul t, for a beta cdf wi th a=10 and b=5, a val ue l ess than or
equal to 0.3726 occurs wi th probabi l i ty 0.01. Si mi l arl y, val ues l ess than or
equal to 0.6742 and 0.8981 occur wi th respecti ve probabi l i ti es 0.5 and 0.99.
See Also betafit, icdf
x F
1
= p a b , ( ) x:F x a b , ( ) p = { } =
p F x a b , ( )
1
B a b , ( )
------------------- t
a 1
0
x
1 t ( )
b 1
dt = =
betalike
2-41
2betal i ke
Purpose Negati ve beta l og-l i kel i hood functi on.
Syntax logL = betalike(params,data)
[logL,avar] = betalike(params,data)
Description logL = betalike(params,data) returns the negati ve of the beta
l og-l i kel i hood functi on for the beta parameters a and b speci fi ed i n vector
params and the observati ons speci fi ed i n col umn vector data. The l ength of
logL i s the l ength of data.
[logL,avar] = betalike(params,data) al so returns avar, whi ch i s the
asymptoti c vari ance-covari ance matri x of the parameter esti mates i f the
val ues i n params are the maxi mum l i kel i hood esti mates. avar i s the i nverse of
Fi shers i nformati on matri x. The di agonal el ements of avar are the asymptoti c
vari ances of thei r respecti ve parameters.
betalike i s a uti l i ty functi on for maxi mum l i kel i hood esti mati on of the beta
di stri buti on. The l i kel i hood assumes that al l the el ements i n the data sampl e
are mutual l y i ndependent. Si nce betalike returns the negati ve beta
l og-l i kel i hood functi on, mi ni mi zi ng betalike usi ng fminsearch i s the same as
maxi mi zi ng the l i kel i hood.
Example Thi s exampl e conti nues the betafit exampl e where we cal cul ated esti mates of
the beta parameters for some randoml y generated beta di stri buted data.
r = betarnd(4,3,100,1);
[logl,avar] = betalike([3.9010 2.6193],r)
logl =
-33.0514
avar =
0.2856 0.1528
0.1528 0.1142
See Also betafit, fminsearch, gamlike, mle, weiblike
betapdf
2-42
2betapdf
Purpose Beta probabi l i ty densi ty functi on (pdf).
Syntax Y = betapdf(X,A,B)
Description Y = betapdf(X,A,B) computes the beta pdf at each of the val ues i n X usi ng the
wi th the same di mensi ons of the other i nputs. The parameters i n A and B must
al l be posi ti ve, and the val ues i n X must l i e on the i nterval [0 1].
The beta probabi l i ty densi ty functi on for a gi ven val ue x and gi ven pai r of
parameters a and b i s
where B( ) i s the Beta functi on. The resul t, y, i s the probabi l i ty that a si ngl e
observati on from a beta di stri buti on wi th parameters a and b wi l l have val ue x.
The i ndi cator functi on I
(0,1)
(x) ensures that onl y val ues of x i n the range (0 1)
have nonzero probabi l i ty. The uni form di stri buti on on (0 1) i s a degenerate case
of the beta pdf where a = 1 and b = 1.
A likelihood function i s the pdf vi ewed as a functi on of the parameters.
Maxi mum l i kel i hood esti mators (MLEs) are the val ues of the parameters that
maxi mi ze the l i kel i hood functi on for a fi xed val ue of x.
Examples a = [0.5 1; 2 4]
a =
0.5000 1.0000
2.0000 4.0000
y = betapdf(0.5,a,a)
y =
0.6366 1.0000
1.5000 2.1875
See Also betacdf, betafit, betainv, betalike, betarnd, betastat, pdf
y f x a b , ( )
1
B a b , ( )
-------------------x
a 1
1 x ( )
b 1
I
0 1 , ( )
x ( ) = =
betarnd
2-43
2betarnd
Purpose Random numbers from the beta di stri buti on.
Syntax R = betarnd(A,B)
R = betarnd(A,B,m)
R = betarnd(A,B,m,n)
Description R = betarnd(A,B) generates random numbers from the beta di stri buti on wi th
parameters speci fi ed by A and B. Vector or matri x i nputs for A and B must have
the same si ze, whi ch i s al so the si ze of R. A scal ar i nput for A or B i s expanded
to a constant matri x wi th the same di mensi ons as the other i nput.
R = betarnd(A,B,m) generates a matri x of si ze m contai ni ng random numbers
from the beta di stri buti on wi th parameters A and B, where m i s a 1-by-2 vector
contai ni ng the row and col umn di mensi ons of R.
R = betarnd(A,B,m,n) generates an m-by-n matri x contai ni ng random
numbers from the beta di stri buti on wi th parameters A and B.
Examples a = [1 1;2 2];
b = [1 2;1 2];
r = betarnd(a,b)
r =
0.6987 0.6139
0.9102 0.8067
r = betarnd(10,10,[1 5])
r =
0.5974 0.4777 0.5538 0.5465 0.6327
r = betarnd(4,2,2,3)
r =
0.3943 0.6101 0.5768
0.5990 0.2760 0.5474
See Also betacdf, betafit, betainv, betalike, betapdf, betastat, rand, randtool
betastat
2-44
2betastat
Purpose Mean and vari ance for the beta di stri buti on.
Syntax [M,V] = betastat(A,B)
Description [M,V] = betastat(A,B) returns the mean and vari ance for the beta
di stri buti on wi th parameters speci fi ed by A and B. Vector or matri x i nputs for
A and B must have the same si ze, whi ch i s al so the si ze of M and V. A scal ar
i nput for A or B i s expanded to a constant matri x wi th the same di mensi ons as
the other i nput.
The mean of the beta di stri buti on wi th parameters a and b i s and
the vari ance i s
Examples I f parameters a and b are equal , the mean i s 1/2.
a = 1:6;
[m,v] = betastat(a,a)
m =
0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
v =
0.0833 0.0500 0.0357 0.0278 0.0227 0.0192
See Also betacdf, betafit, betainv, betalike, betapdf, betarnd
a a b + ( )
ab
a b 1 + + ( ) a b + ( )
2
-------------------------------------------------
binocdf
2-45
2bi nocdf
Purpose Bi nomi al cumul ati ve di stri buti on functi on (cdf).
Syntax Y = binocdf(X,N,P)
Description binocdf(X,N,P) computes a bi nomi al cdf at each of the val ues i n X usi ng the
correspondi ng parameters i n N and P. Vector or matri x i nputs for X, N, and P
wi th the same di mensi ons of the other i nputs. The val ues i n N must al l be
posi ti ve i ntegers, and the val ues i n X and P must l i e on the i nterval [0 1].
The bi nomi al cdf for a gi ven val ue x and gi ven pai r of parameters n and p i s
The resul t, y, i s the probabi l i ty of observi ng up to x successes i n n i ndependent
tri al s, where the probabi l i ty of success i n any gi ven tri al i s p. The i ndi cator
functi on I
(0,1, ... ,n)
(i) ensures that x onl y adopts val ues of 0, 1, ..., n.
Examples I f a basebal l team pl ays 162 games i n a season and has a 50-50 chance of
wi nni ng any game, then the probabi l i ty of that team wi nni ng more than 100
games i n a season i s:
1 binocdf(100,162,0.5)
The resul t i s 0.001 (i .e., 1-0.999). I f a team wi ns 100 or more games i n a season,
thi s resul t suggests that i t i s l i kel y that the teams true probabi l i ty of wi nni ng
any game i s greater than 0.5.
See Also binofit, binoinv, binopdf, binornd, binostat, cdf
y F x n p , ( )
n
i
,
_
i 0 =
x
p
i
q
1 i ( )
I
0 1 n , , , ( )
i ( ) = =
binofit
2-46
2bi nofi t
Purpose Parameter esti mates and confi dence i nterval s for bi nomi al data.
Syntax phat = binofit(x,n)
[phat,pci] = binofit(x,n)
[phat,pci] = binofit(x,n,alpha)
Description phat = binofit(x,n) returns a maxi mum l i kel i hood esti mate of the
probabi l i ty of success i n a given bi nomi al tri al based on the number of
successes, x, observed i n n i ndependent tri al s. A scal ar val ue for x or n i s
expanded to the same si ze as the other i nput.
[phat,pci] = binofit(x,n) returns the probabi l i ty esti mate, phat, and the
95% confi dence i nterval s, pci.
[phat,pci] = binofit(x,n,alpha) returns the 100(1-alpha)% confi dence
i nterval s. For exampl e, alpha = 0.01 yi el ds 99% confi dence i nterval s.
Example Fi rst we generate a bi nomi al sampl e of 100 el ements, where the probabi l i ty of
success i n a gi ven tri al i s 0.6. Then, we esti mate thi s probabi l i ty from the
outcomes i n the sampl e.
r = binornd(100,0.6);
[phat,pci] = binofit(r,100)
phat =
0.5800
pci =
0.4771 0.6780
The 95% confi dence i nterval , pci, contai ns the true val ue, 0.6.
Reference Johnson, N. L., S. Kotz, and A.W. Kemp, Univariate Discrete Distributions,
Second Edition, Wi l ey 1992. pp. 124130.
See Also binocdf, binoinv, binopdf, binornd, binostat, mle
binoinv
2-47
2bi noi nv
Purpose I nverse of the bi nomi al cumul ati ve di stri buti on functi on (cdf).
Syntax X = binoinv(Y,N,P)
Description X = binoinv(Y,N,P) returns the smal l est i nteger X such that the bi nomi al cdf
eval uated at X i s equal to or exceeds Y. You can thi nk of Y as the probabi l i ty of
observi ng X successes i n N i ndependent tri al s where P i s the probabi l i ty of
success i n each tri al . Each X i s a posi ti ve i nteger l ess than or equal to N.
Vector or matri x i nputs for Y, N, and P must al l have the same si ze. A scal ar
i nput i s expanded to a constant matri x wi th the same di mensi ons as the other
i nputs. The parameters i n N must be posi ti ve i ntegers, and the val ues i n both
P and Y must l i e on the i nterval [0 1].
Examples I f a basebal l team has a 50-50 chance of wi nni ng any game, what i s a
reasonabl e range of games thi s team mi ght wi n over a season of 162 games? We
assume that a surpri si ng resul t i s one that occurs by chance once i n a decade.
binoinv([0.05 0.95],162,0.5)
ans =
71 91
Thi s resul t means that i n 90% of basebal l seasons, a .500 team shoul d wi n
between 71 and 91 games.
See Also binocdf, binofit, binopdf, binornd, binostat, icdf
binopdf
2-48
2bi nopdf
Purpose Bi nomi al probabi l i ty densi ty functi on (pdf).
Syntax Y = binopdf(X,N,P)
Description Y = binopdf(X,N,P) computes the bi nomi al pdf at each of the val ues i n X
usi ng the correspondi ng parameters i n N and P. Vector or matri x i nputs for X,
N, and P must al l have the same si ze. A scal ar i nput i s expanded to a constant
matri x wi th the same di mensi ons of the other i nputs.
The parameters i n N must be posi ti ve i ntegers, and the val ues i n P must l i e on
the i nterval [0 1].
The bi nomi al probabi l i ty densi ty functi on for a gi ven val ue x and gi ven pai r of
parameters n and p i s
where q = 1-p. The resul t, y, i s the probabi l i ty of observi ng x successes i n n
i ndependent tri al s, where the probabi l i ty of success i n any given tri al i s p. The
i ndi cator functi on I
(0,1,...,n)
(x) ensures that x onl y adopts val ues of 0, 1, ..., n.
Examples A Qual i ty Assurance i nspector tests 200 ci rcui t boards a day. I f 2% of the
boards have defects, what i s the probabi l i ty that the i nspector wi l l fi nd no
defecti ve boards on any gi ven day?
binopdf(0,200,0.02)
ans =
0.0176
What i s the most l i kel y number of defecti ve boards the i nspector wi l l fi nd?
y = binopdf([0:200],200,0.02);
[x,i] = max(y);
i
i =
5
See Also binocdf, binofit, binoinv, binornd, binostat, pdf
y f x n p , ( )
n
x ,
_
p
x
q
1 x ( )
I
0 1 n , , , ( )
x ( ) = =
binornd
2-49
2bi nornd
Purpose Random numbers from the bi nomi al di stri buti on.
Syntax R = binornd(N,P)
R = binornd(N,P,mm)
R = binornd(N,P,mm,nn)
Description R = binornd(N,P) generates random numbers from the bi nomi al di stri buti on
wi th parameters speci fi ed by N and P. Vector or matri x i nputs for N and P must
have the same si ze, whi ch i s al so the si ze of R. A scal ar i nput for N or P i s
expanded to a constant matri x wi th the same di mensi ons as the other i nput.
R = binornd(N,P,mm) generates a matri x of si ze mm contai ni ng random
numbers from the bi nomi al di stri buti on wi th parameters N and P, where mm i s
a 1-by-2 vector contai ni ng the row and col umn di mensi ons of R.
R = binornd(N,p,mm,nn) generates an mm-by-nn matri x contai ni ng random
numbers from the bi nomi al di stri buti on wi th parameters N and P.
Algorithm The binornd functi on uses the di rect method usi ng the defi ni ti on of the
bi nomi al di stri buti on as a sum of Bernoul l i random vari abl es.
Examples n = 10:10:60;
r1 = binornd(n,1./n)
r1 =
2 1 0 1 1 2
r2 = binornd(n,1./n,[1 6])
r2 =
0 1 2 1 3 1
r3 = binornd(n,1./n,1,6)
r3 =
0 1 1 1 0 3
See Also binocdf, binofit, binoinv, binopdf, binostat, rand, randtool
binostat
2-50
2bi nostat
Purpose Mean and vari ance for the bi nomi al di stri buti on.
Syntax [M,V] = binostat(N,P)
Description [M,V] = binostat(N,P) returns the mean and vari ance for the bi nomi al
di stri buti on wi th parameters speci fi ed by N and P. Vector or matri x i nputs for
N and P must have the same si ze, whi ch i s al so the si ze of M and V. A scal ar
i nput for N or P i s expanded to a constant matri x wi th the same di mensi ons as
the other i nput.
The mean of the bi nomi al di stri buti on wi th parameters n and p i s np. The
vari ance i s npq, where q = 1-p.
Examples n = logspace(1,5,5)
n =
10 100 1000 10000 100000
[m,v] = binostat(n,1./n)
m =
1 1 1 1 1
v =
0.9000 0.9900 0.9990 0.9999 1.0000
[m,v] = binostat(n,1/2)
m =
5 50 500 5000 50000
v =
1.0e+04 *
0.0003 0.0025 0.0250 0.2500 2.5000
See Also binocdf, binofit, binoinv, binopdf, binornd
bootstrp
2-51
2bootstrp
Purpose Bootstrap stati sti cs through resampl i ng of data.
Syntax bootstat = bootstrp(nboot,'bootfun',d1,d2,...)
[bootstat,bootsam] = bootstrp(...)
Description bootstat = bootstrp(nboot,'bootfun',d1,d2,...) draws nboot bootstrap
sampl es from each of the i nput data sets, d1, d2, etc., and passes the bootstrap
sampl es to functi on bootfun for anal ysi s. nboot must be a posi ti ve i nteger, and
each i nput data set must contai n the same number of rows, n. Each bootstrap
sampl e contai ns n rows chosen randoml y (wi th repl acement) from the
correspondi ng i nput data set (d1, d2, etc.).
Each row of the output, bootstat, contai ns the resul ts of appl yi ng bootfun to
one set of bootstrap sampl es. I f bootfun returns mul ti pl e outputs, onl y the fi rst
i s stored i n bootstat. I f the fi rst output from bootfun i s a matri x, the matri x
i s reshaped to a row vector for storage i n bootstat.
[bootstat,bootsam] = bootstrap(...) returns a matri x of bootstrap
i ndi ces, bootsam. Each of the nboot col umns i n bootsam contai ns i ndi ces of the
val ues that were drawn from the ori gi nal data sets to consti tute the
correspondi ng bootstrap sampl e. For exampl e, i f d1, d2, etc., each contai n 16
val ues, and nboot = 4, then bootsam i s a 16-by-4 matri x. The fi rst col umn
contai ns the i ndi ces of the 16 val ues drawn from d1, d2, etc., for the fi rst of the
four bootstrap sampl es, the second col umn contai ns the i ndi ces for the second
of the four bootstrap sampl es, and so on. (The bootstrap i ndi ces are the same
for al l i nput data sets.)
Example Correl ate the LSAT scores and l aw-school GPA for 15 students. These 15 data
poi nts are resampl ed to create 1000 di fferent data sets, and the correl ati on
between the two vari abl es i s computed for each dataset.
load lawdata
[bootstat,bootsam] = bootstrp(1000,'corrcoef',lsat,gpa);
bootstrp
2-52
bootstat(1:5,:)
ans =
1.0000 0.3021 0.3021 1.0000
1.0000 0.6869 0.6869 1.0000
1.0000 0.8346 0.8346 1.0000
1.0000 0.8711 0.8711 1.0000
1.0000 0.8043 0.8043 1.0000
bootsam(:,1:5)
ans =
4 7 5 12 8
1 11 10 8 4
11 9 12 4 2
11 14 15 5 15
15 13 6 6 2
6 8 4 3 8
8 2 15 8 6
13 10 11 14 5
1 7 12 14 14
1 11 10 1 8
8 14 2 14 7
11 12 10 8 15
1 4 14 8 1
6 1 5 5 12
2 12 7 15 12
hist(bootstat(:,2))
0.2 0.4 0.6 0.8 1
0
50
100
150
200
250
bootstrp
2-53
The hi stogram shows the vari ati on of the correl ati on coeffi ci ent across al l the
bootstrap sampl es. The sampl e mi ni mum i s posi ti ve, i ndi cati ng that the
rel ati onshi p between LSAT score and GPA i s not acci dental .
boxplot
2-54
2boxpl ot
Purpose Box pl ots of a data sampl e.
Syntax boxplot(X)
boxplot(X,notch)
boxplot(X,notch,'sym')
boxplot(X,notch,'sym',vert)
boxplot(X,notch,'sym',vert,whis)
Description boxplot(X) produces a box and whi sker pl ot for each col umn of X. The box has
l i nes at the l ower quarti l e, medi an, and upper quarti l e val ues. The whi skers
are l i nes extendi ng from each end of the box to show the extent of the rest of
the data. Outl i ers are data wi th val ues beyond the ends of the whi skers. I f
there i s no data outsi de the whi sker, a dot i s pl aced at the bottom whi sker.
boxplot(X,notch) wi th notch = 1 produces a notched-box pl ot. Notches graph
a robust esti mate of the uncertai nty about the means for box-to-box
compari son. The defaul t, notch = 0, produces a rectangul ar box pl ot.
boxplot(X,notch,'sym') where sym i s a pl otti ng symbol , affords control of the
symbol for outl i ers. The defaul t i s '+'. See MATLABs LineSpec property for
i nformati on about the avai l abl e marker symbol s.
boxplot(X,notch,'sym',vert) wi th vert = 0 creates hori zontal boxes rather
than the defaul t verti cal boxes (vert = 1).
boxplot(X,notch,'sym',vert,whis) enabl es you to speci fy the l ength of the
whi skers. whis defi nes the l ength of the whi skers as a functi on of the
i nter-quarti l e range (defaul t = 1.5
*
I QR). I f whis = 0, then boxplot di spl ays al l
data val ues outsi de the box usi ng the pl otti ng symbol , 'sym'.
Examples x1 = normrnd(5,1,100,1);
x2 = normrnd(6,1,100,1);
x = [x1 x2];
boxplot(x,1)
boxplot
2-55
The di fference between the means of the two col umns of x i s 1. We can detect
thi s di fference graphi cal l y by observi ng that the notches i n the boxpl ot do not
overl ap.
See Also anova1, kruskalwallis
1 2
3
4
5
6
7
8
V
a
l
u
e
s
Column Number
capable
2-56
2capabl e
Purpose Process capabi l i ty i ndi ces.
Syntax p = capable(data,specs)
[p,Cp,Cpk] = capable(data,specs)
Description p = capable(data,specs) computes the probabi l i ty that a sampl e, data, from
some process fal l s outsi de the bounds speci fi ed i n specs, a 2-el ement vector of
the form [lower upper].
The assumpti ons are that the measured val ues i n the vector data are normal l y
di stri buted wi th constant mean and vari ance and that the measurements are
stati sti cal l y i ndependent.
[p,Cp,Cpk] = capable(data,specs) addi ti onal l y returns the capabi l i ty
i ndi ces Cp and Cpk.
C
p
i s the rati o of the range of the speci fi cati ons to si x ti mes the esti mate of the
process standard devi ati on:
For a process that has i ts average val ue on target, a C
p
of 1 transl ates to a l i ttl e
more than one defect per thousand. Recentl y, many i ndustri es have set a
qual i ty goal of one part per mi l l i on. Thi s woul d correspond to C
p
=1.6. The
hi gher the val ue of C
p
, the more capabl e the process.
C
pk
i s the rati o of di fference between the process mean and the cl oser
speci fi cati on l i mi t to three ti mes the esti mate of the process standard
devi ati on:
where the process mean i s . For processes that do not mai ntai n thei r average
on target, C
pk
i s a more descri pti ve i ndex of process capabi l i ty.
Example I magi ne a machi ned part wi th speci fi cati ons requi ri ng a di mensi on to be
wi thi n three thousandths of an i nch of nomi nal . Suppose that the machi ni ng
process cuts too thi ck by one thousandth of an i nch on average and al so has a
C
p
USL L SL
6
-------------------------------- =
C
pk
mi n
USL
3
-----------------------
L SL
3
---------------------- ,
,
_
=
capable
2-57
standard devi ati on of one thousandth of an i nch. What are the capabi l i ty
i ndi ces of thi s process?
data = normrnd(1,1,30,1);
[p,Cp,Cpk] = capable(data,[-3 3]);
indices = [p Cp Cpk]
indices =
0.0172 1.1144 0.7053
We expect 17 parts out of a thousand to be out-of-speci fi cati on. Cpk i s l ess than
Cp because the process i s not centered.
Reference Montgomery, D., I ntroduction to Statistical Quality Control, John Wi l ey &
Sons 1991. pp. 369374.
See Also capaplot, histfit
capaplot
2-58
2capapl ot
Purpose Process capabi l i ty pl ot.
Syntax p = capaplot(data,specs)
[p,h] = capaplot(data,specs)
Description p = capaplot(data,specs) esti mates the mean and vari ance of the
observati ons i n i nput vector data, and pl ots the pdf of the resul ti ng
T di stri buti on. The observati ons i n data are assumed to be normal l y
di stri buted. The output, p, i s the probabi l i ty that a new observati on from the
esti mated di stri buti on wi l l fal l wi thi n the range speci fi ed by the two-el ement
vector specs. The porti on of the di stri buti on between the l ower and upper
bounds speci fi ed i n specs i s shaded i n the pl ot.
[p,h] = capaplot(data,specs) addi ti onal l y returns handl es to the pl ot
el ements i n h.
Example I magi ne a machi ned part wi th speci fi cati ons requi ri ng a di mensi on to be
wi thi n 3 thousandths of an i nch of nomi nal . Suppose that the machi ni ng
process cuts too thi ck by one thousandth of an i nch on average and al so has a
standard devi ati on of one thousandth of an i nch.
data = normrnd(1,1,30,1);
p = capaplot(data,[-3 3])
p =
0.9784
The probabi l i ty of a new observati on bei ng wi thi n specs i s 97.84%.
-3 -2 -1 0 1 2 3 4
0
0.1
0.2
0.3
0.4
Probability Between Limits is 0.9784
capaplot
2-59
See Also capable, histfit
caseread
2-60
2caseread
Purpose Read casenames from a fi l e.
Syntax names = caseread('filename')
names = caseread
Description names = caseread('filename') reads the contents of filename and returns a
stri ng matri x of names. filename i s the name of a fi l e i n the current di rectory,
or the compl ete pathname of any fi l e el sewhere. caseread treats each l i ne as a
separate case.
names = caseread di spl ays the Select File to Open di al og box for i nteracti ve
sel ecti on of the i nput fi l e.
Example Read the fi l e months.dat created usi ng the functi on casewrite on the next
page.
type months.dat
January
February
March
April
May
names = caseread('months.dat')
names =
January
February
March
April
May
See Also tblread, gname, casewrite, tdfread
casewrite
2-61
2casewri te
Purpose Wri te casenames from a stri ng matri x to a fi l e.
Syntax casewrite(strmat,'filename')
casewrite(strmat)
Description casewrite(strmat,'filename') wri tes the contents of stri ng matri x strmat
to filename. Each row of strmat represents one casename. filename i s the
name of a fi l e i n the current di rectory, or the compl ete pathname of any fi l e
el sewhere. casewrite wri tes each name to a separate l i ne i n filename.
casewrite(strmat) di spl ays the Select File to Write di al og box for i nteracti ve
speci fi cati on of the output fi l e.
Example strmat = str2mat('January','February','March','April','May')
strmat =
January
February
March
April
May
casewrite(strmat,'months.dat')
type months.dat
January
February
March
April
May
See Also gname, caseread, tblwrite, tdfread
cdf
2-62
2cdf
Purpose Computes a chosen cumul ati ve di stri buti on functi on (cdf).
Syntax P = cdf('name',X,A1,A2,A3)
Description P = cdf('name',X,A1,A2,A3) returns a matri x of probabi l i ti es, where name i s
a stri ng contai ni ng the name of the di stri buti on, X i s a matri x of val ues, and A,
A2, and A3 are matri ces of di stri buti on parameters. Dependi ng on the
di stri buti on, some of these parameters may not be necessary.
Vector or matri x i nputs for X, A1, A2, and A3 must have the same si ze, whi ch i s
al so the si ze of P. A scal ar i nput for X, A1, A2, or A3 i s expanded to a constant
matri x wi th the same di mensi ons as the other i nputs.
cdf i s a uti l i ty routi ne al l owi ng you to access al l the cdfs i n the Stati sti cs
Tool box by usi ng the name of the di stri buti on as a parameter. See Overvi ew
of the Di stri buti ons on page 1-12 for the l i st of avai l abl e di stri buti ons.
Examples p = cdf('Normal',-2:2,0,1)
p =
0.0228 0.1587 0.5000 0.8413 0.9772
p = cdf('Poisson',0:5,1:6)
p =
0.3679 0.4060 0.4232 0.4335 0.4405 0.4457
See Also betacdf, binocdf, chi2cdf, expcdf, fcdf, gamcdf, geocdf, hygecdf, icdf,
logncdf, mle, nbincdf, ncfcdf, nctcdf, ncx2cdf, normcdf, pdf, poisscdf,
random, raylcdf, tcdf, unidcdf, unifcdf, weibcdf
cdfplot
2-63
2cdfpl ot
Purpose Pl ot of empi ri cal cumul ati ve di stri buti on functi on.
Syntax cdfplot(X)
h = cdfplot(X)
[h,stats] = cdfplot(X)
Description cdfplot(X) di spl ays a pl ot of the empi ri cal cumul ati ve di stri buti on functi on
(cdf) for the data i n the vector X. The empi ri cal cdf i s defi ned as the
proporti on of X val ues l ess than or equal to x.
Thi s pl ot, l i ke those produced by hist and normplot, i s useful for exami ni ng
the di stri buti on of a sampl e of data. You can overl ay a theoreti cal cdf on the
same pl ot to compare the empi ri cal di stri buti on of the sampl e to the theoreti cal
di stri buti on.
The kstest, kstest2, and lillietest functi ons compute test stati sti cs that
are deri ved from the empi ri cal cdf. You may fi nd the empi ri cal cdf pl ot
produced by cdfplot useful i n hel pi ng you to understand the output from those
functi ons.
H = cdfplot(X) returns a handl e to the cdf curve.
[h,stats] = cdfplot(X) al so returns a stats structure wi th the fol l owi ng
fi el ds.
Examples Generate a normal sampl e and an empi ri cal cdf pl ot of the data.
x = normrnd(0,1,50,1);
cdfplot(x)
Field Contents
stats.min Mi ni mum val ue
stats.max Maxi mum val ue
stats.mean Sampl e mean
stats.median Sampl e medi an (50th percenti l e)
stats.std Sampl e standard devi ati on
F x ( )
cdfplot
2-64
See Also hist, kstest, kstest2, lillietest, normplot
3 2 1 0 1 2 3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F
(
x
)
Empirical CDF
chi2cdf
2-65
2chi 2cdf
Purpose Chi -square (
2
) cumul ati ve di stri buti on functi on (cdf).
Syntax P = chi2cdf(X,V)
Description P = chi2cdf(X,V) computes the
2
cdf at each of the val ues i n X usi ng the
correspondi ng parameters i n V. Vector or matri x i nputs for X and V must have
the same si ze. A scal ar i nput i s expanded to a constant matri x wi th the same
di mensi ons as the other i nput. The degrees of freedom parameters i n V must be
posi ti ve i ntegers, and the val ues i n X must l i e on the i nterval [0 1].
The
2
cdf for a gi ven val ue x and degrees-of-freedom i s

where ( ) i s the Gamma functi on. The resul t, p, i s the probabi l i ty that a
si ngl e observati on from a
2
di stri buti on wi th degrees of freedom wi l l fal l i n
the i nterval [0 x].
The
2
densi ty functi on wi th degrees-of-freedom i s the same as the gamma
densi ty functi on wi th parameters /2 and 2.
Examples probability = chi2cdf(5,1:5)
probability =
0.9747 0.9179 0.8282 0.7127 0.5841
probability = chi2cdf(1:5,1:5)
probability =
0.6827 0.6321 0.6084 0.5940 0.5841
See Also cdf, chi2inv, chi2pdf, chi2rnd, chi2stat
p F x ( )
t
2 ( ) 2
e
t 2
2
2
2 ( )
-----------------------------------
0
x
dt = =
chi2inv
2-66
2chi 2i nv
Purpose I nverse of the chi -square (
2
) cumul ati ve di stri buti on functi on (cdf).
Syntax X = chi2inv(P,V)
Description X = chi2inv(P,V) computes the i nverse of the
2
cdf wi th parameters
speci fi ed by V for the correspondi ng probabi l i ti es i n P. Vector or matri x i nputs
for P and V must have the same si ze. A scal ar i nput i s expanded to a constant
The degrees of freedom parameters i n V must be posi ti ve i ntegers, and the
val ues i n P must l i e i n the i nterval [0 1].
The i nverse
2
cdf for a gi ven probabi l i ty p and degrees of freedom i s
where
and ( ) i s the Gamma functi on. Each el ement of output X i s the val ue whose
cumul ati ve probabi l i ty under the
2
cdf defi ned by the correspondi ng degrees
of freedom parameter i n V i s speci fi ed by the correspondi ng val ue i n P.
Examples Fi nd a val ue that exceeds 95% of the sampl es from a
2
di stri buti on wi th
10 degrees of freedom.
x = chi2inv(0.95,10)
x =
18.3070
You woul d observe val ues greater than 18.3 onl y 5% of the ti me by chance.
See Also chi2cdf, chi2pdf, chi2rnd, chi2stat, icdf
x F
1
p ( ) x:F x ( ) p = { } = =
p F x ( )
t
2 ( ) 2
e
t 2
2
2
2 ( )
-----------------------------------
0
x
dt = =
chi2pdf
2-67
2chi 2pdf
Purpose Chi -square (
2
) probabi l i ty densi ty functi on (pdf).
Syntax Y = chi2pdf(X,V)
Description Y = chi2pdf(X,V) computes the
2
pdf at each of the val ues i n X usi ng the
the same si ze, whi ch i s al so the si ze of output Y. A scal ar i nput i s expanded to
a constant matri x wi th the same di mensi ons as the other i nput.
The degrees of freedom parameters i n V must be posi ti ve i ntegers, and the
val ues i n X must l i e on the i nterval [0 1].
The
2
pdf for a gi ven val ue x and degrees of freedom i s
where ( ) i s the Gamma functi on. The resul t, y, i s the probabi l i ty that a
si ngl e observati on from a
2
di stri buti on wi th degrees of freedom wi l l have
val ue x.
I f x i s standard normal , then x
2
i s di stri buted
2
wi th one degree of freedom. I f
x
1
, x
2
, ..., x
n
are n i ndependent standard normal observati ons, then the sum of
the squares of the xs i s di stri buted
2
wi th n degrees of freedom (and i s
equi val ent to the gamma densi ty functi on wi th parameters /2 and 2).
Examples nu = 1:6;
x = nu;
y = chi2pdf(x,nu)
y =
0.2420 0.1839 0.1542 0.1353 0.1220 0.1120
The mean of the
2
di stri buti on i s the val ue of the degrees of freedom
parameter, nu. The above exampl e shows that the probabi l i ty densi ty of the
mean fal l s as nu i ncreases.
See Also chi2cdf, chi2inv, chi2rnd, chi2stat, pdf
y f x ( )
x
2 ( ) 2
e
x 2
2
2
2 ( )
------------------------------------- = =
chi2rnd
2-68
2chi 2rnd
Purpose Random numbers from the chi -square (
2
) di stri buti on.
Syntax R = chi2rnd(V)
R = chi2rnd(V,m)
R = chi2rnd(V,m,n)
Description R = chi2rnd(V) generates random numbers from the
2
di stri buti on wi th
degrees of freedom parameters speci fi ed by V. R i s the same si ze as V.
R = chi2rnd(V,m) generates a matri x of si ze m contai ni ng random numbers
from the
2
di stri buti on wi th degrees of freedom parameter V, where m i s a
1-by-2 vector contai ni ng the row and col umn di mensi ons of R.
R = chi2rnd(V,m,n) generates an m-by-n matri x contai ni ng random numbers
from the
2
di stri buti on wi th degrees of freedom parameter V.
Examples Note that the fi rst and thi rd commands are the same, but are di fferent from the
second command.
r = chi2rnd(1:6)
r =
0.0037 3.0377 7.8142 0.9021 3.2019 9.0729
r = chi2rnd(6,[1 6])
r =
6.5249 2.6226 12.2497 3.0388 6.3133 5.0388
r = chi2rnd(1:6,1,6)
r =
0.7638 6.0955 0.8273 3.2506 1.5469 10.9197
See Also chi2cdf, chi2inv, chi2pdf, chi2stat
chi2stat
2-69
2chi 2stat
Purpose Mean and vari ance for the chi -square (
2
) di stri buti on.
Syntax [M,V] = chi2stat(NU)
Description [M,V] = chi2stat(NU) returns the mean and vari ance for the
2
di stri buti on
wi th degrees of freedom parameters speci fi ed by NU.
The mean of the
2
di stri buti on i s , the degrees of freedom parameter, and the
vari ance i s 2.
Example nu = 1:10;
nu = nu'nu;
[m,v] = chi2stat(nu)
m =
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
v =
2 4 6 8 10 12 14 16 18 20
4 8 12 16 20 24 28 32 36 40
6 12 18 24 30 36 42 48 54 60
8 16 24 32 40 48 56 64 72 80
10 20 30 40 50 60 70 80 90 100
12 24 36 48 60 72 84 96 108 120
14 28 42 56 70 84 98 112 126 140
16 32 48 64 80 96 112 128 144 160
18 36 54 72 90 108 126 144 162 180
20 40 60 80 100 120 140 160 180 200
See Also chi2cdf, chi2inv, chi2pdf, chi2rnd
classify
2-70
2cl assi fy
Purpose Li near di scri mi nant anal ysi s.
Syntax class = classify(sample,training,group)
Description class = classify(sample,training,group) assi gns each row of the data i n
sample to one of the groups i nto whi ch the trai ni ng set, training, i s al ready
di vi ded. sample and training must have the same number of col umns.
The vector group contai ns i ntegers, from one to the number of groups, that
i denti fy the group to whi ch each row of the trai ni ng set bel ongs. group and
training must have the same number of rows.
The functi on returns class, a vector wi th the same number of rows as sample.
Each el ement of class i denti fi es the group to whi ch the correspondi ng el ement
of sample has been assi gned. The classify functi on determi nes the group i nto
whi ch each row i n sample i s cl assi fi ed by computi ng the Mahal anobi s di stance
between each row i n sample and each row i n training.
Example load discrim
sample = ratings(idx,:);
training = ratings(1:200,:);
g = group(1:200);
class = classify(sample,training,g);
first5 = class(1:5)
first5 =
2
2
2
2
2
See Also mahal
cluster
2-71
2cl uster
Purpose Construct cl usters from linkage output.
Syntax T = cluster(Z,cutoff)
T = cluster(Z,cutoff,depth,flag)
Description T = cluster(Z,cutoff) constructs cl usters from the hi erarchi cal cl uster
tree, Z, generated by the linkage functi on. Z i s a matri x of si ze (m-1)-by-3,
where m i s the number of observati ons i n the ori gi nal data.
cutoff i s a threshol d val ue that determi nes how the cluster functi on creates
cl usters. The val ue of cutoff determi nes how cluster i nterprets i t.
T = cluster(Z,cutoff,depth,flag) constructs cl usters from cl uster tree Z.
The depth argument speci fi es the number of l evel s i n the hi erarchi cal cl uster
tree to i ncl ude i n the i nconsi stency coeffi ci ent computati on. (The i nconsi stency
coeffi ci ent compares a l i nk between two objects i n the cl uster tree wi th
nei ghbori ng l i nks up to a speci fi ed depth. See the inconsistent functi on for
more i nformati on.) When the depth argument i s speci fi ed, cutoff i s al ways
i nterpreted as the i nconsi stency coeffi ci ent threshol d.
The flag argument overri des the defaul t meani ng of the cutoff argument. I f
flag i s 'inconsistent', then cutoff i s i nterpreted as a threshol d for the
i nconsi stency coeffi ci ent. I f flag i s 'clusters', then cutoff i s the maxi mum
number of cl usters.
Value Meaning
0 < cutoff < 2 cutoff i s i nterpreted as the threshol d for the
i nconsi stency coeffi ci ent. The i nconsi stency coeffi ci ent
quanti fi es the degree of di fference between objects i n
the hi erarchi cal cl uster tree. I f the i nconsi stency
coeffi ci ent of a l i nk i s greater than the threshol d, the
cluster functi on uses the l i nk as a boundary for a
cl uster groupi ng. For more i nformati on about the
i nconsi stency coeffi ci ent, see the inconsistent
functi on.
cutoff >= 2 cutoff i s i nterpreted as the maxi mum number of
cl usters to retai n i n the hi erarchi cal tree.
cluster
2-72
The output, T, i s a vector of si ze m that i denti fi es, by number, the cl uster i n
whi ch each object was grouped. To fi nd out whi ch objects from the ori gi nal
dataset are contai ned i n cl uster i, use find(T==i).
Example The exampl e uses the pdist functi on to cal cul ate the di stance between i tems
i n a matri x of random numbers and then uses the linkage functi on to compute
the hi erarchi cal cl uster tree based on the matri x. The output of the linkage
functi on i s passed to the cluster functi on. The cutoff val ue 3 i ndi cates that
you want to group the i tems i nto three cl usters. The exampl e uses the find
functi on to l i st al l the i tems grouped i nto cl uster 2.
rand('seed', 0);
X = [rand(10,3); rand(10,3)+1; rand(10,3)+2];
Y = pdist(X);
Z = linkage(Y);
T = cluster(Z,3);
find(T==3)
ans =
11
12
13
14
15
16
17
18
19
20
See Also clusterdata, cophenet, dendrogram, inconsistent, linkage, pdist,
squareform
clusterdata
2-73
2cl usterdata
Purpose Construct cl usters from data.
Syntax T = clusterdata(X,cutoff)
Description T = clusterdata(X,cutoff) constructs cl usters from the data matri x X. X i s a
matri x of si ze m by n, i nterpreted as m observati ons of n vari abl es.
cutoff i s a threshol d val ue that determi nes how the cluster functi on creates
cl usters. The val ue of cutoff determi nes how clusterdata i nterprets i t.
The output, T, i s a vector of si ze m that i denti fi es, by number, the cl uster i n
whi ch each object was grouped.
T = clusterdata(X,cutoff) i s the same as
Y = pdist(X,'euclid');
Z = linkage(Y,'single');
T = cluster(Z,cutoff);
Fol l ow thi s sequence to use nondefaul t parameters for pdist and linkage.
Example The exampl e fi rst creates a sampl e dataset of random numbers. The exampl e
then uses the clusterdata functi on to compute the di stances between i tems i n
the dataset and create a hi erarchi cal cl uster tree from the dataset. Fi nal l y, the
clusterdata functi on groups the i tems i n the dataset i nto three cl usters. The
exampl e uses the find functi on to l i st al l the i tems i n cl uster 2.
Value Meaning
0 < cutoff < 1 cutoff i s i nterpreted as the threshol d for the
i nconsi stency coeffi ci ent. The i nconsi stency coeffi ci ent
quanti fi es the degree of di fference between objects i n
the hi erarchi cal cl uster tree. I f the i nconsi stency
coeffi ci ent of a l i nk i s greater than the threshol d, the
cluster functi on uses the l i nk as a boundary for a
cl uster groupi ng. For more i nformati on about the
i nconsi stency coeffi ci ent, see the inconsistent
functi on.
cutoff >= 1 cutoff i s i nterpreted as the maxi mum number of
cl usters to retai n i n the hi erarchi cal tree.
clusterdata
2-74
rand('seed',12);
X = [rand(10,3); rand(10,3)+1.2; rand(10,3)+2.5;
T = clusterdata(X,3);
find(T==2)
ans =
21
22
23
24
25
26
27
28
29
30
See Also cluster, cophenet, dendrogram, inconsistent, linkage, pdist, squareform
combnk
2-75
2combnk
Purpose Enumerati on of al l combi nati ons of n objects k at a ti me.
Syntax C = combnk(v,k)
Description C = combnk(v,k) returns al l combi nati ons of the n el ements i n v taken k at a
ti me.
C = combnk(v,k) produces a matri x C wi th k col umns and n! /k!(n-k)! rows,
where each row contai ns k of the el ements i n the vector v.
I t i s not practi cal to use thi s functi on i f v has more than about 15 el ements.
Example Combi nati ons of characters from a stri ng.
C = combnk('tendril',4);
last5 = C(31:35,:)
last5 =
tedr
tenl
teni
tenr
tend
Combi nati ons of el ements from a numeri c vector.
c = combnk(1:4,2)
c =
3 4
2 4
2 3
1 4
1 3
1 2
cophenet
2-76
2cophenet
Purpose Copheneti c correl ati on coeffi ci ent.
Syntax c = cophenet(Z,Y)
Description c = cophenet(Z,Y) computes the copheneti c correl ati on coeffi ci ent whi ch
compares the di stance i nformati on i n Z, generated by linkage, and the
di stance i nformati on i n Y, generated by pdist. Z i s a matri x of si ze (m-1)-by-3,
wi th di stance i nformati on i n the thi rd col umn. Y i s a vector of si ze
.
For exampl e, gi ven a group of objects {1, 2, ..., m} wi th di stances Y, the functi on
linkage produces a hi erarchi cal cl uster tree. The cophenet functi on measures
the di storti on of thi s cl assi fi cati on, i ndi cati ng how readi l y the data fi ts i nto the
structure suggested by the cl assi fi cati on.
The output val ue, c, i s the copheneti c correl ati on coeffi ci ent. The magni tude of
thi s val ue shoul d be very cl ose to 1 for a hi gh-qual i ty sol uti on. Thi s measure
can be used to compare al ternati ve cl uster sol uti ons obtai ned usi ng di fferent
al gori thms.
The copheneti c correl ati on between Z(:,3) and Y i s defi ned as
where:
Y
ij
i s the di stance between objects i and j i n Y.
Z
ij
i s the di stance between objects i and j i n Z(:,3).
y and z are the average of Y and Z(:,3), respecti vel y.
Example rand('seed',12);
X = [rand(10,3);rand(10,3)+1;rand(10,3)+2];
Y = pdist(X);
Z = linkage(Y,'centroid');
c = cophenet(Z,Y)
c =
0.6985
m m 1 ( ) 2
c

i j <
Y
i j
y ( ) Z
i j
z ( )
i j <
Y
i j
y ( )
2
i j <
Z
i j
z ( )
2
----------------------------------------------------------------------------- - =
cophenet
2-77
See Also cluster, dendrogram, inconsistent, linkage, pdist, squareform
cordexch
2-78
2cordexch
Purpose D-opti mal desi gn of experi ments coordi nate exchange al gori thm.
Syntax settings = cordexch(nfactors,nruns)
[settings,X] = cordexch(nfactors,nruns)
[settings,X] = cordexch(nfactors,nruns,'model')
Description settings = cordexch(nfactors,nruns) generates the factor setti ngs matri x,
settings, for a D-opti mal desi gn usi ng a l i near addi ti ve model wi th a constant
term. settings has nruns rows and nfactors col umns.
[settings,X] = cordexch(nfactors,nruns) al so generates the associ ated
desi gn matri x X.
[settings,X] = cordexch(nfactors,nruns,'model') produces a desi gn for
fi tti ng a speci fi ed regressi on model . The i nput, 'model', can be one of these
stri ngs:
'interaction' i ncl udes constant, l i near, and cross-product terms.
'quadratic' i ncl udes i nteracti ons and squared terms.
'purequadratic' i ncl udes constant, l i near and squared terms.
Example The D-opti mal desi gn for two factors i n ni ne run usi ng a quadrati c model i s the
3
2
factori al as shown bel ow:
settings = cordexch(2,9,'quadratic')
settings =
-1 1
1 1
0 1
1 -1
-1 -1
0 -1
1 0
0 0
-1 0
See Also rowexch, daugment, dcovary, hadamard, fullfact, ff2n
corrcoef
2-79
2corrcoef
Purpose Correl ati on coeffi ci ents.
Syntax R = corrcoef(X)
Description R = corrcoef(X) returns a matri x of correl ati on coeffi ci ents cal cul ated from
an i nput matri x whose rows are observati ons and whose col umns are vari abl es.
El ement i,j of the matri x R i s rel ated to the correspondi ng el ement of the
covari ance matri x C = cov(X) by
The corrcoef functi on i s part of the standard MATLAB l anguage.
See Also cov, mean, std, var
R i j , ( )
C i j ) , ( )
C i i , ( )C j j , ( )
------------------------------------- =
cov
2-80
2cov
Purpose Covari ance matri x.
Syntax C = cov(X)
C = cov(x,y)
Description C = cov(X) computes the covari ance matri x. For a si ngl e vector, cov(x)
returns a scal ar contai ni ng the vari ance. For matri ces, where each row i s an
observati on, and each col umn a vari abl e, cov(X) i s the covari ance matri x.
The vari ance functi on, var(X) i s the same as diag(cov(X)).
The standard devi ati on functi on, std(X) i s equi val ent to sqrt(diag(cov(X))).
cov(x,y), where x and y are col umn vectors of equal l ength, gi ves the same
resul t as cov([x y]).
The cov functi on i s part of the standard MATLAB l anguage.
Algorithm The al gori thm for cov i s
[n,p] = size(X);
X = X - ones(n,1)
*
mean(X);
Y = X'
X/(n-1);
See Also corrcoef, mean, std, var
xcov, xcorr (Si gnal Processi ng Tool box)
crosstab
2-81
2crosstab
Purpose Cross-tabul ati on of several vectors.
Syntax table = crosstab(col1,col2)
table = crosstab(col1,col2,col3,...)
[table,chi2,p] = crosstab(col1,col2)
[table,chi2,p,label] = crosstab(col1,col2)
Description table = crosstab(col1,col2) takes two vectors of posi ti ve i ntegers and
returns a matri x, table, of cross-tabul ati ons. The ijth el ement of table
contai ns the count of al l i nstances where col1 = i and col2 = j.
Al ternati vel y, col1 and col2 can be vectors contai ni ng noni nteger val ues,
character arrays, or cel l arrays of stri ngs. crosstab i mpl i ci tl y assi gns a
posi ti ve i nteger group number to each di sti nct val ue i n col1 and col2, and
creates a cross-tabul ati on usi ng those numbers.
table = crosstab(col1,col2,col3,...) returns table as an n-di mensi onal
array, where n i s the number of arguments you suppl y. The val ue of
table(i,j,k,...) i s the count of al l i nstances where col1 = i, col2 = j,
col3 = k, and so on.
[table,chi2,p] = crosstab(col1,col2) al so returns the chi -square stati sti c,
chi2, for testi ng the i ndependence of the rows and col umns of table. The
scal ar p i s the si gni fi cance l evel of the test. Val ues of p near zero cast doubt on
the assumpti on of i ndependence of the rows and col umns of table.
[table,chi2,p,label] = crosstab(col1,col2) al so returns a cel l array
label that has one col umn for each i nput argument. The val ue i n label(i,j)
i s the val ue of colj that defi nes group i i n the jth di mensi on.
Example Ex a mple 1
We generate 2 col umns of 50 di screte uni form random numbers. The fi rst
col umn has numbers from 1 to 3. The second has onl y the numbers 1 and 2. The
two col umns are i ndependent so we woul d be surpri sed i f p were near zero.
r1 = unidrnd(3,50,1);
r2 = unidrnd(2,50,1);
[table,chi2,p] = crosstab(r1,r2)
crosstab
2-82
table =
10 5
8 8
6 13
chi2 =
4.1723
p =
0.1242
The resul t, 0.1242, i s not a surpri se. A very smal l val ue of p woul d make us
suspect the randomness of the random number generator.
Ex a mple 2
We have data col l ected on several cars over a peri od of ti me. How many
four-cyl i nder cars were made i n the USA duri ng the l ate part of thi s peri od?
[t,c,p,l] = crosstab(cyl4,when,org);
l
l =
'Other' 'Early' 'USA'
'Four' 'Mid' 'Europe'
[] 'Late' 'Japan'
t(2,3,1)
ans =
38
See Also tabulate
daugment
2-83
2daugment
Purpose D-opti mal augmentati on of an experi mental desi gn.
Syntax settings = daugment(startdes,nruns)
[settings,X] = daugment(startdes,nruns,'model')
Description settings = daugment(startdes,nruns) augments an i ni ti al experi mental
desi gn, startdes, wi th nruns new tests.
[settings,X] = daugment(startdes,nruns,'model') al so suppl i es the
desi gn matri x, X. The i nput, 'model', control s the order of the regressi on
model . By defaul t, daugment assumes a l i near addi ti ve model . Al ternati vel y,
'model' can be any of these:
'interaction' i ncl udes constant, l i near, and cross product terms.
'quadratic' i ncl udes i nteracti ons pl us squared terms.
'purequadratic' i ncl udes constant, l i near, and squared terms.
daugment uses the coordi nate exchange al gori thm.
Example We add 5 runs to a 2
2
factori al desi gn to al l ow us to fi t a quadrati c model .
startdes = [-1 -1; 1 -1; -1 1; 1 1];
settings = daugment(startdes,5,'quadratic')
settings =
-1 -1
1 -1
-1 1
1 1
1 0
-1 0
0 1
0 0
0 -1
The resul t i s a 3
2
factori al desi gn.
See Also cordexch, dcovary, rowexch
dcovary
2-84
2dcovary
Purpose D-opti mal desi gn wi th speci fi ed fi xed covari ates.
Syntax settings = dcovary(factors,covariates)
[settings,X] = dcovary(factors,covariates,'model')
Description settings = dcovary(factors,covariates,'model') creates a D-opti mal
desi gn subject to the constrai nt of fi xed covariates for each run. factors i s
the number of experi mental vari abl es you want to control .
[settings,X] = dcovary(factors,covariates,'model') al so creates the
associ ated desi gn matri x, X. The i nput, 'model', control s the order of the
regressi on model . By defaul t, dcovary assumes a l i near addi ti ve model .
Al ternati vel y, 'model' can be any of these:
'quadratic' i ncl udes i nteracti ons pl us squared terms.
'purequadratic' i ncl udes constant, l i near, and squared terms.
Example Suppose we want to bl ock an ei ght run experi ment i nto 4 bl ocks of si ze 2 to fi t
a l i near model on two factors.
covariates = dummyvar([1 1 2 2 3 3 4 4]);
settings = dcovary(2,covariates(:,1:3),'linear')
settings =
1 1 1 0 0
-1 -1 1 0 0
-1 1 0 1 0
1 -1 0 1 0
1 1 0 0 1
-1 -1 0 0 1
-1 1 0 0 0
1 -1 0 0 0
The fi rst two col umns of the output matri x contai n the setti ngs for the two
factors. The l ast three col umns are dummy variable codi ngs for the four bl ocks.
See Also daugment, cordexch
dendrogram
2-85
2dendrogram
Purpose Pl ot dendrogram graphs.
Syntax H = dendrogram(Z)
H = dendrogram(Z,p)
[H,T] = dendrogram(...)
Description H = dendrogram(Z) generates a dendrogram pl ot of the hi erarchi cal , bi nary
cl uster tree, Z. Z i s an (m-1)-by-3 matri x, generated by the linkage functi on,
where m i s the number of objects i n the ori gi nal dataset.
A dendrogram consi sts of many upsi de-down, U-shaped l i nes connecti ng
objects i n a hi erarchi cal tree. Except for the Ward l i nkage (see linkage), the
hei ght of each U represents the di stance between the two objects bei ng
connected. The output, H, i s a vector of l i ne handl es.
H = dendrogram(Z,p) generates a dendrogram wi th onl y the top p nodes. By
defaul t, dendrogram uses 30 as the val ue of p. When there are more than 30
i ni ti al nodes, a dendrogram may l ook crowded. To di spl ay every node, set p = 0.
[H,T] = dendrogram(...) generates a dendrogram and returns T, a vector of
si ze m that contai ns the cl uster number for each object i n the ori gi nal dataset.
T provi des access to the nodes of a cl uster hi erarchy that are not di spl ayed i n
the dendrogram because they fal l bel ow the cutoff val ue p. For exampl e, to fi nd
out whi ch objects are contai ned i n l eaf node k of the dendrogram, use
find(T==k). Leaf nodes are the nodes at the bottom of the dendrogram that
have no other nodes bel ow them.
When there are fewer than p objects i n the ori gi nal data, al l objects are
di spl ayed i n the dendrogram. I n thi s case, T i s the i denti cal map, i .e.,
T = (1:m)', where each node contai ns onl y i tsel f.
X= rand(100,2);
Y= pdist(X,'citiblock');
Z= linkage(Y,'average');
[H, T] = dendrogram(Z);
dendrogram
2-86
find(T==20)
ans =
20
49
62
65
73
96
Thi s output i ndi cates that l eaf node 20 i n the dendrogram contai ns the ori gi nal
data poi nts 20, 49, 62, 65, 73, and 96.
See Also cluster, clusterdata, cophenet, inconsistent, linkage, pdist, squareform
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
14 17 13 22 12 8 23 20 19 1 21 15 5 2 3 16 27 4 18 24 28 6 10 7 30 26 9 25 11 29
disttool
2-87
2di sttool
Purpose I nteracti ve graph of cdf (or pdf) for many probabi l i ty di stri buti ons.
Syntax disttool
Description The disttool command di spl ays a graphi c user i nterface for expl ori ng the
effects of changi ng parameters on the pl ot of a cdf or pdf. Cl i cki ng and draggi ng
a verti cal l i ne on the pl ot al l ows you to i nteracti vel y eval uate the functi on over
i ts enti re domai n.
Eval uate the pl otted functi on by typi ng a val ue i n the x-axi s edi t box or
draggi ng the verti cal reference l i ne on the pl ot. For cdfs, you can eval uate the
i nverse functi on by typi ng a val ue i n the y-axi s edi t box or draggi ng the
hori zontal reference l i ne on the pl ot. The shape of the poi nter changes from an
arrow to a crosshai r when i t i s over the verti cal or hori zontal l i ne to i ndi cate
that the reference l i ne i s draggabl e.
To change the di stri buti on functi on, choose an opti on from the menu of
functi ons at the top l eft of the fi gure. To change from cdfs to pdfs, choose an
opti on from the menu at the top ri ght of the fi gure.
To change the parameter setti ngs, move the sl i ders or type a val ue i n the edi t
box under the name of the parameter. To change the l i mi ts of a parameter, type
a val ue i n the edi t box at the top or bottom of the parameter sl i der.
To cl ose the tool , press the Close button.
See Also randtool
dummyvar
2-88
2dummyvar
Purpose Matri x of 0-1 dummy vari abl es.
Syntax D = dummyvar(group)
Description D = dummyvar(group) generates a matri x, D, of 0-1 col umns. D has one col umn
for each uni que val ue i n each col umn of the matri x group. Each col umn of
group contai ns posi ti ve i ntegers that i ndi cate the group membershi p of an
i ndi vi dual row.
Example Suppose we are studyi ng the effects of two machi nes and three operators on a
process. The fi rst col umn of group woul d have the val ues 1 or 2 dependi ng on
whi ch machi ne was used. The second col umn of group woul d have the val ues
1, 2, or 3 dependi ng on whi ch operator ran the machi ne.
group = [1 1;1 2;1 3;2 1;2 2;2 3];
D = dummyvar(group)
D =
1 0 1 0 0
1 0 0 1 0
1 0 0 0 1
0 1 1 0 0
0 1 0 1 0
0 1 0 0 1
See Also pinv, regress
errorbar
2-89
2errorbar
Purpose Pl ot error bars al ong a curve.
Syntax errorbar(X,Y,L,U,symbol)
errorbar(X,Y,L)
errorbar(Y,L)
Description errorbar(X,Y,L,U,symbol) pl ots X versus Y wi th error bars speci fi ed by L
and U. X, Y, L, and U must be the same l ength. I f X, Y, L, and U are matri ces, then
each col umn produces a separate l i ne. The error bars are each drawn a di stance
of U(i) above and L(i) bel ow the poi nts i n (X,Y). symbol i s a stri ng that
control s the l i ne type, pl otti ng symbol , and col or of the error bars.
errorbar(X,Y,L) pl ots X versus Y wi th symmetri c error bars about Y.
errorbar(Y,L) pl ots Y wi th error bars [YL Y+L].
The errorbar functi on i s a part of the standard MATLAB l anguage.
Example lambda = (0.1:0.2:0.5);
r = poissrnd(lambda(ones(50,1),:));
[p,pci] = poissfit(r,0.001);
L = p - pci(1,:)
U = pci(2,:) - p
errorbar(1:3,p,L,U,'+')
L =
0.1200 0.1600 0.2600
U =
0.2000 0.2200 0.3400
0.5 1 1.5 2 2.5 3 3.5
0
0.2
0.4
0.6
0.8
ewmaplot
2-90
2ewmapl ot
Purpose Exponenti al l y Wei ghted Movi ng Average (EWMA) chart for Stati sti cal Process
Control (SPC).
Syntax ewmaplot(data)
ewmaplot(data,lambda)
ewmaplot(data,lambda,alpha)
ewmaplot(data,lambda,alpha,specs)
h = ewmaplot(...)
Description ewmaplot(data) produces an EWMA chart of the grouped responses i n data.
The rows of data contai n repl i cate observati ons taken at a gi ven ti me. The rows
shoul d be i n ti me order.
ewmaplot(data,lambda) produces an EWMA chart of the grouped responses i n
data, and speci fi es how much the current predi cti on i s i nfl uenced by past
observati ons. Hi gher val ues of lambda gi ve more wei ght to past observati ons.
By defaul t, lambda = 0.4; lambda must be between 0 and 1.
ewmaplot(data,lambda,alpha) produces an EWMA chart of the grouped
responses i n data, and speci fi es the si gni fi cance l evel of the upper and l ower
pl otted confi dence l i mi ts. alpha i s 0.0027 by defaul t. Thi s val ue produces
three-si gma l i mi ts:
norminv(1-0.0027/2)
ans =
3
To get k-si gma l i mi ts, use the expressi on 2*(1-normcdf(k)). For exampl e, the
correct alpha val ue for 2-si gma l i mi ts i s 0.0455, as shown bel ow.
k = 2;
2*(1-normcdf(k))
ans =
0.0455
ewmaplot(data,lambda,alpha,specs) produces an EWMA chart of the
grouped responses i n data, and speci fi es a two-el ement vector, specs, for the
l ower and upper speci fi cati on l i mi ts of the response.
ewmaplot
2-91
h = ewmaplot(...) returns a vector of handl es to the pl otted l i nes.
Example Consi der a process wi th a sl owl y dri fti ng mean. An EWMA chart i s preferabl e
to an x-bar chart for moni tori ng thi s ki nd of process. The si mul ati on bel ow
demonstrates an EWMA chart for a sl ow l i near dri ft.
t = (1:28)';
r = normrnd(10+0.02*t(:,ones(4,1)),0.5);
ewmaplot(r,0.4,0.01,[9.75 10.75])
The EWMA val ue for group 28 i s hi gher than woul d be expected purel y by
chance. I f we had been moni tori ng thi s process conti nuousl y, we woul d have
detected the dri ft when group 28 was col l ected, and we woul d have had an
opportuni ty to i nvesti gate i ts cause.
Reference Montgomery, D., I ntroduction to Statistical Quality Control, John Wi l ey &
Sons 1991. p. 299.
See Also xbarplot, schart
0 5 10 15 20 25 30
9.6
9.8
10
10.2
10.4
10.6
10.8
28
UCL
LCL
CL
Exponentially Weighted Moving Average (EWMA) Chart
USL
LSL
Sample Number
E
W
M
A
expcdf
2-92
2expcdf
Purpose Exponenti al cumul ati ve di stri buti on functi on (cdf).
Syntax P = expcdf(X,MU)
Description P = expcdf(X,MU) computes the exponenti al cdf at each of the val ues i n X
usi ng the correspondi ng parameters i n MU. Vector or matri x i nputs for X and MU
must have the same si ze. A scal ar i nput i s expanded to a constant matri x wi th
the same di mensi ons as the other i nput. The parameters i n MU must be posi ti ve.
The exponenti al cdf i s
The resul t, p, i s the probabi l i ty that a si ngl e observati on from an exponenti al
di stri buti on wi l l fal l i n the i nterval [0 x].
Examples The medi an of the exponenti al di stri buti on i s l og(2). Demonstrate thi s fact.
mu = 10:10:60;
p = expcdf(log(2)*mu,mu)
p =
0.5000 0.5000 0.5000 0.5000 0.5000 0.5000
What i s the probabi l i ty that an exponenti al random vari abl e wi l l be l ess than
or equal to the mean, ?
mu = 1:6;
x = mu;
p = expcdf(x,mu)
p =
0.6321 0.6321 0.6321 0.6321 0.6321 0.6321
See Also cdf, expfit, expinv, exppdf, exprnd, expstat
p F x ( )
1
---e
t
---
0
x
dt 1 e
x
---
= = =
expfit
2-93
2expfi t
Purpose Parameter esti mates and confi dence i nterval s for exponenti al data.
Syntax muhat = expfit(x)
[muhat,muci] = expfit(x)
[muhat,muci] = expfit(x,alpha)
Description muhat = expfit(x) returns the esti mate of the parameter, , of the
exponenti al di stri buti on gi ven data x.
[muhat,muci] = expfit(x) al so returns the 95% confi dence i nterval i n muci.
[muhat,muci] = expfit(x,alpha) gi ves 100(1-alpha)% confi dence
Example We generate 100 i ndependent sampl es of exponenti al data wi th = 3. muhat i s
an esti mate of true_mu and muci i s a 99% confi dence i nterval around muhat.
Noti ce that muci contai ns true_mu.
true_mu = 3;
[muhat,muci] = expfit(r,0.01)
muhat =
2.8835
muci =
2.1949
3.6803
See Also expcdf, expinv, exppdf, exprnd, expstat, betafit, binofit, gamfit, normfit,
poissfit, unifit, weibfit
expinv
2-94
2expi nv
Purpose I nverse of the exponenti al cumul ati ve di stri buti on functi on (cdf).
Syntax X = expinv(P,MU)
Description X = expinv(P,MU) computes the i nverse of the exponenti al cdf wi th
parameters speci fi ed by MU for the correspondi ng probabi l i ti es i n P. Vector or
matri x i nputs for P and MU must have the same si ze. A scal ar i nput i s expanded
to a constant matri x wi th the same di mensi ons as the other i nput. The
parameters i n MU must be posi ti ve and the val ues i n P must l i e on the i nterval
[0 1].
The i nverse of the exponenti al cdf i s
The resul t, x, i s the val ue such that an observati on from an exponenti al
di stri buti on wi th parameter wi l l fal l i n the range [0 x] wi th probabi l i ty p.
Examples Let the l i feti me of l i ght bul bs be exponenti al l y di stri buted wi th = 700 hours.
What i s the medi an l i feti me of a bul b?
expinv(0.50,700)
ans =
485.2030
So, suppose you buy a box of 700 hour l i ght bul bs. I f 700 hours i s the mean
l i fe of the bul bs, then hal f them wi l l burn out i n l ess than 500 hours.
See Also expcdf, expfit, exppdf, exprnd, expstat, icdf
x F p ( ) l n 1 p ( ) = =
exppdf
2-95
2exppdf
Purpose Exponenti al probabi l i ty densi ty functi on (pdf).
Syntax Y = exppdf(X,MU)
Description exppdf(X,MU) computes the exponenti al pdf at each of the val ues i n X usi ng the
correspondi ng parameters i n MU. Vector or matri x i nputs for X and MU must be
di mensi ons as the other i nput. The parameters i n MU must be posi ti ve.
The exponenti al pdf i s
The exponenti al pdf i s the gamma pdf wi th i ts fi rst parameter equal to 1.
The exponenti al di stri buti on i s appropri ate for model i ng wai ti ng ti mes when
the probabi l i ty of wai ti ng an addi ti onal peri od of ti me i s i ndependent of how
l ong youve al ready wai ted. For exampl e, the probabi l i ty that a l i ght bul b wi l l
burn out i n i ts next mi nute of use i s rel ati vel y i ndependent of how many
mi nutes i t has al ready burned.
Examples y = exppdf(5,1:5)
y =
0.0067 0.0410 0.0630 0.0716 0.0736
y = exppdf(1:5,1:5)
y =
0.3679 0.1839 0.1226 0.0920 0.0736
See Also expcdf, expfit, expinv, exprnd, expstat, pdf
y f x ( )
1
---e
x
---
= =
exprnd
2-96
2exprnd
Purpose Random numbers from the exponenti al di stri buti on.
Syntax R = exprnd(MU)
R = exprnd(MU,m)
R = exprnd(MU,m,n)
Description R = exprnd(MU) generates exponenti al random numbers wi th mean MU. The
si ze of R i s the si ze of MU.
R = exprnd(MU,m) generates exponenti al random numbers wi th mean MU,
where m i s a 1-by-2 vector that contai ns the row and col umn di mensi ons of R.
R = exprnd(MU,m,n) generates exponenti al random numbers wi th mean MU,
where scal ars m and n are the row and col umn di mensi ons of R.
Examples n1 = exprnd(5:10)
n1 =
7.5943 18.3400 2.7113 3.0936 0.6078 9.5841
n2 = exprnd(5:10,[1 6])
n2 =
3.2752 1.1110 23.5530 23.4303 5.7190 3.9876
n3 = exprnd(5,2,3)
n3 =
24.3339 13.5271 1.8788
4.7932 4.3675 2.6468
See Also expcdf, expfit, expinv, exppdf, expstat
expstat
2-97
2expstat
Purpose Mean and vari ance for the exponenti al di stri buti on.
Syntax [M,V] = expstat(MU)
Description [M,V] = expstat(MU) returns the mean and vari ance for the exponenti al
di stri buti on wi th parameters MU. The mean of the exponenti al di stri buti on i s ,
and the vari ance i s
2
.
Examples [m,v] = expstat([1 10 100 1000])
m =
1 10 100 1000
v =
1 100 10000 1000000
See Also expcdf, expfit, expinv, exppdf, exprnd
fcdf
2-98
2fcdf
Purpose F cumul ati ve di stri buti on functi on (cdf).
Syntax P = fcdf(X,V1,V2)
Description P = fcdf(X,V1,V2) computes the F cdf at each of the val ues i n X usi ng the
correspondi ng parameters i n V1 and V2. Vector or matri x i nputs for X, V1, and
V2 must al l be the same si ze. A scal ar i nput i s expanded to a constant matri x
wi th the same di mensi ons as the other i nputs. The parameters i n V1 and V2
must be posi ti ve i ntegers.
The F cdf i s
The resul t, p, i s the probabi l i ty that a si ngl e observati on from an F di stri buti on
wi th parameters
1
and
2
wi l l fal l i n the i nterval [0 x].
Examples Thi s exampl e i l l ustrates an i mportant and useful mathemati cal i denti ty for the
F di stri buti on.
nu1 = 1:5;
nu2 = 6:10;
x = 2:6;
F1 = fcdf(x,nu1,nu2)
F1 =
0.7930 0.8854 0.9481 0.9788 0.9919
F2 = 1 - fcdf(1./x,nu2,nu1)
F2 =
0.7930 0.8854 0.9481 0.9788 0.9919
See Also cdf, finv, fpdf, frnd, fstat
p F x
1

2
, ( )

1

2
+ ( )
2
-----------------------

1
2
------
,
_

2
2
------
,
_
--------------------------------
0
x
2
------
,
_
1
2
-----
t
1
2
2
--------------
1

1
2
------
,
_
t +
1

2
+
2
-----------------
-------------------------------------------dt = =
ff2n
2-99
2ff2n
Purpose Two-l evel ful l -factori al desi gns.
Syntax X = ff2n(n)
Description X = ff2n(n) creates a two-l evel ful l -factori al desi gn, X, where n i s the desi red
number of col umns of X. The number of rows i n X i s 2
n
.
Example X = ff2n(3)
X =
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
X i s the bi nary representati on of the numbers from 0 to 2
n
-1.
See Also fracfact, fullfact
finv
2-100
2fi nv
Purpose I nverse of the F cumul ati ve di stri buti on functi on (cdf).
Syntax X = finv(P,V1,V2)
Description X = finv(P,V1,V2) computes the i nverse of the F cdf wi th numerator degrees
of freedom V1 and denomi nator degrees of freedom V2 for the correspondi ng
probabi l i ti es i n P. Vector or matri x i nputs for P, V1, and V2 must al l be the same
si ze. A scal ar i nput i s expanded to a constant matri x wi th the same di mensi ons
as the other i nputs.
The parameters i n V1 and V2 must al l be posi ti ve i ntegers, and the val ues i n P
must l i e on the i nterval [0 1].
The F i nverse functi on i s defi ned i n terms of the F cdf as
where
Examples Fi nd a val ue that shoul d exceed 95% of the sampl es from an F di stri buti on wi th
5 degrees of freedom i n the numerator and 10 degrees of freedom i n the
denomi nator.
x = finv(0.95,5,10)
x =
3.3258
You woul d observe val ues greater than 3.3258 onl y 5% of the ti me by chance.
See Also fcdf, fpdf, frnd, fstat, icdf
x F
1
p
1

2
, ( ) x:F x
1

2
, ( ) p = { } = =
p F x
1

2
, ( )

1

2
+ ( )
2
-----------------------

1
2
------
,
_

2
2
------
,
_
--------------------------------
0
x
2
------
,
_
1
2
-----
t
1
2
2
--------------
1

1
2
------
,
_
t +
1

2
+
2
-----------------
-------------------------------------------dt = =
fpdf
2-101
2fpdf
Purpose F probabi l i ty densi ty functi on (pdf).
Syntax Y = fpdf(X,V1,V2)
Description Y = fpdf(X,V1,V2) computes the F pdf at each of the val ues i n X usi ng the
correspondi ng parameters i n V1 and V2. Vector or matri x i nputs for X, V1,
and V2 must al l be the same si ze. A scal ar i nput i s expanded to a constant
matri x wi th the same di mensi ons as the other i nputs. The parameters i n V1
and V2 must al l be posi ti ve i ntegers, and the val ues i n X must l i e on the i nterval
[0 ).
The probabi l i ty densi ty functi on for the F di stri buti on i s
Examples y = fpdf(1:6,2,2)
y =
0.2500 0.1111 0.0625 0.0400 0.0278 0.0204
z = fpdf(3,5:10,5:10)
z =
0.0689 0.0659 0.0620 0.0577 0.0532 0.0487
See Also fcdf, finv, frnd, fstat, pdf
y f x
1

2
, ( )

1

2
+ ( )
2
-----------------------

1
2
------
,
_

2
2
------
,
_
--------------------------------

1
2
------
,
_
1
2
-----
x
1
2
2
--------------
1

1
2
------
,
_
x +
1

2
+
2
-----------------
-------------------------------------------- = =
fracfact
2-102
2fracfact
Purpose Generate fracti onal factori al desi gn from generators.
Syntax x = fracfact('gen')
[x,conf] = fracfact('gen')
Description x = fracfact('gen') generates a fracti onal factori al desi gn as speci fi ed by
the generator stri ng gen, and returns a matri x x of desi gn poi nts. The i nput
stri ng gen i s a generator stri ng consi sti ng of words separated by spaces. Each
word descri bes how a col umn of the output desi gn shoul d be formed from
col umns of a ful l factori al . Typi cal l y gen wi l l i ncl ude si ngl e-l etter words for the
fi rst few factors, pl us addi ti onal mul ti pl e-l etter words descri bi ng how the
remai ni ng factors are confounded wi th the fi rst few.
The output matri x x i s a fracti on of a two-l evel ful l -factori al desi gn. Suppose
there are m words i n gen, and that each word i s formed from a subset of the
fi rst n l etters of the al phabet. The output matri x x has 2
n
rows and m col umns.
Let F represent the two-l evel ful l -factori al desi gn as produced by ff2n(n). The
val ues i n col umn j of x are computed by mul ti pl yi ng together the col umns of F
correspondi ng to l etters that appear i n the jth word of the generator stri ng.
[x,conf] = fracfact('gen') al so returns a cel l array, conf, that descri bes
the confoundi ng pattern among the mai n effects and al l two-factor
i nteracti ons.
We want to run an experi ment to study the effects of four factors on a response,
but we can onl y afford ei ght runs. (A run i s a si ngl e repeti ti on of the experi ment
at a speci fi ed combi nati on of factor val ues.) Our goal i s to determi ne whi ch
factors affect the response. There may be i nteracti ons between some pai rs of
factors.
A total of si xteen runs woul d be requi red to test al l factor combi nati ons.
However, i f we are wi l l i ng to assume there are no three-factor i nteracti ons, we
can esti mate the mai n factor effects i n just ei ght runs.
[x,conf] = fracfact('a b c abc')
fracfact
2-103
x =
-1 -1 -1 -1
-1 -1 1 1
-1 1 -1 1
-1 1 1 -1
1 -1 -1 1
1 -1 1 -1
1 1 -1 -1
1 1 1 1
conf =
'Term' 'Generator' 'Confounding'
'X1' 'a' 'X1'
'X2' 'b' 'X2'
'X3' 'c' 'X3'
'X4' 'abc' 'X4'
'X1*X2' 'ab' 'X1*X2 + X3*X4'
'X1*X3' 'ac' 'X1*X3 + X2*X4'
'X1*X4' 'bc' 'X1*X4 + X2*X3'
'X2*X3' 'bc' 'X1*X4 + X2*X3'
'X2*X4' 'ac' 'X1*X3 + X2*X4'
'X3*X4' 'ab' 'X1*X2 + X3*X4'
The fi rst three col umns of the x matri x form a ful l -factori al desi gn. The fi nal
col umn i s formed by mul ti pl yi ng the other three. The confoundi ng pattern
shows that the mai n effects for al l four factors are esti mabl e, but the two-factor
i nteracti ons are not. For exampl e, the X1*X2 and X3*X4 i nteracti ons are
confounded, so i t i s not possi bl e to esti mate thei r effects separatel y.
After conducti ng the experi ment, we may fi nd out that the 'ab' effect i s
si gni fi cant. I n order to determi ne whether thi s effect comes from X1*X2 or
X3*X4 we woul d have to run the remai ni ng ei ght runs. We can obtai n those
runs by reversi ng the si gn of the fi nal generator.
fracfact('a b c -abc')
fracfact
2-104
ans =
-1 -1 -1 1
-1 -1 1 -1
-1 1 -1 -1
-1 1 1 1
1 -1 -1 -1
1 -1 1 1
1 1 -1 1
1 1 1 -1
Ex a mple 2
Suppose now we need to study the effects of ei ght factors. A ful l factori al woul d
requi re 256 runs. By cl ever choi ce of generators, we can fi nd a si xteen-run
desi gn that can esti mate those ei ght effects wi th no confoundi ng from
two-factor i nteracti ons.
[x,c] = fracfact('a b c d abc acd abd bcd');
c(1:10,:)
ans =
'X1' 'a' 'X1'
'X2' 'b' 'X2'
'X3' 'c' 'X3'
'X4' 'd' 'X4'
'X5' 'abc' 'X5'
'X6' 'acd' 'X6'
'X7' 'abd' 'X7'
'X8' 'bcd' 'X8'
'X1*X2' 'ab' 'X1*X2 + X3*X5 + X4*X7 + X6*X8'
Thi s confoundi ng pattern shows that the mai n effects are not confounded wi th
two-factor i nteracti ons. The fi nal row shown reveal s that a group of four
two-factor i nteracti ons i s confounded. Other choi ces of generators woul d not
have the same desi rabl e property.
[x,c] = fracfact('a b c d ab cd ad bc');
c(1:10,:)
fracfact
2-105
ans =
'X1' 'a' 'X1 + X2*X5 + X4*X7'
'X2' 'b' 'X2 + X1*X5 + X3*X8'
'X3' 'c' 'X3 + X2*X8 + X4*X6'
'X4' 'd' 'X4 + X1*X7 + X3*X6'
'X5' 'ab' 'X5 + X1*X2'
'X6' 'cd' 'X6 + X3*X4'
'X7' 'ad' 'X7 + X1*X4'
'X8' 'bc' 'X8 + X2*X3'
'X1*X2' 'ab' 'X5 + X1*X2'
Here al l the mai n effects are confounded wi th one or more two-factor
i nteracti ons.
References Box, G. A. F., W. G. Hunter, and J. S. Hunter (1978), Statistics for
Experimenters, Wi l ey, New York.
See Also ff2n, fullfact, hadamard
friedman
2-106
2fri edman
Purpose Fri edmans nonparametri c two-way Anal ysi s of Vari ance (ANOVA).
Syntax p = friedman(X,reps)
p = friedman(X,reps,'displayopt)
[p,table] = friedman(...)
[p,table,stats] = friedman(...)
Description p = friedman(X,reps) performs the nonparametri c Fri edmans test to
compare the means of the col umns of X. Fri edmans test i s si mi l ar to cl assi cal
two-way ANOVA, but i t tests onl y for col umn effects after adjusti ng for possi bl e
row effects. I t does not test for row effects or i nteracti on effects. Fri edmans test
i s appropri ate when col umns represent treatments that are under study, and
rows represent nui sance effects (bl ocks) that need to be taken i nto account but
are not of any i nterest.
The di fferent col umns represent changes i n factor A. The di fferent rows
represent changes i n the bl ocki ng factor B. I f there i s more than one
observati on for each combi nati on of factors, i nput reps i ndi cates the number of
repl i cates i n each cel l , whi ch must be constant.
The matri x bel ow i l l ustrates the format for a set-up where col umn factor A has
three l evel s, row factor B has two l evel s, and there are two repl i cates (reps=2).
The subscri pts i ndi cate row, col umn, and repl i cate, respecti vel y.
Fri edmans test assumes a model of the form
where i s an overal l l ocati on parameter, represents the col umn effect,
represents the row effect, and represents the error. Thi s test ranks the
data wi thi n each l evel of B, and tests for a di fference across l evel s of A. The p
that friedman returns i s the p-val ue for the nul l hypothesi s that . I f the
p-val ue i s near zero, thi s casts doubt on the nul l hypothesi s. A suffi ci entl y
x
111
x
121
x
131
x
112
x
122
x
132
x
211
x
221
x
231
x
212
x
222
x
232
x
i j k

i

j

i j k
+ + + =

i

j
i j k
i
0 =
friedman
2-107
smal l p-val ue suggests that at l east one col umn-sampl e mean i s si gni fi cantl y
di fferent than the other col umn-sampl e means; i .e., there i s a mai n effect due
to factor A. The choi ce of a l i mi t for the p-val ue to determi ne whether a resul t
i s stati sti cal l y si gni fi cant i s l eft to the researcher. I t i s common to decl are a
resul t si gni fi cant i f the p-val ue i s l ess than 0.05 or 0.01.
friedman al so di spl ays a fi gure showi ng an ANOVA tabl e, whi ch di vi des the
vari abi l i ty of the ranks i nto two or three parts:
The vari abi l i ty due to the di fferences among the col umn means
The vari abi l i ty due to the i nteracti on between rows and col umns (i f reps i s
greater than i ts defaul t val ue of 1)
The fi fth shows Fri edmans chi -square stati sti c.
The si xth shows the p-val ue for the chi -square stati sti c.
p = friedman(X,reps,'displayopt') enabl es the ANOVA tabl e di spl ay
when 'displayopt' i s 'on' (defaul t) and suppresses the di spl ay when
'displayopt' i s 'off'.
[p,table] = friedman(...) returns the ANOVA tabl e (i ncl udi ng col umn and
to the cl i pboard by sel ecti ng Copy Text from the Edit menu.
[p,table,stats] = friedman(...) returns a stats structure that you can
use to perform a fol l ow-up mul ti pl e compari son test. The friedman test
eval uates the hypothesi s that the col umn effects are al l the same agai nst the
al ternati ve that they are not al l the same. Someti mes i t i s preferabl e to
perform a test to determi ne whi ch pai rs of col umn effects are si gni fi cantl y
di fferent, and whi ch are not. You can use the multcompare functi on to perform
such tests by suppl yi ng the stats structure as i nput.
friedman
2-108
Examples Lets repeat the exampl e from the anova2 functi on, thi s ti me appl yi ng
Fri edmans test. Recal l that the data bel ow come from a study of popcorn
brands and popper type (Hogg 1987). The col umns of the matri x popcorn are
brands (Gourmet, Nati onal , and Generi c). The rows are popper type (Oi l and
Ai r). The study popped a batch of each brand three ti mes wi th each popper. The
val ues are the yi el d i n cups of popped popcorn.
load popcorn
popcorn
popcorn =
5.5000 4.5000 3.5000
5.5000 4.5000 4.0000
6.0000 4.0000 3.0000
6.5000 5.0000 4.0000
7.0000 5.5000 5.0000
7.0000 5.0000 4.5000
p = friedman(popcorn,3)
p =
0.0010
The smal l p-val ue of 0.001 i ndi cates the popcorn brand affects the yi el d of
popcorn. Thi s i s consi stent wi th the resul ts from anova2.
We coul d al so test popper type by permuti ng the popcorn array as descri bed on
Fri edmans Test on page 1-97 and repeati ng the test.
friedman
2-109
References Hogg, R. V. and J. Ledol ter. Engineering Statistics. MacMi l l an Publ i shi ng
Company, 1987.
Hol l ander, M. and D. A. Wol fe. Nonparametric Statistical Methods. Wi l ey,
1973.
See Also anova2, multcompare
frnd
2-110
2frnd
Purpose Random numbers from the F di stri buti on.
Syntax R = frnd(V1,V2)
R = frnd(V1,V2,m)
R = frnd(V1,V2,m,n)
Description R = frnd(V1,V2) generates random numbers from the F di stri buti on wi th
numerator degrees of freedom V1 and denomi nator degrees of freedom V2.
Vector or matri x i nputs for V1 and V2 must have the same si ze, whi ch i s al so
the si ze of R. A scal ar i nput for V1 or V2 i s expanded to a constant matri x wi th
the same di mensi ons as the other i nput.
R = frnd(V1,V2,m) generates random numbers from the F di stri buti on wi th
parameters V1 and V2, where m i s a 1-by-2 vector that contai ns the row and
col umn di mensi ons of R.
R = frnd(V1,V2,m,n) generates random numbers from the F di stri buti on
wi th parameters V1 and V2, where scal ars m and n are the row and col umn
di mensi ons of R.
Examples n1 = frnd(1:6,1:6)
n1 =
0.0022 0.3121 3.0528 0.3189 0.2715 0.9539
n2 = frnd(2,2,[2 3])
n2 =
0.3186 0.9727 3.0268
0.2052 148.5816 0.2191
n3 = frnd([1 2 3;4 5 6],1,2,3)
n3 =
0.6233 0.2322 31.5458
2.5848 0.2121 4.4955
See Also fcdf, finv, fpdf, fstat
fstat
2-111
2fstat
Purpose Mean and vari ance for the F di stri buti on.
Syntax [M,V] = fstat(V1,V2)
Description [M,V] = fstat(V1,V2) returns the mean and vari ance for the F di stri buti on
wi th parameters speci fi ed by V1 and V2. Vector or matri x i nputs for V1 and V2
must have the same si ze, whi ch i s al so the si ze of M and V. A scal ar i nput for V1
or V2 i s expanded to a constant matri x wi th the same di mensi ons as the other
i nput.
The mean of the F di stri buti on for val ues of
2
greater than 2 i s
The vari ance of the F di stri buti on for val ues of
2
greater than 4 i s
The mean of the F di stri buti on i s undefi ned i f
2
i s l ess than 3. The vari ance i s
undefi ned for
2
l ess than 5.
Examples fstat returns NaN when the mean and vari ance are undefi ned.
[m,v] = fstat(1:5,1:5)
m =
NaN NaN 3.0000 2.0000 1.6667
v =
NaN NaN NaN NaN 8.8889
See Also fcdf, finv, frnd, frnd
2
2
------------
2
2
2
1

2
2 + ( )
1

2
2 ( )
2
2
4 ( )
--------------------------------------------------
fsurfht
2-112
2fsurfht
Purpose I nteracti ve contour pl ot of a functi on.
Syntax fsurfht('fun',xlims,ylims)
fsurfht('fun',xlims,ylims,p1,p2,p3,p4,p5)
Description fsurfht('fun',xlims,ylims) i s an i nteracti ve contour pl ot of the functi on
speci fi ed by the text vari abl e fun. The x-axi s l i mi ts are speci fi ed by xlims i n
the form [xmin xmax], and the y-axi s l i mi ts are speci fi ed by ylims i n the form
[ymin ymax].
fsurfht('fun',xlims,ylims,p1,p2,p3,p4,p5) al l ows for fi ve opti onal
parameters that you can suppl y to the functi on fun.
The i ntersecti on of the verti cal and hori zontal reference l i nes on the pl ot
defi nes the current x-val ue and y-val ue. You can drag these reference l i nes and
watch the cal cul ated z-val ues (at the top of the pl ot) update si mul taneousl y.
Al ternati vel y, you can type the x-val ue and y-val ue i nto edi tabl e text fi el ds on
the x-axi s and y-axi s.
Example Pl ot the Gaussi an l i kel i hood functi on for the gas.mat data.
load gas
Create a functi on contai ni ng the fol l owi ng commands, and name i t
gauslike.m.
function z = gauslike(mu,sigma,p1)
n = length(p1);
z = ones(size(mu));
for i = 1:n
z = z .* (normpdf(p1(i),mu,sigma));
end
The gauslike functi on cal l s normpdf, treati ng the data sampl e as fi xed and the
parameters and as vari abl es. Assume that the gas pri ces are normal l y
di stri buted, and pl ot the l i kel i hood surface of the sampl e.
fsurfht('gauslike',[112 118],[3 5],price1)
fsurfht
2-113
The sampl e mean i s the x-val ue at the maxi mum, but the sampl e standard
devi ati on i s not the y-val ue at the maxi mum.
mumax = mean(price1)
mumax =
115.1500
sigmamax = std(price1)*sqrt(19/20)
sigmamax =
3.7719
fullfact
2-114
2ful l fact
Purpose Ful l -factori al experi mental desi gn.
Syntax design = fullfact(levels)
Description design = fullfact(levels) gi ve the factor setti ngs for a ful l factori al desi gn.
Each el ement i n the vector levels speci fi es the number of uni que val ues i n the
correspondi ng col umn of design.
For exampl e, i f the fi rst el ement of levels i s 3, then the fi rst col umn of design
contai ns onl y i ntegers from 1 to 3.
Example I f levels = [2 4], fullfact generates an ei ght-run desi gn wi th two l evel s i n
the fi rst col umn and four i n the second col umn.
d = fullfact([2 4])
d =
1 1
2 1
1 2
2 2
1 3
2 3
1 4
2 4
See Also ff2n, dcovary, daugment, cordexch
gamcdf
2-115
2gamcdf
Purpose Gamma cumul ati ve di stri buti on functi on (cdf).
Syntax P = gamcdf(X,A,B)
Description gamcdf(X,A,B) computes the gamma cdf at each of the val ues i n X usi ng the
must al l be the same si ze. A scal ar i nput i s expanded to a constant matri x wi th
the same di mensi ons as the other i nputs. The parameters i n A and B must be
posi ti ve.
The gamma cdf i s
The resul t, p, i s the probabi l i ty that a si ngl e observati on from a gamma
di stri buti on wi th parameters a and b wi l l fal l i n the i nterval [0 x].
gammainc i s the gamma di stri buti on wi th b fi xed at 1.
Examples a = 1:6;
b = 5:10;
prob = gamcdf(a.b,a,b)
prob =
0.6321 0.5940 0.5768 0.5665 0.5595 0.5543
The mean of the gamma di stri buti on i s the product of the parameters, ab. I n
thi s exampl e, the mean approaches the medi an as i t i ncreases (i .e., the
di stri buti on becomes more symmetri c).
See Also cdf, gamfit, gaminv, gamlike, gampdf, gamrnd, gamstat
p F x a b , ( )
1
b
a
a ( )
------------------ t
a 1
e
t
b
---
t d
0
x
= =
gamfit
2-116
2gamfi t
Purpose Parameter esti mates and confi dence i nterval s for gamma di stri buted data.
Syntax phat = gamfit(x)
[phat,pci] = gamfit(x)
[phat,pci] = gamfit(x,alpha)
Description phat = gamfit(x) returns the maxi mum l i kel i hood esti mates (MLEs) for the
parameters of the gamma di stri buti on gi ven the data i n vector x.
[phat,pci] = gamfit(x) returns MLEs and 95% percent confi dence
i nterval s. The fi rst row of pci i s the l ower bound of the confi dence i nterval s;
the l ast row i s the upper bound.
[phat,pci] = gamfit(x,alpha) returns 100(1-alpha)% confi dence
Example Note that the 95% confi dence i nterval s i n the exampl e bel ow bracket the true
parameter val ues of 2 and 4.
a = 2; b = 4;
r = gamrnd(a,b,100,1);
[p,ci] = gamfit(r)
p =
2.1990 3.7426
ci =
1.6840 2.8298
2.7141 4.6554
Reference Hahn, G. J. and S.S. Shapi ro. Statistical Models in Engineering. John Wi l ey &
Sons, New York. 1994. p. 88.
See Also gamcdf, gaminv, gamlike, gampdf, gamrnd, gamstat, betafit, binofit, expfit,
normfit, poissfit, unifit, weibfit
gaminv
2-117
2gami nv
Purpose I nverse of the gamma cumul ati ve di stri buti on functi on (cdf).
Syntax X = gaminv(P,A,B)
Description X = gaminv(P,A,B) computes the i nverse of the gamma cdf wi th parameters A
and B for the correspondi ng probabi l i ti es i n P. Vector or matri x i nputs for P, A,
and B must al l be the same si ze. A scal ar i nput i s expanded to a constant matri x
al l be posi ti ve, and the val ues i n P must l i e on the i nterval [0 1].
The gamma i nverse functi on i n terms of the gamma cdf i s
where
Algorithm There i s no known anal yti cal sol uti on to the i ntegral equati on above. gaminv
uses an i terati ve approach (Newtons method) to converge on the sol uti on.
Examples Thi s exampl e shows the rel ati onshi p between the gamma cdf and i ts i nverse
functi on.
a = 1:5;
b = 6:10;
x = gaminv(gamcdf(1:5,a,b),a,b)
x =
1.0000 2.0000 3.0000 4.0000 5.0000
See Also gamcdf, gamfit, gamlike, gampdf, gamrnd, gamstat, icdf
x F
1
p a b , ( ) x:F x a b , ( ) p = { } = =
p F x a b , ( )
1
b
a
a ( )
------------------ t
a 1
e
t
b
---
t d
0
x
= =
gamlike
2-118
2gaml i ke
Purpose Negati ve gamma l og-l i kel i hood functi on.
Syntax logL = gamlike(params,data)
[logL,avar] = gamlike(params,data)
Description logL = gamlike(params,data) returns the negati ve of the gamma
l og-l i kel i hood functi on for the parameters, params, gi ven data. The l ength of
output vector logL i s the l ength of vector data.
[logL,avar] = gamlike(params,data) al so returns avar, whi ch i s the
asymptoti c vari ance-covari ance matri x of the parameter esti mates when the
gamlike i s a uti l i ty functi on for maxi mum l i kel i hood esti mati on of the gamma
di stri buti on. Si nce gamlike returns the negati ve gamma l og-l i kel i hood
functi on, mi ni mi zi ng gamlike usi ng fminsearch i s the same as maxi mi zi ng the
l i kel i hood.
Example Thi s exampl e conti nues the exampl e for gamfit.
a = 2; b = 3;
r = gamrnd(a,b,100,1);
[logL,info] = gamlike([2.1990 2.8069],r)
logL =
267.5585
info =
0.0690 -0.0790
-0.0790 0.1220
See Also betalike, gamcdf, gamfit, gaminv, gampdf, gamrnd, gamstat, mle, weiblike
gampdf
2-119
2gampdf
Purpose Gamma probabi l i ty densi ty functi on (pdf).
Syntax Y = gampdf(X,A,B)
Description gampdf(X,A,B) computes the gamma pdf at each of the val ues i n X usi ng the
must al l be the same si ze. A scal ar i nput i s expanded to a constant matri x wi th
the same di mensi ons as the other i nputs. The parameters i n A and B must al l
be posi ti ve, and the val ues i n X must l i e on the i nterval [0 ).
The gamma pdf i s
The gamma probabi l i ty densi ty functi on i s useful i n rel i abi l i ty model s of
l i feti mes. The gamma di stri buti on i s more fl exi bl e than the exponenti al
di stri buti on i n that the probabi l i ty of a product survi vi ng an addi ti onal peri od
may depend on i ts current age. The exponenti al and
2
functi ons are speci al
cases of the gamma functi on.
Examples The exponenti al di stri buti on i s a speci al case of the gamma di stri buti on.
mu = 1:5;
y = gampdf(1,1,mu)
y =
0.3679 0.3033 0.2388 0.1947 0.1637
y1 = exppdf(1,mu)
y1 =
0.3679 0.3033 0.2388 0.1947 0.1637
See Also gamcdf, gamfit, gaminv, gamlike, gamrnd, gamstat, pdf
y f x a b , ( )
1
b
a
a ( )
------------------x
a 1
e
x
b
---
= =
gamrnd
2-120
2gamrnd
Purpose Random numbers from the gamma di stri buti on.
Syntax R = gamrnd(A,B)
R = gamrnd(A,B,m)
R = gamrnd(A,B,m,n)
Description R = gamrnd(A,B) generates gamma random numbers wi th parameters A
and B. Vector or matri x i nputs for A and B must have the same si ze, whi ch i s
al so the si ze of R. A scal ar i nput for A or B i s expanded to a constant matri x wi th
R = gamrnd(A,B,m) generates gamma random numbers wi th parameters A
and B, where m i s a 1-by-2 vector that contai ns the row and col umn di mensi ons
of R.
R = gamrnd(A,B,m,n) generates gamma random numbers wi th parameters A
and B, where scal ars m and n are the row and col umn di mensi ons of R.
Examples n1 = gamrnd(1:5,6:10)
n1 =
9.1132 12.8431 24.8025 38.5960 106.4164
n2 = gamrnd(5,10,[1 5])
n2 =
30.9486 33.5667 33.6837 55.2014 46.8265
n3 = gamrnd(2:6,3,1,5)
n3 =
12.8715 11.3068 3.0982 15.6012 21.6739
See Also gamcdf, gamfit, gaminv, gamlike, gampdf, gamstat
gamstat
2-121
2gamstat
Purpose Mean and vari ance for the gamma di stri buti on.
Syntax [M,V] = gamstat(A,B)
Description [M,V] = gamstat(A,B) returns the mean and vari ance for the gamma
A and B must have the same si ze, whi ch i s al so the si ze of M and V. A scal ar i nput
for A or B i s expanded to a constant matri x wi th the same di mensi ons as the
other i nput.
The mean of the gamma di stri buti on wi th parameters a and b i s ab. The
vari ance i s ab
2
.
Examples [m,v] = gamstat(1:5,1:5)
m =
1 4 9 16 25
v =
1 8 27 64 125
[m,v] = gamstat(1:5,1./(1:5))
m =
1 1 1 1 1
v =
1.0000 0.5000 0.3333 0.2500 0.2000
See Also gamcdf, gamfit, gaminv, gamlike, gampdf, gamrnd
geocdf
2-122
2geocdf
Purpose Geometri c cumul ati ve di stri buti on functi on (cdf).
Syntax Y = geocdf(X,P)
Description geocdf(X,P) computes the geometri c cdf at each of the val ues i n X usi ng the
correspondi ng probabi l i ti es i n P. Vector or matri x i nputs for X and P must be
di mensi ons as the other i nput. The parameters i n P must l i e on the i nterval
[0 1].
The geometri c cdf i s
where .
The resul t, y, i s the probabi l i ty of observi ng up to x tri al s before a success, when
the probabi l i ty of success i n any gi ven tri al i s p.
Examples Suppose you toss a fai r coi n repeatedl y. I f the coi n l ands face up (heads), that
i s a success. What i s the probabi l i ty of observi ng three or fewer tai l s before
getti ng a heads?
p = geocdf(3,0.5)
p =
0.9375
See Also cdf, geoinv, geopdf, geornd, geostat
y F x p ( ) pq
i
i 0 =
fl oor x ( )
= =
q 1 p =
geoinv
2-123
2geoi nv
Purpose I nverse of the geometri c cumul ati ve di stri buti on functi on (cdf).
Syntax X = geoinv(Y,P)
Description X = geoinv(Y,P) returns the smal l est posi ti ve i nteger X such that the
geometri c cdf eval uated at X i s equal to or exceeds Y. You can thi nk of Y as the
probabi l i ty of observi ng X successes i n a row i n i ndependent tri al s where P i s
the probabi l i ty of success i n each tri al .
Vector or matri x i nputs for P and Y must have the same si ze, whi ch i s al so the
si ze of X. A scal ar i nput for P and Y i s expanded to a constant matri x wi th the
same di mensi ons as the other i nput. The val ues i n P and Y must l i e on the
i nterval [0 1].
Examples The probabi l i ty of correctl y guessi ng the resul t of 10 coi n tosses i n a row i s l ess
than 0.001 (unl ess the coi n i s not fai r).
psychic = geoinv(0.999,0.5)
psychic =
9
The exampl e bel ow shows the i nverse method for generati ng random numbers
from the geometri c di stri buti on.
rndgeo = geoinv(rand(2,5),0.5)
rndgeo =
0 1 3 1 0
0 1 0 2 0
See Also geocdf, geopdf, geornd, geostat, icdf
geomean
2-124
2geomean
Purpose Geometri c mean of a sampl e.
Syntax m = geomean(X)
Description geomean cal cul ates the geometri c mean of a sampl e. For vectors, geomean(x) i s
the geometri c mean of the el ements i n x. For matri ces, geomean(X) i s a row
vector contai ni ng the geometri c means of each col umn.
The geometri c mean i s
Examples The sampl e average i s greater than or equal to the geometri c mean.
x = exprnd(1,10,6);
geometric = geomean(x)
geometric =
0.7466 0.6061 0.6038 0.2569 0.7539 0.3478
average = mean(x)
average =
1.3509 1.1583 0.9741 0.5319 1.0088 0.8122
See Also mean, median, harmmean, trimmean
m x
i
i 1 =
n
1
n
---
=
geopdf
2-125
2geopdf
Purpose Geometri c probabi l i ty densi ty functi on (pdf).
Syntax Y = geopdf(X,P)
Description geocdf(X,P) computes the geometri c pdf at each of the val ues i n X usi ng the
correspondi ng probabi l i ti es i n P. Vector or matri x i nputs for X and P must be
di mensi ons as the other i nput. The parameters i n P must l i e on the i nterval
[0 1].
The geometri c pdf i s
where .
Examples Suppose you toss a fai r coi n repeatedl y. I f the coi n l ands face up (heads), that
i s a success. What i s the probabi l i ty of observi ng exactl y three tai l s before
getti ng a heads?
p = geopdf(3,0.5)
p =
0.0625
See Also geocdf, geoinv, geornd, geostat, pdf
y f x p ( ) pq
x
I
0 1 K , , ( )
x ( ) = =
q 1 p =
geornd
2-126
2geornd
Purpose Random numbers from the geometri c di stri buti on.
Syntax R = geornd(P)
R = geornd(P,m)
R = geornd(P,m,n)
Description The geometri c di stri buti on i s useful when you want to model the number of
successi ve fai l ures precedi ng a success, where the probabi l i ty of success i n any
gi ven tri al i s the constant P.
R = geornd(P) generates geometri c random numbers wi th probabi l i ty
parameter P. The si ze of R i s the si ze of P.
R = geornd(P,m) generates geometri c random numbers wi th probabi l i ty
parameter P, where m i s a 1-by-2 vector that contai ns the row and col umn
di mensi ons of R.
R = geornd(P,m,n) generates geometri c random numbers wi th probabi l i ty
parameter P, where scal ars m and n are the row and col umn di mensi ons of R.
The parameters i n P must l i e on the i nterval [0 1].
Examples r1 = geornd(1 ./ 2.^(1:6))
r1 =
2 10 2 5 2 60
r2 = geornd(0.01,[1 5])
r2 =
65 18 334 291 63
r3 = geornd(0.5,1,6)
r3 =
0 7 1 3 1 0
See Also geocdf, geoinv, geopdf, geostat
geostat
2-127
2geostat
Purpose Mean and vari ance for the geometri c di stri buti on.
Syntax [M,V] = geostat(P)
Description [M,V] = geostat(P) returns the mean and vari ance for the geometri c
di stri buti on wi th parameters speci fi ed by P.
The mean of the geometri c di stri buti on wi th parameter p i s q/p, where q = 1-p.
The vari ance i s q/p
2
.
Examples [m,v] = geostat(1./(1:6))
m =
0 1.0000 2.0000 3.0000 4.0000 5.0000
v =
0 2.0000 6.0000 12.0000 20.0000 30.0000
See Also geocdf, geoinv, geopdf, geornd
gline
2-128
2gl i ne
Purpose I nteracti vel y draw a l i ne i n a fi gure.
Syntax gline(fig)
h = gline(fig)
gline
Description gline(fig) al l ows you to draw a l i ne segment i n the fi gure fig by cl i cki ng the
poi nter at the two end-poi nts. A rubber band l i ne tracks the poi nter movement.
h = gline(fig) returns the handl e to the l i ne i n h.
gline wi th no i nput arguments draws i n the current fi gure.
See Also refline, gname
glmdemo
2-129
2gl mdemo
Purpose Demo of general i zed l i near model s.
Syntax glmdemo
Description glmdemo begi ns a sl i de show demonstrati on of general i zed l i near model s. The
sl i des i ndi cate when general i zed l i near model s are useful , how to fi t
general i zed l i near model s usi ng the glmfit functi on, and how to make
predi cti ons usi ng the glmval functi on.
See Also glmfit, glmval
glmfit
2-130
2gl mfi t
Purpose General i zed l i near model fi tti ng.
Syntax b = glmfit(X,Y,'distr')
b = glmfit(X,Y,'distr','link','estdisp',offset,pwts,'const')
[b,dev,stats] = glmfit(...)
Description b = glmfit(x,y,'distr') fi ts the general i zed l i near model for response Y,
predi ctor vari abl e matri x X, and di stri buti on 'distr'. The fol l owi ng
di stri buti ons are avai l abl e: 'binomial', 'gamma', 'inverse gaussian',
'lognormal', 'normal' (the defaul t), and 'poisson'. I n most cases Y i s a
vector of response measurements, but for the bi nomi al di stri buti on Y i s a
two-col umn array havi ng the measured number of counts i n the fi rst col umn
and the number of tri al s (the bi nomi al N parameter) i n the second col umn. X i s
a matri x havi ng the same number of rows as Y and contai ni ng the val ues of the
predi ctor vari abl es for each observati on. The output b i s a vector of coeffi ci ent
esti mates. Thi s syntax uses the canoni cal l i nk (see bel ow) to rel ate the
di stri buti on parameter to the predi ctors.
b = glmfit(x,y,'distr','link','estdisp',offset,pwts,'const')
provi des addi ti onal control over the fi t. The 'link' argument speci fi es the
rel ati onshi p between the di stri buti on parameter () and the fi tted l i near
combi nati on of predi ctor vari abl es (xb). I n most cases 'link' i s one of the
fol l owi ng:
link Meaning Default (Canonical) Link
'identity' = xb 'normal'
'log' l og() = xb 'poisson'
'logit' l og( /(1-)) = xb 'binomial'
'probit' normi nv() = xb
'comploglog' l og(-l og(1-)) = xb
'logloglink' l og(-l og()) = xb
'reciprocal' 1/ = xb 'gamma'
p (a number)
p
= xb 'inverse gaussian' (wi th p=-2)
glmfit
2-131
Al ternati vel y, you can wri te functi ons to defi ne your own custom l i nk. You
speci fy the l i nk argument as a three-el ement cel l array contai ni ng functi ons
that defi ne the l i nk functi on, i ts deri vati ve, and i ts i nverse. For exampl e,
suppose you want to defi ne a reci procal square root l i nk usi ng i nl i ne functi ons.
You coul d defi ne the vari abl e mylinks to use as your 'link' argument by
wri ti ng:
FL = inline('x.^-.5')
FD = inline('-.5*x.^-1.5')
FI = inline('x.^-2')
mylinks = {FL FI FD}
Al ternati vel y, you coul d defi ne functi ons named FL, FD, and FI i n thei r own
M-fi l es, and then speci fy mylinks i n the form
mylinks = {@FL @FD @FI}
The 'estdisp' argument can be 'on' to esti mate a di spersi on parameter for
the bi nomi al or Poi sson di stri buti on, or 'off' (the defaul t) to use the
theoreti cal val ue of 1.0 for those di stri buti ons. The glmfit functi on al ways
esti mates di spersi on parameters for other di stri buti ons.
The offset and pwts parameters can be vectors of the same l ength as Y, or can
be omi tted (or speci fi ed as an empty vector). The offset vector i s a speci al
predi ctor vari abl e whose coeffi ci ent i s known to be 1.0. As an exampl e, suppose
that you are model i ng the number of defects on vari ous surfaces, and you want
to construct a model i n whi ch the expected number of defects i s proporti onal to
the surface area. You mi ght use the number of defects as your response, al ong
wi th the Poi sson di stri buti on, the l og l i nk functi on, and the l og surface area as
an offset.
The pwts argument i s a vector of pri or wei ghts. As an exampl e, i f the response
val ue Y(i) i s the average of f(i) measurements, you coul d use f as a vector of
pri or wei ghts.
The 'const' argument can be 'on' (the defaul t) to esti mate a constant term,
or 'off' to omi t the constant term. I f you want the constant term, use thi s
argument rather than speci fyi ng a col umn of ones i n the X matri x.
[b,dev,stats] = glmfit(...) returns the addi ti onal outputs dev and stats.
dev i s the devi ance at the sol uti on vector. The devi ance i s a general i zati on of
the resi dual sum of squares. I t i s possi bl e to perform an anal ysi s of devi ance to
glmfit
2-132
compare several model s, each a subset of the other, and to test whether the
model wi th more terms i s si gni fi cantl y better than the model wi th fewer terms.
stats i s a structure wi th the fol l owi ng fi el ds:
stats.dfe = degrees of freedom for error
stats.s = theoreti cal or esti mated di spersi on parameter
stats.sfit = esti mated di spersi on parameter
stats.estdisp = 1 i f di spersi on i s esti mated, 0 i f fi xed
stats.beta = vector of coeffi ci ent esti mates (same as b)
stats.se = vector of standard errors of the coeffi ci ent esti mates b
stats.coeffcorr = correl ati on matri x for b
stats.t = t stati sti cs for b
stats.p = p-val ues for b
stats.resid = vector of resi dual s
stats.residp = vector of Pearson resi dual s
stats.residd = vector of devi ance resi dual s
stats.resida = vector of Anscombe resi dual s
I f you esti mate a di spersi on parameter for the bi nomi al or Poi sson di stri buti on,
then stats.s i s set equal to stats.sfit. Al so, the el ements of stats.se di ffer
by the factor stats.s from thei r theoreti cal val ues.
Example We have data on cars wei ghi ng between 2100 and 4300 pounds. For each car
wei ght we have the total number of cars of that wei ght, and the number that
can be consi dered to get poor mi l eage accordi ng to some test. For exampl e, 8
out of 21 cars wei ghi ng 3100 pounds get poor mi l eage accordi ng to a
measurement of the mi l es they can travel on a gal l on of gasol i ne.
w = (2100:200:4300)';
poor = [1 2 0 3 8 8 14 17 19 15 17 21]';
total = [48 42 31 34 31 21 23 23 21 16 17 21]';
We can compare several fi ts to these data. Fi rst, l ets try fi tti ng l ogi t and probi t
model s:
[bl,dl,sl] = glmfit(w,[poor total],'binomial');
[bp,dp,sp] = glmfit(w,[poor total],'binomial','probit');
glmfit
2-133
dl
dl =
6.4842
dp
dp =
7.5693
The devi ance for the l ogi t model i s smal l er than for the probi t model . Al though
thi s i s not a formal test, i t l eads us to prefer the l ogi t model .
We can do a formal test compari ng two l ogi t model s. We al ready fi t one model
usi ng w as a l i near predi ctor. Lets fi t another l ogi t model usi ng both l i near and
squared terms i n w. I f there i s no true effect for the squared term, the di fference
i n thei r devi ances shoul d be smal l compared wi th a chi -square di stri buti on
havi ng one degree of freedom.
[b2,d2,s2] = glmfit([w w.^2],[poor total],'binomial')
dl-d2
ans =
0.7027
chi2cdf(dl-d2,1)
ans =
0.5981
A di fference of 0.7072 i s not at al l unusual for a chi -square di stri buti on wi th
one degree of freedom, so the quadrati c model does not gi ve a si gni fi cantl y
better fi t than the si mpl er l i near model .
The fol l owi ng are the coeffi ci ent esti mates, thei r standard errors, t-stati sti cs,
and p-val ues for the l i near model :
[b sl.se sl.t sl.p]
ans =
-13.3801 1.3940 -9.5986 0.0000
0.0042 0.0004 9.4474 0.0000
glmfit
2-134
Thi s shows that we cannot si mpl i fy the model any further. Both the i ntercept
and sl ope coeffi ci ents are si gni fi cantl y di fferent from 0, as i ndi cated by
p-val ues that are 0.0000 to four deci mal pl aces.
See Also glmval, glmdemo, nlinfit, regress, regstats
References Dobson, A. J. An I ntroduction to Generalized Linear Models. 1990, CRC Press.
MuCul l agh, P. and J. A. Nel der. Generalized Linear Models. 2nd edi ti on, 1990,
Chapman and Hal l .
glmval
2-135
2gl mval
Purpose Compute predi cti ons for general i zed l i near model .
Syntax yfit = glmval(b,X,'link')
[yfit,dlo,dhi] = glmval(b,X,'link',stats,clev)
[yfit,dlo,dhi] = glmval(b,X,'link',stats,clev,N,offset,'const')
Description yfit = glmval(b,X,'link') computes the predi cted di stri buti on parameters
for observati ons wi th predi ctor val ues X usi ng the coeffi ci ent vector b and l i nk
functi on 'link'. Typi cal l y, b i s a vector of coeffi ci ent esti mates computed by
the glmfit functi on. The val ue of 'link' must be the same as that used i n
glmfit. The resul t yfit i s the val ue of the i nverse of the l i nk functi on at the
l i near combi nati on X*b.
[yfit,dlo,dhi] = glmval(b,X,'link',stats,clev) returns confi dence
bounds for the predi cted val ues when you suppl y the stats structure returned
from glmfit, and opti onal l y speci fy a confi dence l evel as the clev argument.
(The defaul t confi dence l evel i s 0.95 for 95% confi dence.) The i nterval
[yfit-dlo, yfit+dhi] i s a confi dence bound for the true parameter val ue at
the speci fi ed X val ues.
[yhat,dlo,dhi] = glmval(beta,X,'link',stats,clev,N,offset,'const')
speci fi es three addi ti onal arguments that may be needed i f you used certai n
arguments to glmfit. I f you fi t a bi nomi al di stri buti on usi ng glmfit, speci fy N
as the val ue of the bi nomi al N parameter for the predi cti ons. I f you i ncl uded an
offset vari abl e, speci fy offset as the new val ue of thi s vari abl e. Use the same
'const' val ue ('on' or 'off') that you used wi th glmfit.
Example Lets model the number of cars wi th poor gasol i ne mi l eage usi ng the bi nomi al
di stri buti on. Fi rst we use the bi nomi al di stri buti on wi th the defaul t l ogi t l i nk
to model the probabi l i ty of havi ng poor mi l eage as a functi on of the wei ght and
squared wei ght of the cars. Then we compute a vector wnew of new car wei ghts
at whi ch we want to make predi cti ons. Next we compute the expected number
of cars, out of a total of 30 cars of each wei ght, that woul d have poor mi l eage.
Fi nal l y we graph the predi cted val ues and 95% confi dence bounds as a functi on
of wei ght.
w = [2100 2300 2500 2700 2900 3100 3300 3500 3700 3900 4100 4300]';
poor = [1 2 0 3 8 8 14 17 19 15 17 21]';
total = [48 42 31 34 31 21 23 23 21 16 17 21]';
glmval
2-136
[b2,d2,s2] = glmfit([w w.^2],[poor total],'binomial')
wnew = (3000:100:4000)';
[yfit,dlo,dhi] = glmval(b2,[wnew wnew.^2],'logit',s2,0.95,30)
errorbar(wnew,yfit,dlo,dhi);
See Also glmfit, glmdemo
2800 3000 3200 3400 3600 3800 4000 4200
5
10
15
20
25
30
gname
2-137
2gname
Purpose Label pl otted poi nts wi th thei r case names or case number.
Syntax gname('cases')
gname
h = gname('cases',line_handle)
Description gname('cases') di spl ays a fi gure wi ndow, di spl ays cross-hai rs, and wai ts for
a mouse button or keyboard key to be pressed. Posi ti on the cross-hai r wi th the
mouse and cl i ck once near each poi nt to l abel that poi nt. I nput 'cases' i s a
stri ng matri x wi th each row the case name of a data poi nt. You can al so cl i ck
and drag a sel ecti on rectangl e to l abel al l poi nts wi thi n the rectangl e. When
you are done, press the Enter or Escape key.
gname wi th no arguments l abel s each case wi th i ts case number.
h = gname('cases',line_handle) returns a vector of handl es to the text
objects on the pl ot. Use the scal ar line_handle to i denti fy the correct l i ne i f
there i s more than one l i ne object on the pl ot.
You can use gname to l abel pl ots created by the plot, scatter, gscatter,
plotmatrix, and gplotmatrix functi ons.
Example Lets use the ci ty rati ngs data sets to fi nd out whi ch ci ti es are the best and
worst for educati on and the arts. We create a graph, cal l the gname functi on,
and cl i ck on the poi nts at the extreme l eft and at the top.
load cities
education = ratings(:,6);
arts = ratings(:,7);
plot(education,arts,'+')
gname(names)
gname
2-138
See Also gplotmatrix, gscatter, gtext, plot, plotmatrix, scatter
1500 2000 2500 3000 3500 4000
0
1
2
3
4
5
6
x 10
4
Pascagoula, MS
New York, NY
gplotmatrix
2-139
2gpl otmatri x
Purpose Pl ot matri x of scatter pl ots by group.
Syntax gplotmatrix(x,y,g)
gplotmatrix(x,y,g,'clr','sym',siz)
gplotmatrix(x,y,g,'clr','sym',siz,'doleg')
gplotmatrix(x,y,g,'clr','sym',siz,'doleg','dispopt')
gplotmatrix(x,y,g,'clr','sym',siz,'doleg','dispopt','xnam','ynam')
[h,ax,bigax] = gplotmatrix(...)
Description gplotmatrix(x,y,g) creates a matri x of scatter pl ots. Each i ndi vi dual set of
axes i n the resul ti ng fi gure contai ns a scatter pl ot of a col umn of x agai nst a
col umn of y. Al l pl ots are grouped by the groupi ng vari abl e g.
x and y are matri ces wi th the same number of rows. I f x has p col umns and y
has q col umns, the fi gure contai ns a p-by-q matri x of scatter pl ots. I f you omi t
y or speci fy i t as the empty matri x, [], gplotmatrix creates a square matri x of
scatter pl ots of col umns of x agai nst each other.
g i s a groupi ng vari abl e that can be a vector, stri ng array, or cel l array of
stri ngs. g must have the same number of rows as x and y. Poi nts wi th the same
val ue of g are pl aced i n the same group, and appear on the graph wi th the same
marker and col or. Al ternati vel y, g can be a cel l array contai ni ng several
groupi ng vari abl es (such as {G1 G2 G3}); i n that case, observati ons are i n the
same group i f they have common val ues of al l groupi ng vari abl es.
gplotmatrix(x,y,g,'clr','sym',siz) speci fi es the col or, marker type, and
si ze for each group. clr i s a stri ng array of col ors recogni zed by the plot
functi on. The defaul t i s 'clr' = 'bgrcmyk'. 'sym' i s a stri ng array of symbol s
recogni zed by the plot command, wi th the defaul t val ue '.'. siz i s a vector of
si zes, wi th the defaul t determi ned by the 'defaultlinemarkersize' property.
I f you do not speci fy enough val ues for al l groups, gplotmatrix cycl es through
the speci fi ed val ues as needed.
gplotmatrix(x,y,g,'clr','sym',siz,'doleg') control s whether a l egend i s
di spl ayed on the graph ('doleg' = 'on', the defaul t) or not ('doleg' = 'off').
gplotmatrix
2-140
gplotmatrix(x,y,g,'clr','sym',siz,'doleg','dispopt') control s what
appears al ong the di agonal of a pl ot matri x of x versus x. Al l owabl e val ues are
'none' to l eave the di agonal s bl ank, 'hist' (the defaul t) to pl ot hi stograms, or
'variable' to wri te the vari abl e names.
gplotmatrix(x,y,g,'clr','sym',siz,'doleg','dispopt','xnam','ynam')
speci fi es the names of the col umns i n the x and y arrays. These names are used
to l abel the x- and y-axes. 'xnam' and 'ynam' must be character arrays wi th
one row for each col umn of x and y, respecti vel y.
[h,ax,bigax] = gplotmatrix(...) returns three arrays of handl es. h i s an
array of handl es to the l i nes on the graphs. ax i s a matri x of handl es to the axes
of the i ndi vi dual pl ots. bigax i s a handl e to bi g (i nvi si bl e) axes frami ng the
enti re pl ot matri x. These are l eft as the current axes, so a subsequent title,
xlabel, or ylabel command wi l l produce l abel s that are centered wi th respect
to the enti re pl ot matri x.
Example Load the cities data. The ratings array has rati ngs of the ci ti es i n ni ne
categori es (category names are i n the array categories). group i s a code whose
val ue i s 2 for the l argest ci ti es. We can make scatter pl ots of the fi rst three
categori es agai nst the other four, grouped by the ci ty si ze code.
load discrim
gplotmatrix(ratings(:,1:3),ratings(:,4:7),group)
The output fi gure (not shown) has an array of graphs wi th each ci ty group
represented by a di fferent col or. The graphs are a l i ttl e easi er to read i f we
speci fy col ors and pl otti ng symbol s, l abel the axes wi th the rati ng categori es,
and move the l egend off the graphs.
gplotmatrix(ratings(:,1:3),ratings(:,4:7),group,...
'br','.o',[],'on','',categories(1:3,:),...
categories(4:7,:))
gplotmatrix
2-141
See Also grpstats, gscatter, plotmatrix
0 2000 4000 6000 8000
health
0.5 1 1.5 2
x 10
4 housing
200 400 600 800
0
2
4
x 10
4
climate
a
r
t
s
2000
2500
3000
3500
e
d
u
c
a
t
i
o
n
2000
4000
6000
8000
t
r
a
n
s
p
o
r
t
a
t
i
o
n
500
1000
1500
2000
2500
c
r
i
m
e
1
2
grpstats
2-142
2grpstats
Purpose Summary stati sti cs by group.
Syntax means = grpstats(X,group)
[means,sem,counts,name] = grpstats(X,group)
grpstats(x,group,alpha)
Description means = grpstats(X,group) returns the means of each col umn of X by group,
where X i s a matri x of observati ons. group i s an array that defi nes the groupi ng
such that two el ements of X are i n the same group i f thei r correspondi ng group
val ues are the same. The groupi ng vari abl e group can be a vector, stri ng array,
or cel l array of stri ngs. I t can al so be a cel l array contai ni ng several groupi ng
vari abl es (such as {G1 G2 G3}); i n that case observati ons are i n the same group
i f they have common val ues of al l groupi ng vari abl es.
[means,sem,counts,name] = grpstats(x,group,alpha) suppl i es the
standard error of the mean i n sem, the number of el ements i n each group i n
counts, and the name of each group i n name. name i s useful to i denti fy and l abel
the groups when the i nput group val ues are not si mpl e group numbers.
grpstats(x,group,alpha) pl ots 100(1-alpha)% confi dence i nterval s around
each mean.
Example We assi gn 100 observati ons to one of four groups. For each observati on we
measure fi ve quanti ti es wi th true means from 1 to 5. grpstats al l ows us to
compute the means for each group.
group = unidrnd(4,100,1);
true_mean = 1:5;
true_mean = true_mean(ones(100,1),:);
x = normrnd(true_mean,1);
means = grpstats(x,group)
means =
0.7947 2.0908 2.8969 3.6749 4.6555
0.9377 1.7600 3.0285 3.9484 4.8169
1.0549 2.0255 2.8793 4.0799 5.3740
0.7107 1.9264 2.8232 3.8815 4.9689
See Also tabulate, crosstab
gscatter
2-143
2gscatter
Purpose Scatter pl ot by group.
Syntax gscatter(x,y,g)
gscatter(x,y,g,'clr','sym',siz)
gscatter(x,y,g,'clr','sym',siz,'doleg')
gscatter(x,y,g,'clr','sym',siz,'doleg','xnam','ynam')
h = gscatter(...)
Description gscatter(x,y,g) creates a scatter pl ot of x and y, grouped by g, where x and y
are vectors wi th the same si ze and g can be a vector, stri ng array, or cel l array
of stri ngs. Poi nts wi th the same val ue of g are pl aced i n the same group, and
appear on the graph wi th the same marker and col or. Al ternati vel y, g can be a
cel l array contai ni ng several groupi ng vari abl es (such as {G1 G2 G3}); i n that
case, observati ons are i n the same group i f they have common val ues of al l
groupi ng vari abl es.
gscatter(x,y,g,'clr','sym',siz) speci fi es the col or, marker type, and si ze
for each group. 'clr' i s a stri ng array of col ors recogni zed by the plot functi on.
The defaul t i s 'clr' = 'bgrcmyk'. 'sym' i s a stri ng array of symbol s recogni zed
by the plot command, wi th the defaul t val ue '.'. siz i s a vector of si zes, wi th
the defaul t determi ned by the 'defaultlinemarkersize' property. I f you do
not speci fy enough val ues for al l groups, gscatter cycl es through the speci fi ed
val ues as needed.
gscatter(x,y,g,'clr','sym',siz,'doleg') control s whether a l egend i s
di spl ayed on the graph ('doleg' = 'on', the defaul t) or not ('doleg' = 'off').
gscatter(x,y,g,'clr','sym',siz,'doleg','xnam','ynam') speci fi es the
name to use for the x-axi s and y-axi s l abel s. I f the x and y i nputs are si mpl e
vari abl e names and xnam and ynam are omi tted, gscatter l abel s the axes wi th
the vari abl e names.
h = gscatter(...) returns an array of handl es to the l i nes on the graph.
Example Load the cities data and l ook at the rel ati onshi p between the rati ngs for
cl i mate (fi rst col umn) and housi ng (second col umn) grouped by ci ty si ze. Wel l
al so speci fy the col ors and pl otti ng symbol s.
gscatter
2-144
load discrim
gscatter(ratings(:,1),ratings(:,2),group,'br','xo')
See Also gplotmatrix, grpstats, scatter
100 200 300 400 500 600 700 800 900 1000
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
x 10
4
1
2
harmmean
2-145
2harmmean
Purpose Harmoni c mean of a sampl e of data.
Syntax m = harmmean(X)
Description m = harmmean(X) cal cul ates the harmoni c mean of a sampl e. For vectors,
harmmean(x) i s the harmoni c mean of the el ements i n x. For matri ces,
harmmean(X) i s a row vector contai ni ng the harmoni c means of each col umn.
The harmoni c mean i s
Examples The sampl e average i s greater than or equal to the harmoni c mean.
x = exprnd(1,10,6);
harmonic = harmmean(x)
harmonic =
0.3382 0.3200 0.3710 0.0540 0.4936 0.0907
average = mean(x)
average =
1.3509 1.1583 0.9741 0.5319 1.0088 0.8122
See Also mean, median, geomean, trimmean
m
n
1
x
i
----
i 1 =
n
--------------- =
hist
2-146
2hi st
Purpose Pl ot hi stograms.
Syntax hist(y)
hist(y,nb)
hist(y,x)
[n,x] = hist(y,...)
Description hist(y) draws a 10-bi n hi stogram for the data i n vector y. The bi ns are equal l y
spaced between the mi ni mum and maxi mum val ues i n y.
hist(y,nb) draws a hi stogram wi th nb bi ns.
hist(y,x) draws a hi stogram usi ng the bi ns i n the vector x.
[n,x] = hist(y,...) do not draw graphs, but return vectors n and x
contai ni ng the frequency counts and the bi n l ocati ons such that bar(x,n) pl ots
the hi stogram. Thi s i s useful i n si tuati ons where more control i s needed over
the appearance of a graph, for exampl e, to combi ne a hi stogram i nto a more
el aborate pl ot statement.
The hist functi on i s a part of the standard MATLAB l anguage.
Examples Generate bel l -curve hi stograms from Gaussi an data.
x = -2.9:0.1:2.9;
y = normrnd(0,1,1000,1);
hist(y,x)
-3 -2 -1 0 1 2 3
0
10
20
30
40
50
histfit
2-147
2hi stfi t
Purpose Hi stogram wi th superi mposed normal densi ty.
Syntax histfit(data)
histfit(data,nbins)
h = histfit(data,nbins)
Description histfit(data,nbins) pl ots a hi stogram of the val ues i n the vector data usi ng
nbins bars i n the hi stogram. Wi th nbins i s omi tted, i ts val ue i s set to the
square root of the number of el ements i n data.
h = histfit(data,nbins) returns a vector of handl es to the pl otted l i nes,
where h(1) i s the handl e to the hi stogram, h(2) i s the handl e to the densi ty
curve.
Example r = normrnd(10,1,100,1);
histfit(r)
See Also hist, normfit
7 8 9 10 11 12 13
0
5
10
15
20
25
hougen
2-148
2hougen
Purpose Hougen-Watson model for reacti on ki neti cs.
Syntax yhat = hougen(beta,x)
Description yhat = hougen(beta,x) returns the predi cted val ues of the reacti on rate,
yhat, as a functi on of the vector of parameters, beta, and the matri x of data, X.
beta must have 5 el ements and X must have three col umns.
hougen i s a uti l i ty functi on for rsmdemo.
The model form i s:
Reference Bates, D., and D. Watts. Nonlinear Regression Analysis and I ts Applications.
Wi l ey 1988. p. 271272.
See Also rsmdemo
y

1
x
2
x
3

5

1
2
x
1

3
x
2

4
x
3
+ + +
----------------------------------------------------------- =
hygecdf
2-149
2hygecdf
Purpose Hypergeometri c cumul ati ve di stri buti on functi on (cdf).
Syntax P = hygecdf(X,M,K,N)
Description hygecdf(X,M,K,N) computes the hypergeometri c cdf at each of the val ues i n X
usi ng the correspondi ng parameters i n M, K, and N. Vector or matri x i nputs for
X, M, K, and N must al l have the same si ze. A scal ar i nput i s expanded to a
constant matri x wi th the same di mensi ons as the other i nputs.
The hypergeometri c cdf i s
The resul t, p, i s the probabi l i ty of drawi ng up to x of a possi bl e K i tems i n N
drawi ngs wi thout repl acement from a group of M objects.
Examples Suppose you have a l ot of 100 fl oppy di sks and you know that 20 of them are
defecti ve. What i s the probabi l i ty of drawi ng zero to two defecti ve fl oppi es i f
you sel ect 10 at random?
p = hygecdf(2,100,20,10)
p =
0.6812
See Also cdf, hygeinv, hygepdf, hygernd, hygestat
p F x M K N , , ( )
K
i
,
_
M K
N i
,
_
M
N ,
_
-------------------------------
i 0 =
x
= =
hygeinv
2-150
2hygei nv
Purpose I nverse of the hypergeometri c cumul ati ve di stri buti on functi on (cdf).
Syntax X = hygeinv(P,M,K,N)
Description hygeinv(P,M,K,N) returns the smal l est i nteger X such that the
hypergeometri c cdf eval uated at X equal s or exceeds P. You can thi nk of P as the
probabi l i ty of observi ng X defecti ve i tems i n N drawi ngs wi thout repl acement
from a group of M i tems where K are defecti ve.
Examples Suppose you are the Qual i ty Assurance manager for a fl oppy di sk
manufacturer. The producti on l i ne turns out fl oppy di sks i n batches of 1,000.
You want to sampl e 50 di sks from each batch to see i f they have defects. You
want to accept 99% of the batches i f there are no more than 10 defecti ve di sks
i n the batch. What i s the maxi mum number of defecti ve di sks shoul d you al l ow
i n your sampl e of 50?
x = hygeinv(0.99,1000,10,50)
x =
3
What i s the medi an number of defecti ve fl oppy di sks i n sampl es of 50 di sks
from batches wi th 10 defecti ve di sks?
x = hygeinv(0.50,1000,10,50)
x =
0
See Also hygecdf, hygepdf, hygernd, hygestat, icdf
hygepdf
2-151
2hygepdf
Purpose Hypergeometri c probabi l i ty densi ty functi on (pdf).
Syntax Y = hygepdf(X,M,K,N)
Description Y = hygecdf(X,M,K,N) computes the hypergeometri c pdf at each of the val ues
i n X usi ng the correspondi ng parameters i n M, K, and N. Vector or matri x i nputs
for X, M, K, and N must al l have the same si ze. A scal ar i nput i s expanded to a
constant matri x wi th the same di mensi ons as the other i nputs.
The parameters i n M, K, and N must al l be posi ti ve i ntegers, wi th N M. The
val ues i n X must be l ess than or equal to al l the parameter val ues.
The hypergeometri c pdf i s
The resul t, y, i s the probabi l i ty of drawi ng exactl y x of a possi bl e K i tems i n n
drawi ngs wi thout repl acement from a group of M objects.
Examples Suppose you have a l ot of 100 fl oppy di sks and you know that 20 of them are
defecti ve. What i s the probabi l i ty of drawi ng 0 through 5 defecti ve fl oppy di sks
i f you sel ect 10 at random?
p = hygepdf(0:5,100,20,10)
p =
0.0951 0.2679 0.3182 0.2092 0.0841 0.0215
See Also hygecdf, hygeinv, hygernd, hygestat, pdf
y f x M K N , , ( )
K
x
,
_
M K
N x
,
_
M
N ,
_
------------------------------- = =
hygernd
2-152
2hygernd
Purpose Random numbers from the hypergeometri c di stri buti on.
Syntax R = hygernd(M,K,N)
R = hygernd(M,K,N,mm)
R = hygernd(M,K,N,mm,nn)
Description R = hygernd(M,K,N) generates hypergeometri c random numbers wi th
parameters M, K, and N. Vector or matri x i nputs for M, K, and N must have the
same si ze, whi ch i s al so the si ze of R. A scal ar i nput for M, K, or N i s expanded to
a constant matri x wi th the same di mensi ons as the other i nputs.
R = hygernd(M,K,N,mm) generates hypergeometri c random numbers wi th
parameters M, K, and N, where mm i s a 1-by-2 vector that contai ns the row and
R = hygernd(M,K,N,mm,nn) generates hypergeometri c random numbers wi th
parameters M, K, and N, where scal ars mm and nn are the row and col umn
di mensi ons of R.
Examples numbers = hygernd(1000,40,50)
numbers =
1
See Also hygecdf, hygeinv, hygepdf, hygestat
hygestat
2-153
2hygestat
Purpose Mean and vari ance for the hypergeometri c di stri buti on.
Syntax [MN,V] = hygestat(M,K,N)
Description [MN,V] = hygestat(M,K,N) returns the mean and vari ance for the
hypergeometri c di stri buti on wi th parameters speci fi ed by M, K, and N. Vector or
matri x i nputs for M, K, and N must have the same si ze, whi ch i s al so the si ze of
MN and V. A scal ar i nput for M, K, or N i s expanded to a constant matri x wi th the
same di mensi ons as the other i nputs.
The mean of the hypergeometri c di stri buti on wi th parameters M, K, and N i s
NK/M, and the vari ance i s
Examples The hypergeometri c di stri buti on approaches the bi nomi al di stri buti on, where
p = K /M as M goes to i nfi ni ty.
[m,v] = hygestat(10.^(1:4),10.^(0:3),9)
m =
0.9000 0.9000 0.9000 0.9000
v =
0.0900 0.7445 0.8035 0.8094
[m,v] = binostat(9,0.1)
m =
0.9000
v =
0.8100
See Also hygecdf, hygeinv, hygepdf, hygernd
N
K
M
-----
M K
M
----------------
M N
M 1
----------------
icdf
2-154
2i cdf
Purpose I nverse of a speci fi ed cumul ati ve di stri buti on functi on (i cdf).
Syntax X = icdf('name',P,A1,A2,A3)
Description X = icdf('name',P,A1,A2,A3) returns a matri x of cri ti cal val ues, X, where
'name' i s a stri ng contai ni ng the name of the di stri buti on. P i s a matri x of
probabi l i ti es, and A, B, and C are matri ces of di stri buti on parameters.
Dependi ng on the di stri buti on some of the parameters may not be necessary.
Vector or matri x i nputs for P, A1, A2, and A3 must al l have the same si ze. A
scal ar i nput i s expanded to a constant matri x wi th the same di mensi ons as the
other i nputs.
icdf i s a uti l i ty routi ne al l owi ng you to access al l the i nverse cdfs i n the
Stati sti cs Tool box usi ng the name of the di stri buti on as a parameter. See
Overvi ew of the Di stri buti ons on page 1-12 for the l i st of avai l abl e
di stri buti ons.
Examples x = icdf('Normal',0.1:0.2:0.9,0,1)
x =
-1.2816 -0.5244 0 0.5244 1.2816
x = icdf('Poisson',0.1:0.2:0.9,1:5)
x =
1 1 3 5 8
See Also betainv, binoinv, cdf, chi2inv, expinv, finv, gaminv, geoinv, hygeinv,
logninv, nbininv, ncfinv, nctinv, ncx2inv, norminv, pdf, poissinv, random,
raylinv, tinv, unidinv, unifinv, weibinv
inconsistent
2-155
2i nconsi stent
Purpose Cal cul ate the i nconsi stency coeffi ci ent of a cl uster tree.
Syntax Y = inconsistent(Z)
Y = inconsistent(Z,d)
Description Y = inconsistent(Z) computes the i nconsi stency coeffi ci ent for each l i nk of
the hi erarchi cal cl uster tree Z, where Z i s an (m-1)-by-3 matri x generated by the
linkage functi on. The i nconsi stency coeffi ci ent characteri zes each l i nk i n a
cl uster tree by compari ng i ts l ength wi th the average l ength of other l i nks at
the same l evel of the hi erarchy. The hi gher the val ue of thi s coeffi ci ent, the l ess
si mi l ar the objects connected by the l i nk.
Y = inconsistent(Z,d) computes the i nconsi stency coeffi ci ent for each l i nk
i n the hi erarchi cal cl uster tree Z to depth d, where d i s an i nteger denoti ng the
number of l evel s of the cl uster tree that are i ncl uded i n the cal cul ati on. By
defaul t, d=2.
The output, Y, i s an (m-1)-by-4 matri x formatted as fol l ows.
For each l i nk, k, the i nconsi stency coeffi ci ent i s cal cul ated as:
For l eaf nodes, nodes that have no further nodes under them, the i nconsi stency
coeffi ci ent i s set to 0.
Column Description
1 Mean of the l engths of al l the l i nks i ncl uded i n the cal cul ati on.
2 Standard devi ati on of al l the l i nks i ncl uded i n the cal cul ati on.
3 Number of l i nks i ncl uded i n the cal cul ati on.
4 I nconsi stency coeffi ci ent.
Y k 4 , ( ) z k 3 , ( ) Y k 1 , ( ) ( ) Y k 2 , ( ) =
inconsistent
2-156
X = rand(10,2);
Y = pdist(X);
Z = linkage(Y,'centroid');
W = inconsistent(Z,3)
W =
0.0423 0 1.0000 0
0.1406 0 1.0000 0
0.1163 0.1047 2.0000 0.7071
0.2101 0 1.0000 0
0.2054 0.0886 3.0000 0.6792
0.1742 0.1762 3.0000 0.6568
0.2336 0.1317 4.0000 0.6408
0.3081 0.2109 5.0000 0.7989
0.4610 0.3728 4.0000 0.8004
See Also cluster, cophenet, clusterdata, dendrogram, linkage, pdist, squareform
iqr
2-157
2i qr
Purpose I nterquarti l e range (I QR) of a sampl e.
Syntax y = iqr(X)
Description y = iqr(X) computes the di fference between the 75th and the 25th percenti l es
of the sampl e i n X. The I QR i s a robust esti mate of the spread of the data, si nce
changes i n the upper and l ower 25% of the data do not affect i t.
I f there are outl i ers i n the data, then the I QR i s more representati ve than the
standard devi ati on as an esti mate of the spread of the body of the data. The
I QR i s l ess effi ci ent than the standard devi ati on as an esti mate of the spread
when the data i s al l from the normal di stri buti on.
Mul ti pl y the I QR by 0.7413 to esti mate (the second parameter of the normal
di stri buti on.)
Examples Thi s Monte Carl o si mul ati on shows the rel ati ve effi ci ency of the I QR to the
sampl e standard devi ati on for normal data.
x = normrnd(0,1,100,100);
s = std(x);
s_IQR = 0.7413 iqr(x);
efficiency = (norm(s - 1)./norm(s_IQR - 1)).^2
efficiency =
0.3297
See Also std, mad, range
jbtest
2-158
2jbtest
Purpose Jarque-Bera test for goodness-of-fi t to a normal di stri buti on.
Syntax H = jbtest(X)
H = jbtest(X,alpha)
[H,P,JBSTAT,CV] = jbtest(X,alpha)
Description H = jbtest(X) performs the Jarque-Bera test on the i nput data vector X and
returns H, the resul t of the hypothesi s test. The resul t i s H=1 i f we can reject the
hypothesi s that X has a normal di stri buti on, or H=0 i f we cannot reject that
hypothesi s. We reject the hypothesi s i f the test i s si gni fi cant at the 5% l evel .
The Jarque-Bera test eval uates the hypothesi s that X has a normal di stri buti on
wi th unspeci fi ed mean and vari ance, agai nst the al ternati ve that X does not
have a normal di stri buti on. The test i s based on the sampl e skewness and
kurtosi s of X. For a true normal di stri buti on, the sampl e skewness shoul d be
near 0 and the sampl e kurtosi s shoul d be near 3. The Jarque-Bera test
determi nes whether the sampl e skewness and kurtosi s are unusual l y di fferent
than thei r expected val ues, as measured by a chi -square stati sti c.
The Jarque-Bera test i s an asymptoti c test, and shoul d not be used wi th smal l
sampl es. You may want to use lillietest i n pl ace of jbtest for smal l sampl es.
H = jbtest(X,alpha) performs the Jarque-Bera test at the 100*alpha% l evel
rather than the 5% l evel , where alpha must be between 0 and 1.
[H,P,JBSTAT,CV] = jbtest(X,alpha) returns three addi ti onal outputs. P i s
the p-val ue of the test, JBSTAT i s the val ue of the test stati sti c, and CV i s the
cri ti cal val ue for determi ni ng whether to reject the nul l hypothesi s.
Example We can use jbtest to determi ne i f car wei ghts fol l ow a normal di stri buti on.
load carsmall
[h,p,j] = jbtest(Weight)
jbtest
2-159
h =
1
p =
0.026718
j =
7.2448
Wi th a p-val ue of 2.67%, we reject the hypothesi s that the di stri buti on i s
normal . Wi th a l og transformati on, the di stri buti on becomes cl oser to normal
but i s sti l l si gni fi cantl y di fferent at the 5% l evel .
[h,p,j] = jbtest(log(Weight))
h =
1
p =
0.043474
j =
6.2712
See lillietest for a di fferent test of the same hypothesi s.
Reference Judge, G. G., R. C. Hi l l , W. E. Gri ffi ths, H. Lutkepohl , and T.-C. Lee.
I ntroduction to the Theory and Practice of Econometrics. New York, Wi l ey.
See Also hist, kstest2, lillietest
kruskalwallis
2-160
2kruskal wal l i s
Purpose Kruskal -Wal l i s nonparametri c one-way Anal ysi s of Vari ance (ANOVA).
Syntax p = kruskalwallis(X)
p = kruskalwallis(X,group)
p = kruskalwallis(X,group,'displayopt')
[p,table] = kruskalwallis(...)
[p,table,stats] = kruskalwallis(...)
Description p = kruskalwallis(X) performs a Kruskal -Wal l i s test for compari ng the
means of col umns of the m-by-n matri x X, where each col umn represents an
i ndependent sampl e contai ni ng m mutual l y i ndependent observati ons. The
Kruskal -Wal l i s test i s a nonparametri c versi on of the cl assi cal one-way
ANOVA. The functi on returns the p-val ue for the nul l hypothesi s that al l
sampl es i n X are drawn from the same popul ati on (or from di fferent
popul ati ons wi th the same mean).
I f the p-val ue i s near zero, thi s casts doubt on the nul l hypothesi s and suggests
that at l east one sampl e mean i s si gni fi cantl y di fferent than the other sampl e
means. The choi ce of a cri ti cal p-val ue to determi ne whether the resul t i s
judged stati sti cal l y si gni fi cant i s l eft to the researcher. I t i s common to
decl are a resul t si gni fi cant i f the p-val ue i s l ess than 0.05 or 0.01.
The kruskalwallis functi on di spl ays two fi gures. The fi rst fi gure i s a standard
ANOVA tabl e, cal cul ated usi ng the ranks of the data rather than thei r numeri c
val ues. Ranks are found by orderi ng the data from smal l est to l argest across al l
groups, and taki ng the numeri c i ndex of thi s orderi ng. The rank for a ti ed
observati on i s equal to the average rank of al l observati ons ti ed wi th i t. For
exampl e, the fol l owi ng tabl e shows the ranks for a smal l sampl e.
The entri es i n the ANOVA tabl e are the usual sums of squares, degrees of
freedom, and other quanti ti es cal cul ated on the ranks. The usual F stati sti c i s
repl aced by a chi -square stati sti c. The p-val ue measures the si gni fi cance of the
chi -square stati sti c.
The second fi gure di spl ays box pl ots of each col umn of X (not the ranks of X).
X value 1.4 2.7 1.6 1.6 3.3 0.9 1.1
Rank 3 6 4.5 4.5 7 1 2
kruskalwallis
2-161
p = kruskalwallis(X,group) uses the val ues i n group (a character array or
cel l array) as l abel s for the box pl ot of the sampl es i n X, when X i s a matri x.
Each row of group contai ns the l abel for the data i n the correspondi ng col umn
of X, so group must have l ength equal to the number of col umns i n X.
When X i s a vector, kruskalwallis performs a Kruskal -Wal l i s test on the
sampl es contai ned i n X, as i ndexed by i nput group (a vector, character array,
or cel l array). Each el ement i n group i denti fi es the group (i .e., sampl e) to whi ch
the correspondi ng el ement i n vector X bel ongs, so group must have the same
l ength as X. The l abel s contai ned i n group are al so used to annotate the box
pl ot.
I t i s not necessary to l abel sampl es sequenti al l y (1, 2, 3, ...). For exampl e, i f X
contai ns measurements taken at three di fferent temperatures, -27, 65, and
110, you coul d use these numbers as the sampl e l abel s i n group. I f a row of
group contai ns an empty cel l or empty stri ng, that row and the correspondi ng
observati on i n X are di sregarded. NaNs i n ei ther i nput are si mi l arl y i gnored.
p = kruskalwallis(X,group,'displayopt') enabl es the tabl e and box pl ot
di spl ays when 'displayopt' i s 'on' (defaul t) and suppresses the di spl ays
when 'displayopt' i s 'off'.
[p,table] = kruskalwallis(...) returns the ANOVA tabl e (i ncl udi ng
col umn and row l abel s) i n cel l array table. (You can copy a text versi on of the
ANOVA tabl e to the cl i pboard by usi ng the Copy Text i tem on the Edit menu.)
[p,table,stats] = kruskalwallis(...) returns a stats structure that you
can use to perform a fol l ow-up mul ti pl e compari son test. The kruskalwallis
test eval uates the hypothesi s that al l sampl es have the same mean, agai nst the
al ternati ve that the means are not al l the same. Someti mes i t i s preferabl e to
perform a test to determi ne which pai rs of means are si gni fi cantl y di fferent,
and whi ch are not. You can use the multcompare functi on to perform such tests
by suppl yi ng the stats structure as i nput.
Assumptions
The Kruskal -Wal l i s test makes the fol l owi ng assumpti ons about the data i n X:
Al l sampl e popul ati ons have the same conti nuous di stri buti on, apart from a
possi bl y di fferent l ocati on.
kruskalwallis
2-162
The cl assi cal one-way ANOVA test repl aces the fi rst assumpti on wi th the
stronger assumpti on that the popul ati ons have normal di stri buti ons.
Example Lets revi si t the same materi al strength study that we used wi th the anova1
functi on, to see i f the nonparametri c Kruskal -Wal l i s procedure l eads to the
same concl usi on. Recal l we are studyi ng the strength of beams made from
three al l oys:
strength = [82 86 79 83 84 85 86 87 74 82 78 75 76 77 79 ...
79 77 78 82 79];
'al1','al1','al1','al1','al1','al1',...
'al2','al2','al2','al2','al2','al2'};
Thi s ti me we try both cl assi cal and Kruskal -Wal l i s anova, omi tti ng di spl ays:
anova1(strength,alloy,'off')
ans =
1.5264e-004
kruskalwallis(strength,alloy,'off')
ans =
0.0018
Both tests fi nd that the three al l oys are si gni fi cantl y di fferent, though the
resul t i s l ess si gni fi cant accordi ng to the Kruskal -Wal l i s test. I t i s typi cal that
when a dataset has a reasonabl e fi t to the normal di stri buti on, the cl assi cal
ANOVA test wi l l be more sensi ti ve to di fferences between groups.
To understand when a nonparametri c test may be more appropri ate, l ets see
how the tests behave when the di stri buti on i s not normal . We can si mul ate thi s
by repl aci ng one of the val ues by an extreme val ue (an outl i er).
strength(20)=120;
anova1(strength,alloy,'off')
ans =
0.2501
kruskalwallis
2-163
kruskalwallis(strength,alloy,'off')
ans =
0.0060
Now the cl assi cal ANOVA test does not fi nd a si gni fi cant di fference, but the
nonparametri c procedure does. Thi s i l l ustrates one of the properti es of
nonparametri c procedures they are often not severel y affected by changes i n
a smal l porti on of the data.
Reference Hol l ander, M., and D. A. Wol fe, Nonparametric Statistical Methods, Wi l ey,
1973.
See Also anova1, boxplot, multcompare
kstest
2-164
2kstest
Purpose Kol mogorov-Smi rnov test of the di stri buti on of one sampl e.
Syntax H = kstest(X)
H = kstest(X,cdf)
H = kstest(X,cdf,alpha,tail)
[H,P,KSSTAT,CV] = kstest(X,cdf,alpha,tail)
Description H = kstest(X) performs a Kol mogorov-Smi rnov test to compare the val ues i n
the data vector X wi th a standard normal di stri buti on (that i s, a normal
di stri buti on havi ng mean 0 and vari ance 1). The nul l hypothesi s for the
Kol mogorov-Smi rnov test i s that X has a standard normal di stri buti on. The
al ternati ve hypothesi s that X does not have that di stri buti on. The resul t H i s 1
i f we can reject the hypothesi s that X has a standard normal di stri buti on, or 0
i f we cannot reject that hypothesi s. We reject the hypothesi s i f the test i s
si gni fi cant at the 5% l evel .
For each potenti al val ue x, the Kol mogorov-Smi rnov test compares the
proporti on of val ues l ess than x wi th the expected number predi cted by the
standard normal di stri buti on. The kstest functi on uses the maxi mum
di fference over al l x val ues i s i ts test stati sti c. Mathemati cal l y, thi s can be
wri tten as
where i s the proporti on of X val ues l ess than or equal to x and i s the
standard normal cumul ati ve di stri buti on functi on eval uated at x.
H = kstest(X,cdf) compares the di stri buti on of X to the hypothesi zed
di stri buti on defi ned by the two-col umn matri x cdf. Col umn one contai ns a set
of possi bl e x val ues, and col umn two contai ns the correspondi ng hypothesi zed
cumul ati ve di stri buti on functi on val ues . I f possi bl e, you shoul d defi ne
cdf so that col umn one contai ns the val ues i n X. I f there are val ues i n X not
found i n col umn one of cdf, kstest wi l l approxi mate by i nterpol ati on. Al l
val ues i n X must l i e i n the i nterval between the smal l est and l argest val ues i n
the fi rst col umn of cdf. I f the second argument i s empty (cdf = []), kstest uses
the standard normal di stri buti on as i f there were no second argument.
The Kol mogorov-Smi rnov test requi res that cdf be predetermi ned. I t i s not
accurate i f cdf i s esti mated from the data. To test X agai nst a normal
di stri buti on wi thout speci fyi ng the parameters, use lillietest i nstead.
max F x ( ) G x ( ) ( )
F x ( ) G x ( )
G x ( )
G X ( )
kstest
2-165
H = kstest(X,cdf,alpha,tail) speci fi es the si gni fi cance l evel alpha and a
code tail for the type of al ternati ve hypothesi s. I f tail = 0 (the defaul t),
kstest performs a two-si ded test wi th the general al ternati ve . I f
tail = -1, the al ternati ve i s that . I f tail = 1, the al ternati ve i s .
The form of the test stati sti c depends on the val ue of tail as fol l ows.
tail = 0:
tail = -1:
tail = 1:
[H,P,KSSTAT,CV] = kstest(X,cdf,alpha,tail) al so returns the observed
p-val ue P, the observed Kol mogorov-Smi rnov stati sti c KSSTAT, and the cutoff
val ue CV for determi ni ng i f KSSTAT i s si gni fi cant. I f the return val ue of CV i s NaN,
then kstest determi ned the si gni fi cance cal cul ati ng a p-val ue accordi ng to an
asymptoti c formul a rather than by compari ng KSSTAT to a cri ti cal val ue.
Lets generate some evenl y spaced numbers and perform a
Kol mogorov-Smi rnov test to see how wel l they fi t to a normal di stri buti on:
x = -2:1:4
x =
-2 -1 0 1 2 3 4
[h,p,k,c] = kstest(x,[],0.05,0)
h =
0
p =
0.13632
k =
0.41277
c =
0.48342
We cannot reject the nul l hypothesi s that the val ues come from a standard
normal di stri buti on. Al though i ntui ti vel y i t seems that these evenl y-spaced
i ntegers coul d not fol l ow a normal di stri buti on, thi s exampl e i l l ustrates the
di ffi cul ty i n testi ng normal i ty i n smal l sampl es.
F G
F G < F G >
max F x ( ) G x ( ) ( )
max G x ( ) F x ( ) ( )
max F x ( ) G x ( ) ( )
kstest
2-166
To understand the test, i t i s hel pful to generate an empi ri cal cumul ati ve
di stri buti on pl ot and overl ay the theoreti cal normal di stri buti on.
xx = -3:.1:5;
cdfplot(x)
hold on
plot(xx,normcdf(xx),'r--')
The Kol mogorov-Smi rnov test stati sti c i s the maxi mum di fference between
these curves. I t appears that thi s maxi mum of 0.41277 occurs as we approach
x = 1.0 from bel ow. We can see that the empi ri cal curve has the val ue 3/7 here,
and we can easi l y veri fy that the di fference between the curves i s 0.41277.
normcdf(1) - 3/7
ans =
0.41277
We can al so perform a one-si ded test. By setti ng tail = -1 we i ndi cate that our
al ternati ve i s , so the test stati sti c counts onl y poi nts where thi s
i nequal i ty i s true.
[h,p,k] = kstest(x, [], .05, -1)
3 2 1 0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F
(
x
)
Empirical CDF
F G <
kstest
2-167
h =
0
p =
0.068181
k =
0.41277
The test stati sti c i s the same as before because i n fact at x = 1.0.
However, the p-val ue i s smal l er for the one-si ded test. I f we carry out the other
one-si ded test, we see that the test stati sti c changes, and i s the di fference
between the two curves near x = -1.0.
[h,p,k] = kstest(x,[],0.05,1)
h =
0
p =
0.77533
k =
0.12706
2/7 - normcdf(-1)
ans =
0.12706
Ex a mple 2
Now l ets generate random numbers from a Wei bul l di stri buti on, and test
agai nst that Wei bul l di stri buti on and an exponenti al di stri buti on.
x = weibrnd(1, 2, 100, 1);
kstest(x, [x weibcdf(x, 1, 2)])
ans =
0
kstest(x, [x expcdf(x, 1)])
ans =
1
F G <
kstest
2-168
See Also kstest2, lillietest
kstest2
2-169
2kstest2
Purpose Kol mogorov-Smi rnov test to compare the di stri buti on of two sampl es.
Syntax H = kstest2(X1,X2)
H = kstest2(X1,X2,alpha,tail)
[H,P,KSSTAT] = kstest(X,cdf,alpha,tail)
Description H = kstest2(X1,X2) performs a two-sampl e Kol mogorov-Smi rnov test to
compare the di stri buti ons of val ues i n the two data vectors X1 and X2. The nul l
hypothesi s for thi s test i s that X1 and X2 have the same conti nuous
di stri buti on. The al ternati ve hypothesi s i s that they have di fferent conti nuous
di stri buti ons. The resul t H i s 1 i f we can reject the hypothesi s that the
di stri buti ons are the same, or 0 i f we cannot reject that hypothesi s. We reject
the hypothesi s i f the test i s si gni fi cant at the 5% l evel .
For each potenti al val ue x, the Kol mogorov-Smi rnov test compares the
proporti on of X1 val ues l ess than x wi th proporti on of X2 val ues l ess than x. The
kstest2 functi on uses the maxi mum di fference over al l x val ues i s i ts test
stati sti c. Mathemati cal l y, thi s can be wri tten as
where i s the proporti on of X1 val ues l ess than or equal to x and i s
the proporti on of X2 val ues l ess than or equal to x.
H = kstest2(X1,X2,alpha,tail) speci fi es the si gni fi cance l evel alpha and a
code tail for the type of al ternati ve hypothesi s. I f tail = 0 (the defaul t),
kstest performs a two-si ded test wi th the general al ternati ve . I f
tail = -1, the al ternati ve i s that . I f tail = 1, the al ternati ve i s
. The form of the test stati sti c depends on the val ue of tail as fol l ows:
tail = 0:
tail = -1:
tail = 1:
[H,P,KSSTAT,CV] = kstest(X,cdf,alpha,tail) al so returns the observed
p-val ue P, the observed Kol mogorov-Smi rnov stati sti c KSSTAT, and the cutoff
val ue CV for determi ni ng i f KSSTAT i s si gni fi cant. I f the return val ue of CV i s NaN,
then kstest determi ned the si gni fi cance cal cul ati ng a p-val ue accordi ng to an
asymptoti c formul a rather than by compari ng KSSTAT to a cri ti cal val ue.
max F1 x ( ) F2 x ( ) ( )
F1 x ( ) F2 x ( )
F1 F2
F1 F2 <
F1 F2 >
max F1 x ( ) F2 x ( ) ( )
max F2 x ( ) F1 x ( ) ( )
max F1 x ( ) F2 x ( ) ( )
kstest2
2-170
Examples Lets compare the di stri buti ons of a smal l evenl y-spaced sampl e and a l arger
normal sampl e:
x = -1:1:5
y = randn(20,1);
[h,p,k] = kstest2(x,y)
h =
1
p =
0.0403
k =
0.5714
The di fference between thei r di stri buti ons i s si gni fi cant at the 5% l evel
(p = 4%). To vi sual i ze the di fference, we can overl ay pl ots of the two empi ri cal
cumul ati ve di stri buti on functi ons. The Kol mogorov-Smi rnov stati sti c i s the
maxi mum di fference between these functi ons. After changi ng the col or and l i ne
styl e of one of the two curves, we can see that the maxi mum di fference appears
to be near x = 1.9. We can al so veri fy that the di fference equal s the k val ue that
kstest2 reports:
cdfplot(x)
hold on
cdfplot(y)
h = findobj(gca,'type','line');
set(h(1),'linestyle',':','color','r')
1 - 3/7
ans =
0.5714
kstest2
2-171
See Also kstest, lillietest
2 1 0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
F
(
x
)
Empirical CDF
kurtosis
2-172
2kurtosi s
Purpose Sampl e kurtosi s.
Syntax k = kurtosis(X)
k = kurtosis(X,flag)
Description k = kurtosis(X) returns the sampl e kurtosi s of X. For vectors, kurtosis(x) i s
the kurtosi s of the el ements i n the vector x. For matri ces kurtosis(X) returns
the sampl e kurtosi s for each col umn of X.
Kurtosi s i s a measure of how outl i er-prone a di stri buti on i s. The kurtosi s of the
normal di stri buti on i s 3. Di stri buti ons that are more outl i er-prone than the
normal di stri buti on have kurtosi s greater than 3; di stri buti ons that are l ess
outl i er-prone have kurtosi s l ess than 3.
The kurtosi s of a di stri buti on i s defi ned as
where i s the mean of x, i s the standard devi ati on of x, and E(t) represents
the expected val ue of the quanti ty t.
Note Some defi ni ti ons of kurtosi s subtract 3 from the computed val ue, so
that the normal di stri buti on has kurtosi s of 0. The kurtosis functi on does not
use thi s conventi on.
k = kurtosis(X,flag) speci fi es whether to correct for bi as (flag = 0) or not
(flag = 1, the defaul t). When X represents a sampl e from a popul ati on, the
kurtosi s of X i s bi ased, that i s, i t wi l l tend to di ffer from the popul ati on kurtosi s
by a systemati c amount that depends on the si ze of the sampl e. You can set
flag = 0 to correct for thi s systemati c bi as.
k
E x ( )
4
4
------------------------ =

kurtosis
2-173
Example X = randn([5 4])
X =
1.1650 1.6961 -1.4462 -0.3600
0.6268 0.0591 -0.7012 -0.1356
0.0751 1.7971 1.2460 -1.3493
0.3516 0.2641 -0.6390 -1.2704
-0.6965 0.8717 0.5774 0.9846
k = kurtosis(X)
k =
2.1658 1.2967 1.6378 1.9589
See Also mean, moment, skewness, std, var
leverage
2-174
2l everage
Purpose Leverage val ues for a regressi on.
Syntax h = leverage(data)
h = leverage(data,'model')
Description h = leverage(data) fi nds the l everage of each row (poi nt) i n the matri x data
for a l i near addi ti ve regressi on model .
h = leverage(data,'model') fi nds the l everage on a regressi on, usi ng a
speci fi ed model type, where 'model' can be one of these stri ngs:
'interaction' i ncl udes constant, l i near, and cross product terms
'quadratic' i ncl udes i nteracti ons and squared terms
'purequadratic' i ncl udes constant, l i near, and squared terms
Leverage i s a measure of the i nfl uence of a gi ven observati on on a regressi on
due to i ts l ocati on i n the space of the i nputs.
Example One rul e of thumb i s to compare the l everage to 2p/ n where n i s the number of
observati ons and p i s the number of parameters i n the model . For the Hal d
dataset thi s val ue i s 0.7692.
load hald
h = max(leverage(ingredients,'linear'))
h =
0.7004
Si nce 0.7004 < 0.7692, there are no hi gh l everage poi nts usi ng thi s rul e.
Algorithm [Q,R] = qr(x2fx(data,'model'));
leverage = (sum(Q'.*Q'))'
Reference Goodal l , C. R. (1993). Computation using the QR decomposition. Handbook i n
Stati sti cs, Vol ume 9. Stati sti cal Computi ng (C. R. Rao, ed.). Amsterdam, NL
El sevi er/North-Hol l and.
See Also regstats
lillietest
2-175
2l i l l i etest
Purpose Li l l i efors test for goodness of fi t to a normal di stri buti on.
Syntax H = lillietest(X)
H = lillietest(X,alpha)
[H,P,LSTAT,CV] = lillietest(X,alpha)
Description H = lillietest(X) performs the Li l l i efors test on the i nput data vector X and
returns H, the resul t of the hypothesi s test. The resul t H i s 1 i f we can reject the
hypothesi s that X has a normal di stri buti on, or 0 i f we cannot reject that
hypothesi s. We reject the hypothesi s i f the test i s si gni fi cant at the 5% l evel .
The Li l l i efors test eval uates the hypothesi s that X has a normal di stri buti on
wi th unspeci fi ed mean and vari ance, agai nst the al ternati ve that X does not
have a normal di stri buti on. Thi s test compares the empi ri cal di stri buti on of X
wi th a normal di stri buti on havi ng the same mean and vari ance as X. I t i s
si mi l ar to the Kol mogorov-Smi rnov test, but i t adjusts for the fact that the
parameters of the normal di stri buti on are esti mated from X rather than
speci fi ed i n advance.
H = lillietest(X,alpha) performs the Li l l i efors test at the 100*alpha%
l evel rather than the 5% l evel . alpha must be between 0.01 and 0.2.
[H,P,LSTAT,CV] = lillietest(X,alpha) returns three addi ti onal outputs. P
i s the p-val ue of the test, obtai ned by l i near i nterpol ati on i n a set of tabl e
created by Li l l i efors. LSTAT i s the val ue of the test stati sti c. CV i s the cri ti cal
val ue for determi ni ng whether to reject the nul l hypothesi s. I f the val ue of
LSTAT i s outsi de the range of the Li l l i efors tabl e, P i s returned as NaN but H
i ndi cates whether to reject the hypothesi s.
Example Do car wei ghts fol l ow a normal di stri buti on? Not exactl y, because wei ghts are
al ways posi ti ve, and a normal di stri buti on al l ows both posi ti ve and negati ve
val ues. However, perhaps the normal di stri buti on i s a reasonabl e
approxi mati on.
load carsmall
[h p l c] = lillietest(Weight);
[h p l c]
lillietest
2-176
ans =
1.0000 0.0232 0.1032 0.0886
The Li l l i efors test stati sti c of 0.10317 i s l arger than the cutoff val ue of 0.0886
for a 5% l evel test, so we reject the hypothesi s of normal i ty. I n fact, the p-val ue
of thi s test i s approxi matel y 0.02.
To vi sual i ze the di stri buti on, we can make a hi stogram. Thi s graph shows that
the di stri buti on i s skewed to the ri ght from the peak near 2250, the
frequenci es drop off abruptl y to the l eft but more gradual l y to the ri ght.
hist(Weight)
Someti mes i t i s possi bl e to transform a vari abl e to make i ts di stri buti on more
nearl y normal . A l og transformati on, i n parti cul ar, tends to compensate for
skewness to the ri ght.
[h p l c] = lillietest(log(Weight))
ans =
0 0.13481 0.077924 0.0886
Now the p-val ue i s approxi matel y 0.13, so we do not reject the hypothesi s.
1500 2000 2500 3000 3500 4000 4500 5000
0
2
4
6
8
10
12
14
16
18
lillietest
2-177
Reference Conover, W. J. (1980). Practical Nonparametric Statistics. New York, Wi l ey.
See Also hist, jbtest, kstest2
linkage
2-178
2l i nkage
Purpose Create hi erarchi cal cl uster tree.
Syntax Z = linkage(Y)
Z = linkage(Y,'method')
Description Z = linkage(Y) creates a hi erarchi cal cl uster tree, usi ng the Si ngl e Li nkage
al gori thm. The i nput matri x, Y, i s the di stance vector output by the pdist
functi on, a vector of l ength -by-1, where m i s the number of
objects i n the ori gi nal dataset.
Z = linkage(Y,'method') computes a hi erarchi cal cl uster tree usi ng the
al gori thm speci fi ed by 'method', where 'method' can be any of the fol l owi ng
character stri ngs that i denti fy ways to create the cl uster hi erarchy. Thei r
defi ni ti ons are expl ai ned i n Mathemati cal Defi ni ti ons on page 2-179.
The output, Z, i s an (m-1)-by-3 matri x contai ni ng cl uster tree i nformati on. The
l eaf nodes i n the cl uster hi erarchy are the objects i n the ori gi nal dataset,
numbered from 1 to m. They are the si ngl eton cl usters from whi ch al l hi gher
cl usters are bui l t. Each newl y formed cl uster, correspondi ng to row i i n Z, i s
assi gned the i ndex m+i, where m i s the total number of i ni ti al l eaves.
Col umns 1 and 2, Z(i,1:2), contai n the i ndi ces of the objects that were l i nked
i n pai rs to form a new cl uster. Thi s new cl uster i s assi gned the i ndex val ue m+i.
There are m-1 hi gher cl usters that correspond to the i nteri or nodes of the
hi erarchi cal cl uster tree.
Col umn 3, Z(i,3), contai ns the correspondi ng l i nkage di stances between the
objects pai red i n the cl usters at each row i.
String Meaning
'single' Shortest di stance (defaul t)
'complete' Largest di stance
'average' Average di stance
'centroid' Centroi d di stance
'ward' I ncremental sum of squares
m 1 ( ) m 2 ( )
linkage
2-179
For exampl e, consi der a case wi th 30 i ni ti al nodes. I f the tenth cl uster formed
by the linkage functi on combi nes object 5 and object 7 and thei r di stance i s
1.5, then row 10 of Z wi l l contai n the val ues (5, 7, 1.5). Thi s newl y formed
cl uster wi l l have the i ndex 10+30=40. I f cl uster 40 shows up i n a l ater row, that
means thi s newl y formed cl uster i s bei ng combi ned agai n i nto some bi gger
cl uster.
M a thema tica l Definitions
The 'method' argument i s a character stri ng that speci fi es the al gori thm used
to generate the hi erarchi cal cl uster tree i nformati on. These l i nkage al gori thms
are based on vari ous measurements of proxi mi ty between two groups of objects.
I f n
r
i s the number of objects i n cl uster r and n
s
i s the number of objects i n
cl uster s, and x
ri
i s the ith object i n cl uster r, the defi ni ti ons of these vari ous
measurements are as fol l ows:
Single linkage, al so cal l ed nearest neighbor, uses the smal l est di stance
between objects i n the two groups.

Complete linkage, al so cal l ed furthest neighbor, uses the l argest di stance
between objects i n the two groups.
Average linkage uses the average di stance between al l pai rs of objects i n
cl uster r and cl uster s.
Centroid linkage uses the di stance between the centroi ds of the two groups.

where
and i s defi ned si mi l arl y.
d r s , ( ) mi n di st x
ri
x
sj
, ( ) ( ) i i n
r
, , ( ) j 1 n
s
, , ( ) , , =
d r s , ( ) max di st x
ri
x
sj
, ( ) ( ) i 1 n
r
, , ( ) j 1 n
s
, , ( ) , , =
d r s , ( )
1
n
r
n
s
------------ di st x
ri
x
sj
, ( )
j 1 =
n
s
i 1 =
n
r
=
d r s , ( ) d x
r
x
s
, ( ) =
x
r
1
n
r
------ x
ri
i 1 =
n
r
=
x
s
linkage
2-180
Ward linkage uses the i ncremental sum of squares; that i s, the i ncrease i n
the total wi thi n-group sum of squares as a resul t of joi ni ng groups r and s. I t
i s gi ven by
where i s the di stance between cl uster r and cl uster s defi ned i n the
Centroi d l i nkage. The wi thi n-group sum of squares of a cl uster i s defi ned as
the sum of the squares of the di stance between al l objects i n the cl uster and
the centroi d of the cl uster.
Example X = [3 1.7; 1 1; 2 3; 2 2.5; 1.2 1; 1.1 1.5; 3 1];
Y = pdist(x);
Z = linkage(y)
Z =
2.0000 5.0000 0.2000
3.0000 4.0000 0.5000
8.0000 6.0000 0.5099
1.0000 7.0000 0.7000
11.0000 9.0000 1.2806
12.0000 10.0000 1.3454
See Also cluster, clusterdata, cophenet, dendrogram, inconsistent, pdist,
squareform
d r s , ( ) n
r
n
s
d
rs
2
n
r
n
s
+ ( ) =
d
rs
2
logncdf
2-181
2l ogncdf
Purpose Lognormal cumul ati ve di stri buti on functi on.
Syntax P = logncdf(X,MU,SIGMA)
Description P = logncdf(X,MU,SIGMA) computes the l ognormal cdf at each of the val ues i n
X usi ng the correspondi ng means i n MU and standard devi ati ons i n SIGMA.
Vector or matri x i nputs for X, MU, and SIGMA must have the same si ze, whi ch i s
al so the si ze of P. A scal ar i nput for X, MU, or SIGMA i s expanded to a constant
The l ognormal cdf i s
Example x = (0:0.2:10);
y = logncdf(x,0,1);
plot(x,y); grid;
xlabel('x'); ylabel('p');
Reference Evans, M., N. Hasti ngs, and B. Peacock, Statistical Distributions, Second
Edition, John Wi l ey and Sons, 1993. p. 102105.
See Also cdf, logninv, lognpdf, lognrnd, lognstat
p F x , ( )
1
2
---------------
e
l n t ( ) ( )
2
2
2
--------------------------------
t
----------------------------- t d
0
x
= =
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
x
p
logninv
2-182
2l ogni nv
Purpose I nverse of the l ognormal cumul ati ve di stri buti on functi on (cdf).
Syntax X = logninv(P,MU,SIGMA)
Description X = logninv(P,MU,SIGMA) computes the i nverse l ognormal cdf wi th mean MU
and standard devi ati on SIGMA, at the correspondi ng probabi l i ti es i n P. Vector
or matri x i nputs for P, MU, and SIGMA must have the same si ze, whi ch i s al so the
si ze of X. A scal ar i nput for P, MU, or SIGMA i s expanded to a constant matri x wi th
the same di mensi ons as the other i nputs.
We defi ne the l ognormal i nverse functi on i n terms of the l ognormal cdf as
where
Example p = (0.005:0.01:0.995);
crit = logninv(p,1,0.5);
plot(p,crit)
xlabel('Probability');ylabel('Critical Value'); grid
x F
1
p , ( ) x:F x , ( ) p = { } = =
p F x , ( )
1
2
---------------
e
l n t ( ) ( )
2
2
2
--------------------------------
t
----------------------------- t d
0
x
= =
0 0.2 0.4 0.6 0.8 1
0
2
4
6
8
10
Probability
C
r
i
t
i
c
a
l

V
a
l
u
e
logninv
2-183
See Also icdf, logncdf, lognpdf, lognrnd, lognstat
lognpdf
2-184
2l ognpdf
Purpose Lognormal probabi l i ty densi ty functi on (pdf).
Syntax Y = lognpdf(X,MU,SIGMA)
Description Y = logncdf(X,MU,SIGMA) computes the l ognormal cdf at each of the val ues
i n X usi ng the correspondi ng means i n MU and standard devi ati ons i n SIGMA.
Vector or matri x i nputs for X, MU, and SIGMA must have the same si ze, whi ch i s
al so the si ze of Y. A scal ar i nput for X, MU, or SIGMA i s expanded to a constant
matri x wi th the same di mensi ons as the other i nputs
The l ognormal pdf i s
Example x = (0:0.02:10);
y = lognpdf(x,0,1);
plot(x,y); grid;
xlabel('x'); ylabel('p')
Reference Mood, A. M., F.A. Graybi l l , and D.C. Boes, I ntroduction to the Theory of
Statistics, Third Edition, McGraw-Hi l l 1974 p. 540541.
See Also logncdf, logninv, lognrnd, lognstat, pdf
y f x , ( )
1
x 2
------------------
e
l n x ( ) ( )
2
2
2
---------------------------------
= =
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
x
p
lognrnd
2-185
2l ognrnd
Purpose Random matri ces from the l ognormal di stri buti on.
Syntax R = lognrnd(MU,SIGMA)
R = lognrnd(MU,SIGMA,m)
R = lognrnd(MU,SIGMA,m,n)
Description R = lognrnd(MU,SIGMA) generates l ognormal random numbers wi th
parameters MU and SIGMA. Vector or matri x i nputs for MU and SIGMA must have
the same si ze, whi ch i s al so the si ze of R. A scal ar i nput for MU or SIGMA i s
R = lognrnd(MU,SIGMA,m) generates l ognormal random numbers wi th
parameters MU and SIGMA, where m i s a 1-by-2 vector that contai ns the row and
R = lognrnd(MU,SIGMA,m,n) generates l ognormal random numbers wi th
parameters MU and SIGMA, where scal ars m and n are the row and col umn
di mensi ons of R.
Example r = lognrnd(0,1,4,3)
r =
3.2058 0.4983 1.3022
1.8717 5.4529 2.3909
1.0780 1.0608 0.2355
1.4213 6.0320 0.4960
See Also random, logncdf, logninv, lognpdf, lognstat
lognstat
2-186
2l ognstat
Purpose Mean and vari ance for the l ognormal di stri buti on.
Syntax [M,V] = lognstat(MU,SIGMA)
Description [M,V] = lognstat(MU,SIGMA) returns the mean and vari ance of the
l ognormal di stri buti on wi th parameters MU and SIGMA. Vector or matri x i nputs
for MU and SIGMA must have the same si ze, whi ch i s al so the si ze of M and V. A
scal ar i nput for MU or SIGMA i s expanded to a constant matri x wi th the same
di mensi ons as the other i nput.
The mean of the l ognormal di stri buti on wi th parameters and i s
Example [m,v]= lognstat(0,1)
m =
1.6487
v =
7.0212
Reference Mood, A. M., F.A. Graybi l l , and D.C. Boes, I ntroduction to the Theory of
Statistics, Third Edition, McGraw-Hi l l 1974 p. 540541.
See Also logncdf, logninv, lognrnd, lognrnd
e

2
2
----- +
,
_
e
2 2
2
+ ( )
e
2
2
+ ( )
lsline
2-187
2l sl i ne
Purpose Least squares fi t l i ne(s).
Syntax lsline
h = lsline
Description lsline superi mposes the l east squares l i ne on each l i ne object i n the current
axes (except LineStyles '-','--','.-').
h = lsline returns the handl es to the l i ne objects.
Example y = [2 3.4 5.6 8 11 12.3 13.8 16 18.8 19.9]';
plot(y,'+');
lsline;
See Also polyfit, polyval
0 2 4 6 8 10
0
5
10
15
20
mad
2-188
2mad
Purpose Mean absol ute devi ati on (MAD) of a sampl e of data.
Syntax y = mad(X)
Description y = mad(X) computes the average of the absol ute di fferences between a set of
data and the sampl e mean of that data. For vectors, mad(x) returns the mean
absol ute devi ati on of the el ements of x. For matri ces, mad(X) returns the MAD
of each col umn of X.
The MAD i s l ess effi ci ent than the standard devi ati on as an esti mate of the
spread when the data i s al l from the normal di stri buti on.
Mul ti pl y the MAD by 1.3 to esti mate (the second parameter of the normal
di stri buti on).
Examples Thi s exampl e shows a Monte Carl o si mul ati on of the rel ati ve effi ci ency of the
MAD to the sampl e standard devi ati on for normal data.
x = normrnd(0,1,100,100);
s = std(x);
s_MAD = 1.3 mad(x);
efficiency = (norm(s - 1)./norm(s_MAD - 1)).^2
efficiency =
0.5972
See Also std, range
mahal
2-189
2mahal
Purpose Mahal anobi s di stance.
Syntax d = mahal(Y,X)
Description mahal(Y,X) computes the Mahal anobi s di stance of each poi nt (row) of the
matri x Y from the sampl e i n the matri x X.
The number of col umns of Y must equal the number of col umns i n X, but the
number of rows may di ffer. The number of rows i n X must exceed the number
of col umns.
The Mahal anobi s di stance i s a mul ti vari ate measure of the separati on of a data
set from a poi nt i n space. I t i s the cri teri on mi ni mi zed i n l i near di scri mi nant
anal ysi s.
Example The Mahal anobi s di stance of a matri x r when appl i ed to i tsel f i s a way to fi nd
outl i ers.
r = mvnrnd([0 0],[1 0.9;0.9 1],100);
r = [r;10 10];
d = mahal(r,r);
last6 = d(96:101)
last6 =
1.1036
2.2353
2.0219
0.3876
1.5571
52.7381
The l ast el ement i s cl earl y an outl i er.
See Also classify
manova1
2-190
2manova1
Purpose One-way Mul ti vari ate Anal ysi s of Vari ance (MANOVA).
Syntax d = manova1(X,group)
d = manova1(X,group,alpha)
[d,p] = manova1(...)
[d,p,stats] = anova1(...)
Description d = manova1(X,group) performs a one-way Mul ti vari ate Anal ysi s of Vari ance
(MANOVA) for compari ng the mul ti vari ate means of the col umns of X, grouped
by group. X i s an m-by-n matri x of data val ues, and each row i s a vector of
measurements on n vari abl es for a si ngl e observati on. group i s a groupi ng
vari abl e defi ned as a vector, stri ng array, or cel l array of stri ngs. Two
observati ons are i n the same group i f they have the same val ue i n the group
array. The observati ons i n each group represent a sampl e from a popul ati on.
The functi on returns d, an esti mate of the di mensi on of the space contai ni ng
the group means. manova1 tests the nul l hypothesi s that the means of each
group are the same n-di mensi onal mul ti vari ate vector, and that any di fference
observed i n the sampl e X i s due to random chance. I f d = 0, there i s no evi dence
to reject that hypothesi s. I f d = 1, then you can reject the nul l hypothesi s at the
5% l evel , but you cannot reject the hypothesi s that the mul ti vari ate means l i e
on the same l i ne. Si mi l arl y, i f d = 2 the mul ti vari ate means may l i e on the same
pl ane i n n-di mensi onal space, but not on the same l i ne.
d = manova1(X,group,alpha) gi ves control of the si gni fi cance l evel , alpha.
The return val ue d wi l l be the smal l est di mensi on havi ng p > alpha, where p i s
a p-val ue for testi ng whether the means l i e i n a space of that di mensi on.
[d,p] = manova1(...) al so returns a p, a vector of p-val ues for testi ng
whether the means l i e i n a space of di mensi on 0, 1, and so on. The l argest
possi bl e di mensi on i s ei ther the di mensi on of the space, or one l ess than the
number of groups. There i s one el ement of p for each di mensi on up to, but not
i ncl udi ng, the l argest.
I f the ith p-val ue i s near zero, thi s casts doubt on the hypothesi s that the group
means l i e on a space of i-1 di mensi ons. The choi ce of a cri ti cal p-val ue to
determi ne whether the resul t i s judged stati sti cal l y si gni fi cant i s l eft to the
researcher and i s speci fi ed by the val ue of the i nput argument alpha. I t i s
common to decl are a resul t si gni fi cant i f the p-val ue i s l ess than 0.05 or 0.01.
manova1
2-191
[d,p,stats] = anova1(...) al so returns stats, a structure contai ni ng
addi ti onal MANOVA resul ts. The structure contai ns the fol l owi ng fi el ds.
The canoni cal vari abl es C are l i near combi nati ons of the ori gi nal vari abl es,
chosen to maxi mi ze the separati on between groups. Speci fi cal l y, C(:,1) i s the
l i near combi nati on of the X col umns that has the maxi mum separati on between
groups. Thi s means that among al l possi bl e l i near combi nati ons, i t i s the one
wi th the most si gni fi cant F stati sti c i n a one-way anal ysi s of vari ance.
Field Contents
W Wi thi n-groups sum of squares and cross-products matri x
B Between-groups sum of squares and cross-products matri x
T Total sum of squares and cross-products matri x
dfW Degrees of freedom for W
dfB Degrees of freedom for B
dfT Degrees of freedom for T
lambda Vector of val ues of Wi l ks l ambda test stati sti c for testi ng
whether the means have di mensi on 0, 1, etc.
chisq Transformati on of lambda to an approxi mate chi -square
di stri buti on
chisqdf Degrees of freedom for chisq
eigenval Ei genval ues of
eigenvec Ei genvectors of ; these are the coeffi ci ents for the
canoni cal vari abl es C, and they are scal ed so the wi thi n-group
vari ance of the canoni cal vari abl es i s 1
canon Canoni cal vari abl es C, equal to XC*eigenvec, where XC i s X wi th
col umns centered by subtracti ng thei r means
mdist A vector of Mahal anobi s di stances from each poi nt to the mean
of i ts group
gmdist A matri x of Mahal anobi s di stances between each pai r of group
means
W
1
B
W
1
B
manova1
2-192
C(:,2) has the maxi mum separati on subject to i t bei ng orthogonal to C(:,1),
and so on.
You may fi nd i t useful to use the outputs from manova1 al ong wi th other
functi ons to suppl ement your anal ysi s. For exampl e, you may want to start
wi th a grouped scatter pl ot matri x of the ori gi nal vari abl es usi ng gplotmatrix.
You can use gscatter to vi sual i ze the group separati on usi ng the fi rst two
canoni cal vari abl es. You can use manovacluster to graph a dendrogram
showi ng the cl usters among the group means.
Assumptions
The MANOVA test makes the fol l owi ng assumpti ons about the data i n X:
The popul ati ons for each group are normal l y di stri buted.
The vari ance-covari ance matri x i s the same for each popul ati on.
Example We can use manova1 to determi ne whether there are di fferences i n the averages
of four car characteri sti cs, among groups defi ned by the country where the cars
were made.
load carbig
[d,p] = manova1([MPG Acceleration Weight Displacement],Origin)
d =
3
p =
0
0.0000
0.0075
0.1934
There are four di mensi ons i n the i nput matri x, so the group means must l i e i n
a four-di mensi onal space. manova1 shows that we cannot reject the hypothesi s
that the means l i e i n a three-di mensi onal subspace.
manova1
2-193
References Krzanowski , W. J. Principles of Multivariate Analysis. Oxford Uni versi ty
Press, 1988.
See Also anova1, gscatter, gplotmatrix, manovacluster
manovacluster
2-194
2manovacl uster
Purpose Pl ot dendrogram showi ng group mean cl usters after MANOVA.
Syntax manovacluster(stats)
manovacluster(stats,'method')
H = manovacluster(stats)
Description manovacluster(stats) generates a dendrogram pl ot of the group means after
a mul ti vari ate anal ysi s of vari ance (MANOVA). stats i s the output stats
structure from manova1. The cl usters are computed by appl yi ng the si ngl e
l i nkage method to the matri x of Mahal anobi s di stances between group means.
See dendrogram for more i nformati on on the graphi cal output from thi s
functi on. The dendrogram i s most useful when the number of groups i s l arge.
manovacluster(stats,'method') uses the speci fi ed method i n pl ace of si ngl e
l i nkage. 'method' can be any of the fol l owi ng character stri ngs that i denti fy
ways to create the cl uster hi erarchy. See linkage for further expl anati on.
H = manovacluster(stats,'method') returns a vector of handl es to the l i nes
i n the fi gure.
Example Lets anal yze the l arger car dataset to determi ne whi ch countri es produce cars
wi th the most si mi l ar characteri sti cs.
load carbig
X = [MPG Acceleration Weight Displacement];
[d,p,stats] = manova1(X,Origin);
manovacluster(stats)
String Meaning
'single' Shortest di stance (defaul t)
'complete' Largest di stance
'average' Average di stance
'centroid' Centroi d di stance
'ward' I ncremental sum of squares
manovacluster
2-195
See Also cluster, dendrogram, linkage, manova1
Japan Germany Italy France Sweden England USA
0
0.5
1
1.5
2
2.5
3
mean
2-196
2mean
Purpose Average or mean val ue of vectors and matri ces.
Syntax m = mean(X)
Description m = mean(X) cal cul ates the sampl e average
For vectors, mean(x) i s the mean val ue of the el ements i n vector x. For
matri ces, mean(X) i s a row vector contai ni ng the mean val ue of each col umn.
The mean functi on i s part of the standard MATLAB l anguage.
Example These commands generate fi ve sampl es of 100 normal random numbers wi th
mean, zero, and standard devi ati on, one. The sampl e averages i n xbar are
much l ess vari abl e (0.00 t 0.10).
x = normrnd(0,1,100,5);
xbar = mean(x)
xbar =
0.0727 0.0264 0.0351 0.0424 0.0752
See Also median, std, cov, corrcoef, var
x
j
1
n
--- x
i j
i 1 =
n
=
median
2-197
2medi an
Purpose Medi an val ue of vectors and matri ces.
Syntax m = median(X)
Description m = median(X) cal cul ates the medi an val ue, whi ch i s the 50th percenti l e of a
sampl e. The medi an i s a robust esti mate of the center of a sampl e of data, si nce
outl i ers have l i ttl e effect on i t.
For vectors, median(x) i s the medi an val ue of the el ements i n vector x. For
matri ces, median(X) i s a row vector contai ni ng the medi an val ue of each
col umn. Si nce median i s i mpl emented usi ng sort, i t can be costl y for l arge
matri ces.
The median functi on i s part of the standard MATLAB l anguage.
Examples xodd = 1:5;
modd = median(xodd)
modd =
3
meven = median(xeven)
meven =
2.5000
Thi s exampl e shows robustness of the medi an to outl i ers.
xoutlier = [x 10000];
moutlier = median(xoutlier)
moutlier =
3
See Also mean, std, cov, corrcoef
mle
2-198
2ml e
Purpose Maxi mum l i kel i hood esti mati on.
Syntax phat = mle('dist',data)
[phat,pci] = mle('dist',data)
[phat,pci] = mle('dist',data,alpha)
[phat,pci] = mle('dist',data,alpha,p1)
Description phat = mle('dist',data) returns the maxi mum l i kel i hood esti mates (MLEs)
for the di stri buti on speci fi ed i n 'dist' usi ng the sampl e i n the vector, data.
See Overvi ew of the Di stri buti ons on page 1-12 for the l i st of avai l abl e
di stri buti ons.
[phat,pci] = mle('dist',data) returns the MLEs and 95% percent
[phat,pci] = mle('dist',data,alpha) returns the MLEs and
100(1-alpha)% confi dence i nterval s gi ven the data and the speci fi ed alpha.
[phat,pci] = mle('dist',data,alpha,p1) i s used for the bi nomi al
di stri buti on onl y, where p1 i s the number of tri al s.
Example rv = binornd(20,0.75)
rv =
16
[p,pci] = mle('binomial',rv,0.05,20)
p =
0.8000
pci =
0.5634
0.9427
See Also betafit, binofit, expfit, gamfit, normfit, poissfit, weibfit
moment
2-199
2moment
Purpose Central moment of al l orders.
Syntax m = moment(X,order)
Description m = moment(X,order) returns the central moment of X speci fi ed by the
posi ti ve i nteger order. For vectors, moment(x,order) returns the central
moment of the speci fi ed order for the el ements of x. For matri ces,
moment(X,order) returns central moment of the speci fi ed order for each
col umn.
Note that the central fi rst moment i s zero, and the second central moment i s
the vari ance computed usi ng a di vi sor of n rather than n-1, where n i s the
l ength of the vector x or the number of rows i n the matri x X.
The central moment of order k of a di stri buti on i s defi ned as
where E(x) i s the expected val ue of x.
X =
1.1650 0.0591 1.2460 -1.2704 -0.0562
0.6268 1.7971 -0.6390 0.9846 0.5135
0.0751 0.2641 0.5774 -0.0449 0.3967
0.3516 0.8717 -0.3600 -0.7989 0.7562
-0.6965 -1.4462 -0.1356 -0.7652 0.4005
1.6961 -0.7012 -1.3493 0.8617 -1.3414
m = moment(X,3)
m =
-0.0282 0.0571 0.1253 0.1460 -0.4486
See Also kurtosis, mean, skewness, std, var
m
n
E x ( )
k
=
multcompare
2-200
2mul tcompare
Purpose Mul ti pl e compari son test of means or other esti mates.
Syntax c = multcompare(stats)
c = multcompare(stats,alpha)
c = multcompare(stats,alpha,'displayopt')
c = multcompare(stats,alpha,'displayopt','ctype')
c = multcompare(stats,alpha,'displayopt','ctype','estimate')
c = multcompare(stats,alpha,'displayopt','ctype','estimate',dim)
[c,m] = multcompare(...)
[c,m,h] = multcompare(...)
Description c = multcompare(stats) performs a mul ti pl e compari son test usi ng the
i nformati on i n the stats structure, and returns a matri x c of pai rwi se
compari son resul ts. I t al so di spl ays an i nteracti ve fi gure presenti ng a
graphi cal representati on of the test.
I n a one-way anal ysi s of vari ance, you compare the means of several groups to
test the hypothesi s that they are al l the same, agai nst the general al ternati ve
that they are not al l the same. Someti mes thi s al ternati ve may be too general .
You may need i nformati on about whi ch pai rs of means are si gni fi cantl y
di fferent, and whi ch are not. A test that can provi de such i nformati on i s cal l ed
a mul ti pl e compari son procedure.
When you perform a si mpl e t-test of one group mean agai nst another, you
speci fy a si gni fi cance l evel that determi nes the cutoff val ue of the t stati sti c.
For exampl e, you can speci fy the val ue alpha = 0.05 to i nsure that when there
i s no real di fference, you wi l l i ncorrectl y fi nd a si gni fi cant di fference no more
than 5% of the ti me. When there are many group means, there are al so many
pai rs to compare. I f you appl i ed an ordi nary t-test i n thi s si tuati on, the alpha
val ue woul d appl y to each compari son, so the chance of i ncorrectl y fi ndi ng a
si gni fi cant di fference woul d i ncrease wi th the number of compari sons. Mul ti pl e
compari son procedures are desi gned to provi de an upper bound on the
probabi l i ty that any compari son wi l l be i ncorrectl y found si gni fi cant.
The output c contai ns the resul ts of the test i n the form of a fi ve-col umn matri x.
Each row of the matri x represents one test, and there i s one row for each pai r
of groups. The entri es i n the row i ndi cate the means bei ng compared, the
esti mated di fference i n means, and a confi dence i nterval for the di fference.
multcompare
2-201
For exampl e, suppose one row contai ns the fol l owi ng entri es.
2.0000 5.0000 1.9442 8.2206 14.4971
These numbers i ndi cate that the mean of group 2 mi nus the mean of group 5 i s
esti mated to be 8.2206, and a 95% confi dence i nterval for the true mean i s
[1.9442, 14.4971].
I n thi s exampl e the confi dence i nterval does not contai n 0.0, so the di fference
i s si gni fi cant at the 0.05 l evel . I f the confi dence i nterval di d contai n 0.0, the
di fference woul d not be si gni fi cant at the 0.05 l evel .
The multcompare functi on al so di spl ays a graph wi th each group mean
represented by a symbol and an i nterval around the symbol . Two means are
si gni fi cantl y di fferent i f thei r i nterval s are di sjoi nt, and are not si gni fi cantl y
di fferent i f thei r i nterval s overl ap. You can use the mouse to sel ect any group,
and the graph wi l l hi ghl i ght any other groups that are si gni fi cantl y di fferent
from i t.
c = multcompare(stats,alpha) determi nes the confi dence l evel s of the
i nterval s i n the c matri x and i n the fi gure. The confi dence l evel i s
100*(1-alpha)%. The defaul t val ue of alpha i s 0.05.
c = multcompare(stats,alpha,'displayopt') enabl es the graph di spl ay
when 'displayopt' i s 'on' (defaul t) and suppresses the di spl ay when
'displayopt' i s 'off'.
multcompare
2-202
c = multcompare(stats,alpha,'displayopt','ctype') speci fi es the cri ti cal
val ue to use for the mul ti pl e compari son, whi ch can be any of the fol l owi ng.
ctype Meaning
'hsd' Use Tukeys honestl y si gni fi cant di fference cri teri on.
Thi s i s the defaul t, and i t i s based on the Studenti zed
range di stri buti on. I t i s opti mal for bal anced one-way
ANOVA and si mi l ar procedures wi th equal sampl e si zes.
I t has been proven to be conservati ve for one-way
ANOVA wi th di fferent sampl e si zes. Accordi ng to the
unproven Tukey-Kramer conjecture, i t i s al so accurate
for probl ems where the quanti ti es bei ng compared are
correl ated, as i n anal ysi s of covari ance wi th unbal anced
covari ate val ues.
'lsd' Use Tukeys l east si gni fi cant di fference procedure. Thi s
procedure i s a si mpl e t-test. I t i s reasonabl e i f the
prel i mi nary test (say, the one-way ANOVA F stati sti c)
shows a si gni fi cant di fference. I f i t i s used
uncondi ti onal l y, i t provi des no protecti on agai nst
mul ti pl e compari sons.
'bonferroni' Use cri ti cal val ues from the t di stri buti on, after a
Bonferroni adjustment to compensate for mul ti pl e
compari sons. Thi s procedure i s conservati ve, but usual l y
l ess so than the Scheff procedure.
'dunn-sidak' Use cri ti cal val ues from the t di stri buti on, after an
adjustment for mul ti pl e compari sons that was proposed
by Dunn and proved accurate by i dk. Thi s procedure i s
si mi l ar to, but l ess conservati ve than, the Bonferroni
procedure.
'scheffe' Use cri ti cal val ues from Scheffs S procedure, deri ved
from the F di stri buti on. Thi s procedure provi des a
si mul taneous confi dence l evel for compari sons of al l
l i near combi nati ons of the means, and i t i s conservati ve
for compari sons of si mpl e di fferences of pai rs.
multcompare
2-203
c = multcompare(stats,alpha,'displayopt','ctype','estimate')
speci fi es the esti mate to be compared. The al l owabl e val ues of esti mate depend
on the functi on that was the source of the stats structure, accordi ng to the
fol l owi ng tabl e.
c = multcompare(stats,alpha,'displayopt','ctype','estimate',dim)
speci fi es the popul ati on margi nal means to be compared. Thi s argument i s
used onl y i f the i nput stats structure was created by the anovan functi on. For
n-way ANOVA wi th n factors, you can speci fy di m as a scal ar or a vector of
i ntegers between 1 and n. The defaul t val ue i s 1.
For exampl e, i f dim = 1, the esti mates that are compared are the means for
each val ue of the fi rst groupi ng vari abl e, adjusted by removi ng effects of the
other groupi ng vari abl es as i f the desi gn were bal anced. I f dim = [1 3],
popul ati on margi nal means are computed for each combi nati on of the fi rst and
thi rd groupi ng vari abl es, removi ng effects of the second groupi ng vari abl e. I f
you fi t a si ngul ar model , some cel l means may not be esti mabl e and any
popul ati on margi nal means that depend on those cel l means wi l l have the
val ue NaN.
Source Allowable Values of Estimate
'anova1' I gnored. Al ways compare the group means.
'anova2' Ei ther 'column' (the defaul t) or 'row' to compare
col umn or row means.
'anovan' I gnored. Al ways compare the popul ati on margi nal
means as speci fi ed by the dim argument.
'aoctool' Ei ther 'slope', 'intercept', or 'pmm' to compare
sl opes, i ntercepts, or popul ati on margi nal means. I f
the anal ysi s of covari ance model di d not i ncl ude
separate sl opes, then 'slope' i s not al l owed. I f i t di d
not i ncl ude separate i ntercepts, then no compari sons
are possi bl e.
'friedman' I gnored. Al ways compare average col umn ranks.
'kruskalwallis' I gnored. Al ways compare average group ranks.
multcompare
2-204
Popul ati on margi nal means are descri bed by Mi l l i ken and Johnson (1992) and
by Searl e, Speed, and Mi l l i ken (1980). The i dea behi nd popul ati on margi nal
means i s to remove any effect of an unbal anced desi gn by fi xi ng the val ues of
the factors speci fi ed by dim, and averagi ng out the effects of other factors as i f
each factor combi nati on occurred the same number of ti mes. The defi ni ti on of
popul ati on margi nal means does not depend on the number of observati ons at
each factor combi nati on. For desi gned experi ments where the number of
observati ons at each factor combi nati on has no meani ng, popul ati on margi nal
means can be easi er to i nterpret than si mpl e means i gnori ng other factors. For
surveys and other studi es where the number of observati ons at each
combi nati on does have meani ng, popul ati on margi nal means may be harder to
i nterpret.
[c,m] = multcompare(...) returns an addi ti onal matri x m. The fi rst col umn
of m contai ns the esti mated val ues of the means (or whatever stati sti cs are
bei ng compared) for each group, and the second col umn contai ns thei r standard
errors.
[c,m,h] = multcompare(...) returns a handl e h to the compari son graph.
Note that the ti tl e of thi s graph contai ns i nstructi ons for i nteracti ng wi th the
graph, and the x-axi s l abel contai ns i nformati on about whi ch means are
si gni fi cantl y di fferent from the sel ected mean. I f you pl an to use thi s graph for
presentati on, you may want to omi t the ti tl e and the x-axi s l abel . You can
remove them usi ng i nteracti ve features of the graph wi ndow, or you can use the
fol l owi ng commands.
title('')
xlabel('')
Example Lets revi si t the anova1 exampl e testi ng the materi al strength i n structural
beams. From the anova1 output we found si gni fi cant evi dence that the three
types of beams are not equi val ent i n strength. Now we can determi ne where
those di fferences l i e. Fi rst we create the data arrays and we perform one-way
ANOVA.
strength = [82 86 79 83 84 85 86 87 74 82 78 75 76 77 79 ...
79 77 78 82 79];
'al1','al1','al1','al1','al1','al1',...
'al2','al2','al2','al2','al2','al2'};
multcompare
2-205
[p,a,s] = anova1(strength,alloy);
Among the outputs i s a structure that we can use as i nput to multcompare.
multcompare(s)
ans =
1.0000 2.0000 3.6064 7.0000 10.3936
1.0000 3.0000 1.6064 5.0000 8.3936
2.0000 3.0000 -5.6280 -2.0000 1.6280
The thi rd row of the output matri x shows that the di fferences i n strength
between the two al l oys i s not si gni fi cant. A 95% confi dence i nterval for the
di fference i s [-5.6, 1.6], so we cannot reject the hypothesi s that the true
di fference i s zero.
The fi rst two rows show that both compari sons i nvol vi ng the fi rst group (steel )
have confi dence i nterval s that do not i ncl ude zero. I n other words, those
di fferences are si gni fi cant. The graph shows the same i nformati on.
See Also anova1, anova2, anovan, aoctool, friedman, kruskalwallis
74 76 78 80 82 84 86
al2
al1
st
Click on the group you want to test
2 groups have slopes significantly different from st
multcompare
2-206
References Hochberg, Y., and A. C. Tamhane, Multiple Comparison Procedures, 1987,
Wi l ey.
Mi l l i ken, G. A., and D. E. Johnson, Analysis of Messy Data, Volume 1: Designed
Experiments, 1992, Chapman & Hal l .
Searl e, S. R., F. M. Speed, and G. A. Mi l l i ken, Popul ati on margi nal means i n
the l i near model : an al ternati ve to l east squares means, American
Statistician, 1980, pp. 216-221.
mvnrnd
2-207
2mvnrnd
Purpose Random matri ces from the mul ti vari ate normal di stri buti on.
Syntax r = mvnrnd(mu,SIGMA,cases)
Description r = mvnrnd(mu,SIGMA,cases) returns a matri x of random numbers chosen
from the mul ti vari ate normal di stri buti on wi th mean vector mu and covari ance
matri x SIGMA. cases speci fi es the number of rows i n r.
SIGMA i s a symmetri c posi ti ve defi ni te matri x wi th si ze equal to the l ength
of mu.
Example mu = [2 3];
sigma = [1 1.5; 1.5 3];
r = mvnrnd(mu,sigma,100);
plot(r(:,1),r(:,2),'+')
See Also normrnd
-1 0 1 2 3 4 5
-2
0
2
4
6
8
mvtrnd
2-208
2mvtrnd
Purpose Random matri ces from the mul ti vari ate t di stri buti on.
Syntax r = mvtrnd(C,df,cases)
Description r = mvtrnd(C,df,cases) returns a matri x of random numbers chosen from
the mul ti vari ate t di stri buti on, where C i s a correl ati on matri x. df i s the
degrees of freedom and i s ei ther a scal ar or i s a vector wi th cases el ements. I f
p i s the number of col umns i n C, then the output r has cases rows and p
col umns.
Let t represent a row of r. Then the di stri buti on of t i s that of a vector havi ng
a mul ti vari ate normal di stri buti on wi th mean 0, vari ance 1, and covari ance
matri x C, di vi ded by an i ndependent chi -square random val ue havi ng df
degrees of freedom. The rows of r are i ndependent.
C must be a square, symmetri c and posi ti ve defi ni te matri x. I f i ts di agonal
el ements are not al l 1 (that i s, i f C i s a covari ance matri x rather than a
correl ati on matri x), mvtrnd computes the equi val ent correl ati on matri x before
generati ng the random numbers.
Example sigma = [1 0.8;0.8 1];
r = mvtrnd(sigma,3,100);
plot(r(:,1),r(:,2),'+')
See Also mvnrnd, trnd
4 2 0 2 4 6 8 10 12
4
2
0
2
4
6
8
10
nanmax
2-209
2nanmax
Purpose Maxi mum i gnori ng NaNs.
Syntax m = nanmax(a)
[m,ndx] = nanmax(a)
m = nanmax(a,b)
Description m = nanmax(a) returns the maxi mum wi th NaNs treated as mi ssi ng. For
vectors, nanmax(a) i s the l argest non-NaN el ement i n a. For matri ces,
nanmax(A) i s a row vector contai ni ng the maxi mum non-NaN el ement from each
col umn.
[m,ndx] = nanmax(a) al so returns the i ndi ces of the maxi mum val ues i n
vector ndx.
m = nanmax(a,b) returns the l arger of a or b, whi ch must match i n si ze.
Example m = magic(3);
m([1 6 8]) = [NaN NaN NaN]
m =
NaN 1 6
3 5 NaN
4 NaN 2
[nmax,maxidx] = nanmax(m)
nmax =
4 5 6
maxidx =
3 2 1
See Also nanmin, nanmean, nanmedian, nanstd, nansum
nanmean
2-210
2nanmean
Purpose Mean i gnori ng NaNs
Syntax y = nanmean(X)
Description y = nanmean(X) i s the average computed by treati ng NaNs as mi ssi ng val ues.
For vectors, nanmean(x) i s the mean of the non-NaN el ements of x. For matri ces,
nanmean(X) i s a row vector contai ni ng the mean of the non-NaN el ements i n
each col umn.
m([1 6 8]) = [NaN NaN NaN]
m =
NaN 1 6
3 5 NaN
4 NaN 2
nmean = nanmean(m)
nmean =
3.5000 3.0000 4.0000
See Also nanmin, nanmax, nanmedian, nanstd, nansum
nanmedian
2-211
2nanmedi an
Purpose Medi an i gnori ng NaNs
Syntax y = nanmedian(X)
Description y = nanmedian(X) i s the medi an computed by treati ng NaNs as mi ssi ng val ues.
For vectors, nanmedian(x) i s the medi an of the non-NaN el ements of x. For
matri ces, nanmedian(X) i s a row vector contai ni ng the medi an of the non-NaN
el ements i n each col umn of X.
m([1 6 9 11]) = [NaN NaN NaN NaN]
m =
NaN 2 NaN 13
5 NaN 10 8
9 7 NaN 12
4 14 15 1
nmedian = nanmedian(m)
nmedian =
5.0000 7.0000 12.5000 10.0000
See Also nanmin, nanmax, nanmean, nanstd, nansum
nanmin
2-212
2nanmi n
Purpose Mi ni mum i gnori ng NaNs
Syntax m = nanmin(a)
[m,ndx] = nanmin(a)
m = nanmin(a,b)
Description m = nanmin(a) i s the mi ni mum computed by treati ng NaNs as mi ssi ng val ues.
For vectors, nanmin(a) i s the smal l est non-NaN el ement i n a. For matri ces,
nanmin(A) i s a row vector contai ni ng the mi ni mum non-NaN el ement from each
col umn.
[m,ndx] = nanmin(a) al so returns the i ndi ces of the mi ni mum val ues i n
vector ndx.
m = nanmin(a,b) returns the smal l er of a or b, whi ch must match i n si ze.
m([1 6 8]) = [NaN NaN NaN]
m =
NaN 1 6
3 5 NaN
4 NaN 2
[nmin,minidx] = nanmin(m)
nmin =
3 1 2
minidx =
2 1 3
See Also nanmax, nanmean, nanmedian, nanstd, nansum
nanstd
2-213
2nanstd
Purpose Standard devi ati on i gnori ng NaNs.
Syntax y = nanstd(X)
Description y = nanstd(X) i s the standard devi ati on computed by treati ng NaNs as
mi ssi ng val ues.
For vectors, nanstd(x) i s the standard devi ati on of the non-NaN el ements of x.
For matri ces, nanstd(X) i s a row vector contai ni ng the standard devi ati ons of
the non-NaN el ements i n each col umn of X.
m([1 6 8]) = [NaN NaN NaN]
m =
NaN 1 6
3 5 NaN
4 NaN 2
nstd = nanstd(m)
nstd =
0.7071 2.8284 2.8284
See Also nanmax, nanmin, nanmean, nanmedian, nansum
nansum
2-214
2nansum
Purpose Sum i gnori ng NaNs.
Syntax y = nansum(X)
Description y = nansum(X) i s the sum computed by treati ng NaNs as mi ssi ng val ues.
For vectors, nansum(x) i s the sum of the non-NaN el ements of x. For matri ces,
nansum(X) i s a row vector contai ni ng the sum of the non-NaN el ements i n each
col umn of X.
m([1 6 8]) = [NaN NaN NaN]
m =
NaN 1 6
3 5 NaN
4 NaN 2
nsum = nansum(m)
nsum =
7 6 8
See Also nanmax, nanmin, nanmean, nanmedian, nanstd
nbincdf
2-215
2nbi ncdf
Purpose Negati ve bi nomi al cumul ati ve di stri buti on functi on.
Syntax Y = nbincdf(X,R,P)
Description Y = nbincdf(X,R,P) computes the negati ve bi nomi al cdf at each of the val ues
i n X usi ng the correspondi ng parameters i n R and P. Vector or matri x i nputs for
X, R, and P must have the same si ze, whi ch i s al so the si ze of Y. A scal ar i nput
for X, R, or P i s expanded to a constant matri x wi th the same di mensi ons as the
other i nputs.
The negati ve bi nomi al cdf i s
The moti vati on for the negati ve bi nomi al i s the case of successi ve tri al s, each
havi ng a constant probabi l i ty P of success. What you want to fi nd out i s how
many extra tri al s you must do to observe a gi ven number R of successes.
Example x = (0:15);
p = nbincdf(x,3,0.5);
stairs(x,p)
See Also cdf, nbininv, nbinpdf, nbinrnd, nbinstat
y F x r p , ( )
r i 1 +
i ,
_
i 0 =
x
p
r
q
i
I
0 1 , , ( )
i ( ) = =
0 5 10 15
0
0.2
0.4
0.6
0.8
1
nbininv
2-216
2nbi ni nv
Purpose I nverse of the negati ve bi nomi al cumul ati ve di stri buti on functi on (cdf).
Syntax X = nbininv(Y,R,P)
Description X = nbininv(Y,R,P) returns the i nverse of the negati ve bi nomi al cdf wi th
parameters R and P at the correspondi ng probabi l i ti es i n P. Si nce the bi nomi al
di stri buti on i s di screte, nbininv returns the l east i nteger X such that the
negati ve bi nomi al cdf eval uated at X equal s or exceeds Y. Vector or matri x
i nputs for Y, R, and P must have the same si ze, whi ch i s al so the si ze of X. A
scal ar i nput for Y, R, or P i s expanded to a constant matri x wi th the same
di mensi ons as the other i nputs.
The negati ve bi nomi al cdf model s consecuti ve tri al s, each havi ng a constant
probabi l i ty P of success. The parameter R i s the number of successes requi red
before stoppi ng.
Example How many ti mes woul d you need to fl i p a fai r coi n to have a 99% probabi l i ty of
havi ng observed 10 heads?
flips = nbininv(0.99,10,0.5) + 10
flips =
33
Note that you have to fl i p at l east 10 ti mes to get 10 heads. That i s why the
second term on the ri ght si de of the equal s si gn i s a 10.
See Also icdf, nbincdf, nbinpdf, nbinrnd, nbinstat
nbinpdf
2-217
2nbi npdf
Purpose Negati ve bi nomi al probabi l i ty densi ty functi on.
Syntax Y = nbinpdf(X,R,P)
Description Y = nbinpdf(X,R,P) returns the negati ve bi nomi al pdf at each of the val ues
i n X usi ng the correspondi ng parameters i n R and P. Vector or matri x i nputs for
X, R, and P must have the same si ze, whi ch i s al so the si ze of Y. A scal ar i nput
for X, R, or P i s expanded to a constant matri x wi th the same di mensi ons as the
other i nputs. Note that the densi ty functi on i s zero unl ess the val ues i n X are
i ntegers.
The negati ve bi nomi al pdf i s
The negati ve bi nomi al pdf model s consecuti ve tri al s, each havi ng a constant
probabi l i ty P of success. The parameter R i s the number of successes requi red
before stoppi ng.
Example x = (0:10);
y = nbinpdf(x,3,0.5);
plot(x,y,'+')
set(gca,'Xlim',[-0.5,10.5])
See Also nbincdf, nbininv, nbinrnd, nbinstat, pdf
y f x r p , ( )
r x 1 +
x ,
_
p
r
q
x
I
0 1 , , ( )
x ( ) = =
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
nbinrnd
2-218
2nbi nrnd
Purpose Random matri ces from a negati ve bi nomi al di stri buti on.
Syntax RND = nbinrnd(R,P)
RND = nbinrnd(R,P,m)
RND = nbinrnd(R,P,m,n)
Description RND = nbinrnd(R,P) i s a matri x of random numbers chosen from a negati ve
bi nomi al di stri buti on wi th parameters R and P. Vector or matri x i nputs for R
and P must have the same si ze, whi ch i s al so the si ze of RND. A scal ar i nput for
R or P i s expanded to a constant matri x wi th the same di mensi ons as the other
i nput.
RND = nbinrnd(R,P,m) generates random numbers wi th parameters R and P,
where m i s a 1-by-2 vector that contai ns the row and col umn di mensi ons of RND.
RND = nbinrnd(R,P,m,n) generates random numbers wi th parameters R
and P, where scal ars m and n are the row and col umn di mensi ons of RND.
The negati ve bi nomi al di stri buti on model s consecuti ve tri al s, each havi ng a
constant probabi l i ty P of success. The parameter R i s the number of successes
requi red before stoppi ng.
Example Suppose you want to si mul ate a process that has a defect probabi l i ty of 0.01.
How many uni ts mi ght Qual i ty Assurance i nspect before fi ndi ng three
defecti ve i tems?
r = nbinrnd(3,0.01,1,6) + 3
r =
496 142 420 396 851 178
See Also nbincdf, nbininv, nbinpdf, nbinstat
nbinstat
2-219
2nbi nstat
Purpose Mean and vari ance of the negati ve bi nomi al di stri buti on.
Syntax [M,V] = nbinstat(R,P)
Description [M,V] = nbinstat(R,P) returns the mean and vari ance of the negati ve
bi nomi al di stri buti on wi th parameters R and P. Vector or matri x i nputs for R
and P must have the same si ze, whi ch i s al so the si ze of M and V. A scal ar i nput
for R or P i s expanded to a constant matri x wi th the same di mensi ons as the
other i nput.
The mean of the negati ve bi nomi al di stri buti on wi th parameters r and p i s rq/p,
where q = 1-p. The vari ance i s rq/p
2
.
Example p = 0.1:0.2:0.9;
r = 1:5;
[R,P] = meshgrid(r,p);
[M,V] = nbinstat(R,P)
M =
9.0000 18.0000 27.0000 36.0000 45.0000
2.3333 4.6667 7.0000 9.3333 11.6667
1.0000 2.0000 3.0000 4.0000 5.0000
0.4286 0.8571 1.2857 1.7143 2.1429
0.1111 0.2222 0.3333 0.4444 0.5556
V =
90.0000 180.0000 270.0000 360.0000 450.0000
7.7778 15.5556 23.3333 31.1111 38.8889
2.0000 4.0000 6.0000 8.0000 10.0000
0.6122 1.2245 1.8367 2.4490 3.0612
0.1235 0.2469 0.3704 0.4938 0.6173
See Also nbincdf, nbininv, nbinpdf, nbinrnd
ncfcdf
2-220
2ncfcdf
Purpose Noncentral F cumul ati ve di stri buti on functi on (cdf).
Syntax P = ncfcdf(X,NU1,NU2,DELTA)
Description P = ncfcdf(X,NU1,NU2,DELTA) computes the noncentral F cdf at each of the
val ues i n X usi ng the correspondi ng numerator degrees of freedom i n NU1,
denomi nator degrees of freedom i n NU2, and posi ti ve noncentral i ty parameters
i n DELTA. Vector or matri x i nputs for X, NU1, NU2, and DELTA must have the same
si ze, whi ch i s al so the si ze of P. A scal ar i nput for X, NU1, NU2, or DELTA i s
expanded to a constant matri x wi th the same di mensi ons as the other i nputs.
The noncentral F cdf i s
where I (x| a,b) i s the i ncompl ete beta functi on wi th parameters a and b.
Example Compare the noncentral F cdf wi th = 10 to the F cdf wi th the same number of
numerator and denomi nator degrees of freedom (5 and 20 respecti vel y).
x = (0.01:0.1:10.01)';
p1 = ncfcdf(x,5,20,10);
p = fcdf(x,5,20);
plot(x,p,'--',x,p1,'-')
F x
1

2
, , ( )
1
2
---
,
_
j
j!
--------------e

2
---
,

_
I

1
x
2
+
1
x
-------------------------

1
2
------ j +

2
2
------ ,
,

_
j 0 =
=
0 2 4 6 8 10 12
0
0.2
0.4
0.6
0.8
1
ncfcdf
2-221
References Johnson, N., and S. Kotz, Distributions in Statistics: Continuous Univariate
Distributions-2, John Wi l ey and Sons, 1970. pp. 189200.
See Also cdf, ncfpdf, ncfinv, ncfrnd, ncfstat
ncfinv
2-222
2ncfi nv
Purpose I nverse of the noncentral F cumul ati ve di stri buti on functi on (cdf).
Syntax X = ncfinv(P,NU1,NU2,DELTA)
Description X = ncfinv(P,NU1,NU2,DELTA) returns the i nverse of the noncentral F cdf
wi th numerator degrees of freedom NU1, denomi nator degrees of freedom NU2,
and posi ti ve noncentral i ty parameter DELTA for the correspondi ng probabi l i ti es
i n P. Vector or matri x i nputs for P, NU1, NU2, and DELTA must have the same
si ze, whi ch i s al so the si ze of X. A scal ar i nput for P, NU1, NU2, or DELTA i s
Example One hypothesi s test for compari ng two sampl e vari ances i s to take thei r rati o
and compare i t to an F di stri buti on. I f the numerator and denomi nator degrees
of freedom are 5 and 20 respecti vel y, then you reject the hypothesi s that the
fi rst vari ance i s equal to the second vari ance i f thei r rati o i s l ess than that
computed bel ow.
critical = finv(0.95,5,20)
critical =
2.7109
Suppose the truth i s that the fi rst vari ance i s twi ce as bi g as the second
vari ance. How l i kel y i s i t that you woul d detect thi s di fference?
prob = 1 - ncfcdf(critical,5,20,2)
prob =
0.1297
References Evans, M., N. Hasti ngs, and B. Peacock, Statistical Distributions, Second
Johnson, N., and S. Kotz, Distributions in Statistics: Continuous Univariate
See Also icdf, ncfcdf, ncfpdf, ncfrnd, ncfstat
ncfpdf
2-223
2ncfpdf
Purpose Noncentral F probabi l i ty densi ty functi on.
Syntax Y = ncfpdf(X,NU1,NU2,DELTA)
Description Y = ncfpdf(X,NU1,NU2,DELTA) computes the noncentral F pdf at each of the
val ues i n X usi ng the correspondi ng numerator degrees of freedom i n NU1,
denomi nator degrees of freedom i n NU2, and posi ti ve noncentral i ty parameters
i n DELTA. Vector or matri x i nputs for X, NU1, NU2, and DELTA must have the same
si ze, whi ch i s al so the si ze of Y. A scal ar i nput for P, NU1, NU2, or DELTA i s
The F di stri buti on i s a speci al case of the noncentral F where = 0. As
i ncreases, the di stri buti on fl attens l i ke the pl ot i n the exampl e.
Example Compare the noncentral F pdf wi th = 10 to the F pdf wi th the same number
of numerator and denomi nator degrees of freedom (5 and 20 respecti vel y).
x = (0.01:0.1:10.01)';
p1 = ncfpdf(x,5,20,10);
p = fpdf(x,5,20);
plot(x,p,'--',x,p1,'-')
See Also ncfcdf, ncfinv, ncfrnd, ncfstat, pdf
0 2 4 6 8 10 12
0
0.2
0.4
0.6
0.8
ncfrnd
2-224
2ncfrnd
Purpose Random matri ces from the noncentral F di stri buti on.
Syntax R = ncfrnd(NU1,NU2,DELTA)
R = ncfrnd(NU1,NU2,DELTA,m)
R = ncfrnd(NU1,NU2,DELTA,m,n)
Description R = ncfrnd(NU1,NU2,DELTA) returns a matri x of random numbers chosen from
the noncentral F di stri buti on wi th parameters NU1, NU2 and DELTA. Vector or
matri x i nputs for NU1, NU2, and DELTA must have the same si ze, whi ch i s al so
the si ze of R. A scal ar i nput for NU1, NU2, or DELTA i s expanded to a constant
R = ncfrnd(NU1,NU2,DELTA,m) returns a matri x of random numbers wi th
parameters NU1, NU2, and DELTA, where m i s a 1-by-2 vector that contai ns the
row and col umn di mensi ons of R.
R = ncfrnd(NU1,NU2,DELTA,m,n) generates random numbers wi th
parameters NU1, NU2, and DELTA, where scal ars m and n are the row and col umn
di mensi ons of R.
Example Compute si x random numbers from a noncentral F di stri buti on wi th 10
numerator degrees of freedom, 100 denomi nator degrees of freedom and a
noncentral i ty parameter, , of 4.0. Compare thi s to the F di stri buti on wi th the
same degrees of freedom.
r = ncfrnd(10,100,4,1,6)
r =
2.5995 0.8824 0.8220 1.4485 1.4415 1.4864
r1 = frnd(10,100,1,6)
r1 =
0.9826 0.5911 1.0967 0.9681 2.0096 0.6598
See Also ncfcdf, ncfinv, ncfpdf, ncfstat
ncfstat
2-225
2ncfstat
Purpose Mean and vari ance of the noncentral F di stri buti on.
Syntax [M,V] = ncfstat(NU1,NU2,DELTA)
Description [M,V] = ncfstat(NU1,NU2,DELTA) returns the mean and vari ance of the
noncentral F pdf wi th NU1 and NU2 degrees of freedom and noncentral i ty
parameter DELTA. Vector or matri x i nputs for NU1, NU2, and DELTA must have
the same si ze, whi ch i s al so the si ze of M and V. A scal ar i nput for NU1, NU2, or
DELTA i s expanded to a constant matri x wi th the same di mensi ons as the other
i nput.
The mean of the noncentral F di stri buti on wi th parameters
1
,
2
, and i s

where
2
> 2.
The vari ance i s
where
2
> 4.
Example [m,v]= ncfstat(10,100,4)
m =
1.4286
v =
3.9200
See Also ncfcdf, ncfinv, ncfpdf, ncfrnd
2
+
1
( )
1

2
2 ( )
--------------------------
2

2
1
------
,
_
2
+
1
( )
2
2 +
1
( )
2
2 ( ) +
2
2 ( )
2
2
4 ( )
------------------------------------------------------------------------- -
nctcdf
2-226
2nctcdf
Purpose Noncentral T cumul ati ve di stri buti on functi on.
Syntax P = nctcdf(X,NU,DELTA)
Description P = nctcdf(X,NU,DELTA) computes the noncentral T cdf at each of the val ues
i n X usi ng the correspondi ng degrees of freedom i n NU and noncentral i ty
parameters i n DELTA. Vector or matri x i nputs for X, NU, and DELTA must have
the same si ze, whi ch i s al so the si ze of P. A scal ar i nput for X, NU, or DELTA i s
Example Compare the noncentral T cdf wi th DELTA = 1 to the T cdf wi th the same
number of degrees of freedom (10).
x = (-5:0.1:5)';
p1 = nctcdf(x,10,1);
p = tcdf(x,10);
plot(x,p,'--',x,p1,'-')
See Also cdf, nctcdf, nctinv, nctpdf, nctrnd, nctstat
-5 0 5
0
0.2
0.4
0.6
0.8
1
nctinv
2-227
2ncti nv
Purpose I nverse of the noncentral T cumul ati ve di stri buti on.
Syntax X = nctinv(P,NU,DELTA)
Description X = nctinv(P,NU,DELTA) returns the i nverse of the noncentral T cdf wi th NU
degrees of freedom and noncentral i ty parameter DELTA for the correspondi ng
probabi l i ti es i n P. Vector or matri x i nputs for P, NU, and DELTA must have the
same si ze, whi ch i s al so the si ze of X. A scal ar i nput for P, NU, or DELTA i s
Example x = nctinv([0.1 0.2],10,1)
x =
-0.2914 0.1618
See Also icdf, nctcdf, nctpdf, nctrnd, nctstat
nctpdf
2-228
2nctpdf
Purpose Noncentral T probabi l i ty densi ty functi on (pdf).
Syntax Y = nctpdf(X,V,DELTA)
Description Y = nctpdf(X,V,DELTA) computes the noncentral T pdf at each of the val ues
i n X usi ng the correspondi ng degrees of freedom i n V and noncentral i ty
parameters i n DELTA. Vector or matri x i nputs for X, V, and DELTA must have the
same si ze, whi ch i s al so the si ze of Y. A scal ar i nput for X, V, or DELTA i s
Example Compare the noncentral T pdf wi th DELTA = 1 to the T pdf wi th the same
number of degrees of freedom (10).
x = (-5:0.1:5)';
p1 = nctpdf(x,10,1);
p = tpdf(x,10);
plot(x,p,'--',x,p1,'-')
See Also nctcdf, nctinv, nctrnd, nctstat, pdf
-5 0 5
0
0.1
0.2
0.3
0.4
nctrnd
2-229
2nctrnd
Purpose Random matri ces from noncentral T di stri buti on.
Syntax R = nctrnd(V,DELTA)
R = nctrnd(V,DELTA,m)
R = nctrnd(V,DELTA,m,n)
Description R = nctrnd(V,DELTA) returns a matri x of random numbers chosen from the
noncentral T di stri buti on wi th parameters V and DELTA. Vector or matri x
i nputs for V and DELTA must have the same si ze, whi ch i s al so the si ze of R. A
scal ar i nput for V or DELTA i s expanded to a constant matri x wi th the same
R = nctrnd(V,DELTA,m) returns a matri x of random numbers wi th
parameters V and DELTA, where m i s a 1-by-2 vector that contai ns the row and
R = nctrnd(V,DELTA,m,n) generates random numbers wi th parameters V and
DELTA, where scal ars m and n are the row and col umn di mensi ons of R.
Example nctrnd(10,1,5,1)
ans =
1.6576
1.0617
1.4491
0.2930
3.6297
See Also nctcdf, nctinv, nctpdf, nctstat
nctstat
2-230
2nctstat
Purpose Mean and vari ance for the noncentral t di stri buti on.
Syntax [M,V] = nctstat(NU,DELTA)
Description [M,V] = nctstat(NU,DELTA) returns the mean and vari ance of the
noncentral t pdf wi th NU degrees of freedom and noncentral i ty parameter
DELTA. Vector or matri x i nputs for NU and DELTA must have the same si ze, whi ch
i s al so the si ze of M and V. A scal ar i nput for NU or DELTA i s expanded to a
constant matri x wi th the same di mensi ons as the other i nput.
The mean of the noncentral t di stri buti on wi th parameters and i s

where > 1.
The vari ance i s
where > 2.
Example [m,v] = nctstat(10,1)
m =
1.0837
v =
1.3255
See Also nctcdf, nctinv, nctpdf, nctrnd
2 ( )
1 2
1 ( ) 2 ( )
2 ( )
-------------------------------------------------------------
2 ( )
----------------- 1
2
+ ( )

2
---
2 1 ( ) 2 ( )
2 ( )
---------------------------------
2
ncx2cdf
2-231
2ncx2cdf
Purpose Noncentral chi -square cumul ati ve di stri buti on functi on (cdf).
Syntax P = ncx2cdf(X,V,DELTA)
Description P = ncx2cdf(X,V,DELTA) computes the noncentral chi -square cdf at each of
the val ues i n X usi ng the correspondi ng degrees of freedom i n V and posi ti ve
noncentral i ty parameters i n DELTA. Vector or matri x i nputs for X, V, and DELTA
must have the same si ze, whi ch i s al so the si ze of P. A scal ar i nput for X, V, or
i nputs.
Some texts refer to thi s di stri buti on as the general i zed Rayl ei gh,
Rayl ei gh-Ri ce, or Ri ce di stri buti on.
The noncentral chi -square cdf i s
Example x = (0:0.1:10)';
p1 = ncx2cdf(x,4,2);
p = chi2cdf(x,4);
plot(x,p,'--',x,p1,'-')
F x , ( )
1
2
---
,
_
j
j!
--------------e

2
---
,

_
Pr
2j +
2
x [ ]
j 0 =
=
0 2 4 6 8 10
0
0.2
0.4
0.6
0.8
1
ncx2cdf
2-232
See Also cdf, ncx2inv, ncx2pdf, ncx2rnd, ncx2stat
ncx2inv
2-233
2ncx2i nv
Purpose I nverse of the noncentral chi -square cdf.
Syntax X = ncx2inv(P,V,DELTA)
Description X = ncx2inv(P,V,DELTA) returns the i nverse of the noncentral chi -square cdf
wi th parameters V and DELTA at the correspondi ng probabi l i ti es i n P. Vector or
matri x i nputs for P, V, and DELTA must have the same si ze, whi ch i s al so the si ze
of X. A scal ar i nput for P, V, or DELTA i s expanded to a constant matri x wi th the
same di mensi ons as the other i nputs.
Algorithm ncx2inv uses Newtons method to converge to the sol uti on.
Example ncx2inv([0.01 0.05 0.1],4,2)
ans =
0.4858 1.1498 1.7066
See Also icdf, ncx2cdf, ncx2pdf, ncx2rnd, ncx2stat
ncx2pdf
2-234
2ncx2pdf
Purpose Noncentral chi -square probabi l i ty densi ty functi on (pdf).
Syntax Y = ncx2pdf(X,V,DELTA)
Description Y = ncx2pdf(X,V,DELTA) computes the noncentral chi -square pdf at each of
the val ues i n X usi ng the correspondi ng degrees of freedom i n V and posi ti ve
noncentral i ty parameters i n DELTA. Vector or matri x i nputs for X, V, and DELTA
must have the same si ze, whi ch i s al so the si ze of Y. A scal ar i nput for X, V, or
i nputs.
Some texts refer to thi s di stri buti on as the general i zed Rayl ei gh,
Rayl ei gh-Ri ce, or Ri ce di stri buti on.
Example As the noncentral i ty parameter i ncreases, the di stri buti on fl attens as shown
i n the pl ot.
x = (0:0.1:10)';
p1 = ncx2pdf(x,4,2);
p = chi2pdf(x,4);
plot(x,p,'--',x,p1,'-')
See Also ncx2cdf, ncx2inv, ncx2rnd, ncx2stat, pdf
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
ncx2rnd
2-235
2ncx2rnd
Purpose Random matri ces from the noncentral chi -square di stri buti on.
Syntax R = ncx2rnd(V,DELTA)
R = ncx2rnd(V,DELTA,m)
R = ncx2rnd(V,DELTA,m,n)
Description R = ncx2rnd(V,DELTA) returns a matri x of random numbers chosen from the
non-central chi -square di stri buti on wi th parameters V and DELTA. Vector or
matri x i nputs for V and DELTA must have the same si ze, whi ch i s al so the si ze
of R. A scal ar i nput for V or DELTA i s expanded to a constant matri x wi th the
same di mensi ons as the other i nput.
R = ncx2rnd(V,DELTA,m) returns a matri x of random numbers wi th
parameters V and DELTA, where m i s a 1-by-2 vector that contai ns the row and
R = ncx2rnd(V,DELTA,m,n) generates random numbers wi th parameters V and
DELTA, where scal ars m and n are the row and col umn di mensi ons of R.
Example ncx2rnd(4,2,6,3)
ans =
6.8552 5.9650 11.2961
5.2631 4.2640 5.9495
9.1939 6.7162 3.8315
10.3100 4.4828 7.1653
2.1142 1.9826 4.6400
3.8852 5.3999 0.9282
See Also ncx2cdf, ncx2inv, ncx2pdf, ncx2stat
ncx2stat
2-236
2ncx2stat
Purpose Mean and vari ance for the noncentral chi -square di stri buti on.
Syntax [M,V] = ncx2stat(NU,DELTA)
Description [M,V] = ncx2stat(NU,DELTA) returns the mean and vari ance of the noncentral
chi -square pdf wi th NU degrees of freedom and noncentral i ty parameter DELTA.
Vector or matri x i nputs for NU and DELTA must have the same si ze, whi ch i s al so
the si ze of M and V. A scal ar i nput for NU or DELTA i s expanded to a constant
matri x wi th the same di mensi ons as the other i nput.
The mean of the noncentral chi -square di stri buti on wi th parameters and i s
, and the vari ance i s .
Example [m,v] = ncx2stat(4,2)
m =
6
v =
16
See Also ncx2cdf, ncx2inv, ncx2pdf, ncx2rnd
+ 2 2 + ( )
nlinfit
2-237
2nl i nfi t
Purpose Nonl i near l east-squares data fi tti ng by the Gauss-Newton method.
Syntax [beta,r,J] = nlinfit(X,y,FUN,beta0)
Description beta = nlinfit(X,y,FUN,beta0) returns the coeffi ci ents of the nonl i near
functi on descri bed i n FUN. FUN can be a functi on handl e speci fi ed usi ng @, an
i nl i ne functi on, or a quoted text stri ng contai ni ng the name of a functi on.
The functi on FUN has the form . I t returns the predi cted val ues of y
gi ven i ni ti al parameter esti mates and the i ndependent vari abl e X.
The matri x X has one col umn per i ndependent vari abl e. The response, y, i s a
col umn vector wi th the same number of rows as X.
[beta,r,J] = nlinfit(X,y,FUN,beta0) returns the fi tted coeffi ci ents, beta,
the resi dual s, r, and the Jacobi an, J, for use wi th nlintool to produce error
esti mates on predi cti ons.
Example load reaction
betafit = nlinfit(reactants,rate,@hougen,beta)
betafit =
1.2526
0.0628
0.0400
0.1124
1.1914
See Also nlintool
y f X , ( ) =
nlintool
2-238
2nl i ntool
Purpose Fi ts a nonl i near equati on to data and di spl ays an i nteracti ve graph.
Syntax nlintool(x,y,FUN,beta0)
nlintool(x,y,FUN,beta0,alpha)
nlintool(x,y,FUN,beta0,alpha,'xname','yname')
Description nlintool(x,y,FUN,beta0) i s a predi cti on pl ot that provi des a nonl i near curve
fi t to (x,y) data. I t pl ots a 95% gl obal confi dence i nterval for predi cti ons as two
red curves. beta0 i s a vector contai ni ng i ni ti al guesses for the parameters.
nlintool(x,y,FUN,beta0,alpha) pl ots a 100(1-alpha)% confi dence
i nterval for predi cti ons.
nlintool di spl ays a vector of pl ots, one for each col umn of the matri x of
i nputs, x. The response vari abl e, y, i s a col umn vector that matches the number
of rows i n x.
The defaul t val ue for alpha i s 0.05, whi ch produces 95% confi dence i nterval s.
nlintool(x,y,FUN,beta0,alpha,'xname','yname') l abel s the pl ot usi ng the
stri ng matri x, 'xname for the x vari abl es and the stri ng 'yname for the y
vari abl e.
You can drag the dotted whi te reference l i ne and watch the predi cted val ues
update si mul taneousl y. Al ternati vel y, you can get a speci fi c predi cti on by
typi ng the val ue for x i nto an edi tabl e text fi el d. Use the pop-up menu l abel ed
Export to move speci fi ed vari abl es to the base workspace. You can change the
type of confi dence bounds usi ng the Bounds menu.
Example See Nonl i near Regressi on Model s on page 1-100.
See Also nlinfit, rstool
nlparci
2-239
2nl parci
Purpose Confi dence i nterval s on esti mates of parameters i n nonl i near model s.
Syntax ci = nlparci(beta,r,J)
Description nlparci(beta,r,J) returns the 95% confi dence i nterval ci on the nonl i near
l east squares parameter esti mates beta, gi ven the resi dual s r and the
Jacobi an matri x J at the sol uti on. The confi dence i nterval cal cul ati on i s val i d
for systems where the number of rows of J exceeds the l ength of beta.
nlparci uses the outputs of nlinfit for i ts i nputs.
Example Conti nui ng the exampl e from nlinfit:
load reaction
[beta,resids,J] = nlinfit(reactants,rate,'hougen',beta);
ci = nlparci(beta,resids,J)
ci =
-1.0798 3.3445
-0.0524 0.1689
-0.0437 0.1145
-0.0891 0.2941
-1.1719 3.7321
See Also nlinfit, nlintool, nlpredci
nlpredci
2-240
2nl predci
Purpose Confi dence i nterval s on predi cti ons of nonl i near model s.
Syntax ypred = nlpredci(FUN,inputs,beta,r,J)
[ypred,delta] = nlpredci(FUN,inputs,beta,r,J)
ypred = nlpredci(FUN,inputs,beta,r,J,alpha,'simopt','predopt')
Description ypred = nlpredci(FUN,inputs,beta,r,J) returns the predi cted responses,
ypred, gi ven the fi tted parameters beta, resi dual s r, and the Jacobi an
matri x J. inputs i s a matri x of val ues of the i ndependent vari abl es i n the
nonl i near functi on.
[ypred,delta] = nlpredci(FUN,inputs,beta,r,J) al so returns the
hal f-wi dth, delta, of confi dence i nterval s for the nonl i near l east squares
predi cti ons. The confi dence i nterval cal cul ati on i s val i d for systems where the
l ength of r exceeds the l ength of beta and J i s of ful l col umn rank. The i nterval
[ypred-delta,ypred+delta] i s a 95% non-si mul taneous confi dence i nterval
for the true val ue of the functi on at the speci fi ed i nput val ues.
ypred = nlpredci(FUN,inputs,beta,r,J,alpha,'simopt','predopt')
control s the type of confi dence i nterval s. The confi dence l evel i s
100(1-alpha)%. 'simopt' can be 'on' for si mul taneous i nterval s or 'off' (the
defaul t) for non-si mul taneous i nterval s. 'predopt' can be 'curve' (the
defaul t) for confi dence i nterval s for the functi on val ue at the i nputs, or
'observation' for confi dence i nterval s for a new response val ue.
nlpredci uses the outputs of nlinfit for i ts i nputs.
Example Conti nui ng the exampl e from nlinfit, we can determi ne the predi cted
functi on val ue at [100 300 80] and the hal f-wi dth of a confi dence i nterval for
i t.
load reaction
[beta,resids,J] = nlinfit(reactants,rate,@hougen,beta);
[ypred,delta] = nlpredci(@hougen,[100 300 80],beta,resids,J)
ypred =
13
delta =
1.4277
nlpredci
2-241
See Also nlinfit, nlintool, nlparci
normcdf
2-242
2normcdf
Purpose Normal cumul ati ve di stri buti on functi on (cdf).
Syntax P = normcdf(X,MU,SIGMA)
Description normcdf(X,MU,SIGMA) computes the normal cdf at each of the val ues i n X usi ng
the correspondi ng parameters i n MU and SIGMA. Vector or matri x i nputs for X,
MU, and SIGMA must al l have the same si ze. A scal ar i nput i s expanded to a
constant matri x wi th the same di mensi ons as the other i nputs. The parameters
i n SIGMA must be posi ti ve.
The normal cdf i s
The resul t, p, i s the probabi l i ty that a si ngl e observati on from a normal
di stri buti on wi th parameters and wi l l fal l i n the i nterval (- x].
The standard normal di stri buti on has = 0 and = 1.
Examples What i s the probabi l i ty that an observati on from a standard normal
di stri buti on wi l l fal l on the i nterval [-1 1]?
p = normcdf([-1 1]);
p(2) - p(1)
ans =
0.6827
More general l y, about 68% of the observati ons from a normal di stri buti on fal l
wi thi n one standard devi ati on, , of the mean, .
See Also cdf, normfit, norminv, normpdf, normplot, normrnd, normspec, normstat
p F x , ( )
1
2
---------------
e
t ( )
2
2
2
---------------------
t d

x
= =
normfit
2-243
2normfi t
Purpose Parameter esti mates and confi dence i nterval s for normal data.
Syntax [muhat,sigmahat,muci,sigmaci] = normfit(X)
[muhat,sigmahat,muci,sigmaci] = normfit(X,alpha)
Description [muhat,sigmahat,muci,sigmaci] = normfit(X) returns esti mates muhat
and sigmahat of the normal di stri buti on parameters and , gi ven the matri x
of data X. muci and sigmaci are 95% confi dence i nterval s and have two rows
and as many col umns as matri x X. The top row i s the l ower bound of the
confi dence i nterval and the bottom row i s the upper bound.
[muhat,sigmahat,muci,sigmaci] = normfit(X,alpha) gi ves esti mates and
100(1-alpha)% confi dence i nterval s. For exampl e, alpha = 0.01 gi ves 99%
Example I n thi s exampl e the data i s a two-col umn random normal matri x. Both col umns
have =10 and =2. Note that the confi dence i nterval s bel ow contai n the
true val ues.
r = normrnd(10,2,100,2);
[mu,sigma,muci,sigmaci] = normfit(r)
mu =
10.1455 10.0527
sigma =
1.9072 2.1256
muci =
9.7652 9.6288
10.5258 10.4766
sigmaci =
1.6745 1.8663
2.2155 2.4693
See Also normcdf, norminv, normpdf, normplot, normrnd, normspec, normstat, betafit,
binofit, expfit, gamfit, poissfit, unifit, weibfit
norminv
2-244
2normi nv
Purpose I nverse of the normal cumul ati ve di stri buti on functi on (cdf).
Syntax X = norminv(P,MU,SIGMA)
Description X = norminv(P,MU,SIGMA) computes the i nverse of the normal cdf wi th
parameters MU and SIGMA at the correspondi ng probabi l i ti es i n P. Vector or
matri x i nputs for P, MU, and SIGMA must al l have the same si ze. A scal ar i nput
i s expanded to a constant matri x wi th the same di mensi ons as the other i nputs.
The parameters i n SIGMA must be posi ti ve, and the val ues i n P must l i e on the
i nterval [0 1].
We defi ne the normal i nverse functi on i n terms of the normal cdf as
where
The resul t, x, i s the sol uti on of the i ntegral equati on above where you suppl y
the desi red probabi l i ty, p.
Examples Fi nd an i nterval that contai ns 95% of the val ues from a standard normal
di stri buti on.
x = norminv([0.025 0.975],0,1)
x =
-1.9600 1.9600
Note the i nterval x i s not the onl y such i nterval , but i t i s the shortest.
xl = norminv([0.01 0.96],0,1)
xl =
-2.3263 1.7507
The i nterval xl al so contai ns 95% of the probabi l i ty, but i t i s l onger than x.
See Also icdf, normfit, normfit, normpdf, normplot, normrnd, normspec, normstat
x F
1
p , ( ) x:F x , ( ) p = { } = =
p F x , ( )
1
2
---------------
e
t ( )
2
2
2
---------------------
t d

x
= =
normpdf
2-245
2normpdf
Purpose Normal probabi l i ty densi ty functi on (pdf).
Syntax Y = normpdf(X,MU,SIGMA)
Description normpdf(X,MU,SIGMA) computes the normal pdf at each of the val ues i n X usi ng
the correspondi ng parameters i n MU and SIGMA. Vector or matri x i nputs for X,
MU, and SIGMA must al l have the same si ze. A scal ar i nput i s expanded to a
constant matri x wi th the same di mensi ons as the other i nputs. The parameters
i n SIGMA must be posi ti ve.
The normal pdf i s
The likelihood function i s the pdf vi ewed as a functi on of the parameters.
Maxi mum l i kel i hood esti mators (MLEs) are the val ues of the parameters that
maxi mi ze the l i kel i hood functi on for a fi xed val ue of x.
The standard normal di stri buti on has = 0 and = 1.
I f x i s standard normal , then x + i s al so normal wi th mean and standard
devi ati on . Conversel y, i f y i s normal wi th mean and standard devi ati on ,
then x = (y-) / i s standard normal .
Examples mu = [0:0.1:2];
[y i] = max(normpdf(1.5,mu,1));
MLE = mu(i)
MLE =
1.5000
See Also normfit, normfit, norminv, normplot, normrnd, normspec, normstat, pdf
y f x , ( )
1
2
---------------
e
x ( )
2
2
2
----------------------
= =
normplot
2-246
2normpl ot
Purpose Normal probabi l i ty pl ot for graphi cal normal i ty testi ng.
Syntax normplot(X)
h = normplot(X)
Description normplot(X) di spl ays a normal probabi l i ty pl ot of the data i n X. For matri x X,
normplot di spl ays a l i ne for each col umn of X.
The pl ot has the sampl e data di spl ayed wi th the pl ot symbol '+'.
Superi mposed on the pl ot i s a l i ne joi ni ng the fi rst and thi rd quarti l es of each
col umn of X (a robust l i near fi t of the sampl e order stati sti cs.) Thi s l i ne i s
extrapol ated out to the ends of the sampl e to hel p eval uate the l i neari ty of the
data.
I f the data does come from a normal di stri buti on, the pl ot wi l l appear l i near.
Other probabi l i ty densi ty functi ons wi l l i ntroduce curvature i n the pl ot.
h = normplot(X) returns a handl e to the pl otted l i nes.
Examples Generate a normal sampl e and a normal probabi l i ty pl ot of the data.
x = normrnd(0,1,50,1);
h = normplot(x);
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5
0.01
0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.95
0.98
0.99
Data
P
r
o
b
a
b
i
l
i
t
y
normplot
2-247
The pl ot i s l i near, i ndi cati ng that you can model the sampl e by a normal
di stri buti on.
See Also cdfplot, hist, normfit, normfit, norminv, normpdf, normrnd, normspec,
normstat
normrnd
2-248
2normrnd
Purpose Random numbers from the normal di stri buti on.
Syntax R = normrnd(MU,SIGMA)
R = normrnd(MU,SIGMA,m)
R = normrnd(MU,SIGMA,m,n)
Description R = normrnd(MU,SIGMA) generates normal random numbers wi th mean MU
and standard devi ati on SIGMA. Vector or matri x i nputs for MU and SIGMA must
have the same si ze, whi ch i s al so the si ze of R. A scal ar i nput for MU or SIGMA i s
R = normrnd(MU,SIGMA,m) generates normal random numbers wi th
parameters MU and SIGMA, where m i s a 1-by-2 vector that contai ns the row and
R = normrnd(MU,SIGMA,m,n) generates normal random numbers wi th
parameters MU and SIGMA, where scal ars m and n are the row and col umn
di mensi ons of R.
Examples n1 = normrnd(1:6,1./(1:6))
n1 =
2.1650 2.3134 3.0250 4.0879 4.8607 6.2827
n2 = normrnd(0,1,[1 5])
n2 =
0.0591 1.7971 0.2641 0.8717 -1.4462
n3 = normrnd([1 2 3;4 5 6],0.1,2,3)
n3 =
0.9299 1.9361 2.9640
4.1246 5.0577 5.9864
See Also normfit, normfit, norminv, normpdf, normplot, normspec, normstat
normspec
2-249
2normspec
Purpose Pl ot normal densi ty between speci fi cati on l i mi ts.
Syntax p = normspec(specs,mu,sigma)
[p,h] = normspec(specs,mu,sigma)
Description p = normspec(specs,mu,sigma) pl ots the normal densi ty between a l ower
and upper l i mi t defi ned by the two el ements of the vector specs, where mu and
sigma are the parameters of the pl otted normal di stri buti on.
[p,h] = normspec(specs,mu,sigma) returns the probabi l i ty p of a sampl e
fal l i ng between the l ower and upper l i mi ts. h i s a handl e to the l i ne objects.
I f specs(1) i s -Inf, there i s no l ower l i mi t, and si mi l arl y i f specs(2) = Inf,
there i s no upper l i mi t.
Example Suppose a cereal manufacturer produces 10 ounce boxes of corn fl akes.
Vari abi l i ty i n the process of fi l l i ng each box wi th fl akes causes a 1.25 ounce
standard devi ati on i n the true wei ght of the cereal i n each box. The average box
of cereal has 11.5 ounces of fl akes. What percentage of boxes wi l l have l ess than
10 ounces?
normspec([10 Inf],11.5,1.25)
See Also capaplot, disttool, histfit, normfit, normfit, norminv, normpdf, normplot,
normrnd, normstat
6 8 10 12 14 16
0
0.1
0.2
0.3
0.4
Critical Value
D
e
n
s
i
t
y
Probability Between Limits is 0.8849
normstat
2-250
2normstat
Purpose Mean and vari ance for the normal di stri buti on.
Syntax [M,V] = normstat(MU,SIGMA)
Description [M,V] = normstat(MU,SIGMA) returns the mean and vari ance for the normal
di stri buti on wi th parameters MU and SIGMA. Vector or matri x i nputs for MU and
SIGMA must have the same si ze, whi ch i s al so the si ze of M and V. A scal ar i nput
for MU or SIGMA i s expanded to a constant matri x wi th the same di mensi ons as
the other i nput.
The mean of the normal di stri buti on wi th parameters and i s , and the
vari ance i s
2
.
Examples n = 1:5;
[m,v] = normstat(n'n,n'*n)
[m,v] = normstat(n'*n,n'*n)
m =
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
v =
1 4 9 16 25
4 16 36 64 100
9 36 81 144 225
16 64 144 256 400
25 100 225 400 625
See Also normfit, normfit, norminv, normpdf, normplot, normrnd, normspec
pareto
2-251
2pareto
Purpose Pareto charts for Stati sti cal Process Control .
Syntax pareto(y)
pareto(y,names)
h = pareto(...)
Description pareto(y,names) di spl ays a Pareto chart where the val ues i n the vector y are
drawn as bars i n descendi ng order. Each bar i s l abel ed wi th the associ ated
val ue i n the stri ng matri x names. pareto(y) l abel s each bar wi th the i ndex of
the correspondi ng el ement i n y.
The l i ne above the bars shows the cumul ati ve percentage.
pareto(y,names) l abel s each bar wi th the row of the stri ng matri x names that
corresponds to the pl otted el ement of y.
h = pareto(...) returns a combi nati on of patch and l i ne handl es.
Example Create a Pareto chart from data measuri ng the number of manufactured parts
rejected for vari ous types of defects.
defects = ['pits';'cracks';'holes';'dents'];
quantity = [5 3 19 25];
pareto(quantity,defects)
See Also bar, capaplot, ewmaplot, hist, histfit, schart, xbarplot
dents holes pits cracks
0
20
40
60
pcacov
2-252
2pcacov
Purpose Pri nci pal Components Anal ysi s (PCA) usi ng the covari ance matri x.
Syntax pc = pcacov(X)
[pc,latent,explained] = pcacov(X)
Description [pc,latent,explained] = pcacov(X) takes the covari ance matri x X and
returns the pri nci pal components i n pc, the ei genval ues of the covari ance
matri x of X i n latent, and the percentage of the total vari ance i n the
observati ons expl ai ned by each ei genvector i n explained.
Example load hald
covx = cov(ingredients);
[pc,variances,explained] = pcacov(covx)
pc =
0.0678 -0.6460 0.5673 -0.5062
0.6785 -0.0200 -0.5440 -0.4933
-0.0290 0.7553 0.4036 -0.5156
-0.7309 -0.1085 -0.4684 -0.4844
variances =
517.7969
67.4964
12.4054
0.2372
explained =
86.5974
11.2882
2.0747
0.0397
References Jackson, J. E., A Users Guide to Principal Components, John Wi l ey and Sons,
I nc. 1991. pp. 125.
See Also barttest, pcares, princomp
pcares
2-253
2pcares
Purpose Resi dual s from a Pri nci pal Components Anal ysi s.
Syntax residuals = pcares(X,ndim)
Description pcares(X,ndim) returns the residuals obtai ned by retai ni ng ndim pri nci pal
components of X. Note that ndim i s a scal ar and must be l ess than the number
of col umns i n X. Use the data matri x, not the covari ance matri x, wi th thi s
functi on.
Example Thi s exampl e shows the drop i n the resi dual s from the fi rst row of the Hal d
data as the number of component di mensi ons i ncrease from one to three.
load hald
r1 = pcares(ingredients,1);
r11 = r1(1,:)
r11 =
2.0350 2.8304 -6.8378 3.0879
r21 = r2(1,:)
r21 =
-2.4037 2.6930 -1.6482 2.3425
r31 = r3(1,:)
r31 =
0.2008 0.1957 0.2045 0.1921
Reference Jackson, J. E., A Users Guide to Principal Components, John Wi l ey and Sons,
I nc. 1991. pp. 125.
See Also barttest, pcacov, princomp
pdf
2-254
2pdf
Purpose Probabi l i ty densi ty functi on (pdf) for a speci fi ed di stri buti on.
Syntax Y = pdf('name',X,A1,A2,A3)
Description pdf('name',X,A1,A2,A3) returns a matri x of densi ti es, where name' i s a
stri ng contai ni ng the name of the di stri buti on. X i s a matri x of val ues, and A1,
A2, and A3 are matri ces of di stri buti on parameters. Dependi ng on the
di stri buti on, some of the parameters may not be necessary.
Vector or matri x i nputs for X, A1, A2, and A3 must al l have the same si ze. A
scal ar i nput i s expanded to a constant matri x wi th the same di mensi ons as the
other i nputs.
pdf i s a uti l i ty routi ne al l owi ng access to al l the pdfs i n the Stati sti cs Tool box
usi ng the name of the di stri buti on as a parameter. See Overvi ew of the
Di stri buti ons on page 1-12 for the l i st of avai l abl e di stri buti ons.
Examples p = pdf('Normal',-2:2,0,1)
p =
0.0540 0.2420 0.3989 0.2420 0.0540
p = pdf('Poisson',0:4,1:5)
p =
0.3679 0.2707 0.2240 0.1954 0.1755
See Also betapdf, binopdf, cdf, chi2pdf, exppdf, fpdf, gampdf, geopdf, hygepdf,
lognpdf, nbinpdf, ncfpdf, nctpdf, ncx2pdf, normpdf, poisspdf, raylpdf,
tpdf, unidpdf, unifpdf, weibpdf
pdist
2-255
2pdi st
Purpose Pai rwi se di stance between observati ons.
Syntax Y = pdist(X)
Y = pdist(X,'metric')
Y = pdist(X,'minkowski',p)
Description Y = pdist(X) computes the Eucl i dean di stance between pai rs of objects i n
m-by-n matri x X, whi ch i s treated as m vectors of si ze n. For a dataset made up
of m objects, there are pai rs.
The output, Y, i s a vector of l ength , contai ni ng the di stance
i nformati on. The di stances are arranged i n the order (1,2), (1,3), ..., (1,m),
(2,3), ..., (2,m), ..., ..., (m-1,m). Y i s al so commonl y known as a si mi l ari ty matri x
or di ssi mi l ari ty matri x.
To save space and computati on ti me, Y i s formatted as a vector. However, you
can convert thi s vector i nto a square matri x usi ng the squareform functi on so
that el ement i,j i n the matri x corresponds to the di stance between objects i and
j i n the ori gi nal dataset.
Y = pdist(X,'metric') computes the di stance between objects i n the data
matri x, X, usi ng the method speci fi ed by 'metric', where 'metric' can be any
of the fol l owi ng character stri ngs that i denti fy ways to compute the di stance.
Y = pdist(X,'minkowski',p) computes the di stance between objects i n the
data matri x, X, usi ng the Mi nkowski metri c. p i s the exponent used i n the
Mi nkowski computati on whi ch, by defaul t, i s 2.
String Meaning
'Euclid' Eucl i dean di stance (defaul t)
'SEuclid' Standardi zed Eucl i dean di stance
'Mahal' Mahal anobi s di stance
'CityBlock' Ci ty Bl ock metri c
'Minkowski' Mi nkowski metri c
m 1 ( ) m 2
m 1 ( ) m 2
pdist
2-256
M a thema tica l Definitions of M ethods
Gi ven an m-by-n data matri x X, whi ch i s treated as m (1-by-n) row vectors x
1
,
x
2
, ..., x
m
, the vari ous di stances between the vector x
r
and x
s
are defi ned as
fol l ows:
Eucl i dean di stance
Standardi zed Eucl i dean di stance
where D i s the di agonal matri x wi th di agonal el ements gi ven by , whi ch
denotes the vari ance of the vari abl e X
j
over the m objects.
Mahal anobi s di stance
where V i s the sampl e covari ance matri x.
Ci ty Bl ock metri c
Mi nkowski metri c
Noti ce that for the speci al case of p = 1, the Mi nkowski metri c gi ves the Ci ty
Bl ock metri c, and for the speci al case of p = 2, the Mi nkowski metri c gi ves
the Eucl i dean di stance.
d
rs
2
x
r
x
s
( ) x
r
x
s
( )' =
d
rs
2
x
r
x
s
( )D
1
x
r
x
s
( )' =
v
j
2
d
rs
2
x
r
x
s
( )' V
1
x
r
x
s
( ) =
d
rs
x
rj
x
sj
j 1 =
n
=
d
rs
x
rj
x
sj
p
j 1 =
n

' ;

1 p
=
pdist
2-257
Examples X = [1 2; 1 3; 2 2; 3 1]
X =
1 2
1 3
2 2
3 1
Y = pdist(X,'mahal')
Y =
2.3452 2.0000 2.3452 1.2247 2.4495 1.2247
Y = pdist(X)
Y =
1.0000 1.0000 2.2361 1.4142 2.8284 1.4142
squareform(Y)
ans =
0 1.0000 1.0000 2.2361
1.0000 0 1.4142 2.8284
1.0000 1.4142 0 1.4142
2.2361 2.8284 1.4142 0
See Also cluster, clusterdata, cophenet, dendrogram, inconsistent, linkage,
squareform
perms
2-258
2perms
Purpose Al l permutati ons.
Syntax P = perms(v)
Description P = perms(v) where v i s a row vector of l ength n, creates a matri x whose rows
consi st of al l possi bl e permutati ons of the n el ements of v. The matri x P
contai ns n! rows and n col umns.
perms i s onl y practi cal when n i s l ess than 8 or 9.
Example perms([2 4 6])
ans =
6 4 2
4 6 2
6 2 4
2 6 4
4 2 6
2 4 6
poisscdf
2-259
2poi sscdf
Purpose Poi sson cumul ati ve di stri buti on functi on (cdf).
Syntax P = poisscdf(X,LAMBDA)
Description poisscdf(X,LAMBDA) computes the Poi sson cdf at each of the val ues i n X usi ng
the correspondi ng parameters i n LAMBDA. Vector or matri x i nputs for X and
LAMBDA must be the same si ze. A scal ar i nput i s expanded to a constant matri x
wi th the same di mensi ons as the other i nput. The parameters i n LAMBDA must
be posi ti ve.
The Poi sson cdf i s
Examples For exampl e, consi der a Qual i ty Assurance department that performs random
tests of i ndi vi dual hard di sks. Thei r pol i cy i s to shut down the manufacturi ng
process i f an i nspector fi nds more than four bad sectors on a di sk. What i s the
probabi l i ty of shutti ng down the process i f the mean number of bad sectors ()
i s two?
probability = 1 - poisscdf(4,2)
probability =
0.0527
About 5% of the ti me, a normal l y functi oni ng manufacturi ng process wi l l
produce more than four fl aws on a hard di sk.
Suppose the average number of fl aws () i ncreases to four. What i s the
probabi l i ty of fi ndi ng fewer than fi ve fl aws on a hard dri ve?
probability = poisscdf(4,4)
probability =
0.6288
Thi s means that thi s faul ty manufacturi ng process conti nues to operate after
thi s fi rst i nspecti on al most 63% of the ti me.
p F x ( ) e

i
i!
-----
i 0 =
fl oor x ( )
= =
poisscdf
2-260
See Also cdf, poissfit, poissinv, poisspdf, poissrnd, poisstat
poissfit
2-261
2poi ssfi t
Purpose Parameter esti mates and confi dence i nterval s for Poi sson data.
Syntax lambdahat = poissfit(X)
[lambdahat,lambdaci] = poissfit(X)
[lambdahat,lambdaci] = poissfit(X,alpha)
Description poissfit(X) returns the maxi mum l i kel i hood esti mate (MLE) of the
parameter of the Poi sson di stri buti on, , gi ven the data X.
[lambdahat,lambdaci] = poissfit(X) al so gi ves 95% confi dence i nterval s i n
lamdaci.
[lambdahat,lambdaci] = poissfit(X,alpha) gi ves 100(1-alpha)%
confi dence i nterval s. For exampl e alpha = 0.001 yi el ds 99.9% confi dence
i nterval s.
The sampl e average i s the MLE of .
Example r = poissrnd(5,10,2);
[l,lci] = poissfit(r)
l =
7.4000 6.3000
lci =
5.8000 4.8000
9.1000 7.9000
See Also betafit, binofit, expfit, gamfit, poisscdf, poissfit, poissinv, poisspdf,
poissrnd, poisstat, unifit, weibfit
1
n
--- x
i
i 1 =
n
=
poissinv
2-262
2poi ssi nv
Purpose I nverse of the Poi sson cumul ati ve di stri buti on functi on (cdf).
Syntax X = poissinv(P,LAMBDA)
Description poissinv(P,LAMBDA) returns the smal l est val ue X such that the Poi sson cdf
eval uated at X equal s or exceeds P.
Examples I f the average number of defects () i s two, what i s the 95th percenti l e of the
number of defects?
poissinv(0.95,2)
ans =
5
What i s the medi an number of defects?
median_defects = poissinv(0.50,2)
median_defects =
2
See Also icdf, poisscdf, poissfit, poisspdf, poissrnd, poisstat
poisspdf
2-263
2poi sspdf
Purpose Poi sson probabi l i ty densi ty functi on (pdf).
Syntax Y = poisspdf(X,LAMBDA)
Description poisspdf(X,LAMBDA) computes the Poi sson pdf at each of the val ues i n X usi ng
the correspondi ng parameters i n LAMBDA. Vector or matri x i nputs for X and
LAMBDA must be the same si ze. A scal ar i nput i s expanded to a constant matri x
wi th the same di mensi ons as the other i nput. The parameters i n LAMBDA must
al l be posi ti ve.
The Poi sson pdf i s
where x can be any nonnegati ve i nteger. The densi ty functi on i s zero unl ess x
i s an i nteger.
Examples A computer hard di sk manufacturer has observed that fl aws occur randoml y i n
the manufacturi ng process at the average rate of two fl aws i n a 4 Gb hard di sk
and has found thi s rate to be acceptabl e. What i s the probabi l i ty that a di sk wi l l
be manufactured wi th no defects?
I n thi s probl em, = 2 and x = 0.
p = poisspdf(0,2)
p =
0.1353
See Also pdf, poisscdf, poissfit, poissinv, poissrnd, poisstat
y f x ( )

x
x!
-----e

I
0 1 , , ( )
x ( ) = =
poissrnd
2-264
2poi ssrnd
Purpose Random numbers from the Poi sson di stri buti on.
Syntax R = poissrnd(LAMBDA)
R = poissrnd(LAMBDA,m)
R = poissrnd(LAMBDA,m,n)
Description R = poissrnd(LAMBDA) generates Poi sson random numbers wi th mean
LAMBDA. The si ze of R i s the si ze of LAMBDA.
R = poissrnd(LAMBDA,m) generates Poi sson random numbers wi th mean
LAMBDA, where m i s a 1-by-2 vector that contai ns the row and col umn
di mensi ons of R.
R = poissrnd(LAMBDA,m,n) generates Poi sson random numbers wi th mean
LAMBDA, where scal ars m and n are the row and col umn di mensi ons of R.
Examples Generate a random sampl e of 10 pseudo-observati ons from a Poi sson
di stri buti on wi th = 2.
lambda = 2;
random_sample1 = poissrnd(lambda,1,10)
random_sample1 =
1 0 1 2 1 3 4 2 0 0
random_sample2 = poissrnd(lambda,[1 10])
random_sample2 =
1 1 1 5 0 3 2 2 3 4
random_sample3 = poissrnd(lambda(ones(1,10)))
random_sample3 =
3 2 1 1 0 0 4 0 2 0
See Also poisscdf, poissfit, poissinv, poisspdf, poisstat
poisstat
2-265
2poi sstat
Purpose Mean and vari ance for the Poi sson di stri buti on.
Syntax M = poisstat(LAMBDA)
[M,V] = poisstat(LAMBDA)
Description M = poisstat(LAMBDA) returns the mean of the Poi sson di stri buti on wi th
parameter LAMBDA. The si ze of M i s the si ze of LAMBDA.
[M,V] = poisstat(LAMBDA) al so returns the vari ance V of the Poi sson
di stri buti on.
For the Poi sson di stri buti on wi th parameter , both the mean and vari ance are
equal to .
Examples Fi nd the mean and vari ance for the Poi sson di stri buti on wi th = 2.
[m,v] = poisstat([1 2; 3 4])
m =
1 2
3 4
v =
1 2
3 4
See Also poisscdf, poissfit, poissinv, poisspdf, poissrnd
polyconf
2-266
2pol yconf
Purpose Pol ynomi al eval uati on and confi dence i nterval esti mati on.
Syntax [Y,DELTA] = polyconf(p,X,S)
[Y,DELTA] = polyconf(p,X,S,alpha)
Description [Y,DELTA] = polyconf(p,X,S) uses the opti onal output S generated by
polyfit to gi ve 95% confi dence i nterval s Y DELTA. Thi s assumes the errors i n
the data i nput to polyfit are i ndependent normal wi th constant vari ance.
[Y,DELTA] = polyconf(p,X,S,alpha) gi ves 100(1-alpha)% confi dence
i nterval s. For exampl e, alpha = 0.1 yi el ds 90% i nterval s.
I f p i s a vector whose el ements are the coeffi ci ents of a pol ynomi al i n
descendi ng powers, such as those output from polyfit, then polyconf(p,X) i s
the val ue of the pol ynomi al eval uated at X. I f X i s a matri x or vector, the
pol ynomi al i s eval uated at each of the el ements.
Examples Thi s exampl e gi ves predi cti ons and 90% confi dence i nterval s for computi ng
ti me for LU factori zati ons of square matri ces wi th 100 to 200 col umns.
n = [100 100:20:200];
for i = n
A = rand(i,i);
tic
B = lu(A);
t(ceil((i-80)/20)) = toc;
end
[p,S] = polyfit(n(2:7),t,3);
[time,delta_t] = polyconf(p,n(2:7),S,0.1)
time =
0.0829 0.1476 0.2277 0.3375 0.4912 0.7032
delta_t =
0.0064 0.0057 0.0055 0.0055 0.0057 0.0064
polyfit
2-267
2pol yfi t
Purpose Pol ynomi al curve fi tti ng.
Syntax [p,S] = polyfit(x,y,n)
Description p = polyfit(x,y,n) fi nds the coeffi ci ents of a pol ynomi al p(x) of degree n
that fi ts the data, p(x(i)) to y(i), i n a l east-squares sense. The resul t p i s a
row vector of l ength n+1 contai ni ng the pol ynomi al coeffi ci ents i n descendi ng
powers.
[p,S] = polyfit(x,y,n) returns pol ynomi al coeffi ci ents p and matri x S for
use wi th polyval to produce error esti mates on predi cti ons. I f the errors i n the
data, y, are i ndependent normal wi th constant vari ance, polyval wi l l produce
error bounds whi ch contai n at l east 50% of the predi cti ons.
You may omi t S i f you are not goi ng to pass i t to polyval or polyconf for
cal cul ati ng error esti mates.
The polyfit functi on i s part of the standard MATLAB l anguage.
Example [p,S] = polyfit(1:10,[1:10] + normrnd(0,1,1,10),1)
p =
1.0300 0.4561
S =
-19.6214 -2.8031
0 -1.4639
8.0000 0
2.3180 0
See Also polyval, polytool, polyconf
p x ( ) p
1
x
n
p
2
x
n 1
p
n
x p
n 1 +
+ + + + =
polytool
2-268
2pol ytool
Purpose I nteracti ve pl ot for predi cti on of fi tted pol ynomi al s.
Syntax polytool(x,y)
polytool(x,y,n)
polytool(x,y,n,alpha)
Description polytool(x,y) fi ts a l i ne to the col umn vectors x and y and di spl ays an
i nteracti ve pl ot of the resul t. Thi s pl ot i s graphi c user i nterface for expl ori ng
the effects of changi ng the pol ynomi al degree of the fi t. The pl ot shows the
fi tted curve and 95% gl obal confi dence i nterval s on a new predi cted val ue for
the curve. Text wi th current predi cted val ue of y and i ts uncertai nty appears to
the l eft of the y-axi s.
polytool(x,y,n) i ni ti al l y fi ts a pol ynomi al of order n.
polytool(x,y,n,alpha) pl ots 100(1-alpha)% confi dence i nterval s on the
predi cted val ues.
polytool fi ts by l east-squares usi ng the regressi on model
Eval uate the functi on by typi ng a val ue i n the x-axi s edi t box or by draggi ng
the verti cal reference l i ne on the pl ot. The shape of the poi nter changes from
an arrow to a cross hai r when you are over the verti cal l i ne to i ndi cate that the
l i ne can be dragged. The predi cted val ue of y wi l l update as you drag the
reference l i ne.
The argument n control s the degree of the pol ynomi al fi t. To change the degree
of the pol ynomi al , choose from the pop-up menu at the top of the fi gure. To
change the type of confi dence i nterval s, use the Bounds menu. To change from
l east squares to a robust fi tti ng method, use the Method menu.
When you are done, press the Close button.
y
i

0

1
x
i

2
x
i
2

n
x
i
n

i
+ + + + + =
i
N 0
2
, ( ) i
Cov
i

j
, ( ) 0 = i j ,
polyval
2-269
2pol yval
Purpose Pol ynomi al eval uati on.
Syntax Y = polyval(p,X)
[Y,DELTA] = polyval(p,X,S)
Description Y = polyval(p,X) returns the predi cted val ue of a pol ynomi al gi ven i ts
coeffi ci ents, p, at the val ues i n X.
[Y,DELTA] = polyval(p,X,S) uses the opti onal output S generated by
polyfit to generate error esti mates, Y DELTA. I f the errors i n the data i nput
to polyfit are i ndependent normal wi th constant vari ance, Y DELTA contai ns
at l east 50% of the predi cti ons.
I f p i s a vector whose el ements are the coeffi ci ents of a pol ynomi al i n
descendi ng powers, then polyval(p,X) i s the val ue of the pol ynomi al
eval uated at X. I f X i s a matri x or vector, the pol ynomi al i s eval uated at each of
the el ements.
The polyval functi on i s part of the standard MATLAB language.
Examples Si mul ate the functi on y =x, addi ng normal random errors wi th a standard
devi ati on of 0.1. Then use polyfit to esti mate the pol ynomi al coeffi ci ents. Note
that predi cted Y val ues are wi thi n DELTA of the i nteger X i n every case.
[p,S] = polyfit(1:10,(1:10) + normrnd(0,0.1,1,10),1);
X = magic(3);
[Y,D] = polyval(p,X,S)
Y =
8.0696 1.0486 6.0636
3.0546 5.0606 7.0666
4.0576 9.0726 2.0516
D =
0.0889 0.0951 0.0861
0.0889 0.0861 0.0870
0.0870 0.0916 0.0916
See Also polyfit, polytool, polyconf
prctile
2-270
2prcti l e
Purpose Percenti l es of a sampl e.
Syntax Y = prctile(X,p)
Description Y = prctile(X,p) cal cul ates a val ue that i s greater than p percent of the
val ues i n X. The val ues of p must l i e i n the i nterval [0 100].
For vectors, prctile(X,p) i s the pth percenti l e of the el ements i n X. For
i nstance, i f p = 50 then Y i s the medi an of X.
For matri x X and scal ar p, prctile(X,p) i s a row vector contai ni ng the pth
percenti l e of each col umn. I f p i s a vector, the ith row of Y i s p(i) of X.
Examples x = (1:5)'*(1:5)
x =
1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25
y = prctile(x,[25 50 75])
y =
1.7500 3.5000 5.2500 7.0000 8.7500
3.0000 6.0000 9.0000 12.0000 15.0000
4.2500 8.5000 12.7500 17.0000 21.2500
princomp
2-271
2pri ncomp
Purpose Pri nci pal Components Anal ysi s (PCA).
Syntax PC = princomp(X)
[PC,SCORE,latent,tsquare] = princomp(X)
Description [PC,SCORE,latent,tsquare] = princomp(X) takes a data matri x X and
returns the pri nci pal components i n PC, the so-cal l ed Z-scores i n SCORE, the
ei genval ues of the covari ance matri x of X i n latent, and Hotel l i ngs T
2
stati sti c
for each data poi nt i n tsquare.
The Z-scores are the data formed by transformi ng the ori gi nal data i nto the
space of the pri nci pal components. The val ues of the vector, latent, are the
vari ance of the col umns of SCORE. Hotel l i ngs T
2
i s a measure of the
mul ti vari ate di stance of each observati on from the center of the data set.
Example Compute pri nci pal components for the ingredients data i n the Hal d dataset,
and the vari ance accounted for by each component.
load hald;
[pc,score,latent,tsquare] = princomp(ingredients);
pc,latent
pc =
0.0678 -0.6460 0.5673 -0.5062
0.6785 -0.0200 -0.5440 -0.4933
-0.0290 0.7553 0.4036 -0.5156
-0.7309 -0.1085 -0.4684 -0.4844
latent =
517.7969
67.4964
12.4054
0.2372
Reference Jackson, J. E., A Users Guide to Principal Components, John Wi l ey and Sons,
I nc. 1991. pp. 125.
See Also barttest, pcacov, pcares
qqplot
2-272
2qqpl ot
Purpose Quanti l e-quanti l e pl ot of two sampl es.
Syntax qqplot(X)
qqplot(X,Y)
qqplot(X,Y,pvec)
h = qqplot(...)
Description qqplot(X) di spl ays a quanti l e-quanti l e pl ot of the sampl e quanti l es of X versus
theoreti cal quanti l es from a normal di stri buti on. I f the di stri buti on of X i s
normal , the pl ot wi l l be cl ose to l i near.
qqplot(X,Y) di spl ays a quanti l e-quanti l e pl ot of two sampl es. I f the sampl es
do come from the same di stri buti on, the pl ot wi l l be l i near.
For matri x X and Y, qqplot di spl ays a separate l i ne for each pai r of col umns.
The pl otted quanti l es are the quanti l es of the smal l er dataset.
The pl ot has the sampl e data di spl ayed wi th the pl ot symbol '+'.
Superi mposed on the pl ot i s a l i ne joi ni ng the fi rst and thi rd quarti l es of each
di stri buti on (thi s i s a robust l i near fi t of the order stati sti cs of the two sampl es).
Thi s l i ne i s extrapol ated out to the ends of the sampl e to hel p eval uate the
l i neari ty of the data.
Use qqplot(X,Y,pvec) to speci fy the quanti l es i n the vector pvec.
h = qqplot(X,Y,pvec) returns handl es to the l i nes i n h.
Examples Generate two normal sampl es wi th di fferent means and standard devi ati ons.
Then make a quanti l e-quanti l e pl ot of the two sampl es.
x = normrnd(0,1,100,1);
y = normrnd(0.5,2,50,1);
qqplot(x,y);
qqplot
2-273
See Also normplot
-3 -2 -1 0 1 2 3
-10
-5
0
5
10
X Quantiles
Y

Q
u
a
n
t
i
l
e
s
random
2-274
2random
Purpose Random numbers from a speci fi ed di stri buti on.
Syntax y = random('name',A1,A2,A3,m,n)
Description y = random('name',A1,A2,A3,m,n) returns a matri x of random numbers,
where 'name' i s a stri ng contai ni ng the name of the di stri buti on, and A1, A2,
and A3 are matri ces of di stri buti on parameters. Dependi ng on the di stri buti on
some of the parameters may not be necessary.
Vector or matri x i nputs must al l have the same si ze. A scal ar i nput i s expanded
to a constant matri x wi th the same di mensi ons as the other i nputs.
The l ast two parameters, d and e, are the si ze of the matri x y. I f the
di stri buti on parameters are matri ces, then these parameters are opti onal , but
they must match the si ze of the other matri x arguments (see second exampl e).
random i s a uti l i ty routi ne al l owi ng you to access al l the random number
generators i n the Stati sti cs Tool box usi ng the name of the di stri buti on as a
parameter. See Overvi ew of the Di stri buti ons on page 1-12 for the l i st of
avai l abl e di stri buti ons.
Examples rn = random('Normal',0,1,2,4)
rn =
1.1650 0.0751 -0.6965 0.0591
0.6268 0.3516 1.6961 1.7971
rp = random('Poisson',1:6,1,6)
rp =
0 0 1 2 5 7
See Also betarnd, binornd, cdf, chi2rnd, exprnd, frnd, gamrnd, geornd, hygernd, icdf,
lognrnd, nbinrnd, ncfrnd, nctrnd, ncx2rnd, normrnd, pdf, poissrnd, raylrnd,
trnd, unidrnd, unifrnd, weibrnd
randtool
2-275
2randtool
Purpose I nteracti ve random number generati on usi ng hi stograms for di spl ay.
Syntax randtool
r = randtool('output')
Description The randtool command sets up a graphi c user i nterface for expl ori ng the
effects of changi ng parameters and sampl e si ze on the hi stogram of random
sampl es from the supported probabi l i ty di stri buti ons.
The M-fi l e cal l s i tsel f recursi vel y usi ng the action and flag parameters. For
general use cal l randtool wi thout parameters.
To output the current set of random numbers, press the Output button. The
resul ts are stored i n the vari abl e ans. Al ternati vel y, use the fol l owi ng
command.
r = randtool('output') pl aces the sampl e of random numbers i n the
vector r.
To sampl e repeti ti vel y from the same di stri buti on, press the Resample button.
To change the di stri buti on functi on, choose from the pop-up menu of functi ons
at the top of the fi gure.
To change the parameter setti ngs, move the sl i ders or type a val ue i n the edi t
box under the name of the parameter. To change the l i mi ts of a parameter, type
a val ue i n the edi t box at the top or bottom of the parameter sl i der.
To change the sampl e si ze, type a number i n the Sample Size edi t box.
When you are done, press the Close button.
For an extensi ve di scussi on, see The randtool Demo on page 1-169.
See Also disttool
range
2-276
2range
Purpose Sampl e range.
Syntax y = range(X)
Description range(X) returns the di fference between the maxi mum and the mi ni mum of a
sampl e. For vectors, range(x) i s the range of the el ements. For matri ces,
range(X) i s a row vector contai ni ng the range of each col umn of X.
The range i s an easi l y cal cul ated esti mate of the spread of a sampl e. Outl i ers
have an undue i nfl uence on thi s stati sti c, whi ch makes i t an unrel i abl e
esti mator.
Example The range of a l arge sampl e of standard normal random numbers i s
approxi matel y si x. Thi s i s the moti vati on for the process capabi l i ty i ndi ces C
p

and C
pk
i n stati sti cal qual i ty control appl i cati ons.
rv = normrnd(0,1,1000,5);
near6 = range(rv)
near6 =
6.1451 6.4986 6.2909 5.8894 7.0002
See Also std, iqr, mad
ranksum
2-277
2ranksum
Purpose Wi l coxon rank sum test that two popul ati ons are i denti cal .
Syntax p = ranksum(x,y,alpha)
[p,h] = ranksum(x,y,alpha)
[p,h,stats] = ranksum(x,y,alpha)
Description p = ranksum(x,y,alpha) returns the si gni fi cance probabi l i ty that the
popul ati ons generati ng two i ndependent sampl es, x and y, are i denti cal . x and
y are both vectors, but can have di fferent l engths. alpha i s the desi red l evel of
si gni fi cance and must be a scal ar between zero and one.
[p,h] = ranksum(x,y,alpha) al so returns the resul t of the hypothesi s test, h.
h i s zero i f the popul ati ons of x and y are not si gni fi cantl y di fferent. h i s one i f
the two popul ati ons are si gni fi cantl y di fferent.
p i s the probabi l i ty of observi ng a resul t equal l y or more extreme than the one
usi ng the data (x and y) i f the nul l hypothesi s i s true. I f p i s near zero, thi s casts
doubt on thi s hypothesi s.
[p,h,stats] = ranksum(x,y,alpha) al so returns a structure contai ni ng the
fi el d stats.ranksum whose val ue i s equal to the rank sum stati sti c. For l arge
sampl es, i t al so contai ns stats.zval that i s the val ue of the normal (Z) stati sti c
used to compute p.
Example Thi s exampl e tests the hypothesi s of equal i ty of means for two sampl es
generated wi th poissrnd.
x = poissrnd(5,10,1);
y = poissrnd(2,20,1);
[p,h] = ranksum(x,y,0.05)
p =
0.0027
h =
1
See Also signrank, signtest, ttest2
raylcdf
2-278
2rayl cdf
Purpose Rayl ei gh cumul ati ve di stri buti on functi on (cdf).
Syntax P = raylcdf(X,B)
Description P = raylcdf(X,B) computes the Rayl ei gh cdf at each of the val ues i n X usi ng
the correspondi ng parameters i n B. Vector or matri x i nputs for X and B must
have the same si ze, whi ch i s al so the si ze of P. A scal ar i nput for X or B i s
The Rayl ei gh cdf i s
Example x = 0:0.1:3;
p = raylcdf(x,1);
plot(x,p)
Edition, Wi l ey 1993. pp. 134136.
See Also cdf, raylinv, raylpdf, raylrnd, raylstat
y F x b ( )
t
b
2
------
0
x
e
t
2
2b
2
---------
,
_
= = dt
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
raylinv
2-279
2rayl i nv
Purpose I nverse of the Rayl ei gh cumul ati ve di stri buti on functi on.
Syntax X = raylinv(P,B)
Description X = raylinv(P,B) returns the i nverse of the Rayl ei gh cumul ati ve di stri buti on
functi on wi th parameter B at the correspondi ng probabi l i ti es i n P. Vector or
matri x i nputs for P and B must have the same si ze, whi ch i s al so the si ze of X.
A scal ar i nput for P or B i s expanded to a constant matri x wi th the same
Example x = raylinv(0.9,1)
x =
2.1460
See Also icdf, raylcdf, raylpdf, raylrnd, raylstat
raylpdf
2-280
2rayl pdf
Purpose Rayl ei gh probabi l i ty densi ty functi on.
Syntax Y = raylpdf(X,B)
Description Y = raylpdf(X,B) computes the Rayl ei gh pdf at each of the val ues i n X usi ng
the correspondi ng parameters i n B. Vector or matri x i nputs for X and B must
have the same si ze, whi ch i s al so the si ze of Y. A scal ar i nput for X or B i s
The Rayl ei gh pdf i s
Example x = 0:0.1:3;
p = raylpdf(x,1);
plot(x,p)
See Also pdf, raylcdf, raylinv, raylrnd, raylstat
y f x b ( )
x
b
2
------e
x
2
2b
2
---------
,
_
= =
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
raylrnd
2-281
2rayl rnd
Purpose Random matri ces from the Rayl ei gh di stri buti on.
Syntax R = raylrnd(B)
R = raylrnd(B,m)
R = raylrnd(B,m,n)
Description R = raylrnd(B) returns a matri x of random numbers chosen from the
Rayl ei gh di stri buti on wi th parameter B. The si ze of R i s the si ze of B.
R = raylrnd(B,m) returns a matri x of random numbers chosen from the
Rayl ei gh di stri buti on wi th parameter B, where m i s a 1-by-2 vector that
contai ns the row and col umn di mensi ons of R.
R = raylrnd(B,m,n) returns a matri x of random numbers chosen from the
Rayl ei gh di stri buti on wi th parameter B, where scal ars m and n are the row and
Example r = raylrnd(1:5)
r =
1.7986 0.8795 3.3473 8.9159 3.5182
See Also random, raylcdf, raylinv, raylpdf, raylstat
raylstat
2-282
2rayl stat
Purpose Mean and vari ance for the Rayl ei gh di stri buti on.
Syntax M = raylstat(B)
[M,V] = raylstat(B)
Description [M,V] = raylstat(B) returns the mean and vari ance of the Rayl ei gh
di stri buti on wi th parameter B.
The mean of the Rayl ei gh di stri buti on wi th parameter b i s and the
vari ance i s
Example [mn,v] = raylstat(1)
mn =
1.2533
v =
0.4292
See Also raylcdf, raylinv, raylpdf, raylrnd
b 2
4
2
------------b
2
rcoplot
2-283
2rcopl ot
Purpose Resi dual case order pl ot.
Syntax rcoplot(r,rint)
Description rcoplot(r,rint) di spl ays an errorbar pl ot of the confi dence i nterval s on the
resi dual s from a regressi on. The resi dual s appear i n the pl ot i n case order.
I nputs r and rint are outputs from the regress functi on.
Example X = [ones(10,1) (1:10)'];
y = X [10;1] + normrnd(0,0.1,10,1);
[b,bint,r,rint] = regress(y,X,0.05);
rcoplot(r,rint);
The fi gure shows a pl ot of the resi dual s wi th error bars showi ng 95% confi dence
i nterval s on the resi dual s. Al l the error bars pass through the zero l i ne,
i ndi cati ng that there are no outl i ers i n the data.
See Also regress
0 2 4 6 8 10
-0.2
-0.1
0
0.1
0.2
R
e
s
i
d
u
a
l
s
Case Number
refcurve
2-284
2refcurve
Purpose Add a pol ynomi al curve to the current pl ot.
Syntax h = refcurve(p)
Description refcurve adds a graph of the pol ynomi al p to the current axes. The functi on for
a pol ynomi al of degree n i s:
y = p
1
x
n
+ p
2
x
(n-1)
+ ... + p
n
x + p
n+1
Note that p
1
goes wi th the hi ghest order term.
h = refcurve(p) returns the handl e to the curve.
Example Pl ot data for the hei ght of a rocket agai nst ti me, and add a reference curve
showi ng the theoreti cal hei ght (assumi ng no ai r fri cti on). The i ni ti al vel oci ty of
the rocket i s 100 m/sec.
h = [85 162 230 289 339 381 413 437 452 458 456 440 400 356];
plot(h,'+')
refcurve([-4.9 100 0])
See Also polyfit, polyval, refline
0 2 4 6 8 10 12 14
0
100
200
300
400
500
refline
2-285
2refl i ne
Purpose Add a reference l i ne to the current axes.
Syntax refline(slope,intercept)
refline(slope)
h = refline(slope,intercept)
refline
Description refline(slope,intercept) adds a reference l i ne wi th the gi ven slope and
intercept to the current axes.
refline(slope), where slope i s a two-el ement vector, adds the l i ne
y = slope(2) + slope(1)*x
to the fi gure.
h = refline(slope,intercept) returns the handl e to the l i ne.
refline wi th no i nput arguments superi mposes the l east squares l i ne on each
l i ne object i n the current fi gure (except LineStyles '-','--','.-'). Thi s
behavi or i s equi val ent to lsline.
Example y = [3.2 2.6 3.1 3.4 2.4 2.9 3.0 3.3 3.2 2.1 2.6]';
plot(y,'+')
refline(0,3)
See Also lsline, polyfit, polyval, refcurve
0 2 4 6 8 10 12
2
2.5
3
3.5
regress
2-286
2regress
Purpose Mul ti pl e l i near regressi on.
Syntax b = regress(y,X)
[b,bint,r,rint,stats] = regress(y,X)
[b,bint,r,rint,stats] = regress(y,X,alpha)
Description b = regress(y,X) returns the l east squares fi t of y on X by sol vi ng the l i near
model
for , where:
y i s an n-by-1 vector of observati ons
X i s an n-by-p matri x of regressors
i s a p-by-1 vector of parameters
i s an n-by-1 vector of random di sturbances
[b,bint,r,rint,stats] = regress(y,X) returns an esti mate of i n b, a 95%
confi dence i nterval for i n the p-by-2 vector bint. The resi dual s are returned
i n r and a 95% confi dence i nterval for each resi dual i s returned i n the n-by-2
vector rint. The vector stats contai ns the R
2
stati sti c al ong wi th the F and p
val ues for the regressi on.
[b,bint,r,rint,stats] = regress(y,X,alpha) gi ves 100(1-alpha)%
confi dence i nterval s for bint and rint. For exampl e, alpha = 0.2 gi ves 80%
Examples Suppose the true model i s
where I i s the i denti ty matri x.
X = [ones(10,1) (1:10)']
y X + =
N 0
2
I , ( )
y 10 x + + =
N 0 0.01I , ( )
regress
2-287
X =
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
y = X [10;1] + normrnd(0,0.1,10,1)
y =
11.1165
12.0627
13.0075
14.0352
14.9303
16.1696
17.0059
18.1797
19.0264
20.0872
[b,bint] = regress(y,X,0.05)
b =
10.0456
1.0030
bint =
9.9165 10.1747
0.9822 1.0238
Compare b to [10 1]'. Note that bint i ncl udes the true model val ues.
Reference Chatterjee, S. and A. S. Hadi . I nfluential Observations, High Leverage Points,
and Outliers in Linear Regression. Stati sti cal Sci ence, 1986. pp. 379416.
regstats
2-288
2regstats
Purpose Regressi on di agnosti cs graphi cal user i nterface.
Syntax regstats(responses,DATA)
regstats(responses,DATA,'model')
Description regstats(responses,DATA) generates regressi on di agnosti cs for a l i near
addi ti ve model wi th a constant term. The dependent vari abl e i s the vector
responses. Val ues of the i ndependent vari abl es are i n the matri x DATA.
The functi on creates a fi gure wi th a group of check boxes that save di agnosti c
stati sti cs to the base workspace usi ng vari abl e names you can speci fy.
regstats(responses,data,'model') control s the order of the regressi on
model , where 'model' can be one of these stri ngs:
The l i terature suggests many di agnosti c stati sti cs for eval uati ng mul ti pl e
l i near regressi on. regstats provi des these di agnosti cs:
Q from QR decomposi ti on
R from QR decomposi ti on
Regressi on coeffi ci ents
Covari ance of regressi on coeffi ci ents
Fi tted val ues of the response data
Resi dual s
Mean squared error
Leverage
Hat matri x
Del ete-1 vari ance
Del ete-1 coeffi ci ents
Standardi zed resi dual s
Studenti zed resi dual s
Change i n regressi on coeffi ci ents
regstats
2-289
Change i n fi tted val ues
Scal ed change i n fi tted val ues
Change i n covari ance
Cooks di stance
For more detai l press the Help button i n the regstats wi ndow. Thi s provi des
formul ae and i nterpretati ons for each of these regressi on di agnosti cs.
Algorithm The usual regressi on model i s y = X + , where:
y i s an n-by-1 vector of responses
X i s an n-by-p matri x of predi ctors
i s an p-by-1 vector of parameters
i s an n-by-1 vector of random di sturbances
Let X = Q*R where Q and R come from a QR Decomposi ti on of X. Q i s orthogonal
and R i s tri angul ar. Both of these matri ces are useful for cal cul ati ng many
regressi on di agnosti cs (Goodal l 1993).
The standard textbook equati on for the l east squares esti mator of i s
However, thi s defi ni ti on has poor numeri c properti es. Parti cul arl y dubi ous i s
the computati on of , whi ch i s both expensi ve and i mpreci se.
Numeri cal l y stabl e MATLAB code for i s
b = R\(Q'*y);
Reference Goodal l , C. R. (1993). Computation using the QR decomposition. Handbook i n
Stati sti cs, Vol ume 9. Stati sti cal Computi ng (C. R. Rao, ed.). Amsterdam, NL
El sevi er/North-Hol l and.
See Also leverage, stepwise, regress
b X' X ( )
1
Xy = =
XX ( )
1
ridge
2-290
2ri dge
Purpose Parameter esti mates for ri dge regressi on.
Syntax b = ridge(y,X,k)
Description b = ridge(y,X,k) returns the ri dge regressi on coeffi ci ents b for the l i near
model y =X + , where:
X i s an n-by-p matri x
y i s the n-by-1 vector of observati ons
k i s a scal ar constant (the ri dge parameter)
The ri dge esti mator of i s .
When k = 0, b i s the l east squares esti mator. For i ncreasi ng k, the bi as of b
i ncreases, but the vari ance of b fal l s. For poorl y condi ti oned X, the drop i n the
vari ance more than compensates for the bi as.
Example Thi s exampl e shows how the coeffi ci ents change as the val ue of k i ncreases,
usi ng data from the hald dataset.
load hald;
b = zeros(4,100);
kvec = 0.01:0.01:1;
count = 0;
for k = 0.01:0.01:1
count = count + 1;
b(:,count) = ridge(heat,ingredients,k);
end
plot(kvec',b'),xlabel('k'),ylabel('b','FontName','Symbol')
b XX kI + ( )
1
Xy =
ridge
2-291
See Also regress, stepwise
0 0.2 0.4 0.6 0.8 1
-10
-5
0
5
10
k
robustdemo
2-292
2robustdemo
Purpose Demo of robust regressi on.
Syntax robustdemo
robustdemo(X,Y)
Description rsmdemo demonstrates robust regressi on and ordi nary l east squares regressi on
on a sampl e dataset. The functi on creates a fi gure wi ndow contai ni ng a scatter
pl ot of sampl e data vectors X and Y, al ong wi th two fi tted l i nes cal cul ated usi ng
l east squares and the robust bi square method. The bottom of the fi gure shows
the equati ons of the l i nes and the esti mated error standard devi ati ons for each
fi t. I f you use the l eft mouse button to sel ect an poi nt and move i t to a new
l ocati on, both fi ts wi l l update. I f you hol d down the ri ght mouse button over any
poi nt, the poi nt wi l l be l abel ed wi th the l everage of that poi nt on the l east
squares fi t, and the wei ght of that poi nt i n the robust fi t.
rsmdemo(X,Y) performs the same demonstrati on usi ng the X and Y val ues that
you speci fy.
Example See The robustdemo Demo on page 1-172.
See Also robustfit, leverage
robustfit
2-293
2robustfi t
Purpose Robust regressi on.
Syntax b = robustfit(X,Y)
[b,stats] = robustfit(X,Y)
[b,stats] = robustfit(X,Y,'wfun',tune,'const')
Description b = robustfit(X,Y) uses robust regressi on to fi t Y as a functi on of the
col umns of X, and returns the vector b of coeffi ci ent esti mates. The robustfit
functi on uses an i terati vel y rewei ghted l east squares al gori thm, wi th the
wei ghts at each i terati on cal cul ated by appl yi ng the bi square functi on to the
resi dual s from the previ ous i terati on. Thi s al gori thm gi ves l ower wei ght to
poi nts that do not fi t wel l . The resul ts are l ess sensi ti ve to outl i ers i n the data
as compared wi th ordi nary l east squares regressi on.
[b,stats] = robustfit(X,Y) al so returns a stats structure wi th the
fol l owi ng fi el ds:
stats.ols_s si gma esti mate (rmse) from l east squares fi t
stats.robust_s robust esti mate of si gma
stats.mad_s esti mate of si gma computed usi ng the medi an absol ute
devi ati on of the resi dual s from thei r medi an; used for scal i ng resi dual s
duri ng the i terati ve fi tti ng
stats.s fi nal esti mate of si gma, the l arger of robust_s and a wei ghted
average of ols_s and robust_s
stats.se standard error of coeffi ci ent esti mates
stats.t rati o of b to stats.se
stats.p p-val ues for stats.t
stats.coeffcorr esti mated correl ati on of coeffi ci ent esti mates
stats.w vector of wei ghts for robust fi t
stats.h vector of l everage val ues for l east squares fi t
stats.dfe degrees of freedom for error
stats.R R factor i n QR decomposi ti on of X matri x
The robustfit functi on esti mates the vari ance-covari ance matri x of the
coeffi ci ent esti mates as V = inv(X'*X)*stats.s^2. The standard errors and
correl ati ons are deri ved from V.
robustfit
2-294
[b,stats] = robustfit(X,Y,'wfun',tune,'const') speci fi es a wei ght
functi on, a tuni ng constant, and the presence or absence of a constant term.
The wei ght functi on 'wfun' can be any of the names l i sted i n the fol l owi ng
tabl e.
The val ue r i n the wei ght functi on expressi on i s equal to
resid/(tune*s*sqrt(1-h))
where resid i s the vector of resi dual s from the previ ous i terati on, tune i s the
tuni ng constant, h i s the vector of l everage val ues from a l east squares fi t, and
s i s an esti mate of the standard devi ati on of the error term.
s = MAD/0.6745
The quanti ty MAD i s the medi an absol ute devi ati on of the resi dual s from thei r
medi an. The constant 0.6745 makes the esti mate unbi ased for the normal
di stri buti on. I f there are p col umns i n the X matri x (i ncl udi ng the constant
term, i f any), the smal l est p-1 absol ute devi ati ons are excl uded when
computi ng thei r medi an.
I n addi ti on to the functi on names l i sted above, 'wfun' can be 'ols' to perform
unwei ghted ordi nary l east squares.
The argument tune overri des the defaul t tuni ng constant from the tabl e. A
smal l er tuni ng constant tends to downwei ght l arge resi dual s more severel y,
Weight function Meaning Tuning constant
'andrews' w = (abs(r)<pi) .* sin(r) ./ r 1.339
'bisquare' w = (abs(r)<1) .* (1 - r.^2).^2 4.685
'cauchy' w = 1 ./ (1 + r.^2) 2.385
'fair' w = 1 ./ (1 + abs(r)) 1.400
'huber' w = 1 ./ max(1, abs(r)) 1.345
'logistic' w = tanh(r) ./ r 1.205
'talwar' w = 1 * (abs(r)<1) 2.795
'welsch' w = exp(-(r.^2)) 2.985
robustfit
2-295
and a l arger tuni ng constant downwei ghts l arge resi dual s l ess severel y. The
defaul t tuni ng constants, shown i n the tabl e, yi el d coeffi ci ent esti mates that
are approxi matel y 95% as effi ci ent as l east squares esti mates, when the
response has a normal di stri buti on wi th no outl i ers. The val ue of 'const' can
be 'on' (the defaul t) to add a constant term or 'off' to omi t i t. I f you want a
constant term, you shoul d set 'const' to 'on' rather than addi ng a col umn of
ones to your X matri x.
As an al ternati ve to speci fyi ng one of the named wei ght functi ons shown above,
you can wri te your own wei ght functi on that takes a vector of scal ed resi dual s
as i nput and produces a vector of wei ghts as output. You can speci fy 'wfun'
usi ng @ (for exampl e, @myfun) or as an i nl i ne functi on.
Example Lets see how a si ngl e erroneous poi nt affects l east squares and robust fi ts.
Fi rst we generate a si mpl e dataset fol l owi ng the equati on y = 10-2*x pl us some
random noi se. Then we change one y val ue to si mul ate an outl i er that coul d be
an erroneous measurement.
x = (1:10)';
y = 10 - 2*x + randn(10,1);
y(10) = 0;
We use both ordi nary l east squares and robust fi tti ng to esti mate the equati ons
of a strai ght l i ne fi t.
bls = regress(y,[ones(10,1) x])
bls =
8.6305
-1.4721
brob = robustfit(x,y)
brob =
10.5089
-1.9844
A scatter pl ot wi th both fi tted l i nes shows that the robust fi t (sol i d l i ne) fi ts
most of the data poi nts wel l but i gnores the outl i er. The l east squares fi t (dotted
l i ne) i s pul l ed toward the outl i er.
robustfit
2-296
scatter(x,y)
hold on
plot(x,bls(1)+bls(2)*x,'g:')
plot(x,brob(1)+brob(2)*x,'r-')
See Also regress, robustdemo
References DuMouchel , W.H., and F.L. OBri en (1989), I ntegrati ng a robust opti on i nto a
mul ti pl e regressi on computi ng envi ronment, Computer Science and Statistics:
Proceedings of the 21st Symposium on the I nterface, Al exandri a, VA: Ameri can
Stati sti cal Associ ati on.
Hol l and, P.W., and R.E. Wel sch (1977), Robust regressi on usi ng i terati vel y
rewei ghted l east-squares, Communications in Statistics: Theory and Methods,
A6, 813-827.
Huber, P.J. (1981), Robust Statistics, New York: Wi l ey.
Street, J.O., R.J. Carrol l , and D. Ruppert (1988), A note on computi ng robust
regressi on esti mates vi a i terati vel y rewei ghted l east squares, The American
Statistician, 42, 152-154
1 2 3 4 5 6 7 8 9 10
10
8
6
4
2
0
2
4
6
8
10
rowexch
2-297
2rowexch
Purpose D-opti mal desi gn of experi ments row exchange al gori thm.
Syntax settings = rowexch(nfactors,nruns)
[settings,X] = rowexch(nfactors,nruns)
[settings,X] = rowexch(nfactors,nruns,'model')
Description settings = rowexch(nfactors,nruns) generates the factor setti ngs matri x,
settings, for a D-Opti mal desi gn usi ng a l i near addi ti ve model wi th a constant
term. settings has nruns rows and nfactors col umns.
[settings,X] = rowexch(nfactors,nruns) al so generates the associ ated
desi gn matri x X.
[settings,X] = rowexch(nfactors,nruns,'model') produces a desi gn for
fi tti ng a speci fi ed regressi on model . The i nput, 'model', can be one of these
stri ngs:
'quadratic' i nteracti ons pl us squared terms.
'purequadratic' i ncl udes constant, l i near and squared terms.
Example Thi s exampl e i l l ustrates that the D-opti mal desi gn for three factors i n ei ght
runs, usi ng an i nteracti ons model , i s a two l evel ful l -factori al desi gn.
s = rowexch(3,8,'interaction')
s =
-1 -1 1
1 -1 -1
1 -1 1
-1 -1 -1
-1 1 1
1 1 1
-1 1 -1
1 1 -1
See Also cordexch, daugment, dcovary, fullfact, ff2n, hadamard
rsmdemo
2-298
2rsmdemo
Purpose Demo of desi gn of experi ments and surface fi tti ng.
Syntax rsmdemo
Description rsmdemo creates a GUI that si mul ates a chemi cal reacti on. To start, you have
a budget of 13 test reacti ons. Try to fi nd out how changes i n each reactant affect
the reacti on rate. Determi ne the reactant setti ngs that maxi mi ze the reacti on
rate. Esti mate the run-to-run vari abi l i ty of the reacti on. Now run a desi gned
experi ment usi ng the model pop-up. Compare your previ ous resul ts wi th the
output from response surface model i ng or nonl i near model i ng of the reacti on.
The GUI has the fol l owi ng el ements:
A Run button to perform one reactor run at the current setti ngs
An Export button to export the x and y data to the base workspace
Three sl i ders wi th associ ated data entry boxes to control the parti al
pressures of the chemi cal reactants: Hydrogen, n-Pentane, and I sopentane
A text box to report the reacti on rate
A text box to keep track of the number of test reacti ons you have l eft
Example See The rsmdemo Demo on page 1-170.
See Also rstool, nlintool, cordexch
rstool
2-299
2rstool
Purpose I nteracti ve fi tti ng and vi sual i zati on of a response surface.
Syntax rstool(x,y)
rstool(x,y,'model')
rstool(x,y,'model',alpha,'xname','yname')
Description rstool(x,y) di spl ays an i nteracti ve predi cti on pl ot wi th 95% gl obal confi dence
i nterval s. Thi s pl ot resul ts from a mul ti pl e regressi on of (x,y) data usi ng a
l i near addi ti ve model .
rstool(x,y,'model') al l ows control over the i ni ti al regressi on model , where
'model' can be one of the fol l owi ng stri ngs:
'purequadratic' i ncl udes constant, l i near and squared terms
rstool(x,y,'model',alpha) pl ots 100(1-alpha)% gl obal confi dence i nterval
for predi cti ons as two red curves. For exampl e, alpha = 0.01 gi ves 99%
rstool di spl ays a vector of pl ots, one for each col umn of the matri x of
i nputs x. The response vari abl e, y, i s a col umn vector that matches the number
of rows i n x.
rstool(x,y,'model',alpha,'xname','yname') l abel s the graph usi ng the
stri ng matri x 'xname' for the l abel s to the x-axes and the stri ng, 'yname', to
l abel the y-axi s common to al l the pl ots.
Drag the dotted whi te reference l i ne and watch the predi cted val ues update
si mul taneousl y. Al ternati vel y, you can get a speci fi c predi cti on by typi ng the
val ue of x i nto an edi tabl e text fi el d. Use the pop-up menu l abel ed Model to
i nteracti vel y change the model . Use the pop-up menu l abel ed Export to move
speci fi ed vari abl es to the base workspace.
Example See Quadrati c Response Surface Model s on page 1-86.
See Also nlintool
schart
2-300
2schart
Purpose Chart of standard devi ati on for Stati sti cal Process Control .
Syntax schart(DATA,conf)
schart(DATA,conf,specs)
schart(DATA,conf,specs)
[outliers,h] = schart(DATA,conf,specs)
Description schart(data) di spl ays an S chart of the grouped responses i n DATA. The rows
of DATA contai n repl i cate observati ons taken at a gi ven ti me. The rows must be
i n ti me order. The graph contai ns the sampl e standard devi ati on s for each
group, a center l i ne at the average s val ue, and upper and l ower control l i mi ts.
The l i mi ts are pl aced at a three-si gma di stance on ei ther si de of the center l i ne,
where si gma i s an esti mate of the standard devi ati on of s. I f the process i s i n
control , fewer than 3 out of 1000 observati ons woul d be expected to fal l outsi de
the control l i mi ts by random chance. So, i f you observe poi nts outsi de the
l i mi ts, you can take thi s as evi dence that the process i s not i n control .
schart(DATA,conf) al l ows control of the confi dence l evel of the upper and
l ower pl otted control l i mi ts. The defaul t conf = 0.9973 produces three-si gma
l i mi ts.
norminv(1 - (1-.9973)/2)
ans =
3
To get k-si gma l i mi ts, use the expressi on 1-2*(1-normcdf(k)). For exampl e,
the correct conf val ue for 2-si gma l i mi ts i s 0.9545, as shown bel ow.
k = 2;
1-2*(1-normcdf(k))
ans =
0.9545
schart(DATA,conf,specs) pl ots the speci fi cati on l i mi ts i n the two el ement
vector specs.
[outliers,h] = schart(data,conf,specs) returns outliers, a vector of
i ndi ces to the rows where the mean of DATA i s out of control , and h, a vector of
handl es to the pl otted l i nes.
schart
2-301
Example Thi s exampl e pl ots an S chart of measurements on newl y machi ned parts,
taken at one hour i nterval s for 36 hours. Each row of the runout matri x
contai ns the measurements for 4 parts chosen at random. The val ues i ndi cate,
i n thousandths of an i nch, the amount the part radi us di ffers from the target
radi us.
load parts
schart(runout)
Al l poi nts are wi thi n the control l i mi ts, so the vari abi l i ty wi thi n subgroups i s
consi stent wi th what woul d be expected by random chance. There i s no
evi dence that the process i s out of control .
Reference Montgomery, D., I ntroduction to Statistical Quality Control, John Wi l ey and
Sons 1991. p. 235.
See Also capaplot, ewmaplot, histfit, xbarplot
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
UCL
LCL
CL
S Chart
Sample Number
S
t
a
n
d
a
r
d

D
e
v
i
a
t
i
o
n
signrank
2-302
2si gnrank
Purpose Wi l coxon si gned rank test of equal i ty of medi ans.
Syntax p = signrank(x,y,alpha)
[p,h] = signrank(x,y,alpha)
[p,h,stats] = signrank(x,y,alpha)
Description p = signrank(x,y,alpha) returns the si gni fi cance probabi l i ty that the
medi ans of two matched sampl es, x and y, are equal . x and y must be vectors
of equal l ength. y may al so be a scal ar; i n thi s case, signrank computes the
probabi l i ty that the medi an of x i s di fferent from the constant y. alpha i s the
desi red l evel of si gni fi cance, and must be a scal ar between zero and one.
[p,h] = signrank(x,y,alpha) al so returns the resul t of the hypothesi s
test, h. h i s zero i f the di fference i n medi ans of x and y i s not si gni fi cantl y
di fferent from zero. h i s one i f the two medi ans are si gni fi cantl y di fferent.
usi ng the data (x and y) i f the nul l hypothesi s i s true. p i s cal cul ated usi ng the
rank val ues for the di fferences between correspondi ng el ements i n x and y. I f p
i s near zero, thi s casts doubt on thi s hypothesi s.
[p,h,stats] = signrank(x,y,alpha) al so returns a structure stats
contai ni ng the fi el d stats.signedrank whose val ue i s the si gned rank
stati sti c. For l arge sampl es, i t al so contai ns stats.zval, the val ue of the
normal (Z) stati sti c used to compute p.
Example Thi s exampl e tests the hypothesi s of equal i ty of means for two sampl es
generated wi th normrnd. The sampl es have the same theoreti cal mean but
di fferent standard devi ati ons.
x = normrnd(0,1,20,1);
y = normrnd(0,2,20,1);
[p,h] = signrank(x,y,0.05)
p =
0.2959
h =
0
signrank
2-303
See Also ranksum, signtest, ttest
signtest
2-304
2si gntest
Purpose Si gn test for pai red sampl es.
Syntax p = signtest(x,y,alpha)
[p,h] = signtest(x,y,alpha)
[p,h,stats] = signtest(x,y,alpha)
Description p = signtest(x,y,alpha) returns the si gni fi cance probabi l i ty that the
medi ans of two matched sampl es, x and y, are equal . x and y must be vectors
of equal l ength. y may al so be a scal ar; i n thi s case, signtest computes the
probabi l i ty that the medi an of x i s di fferent from the constant y. alpha i s the
desi red l evel of si gni fi cance and must be a scal ar between zero and one.
[p,h] = signtest(x,y,alpha) al so returns the resul t of the hypothesi s test,
h. h i s 0 i f the di fference i n medi ans of x and y i s not si gni fi cantl y di fferent from
zero. h i s 1 i f the two medi ans are si gni fi cantl y di fferent.
usi ng the data (x and y) i f the nul l hypothesi s i s true. p i s cal cul ated usi ng the
si gns (pl us or mi nus) of the di fferences between correspondi ng el ements i n x
and y. I f p i s near zero, thi s casts doubt on thi s hypothesi s.
[p,h,stats] = signtest(x,y,alpha) al so returns a structure stats
contai ni ng the fi el d stats.sign whose val ue i s the si gn stati sti c. For l arge
sampl es, i t al so contai ns stats.zval, the val ue of the normal (Z) stati sti c used
to compute p.
Example Thi s exampl e tests the hypothesi s of equal i ty of medi ans for two sampl es
generated wi th normrnd. The sampl es have the same theoreti cal medi an but
di fferent standard devi ati ons. (For the normal di stri buti on, the mean and
medi an are the same.)
x = normrnd(0,1,20,1);
y = normrnd(0,2,20,1);
[p,h] = signtest(x,y,0.05)
p =
0.2632
h =
0
signtest
2-305
See Also ranksum, signrank, ttest
skewness
2-306
2skewness
Purpose Sampl e skewness.
Syntax y = skewness(X)
y = skewness(X,flag)
Description y = skewness(X) returns the sampl e skewness of X. For vectors, skewness(x)
i s the skewness of the el ements of x. For matri ces, skewness(X) i s a row vector
contai ni ng the sampl e skewness of each col umn.
Skewness i s a measure of the asymmetry of the data around the sampl e mean.
I f skewness i s negati ve, the data are spread out more to the l eft of the mean
than to the ri ght. I f skewness i s posi ti ve, the data are spread out more to the
ri ght. The skewness of the normal di stri buti on (or any perfectl y symmetri c
di stri buti on) i s zero.
The skewness of a di stri buti on i s defi ned as
where i s the mean of x, i s the standard devi ati on of x, and E(t) represents
the expected val ue of the quanti ty t.
y = skewness(X,flag) speci fi es whether to correct for bi as (flag = 0) or not
(flag = 1, the defaul t). When X represents a sampl e from a popul ati on, the
skewness of X i s bi ased; that i s, i t wi l l tend to di ffer from the popul ati on
skewness by a systemati c amount that depends on the si ze of the sampl e. You
can set flag = 0 to correct for thi s systemati c bi as.
X =
1.1650 1.6961 -1.4462 -0.3600
0.6268 0.0591 -0.7012 -0.1356
0.0751 1.7971 1.2460 -1.3493
0.3516 0.2641 -0.6390 -1.2704
-0.6965 0.8717 0.5774 0.9846
y
E x ( )
3
3
------------------------ =
skewness
2-307
y = skewness(X)
y =
-0.2933 0.0482 0.2735 0.4641
See Also kurtosis, mean, moment, std, var
squareform
2-308
2squareform
Purpose Reformat the output of pdist i nto a square matri x.
Syntax S = squareform(Y)
Description S = squareform(Y) reformats the di stance i nformati on returned by pdist
from a vector i nto a square matri x. I n thi s format, S(i,j) denotes the di stance
between the i and j observati ons i n the ori gi nal data.
See Also pdist
std
2-309
2std
Purpose Standard devi ati on of a sampl e.
Syntax y = std(X)
Description y = std(X) computes the sampl e standard devi ati on of the data i n X. For
vectors, std(x) i s the standard devi ati on of the el ements i n x. For matri ces,
std(X) i s a row vector contai ni ng the standard devi ati on of each col umn of X.
std normal i zes by n-1 where n i s the sequence l ength. For normal l y di stri buted
data, the square of the standard devi ati on i s the mi ni mum vari ance unbi ased
esti mator of
2
(the second parameter).
The standard devi ati on i s
where the sampl e average i s .
The std functi on i s part of the standard MATLAB l anguage.
Examples I n each col umn, the expected val ue of y i s one.
x = normrnd(0,1,100,6);
y = std(x)
y =
0.9536 1.0628 1.0860 0.9927 0.9605 1.0254
y = std(-1:2:1)
y =
1.4142
See Also cov, var
s
1
n 1
------------- x
i
x ( )
2
i 1 =
n
,

_
1
2
---
=
x
1
n
--- x
i
=
stepwise
2-310
2stepwi se
Purpose I nteracti ve envi ronment for stepwi se regressi on.
Syntax stepwise(X,y)
stepwise(X,y,inmodel)
stepwise(X,y,inmodel,alpha)
Description stepwise(X,y) fi ts a regressi on model of y on the col umns of X. I t di spl ays
three fi gure wi ndows for i nteracti vel y control l i ng the stepwi se addi ti on and
removal of model terms.
stepwise(X,y,inmodel) al l ows control of the terms i n the ori gi nal regressi on
model . The val ues of vector, inmodel, are the i ndi ces of the col umns of the
matri x X to i ncl ude i n the i ni ti al model .
stepwise(X,y,inmodel,alpha) al l ows control of the l ength confi dence
i nterval s on the fi tted coeffi ci ents. alpha i s the si gni fi cance for testi ng each
term i n the model . By defaul t, alpha = 1 - (1 - 0.025)
(1/p)
where p i s the number
of col umns i n X. Thi s transl ates to pl otted 95% si mul taneous confi dence
i nterval s (Bonferroni ) for al l the coeffi ci ents.
The l east squares coeffi ci ent i s pl otted wi th a green fi l l ed ci rcl e. A coeffi ci ent i s
not si gni fi cantl y di fferent from zero i f i ts confi dence i nterval crosses the whi te
zero l i ne. Si gni fi cant model terms are pl otted usi ng sol i d l i nes. Terms not
si gni fi cantl y di fferent from zero are pl otted wi th dotted l i nes.
Cl i ck on the confi dence i nterval l i nes to toggl e the state of the model
coeffi ci ents. I f the confi dence i nterval l i ne i s green, the term i s i n the model . I f
the confi dence i nterval l i ne i s red, the term i s not i n the model .
Use the Export menu to move vari abl es to the base workspace.
Example See Stepwi se Regressi on on page 1-88.
Reference Draper, N. and H. Smi th, Applied Regression Analysis, Second Edition, John
Wi l ey and Sons, I nc. 1981 pp. 307312.
See Also regstats, regress, rstool
surfht
2-311
2surfht
Purpose I nteracti ve contour pl ot.
Syntax surfht(Z)
surfht(x,y,Z)
Description surfht(Z) i s an i nteracti ve contour pl ot of the matri x Z treati ng the val ues i n
Z as hei ght above the pl ane. The x-val ues are the col umn i ndi ces of Z whi l e the
y-val ues are the row i ndi ces of Z.
surfht(x,y,Z) where x and y are vectors speci fy the x and y-axes on the
contour pl ot. The l ength of x must match the number of col umns i n Z, and the
l ength of y must match the number of rows i n Z.
There are verti cal and hori zontal reference l i nes on the pl ot whose i ntersecti on
defi nes the current x-val ue and y-val ue. You can drag these dotted whi te
reference l i nes and watch the i nterpol ated z-val ue (at the top of the pl ot)
update si mul taneousl y. Al ternati vel y, you can get a speci fi c i nterpol ated
z-val ue by typi ng the x-val ue and y-val ue i nto edi tabl e text fi el ds on the x-axi s
and y-axi s respecti vel y.
tabulate
2-312
2tabul ate
Purpose Frequency tabl e.
Syntax table = tabulate(x)
tabulate(x)
Description table = tabulate(x) takes a vector of posi ti ve i ntegers, x, and returns a
matri x, table.
The fi rst col umn of table contai ns the val ues of x. The second contai ns the
number of i nstances of thi s val ue. The l ast col umn contai ns the percentage of
each val ue.
tabulate wi th no output arguments di spl ays a formatted tabl e i n the
command wi ndow.
Example tabulate([1 2 4 4 3 4])
Value Count Percent
1 1 16.67%
2 1 16.67%
3 1 16.67%
4 3 50.00%
See Also pareto
tblread
2-313
2tbl read
Purpose Read tabul ar data from the fi l e system.
Syntax [data,varnames,casenames] = tblread
[data,varnames,casenames] = tblread('filename')
[data,varnames,casenames] = tblread('filename','delimiter')
Description [data,varnames,casenames] = tblread di spl ays the File Open di al og box for
i nteracti ve sel ecti on of the tabul ar data fi l e. The fi l e format has vari abl e names
i n the fi rst row, case names i n the fi rst col umn and data starti ng i n the (2,2)
posi ti on.
[data,varnames,casenames] = tblread(filename) al l ows command l i ne
speci fi cati on of the name of a fi l e i n the current di rectory, or the compl ete
pathname of any fi l e.
[data,varnames,casenames] = tblread(filename,'delimiter') al l ows
speci fi cati on of the fi el d 'delimiter' i n the fi l e. Accepted val ues are 'tab',
'space', or 'comma'.
tblread returns the data read i n three val ues.
Return Value Description
data Numeri c matri x wi th a val ue for each vari abl e-case pai r.
varnames Stri ng matri x contai ni ng the vari abl e names i n the fi rst
row.
casenames Stri ng matri x contai ni ng the names of each case i n the
fi rst col umn.
tblread
2-314
Example [data,varnames,casenames] = tblread('sat.dat')
data =
470 530
520 480
varnames =
Male
Female
casenames =
Verbal
Quantitative
See Also caseread, tblwrite, tdfread
tblwrite
2-315
2tbl wri te
Purpose Wri tes tabul ar data to the fi l e system.
Syntax tblwrite(data,'varnames','casenames')
tblwrite(data,'varnames','casenames','filename')
Description tblwrite(data,'varnames','casenames') di spl ays the File Open di al og box
for i nteracti ve speci fi cati on of the tabul ar data output fi l e. The fi l e format has
vari abl e names i n the fi rst row, case names i n the fi rst col umn and data
starti ng i n the (2,2) posi ti on.
'varnames' i s a stri ng matri x contai ni ng the vari abl e names. 'casenames' i s
a stri ng matri x contai ni ng the names of each case i n the fi rst col umn. data i s
a numeri c matri x wi th a val ue for each vari abl e-case pai r.
tblwrite(data,'varnames','casenames','filename') al l ows command l i ne
speci fi cati on of a fi l e i n the current di rectory, or the compl ete pathname of any
fi l e i n the stri ng 'filename'.
Example Conti nui ng the exampl e from tblread:
tblwrite(data,varnames,casenames,'sattest.dat')
type sattest.dat
Male Female
Verbal 470 530
Quantitative 520 480
See Also casewrite, tblread
tcdf
2-316
2tcdf
Purpose Students t cumul ati ve di stri buti on functi on (cdf).
Syntax P = tcdf(X,V)
Description P = tcdf(X,V) computes Students t cdf at each of the val ues i n X usi ng the
correspondi ng degrees of freedom i n V. Vector or matri x i nputs for X and V must
be the same si ze. A scal ar i nput i s expanded to a constant matri x wi th the same
di mensi ons as the other i nputs. The parameters i n V must be posi ti ve i ntegers.
The t cdf i s
The resul t, p, i s the probabi l i ty that a si ngl e observati on from the t di stri buti on
wi th degrees of freedom wi l l fal l i n the i nterval (- x].
Examples Suppose 10 sampl es of Gui nness beer have a mean al cohol content of 5.5% by
vol ume and the standard devi ati on of these sampl es i s 0.5%. What i s the
probabi l i ty that the true al cohol content of Gui nness beer i s l ess than 5%?
t = (5.0 - 5.5) / 0.5;
probability = tcdf(t,10 - 1)
probability =
0.1717
See Also cdf, tinv, tpdf, trnd, tstat
p F x ( )
1 +
2
------------
,
_

2
---
,
_
----------------------
1
----------
1
1
t
2
----- +
,
_
1 +
2
------------
------------------------------- t d

x
= =
tdfread
2-317
2tdfread
Purpose Read fi l e contai ni ng tab-del i mi ted numeri c and text val ues.
Syntax tdfread
tdfread('filename')
tdfread('filename','delimiter')
Description tdfread di spl ays the File Open di al og box for i nteracti ve sel ecti on of the data
fi l e. The fi l e shoul d consi st of col umns of val ues, separated by tabs, and wi th
col umn names i n the fi rst l i ne of the fi l e. Each col umn i s read from the fi l e and
assi gned to a vari abl e wi th the speci fi ed name. I f al l val ues for a col umn are
numeri c, the vari abl e i s converted to numbers; otherwi se the vari abl e i s a
stri ng matri x. After al l val ues are i mported, tdfread di spl ays i nformati on
about the i mported val ues usi ng the format of the whos command.
tdfread('filename') al l ows command l i ne speci fi cati on of the name of a fi l e
i n the current di rectory, or the compl ete pathname of any fi l e.
tdfread('filename','delimiter') i ndi cates that the character speci fi ed by
'delimiter' separates col umns i n the fi l e. Accepted val ues are:
' ' or 'space'
'\t' or 'tab'
',' or 'comma'
';' or 'semi'
'|' or 'bar'
The defaul t del i mi ter i s 'tab'.
Example type sat2.dat
Test,Gender,Score
Verbal,Mail,470
Verbal,Female,530
Quantitative,Male,520
Quantitative,Female,480
tdfread('sat2.dat',',')
tdfread
2-318
Gender 4x6 48 char array
Score 4x1 32 double array
Test 4x12 96 char array
Grand total is 76 elements using 176 bytes
See Also tblread
tinv
2-319
2ti nv
Purpose I nverse of the Students t cumul ati ve di stri buti on functi on (cdf).
Syntax X = tinv(P,V)
Description X = tinv(P,V) computes the i nverse of Students t cdf wi th parameter V for
the correspondi ng probabi l i ti es i n P. Vector or matri x i nputs for P and V must
be the same si ze. A scal ar i nput i s expanded to a constant matri x wi th the same
di mensi ons as the other i nputs. The degrees of freedom i n V must be posi ti ve
i ntegers, and the val ues i n P must l i e on the i nterval [0 1].
The t i nverse functi on i n terms of the t cdf i s
where
The resul t, x, i s the sol uti on of the cdf i ntegral wi th parameter , where you
suppl y the desi red probabi l i ty p.
Examples What i s the 99th percenti l e of the t di stri buti on for one to si x degrees of
freedom?
percentile = tinv(0.99,1:6)
percentile =
31.8205 6.9646 4.5407 3.7469 3.3649 3.1427
See Also icdf, tcdf, tpdf, trnd, tstat
x F
1
p ( ) x:F x ( ) p = { } = =
p F x ( )
1 +
2
------------
,
_

2
---
,
_
----------------------
1
----------
1
1
t
2
----- +
,
_
1 +
2
------------
------------------------------- t d

x
= =
tpdf
2-320
2tpdf
Purpose Students t probabi l i ty densi ty functi on (pdf).
Syntax Y = tpdf(X,V)
Description Y = tpdf(X,V) computes Students t pdf at each of the val ues i n X usi ng the
di mensi ons as the other i nputs. The degrees of freedom i n V must be posi ti ve
i ntegers.
Students t pdf i s
Examples The mode of the t di stri buti on i s at x = 0. Thi s exampl e shows that the val ue of
the functi on at the mode i s an i ncreasi ng functi on of the degrees of freedom.
tpdf(0,1:6)
ans =
0.3183 0.3536 0.3676 0.3750 0.3796 0.3827
The t di stri buti on converges to the standard normal di stri buti on as the degrees
of freedom approaches i nfi ni ty. How good i s the approxi mati on for v = 30?
difference = tpdf(-2.5:2.5,30) - normpdf(-2.5:2.5)
difference =
0.0035 -0.0006 -0.0042 -0.0042 -0.0006 0.0035
See Also pdf, tcdf, tinv, trnd, tstat
y f x ( )
1 +
2
------------
,
_

2
---
,
_
----------------------
1
----------
1
1
x
2
----- +
,
_
1 +
2
------------
-------------------------------- = =
trimmean
2-321
2tri mmean
Purpose Mean of a sampl e of data excl udi ng extreme val ues.
Syntax m = trimmean(X,percent)
Description m = trimmean(X,percent) cal cul ates the mean of a sampl e X excl udi ng the
hi ghest and l owest percent/2 of the observati ons. The tri mmed mean i s a
robust esti mate of the l ocati on of a sampl e. I f there are outl i ers i n the data, the
tri mmed mean i s a more representati ve esti mate of the center of the body of the
data. I f the data i s al l from the same probabi l i ty di stri buti on, then the tri mmed
mean i s l ess effi ci ent than the sampl e average as an esti mator of the l ocati on
of the data.
Examples Thi s exampl e shows a Monte Carl o si mul ati on of the effi ci ency of the 10%
tri mmed mean rel ati ve to the sampl e average for normal data.
x = normrnd(0,1,100,100);
m = mean(x);
trim = trimmean(x,10);
sm = std(m);
strim = std(trim);
efficiency = (sm/strim).^2
efficiency =
0.9702
See Also mean, median, geomean, harmmean
trnd
2-322
2trnd
Purpose Random numbers from Students t di stri buti on.
Syntax R = trnd(V)
R = trnd(V,m)
R = trnd(V,m,n)
Description R = trnd(V) generates random numbers from Students t di stri buti on wi th V
degrees of freedom. The si ze of R i s the si ze of V.
R = trnd(V,m) generates random numbers from Students t di stri buti on wi th
V degrees of freedom, where m i s a 1-by-2 vector that contai ns the row and
R = trnd(V,m,n) generates random numbers from Students t di stri buti on
wi th V degrees of freedom, where scal ars m and n are the row and col umn
di mensi ons of R.
Examples noisy = trnd(ones(1,6))
noisy =
19.7250 0.3488 0.2843 0.4034 0.4816 -2.4190
numbers = trnd(1:6,[1 6])
numbers =
-1.9500 -0.9611 -0.9038 0.0754 0.9820 1.0115
numbers = trnd(3,2,6)
numbers =
-0.3177 -0.0812 -0.6627 0.1905 -1.5585 -0.0433
0.2536 0.5502 0.8646 0.8060 -0.5216 0.0891
See Also tcdf, tinv, tpdf, tstat
tstat
2-323
2tstat
Purpose Mean and vari ance for the Students t di stri buti on.
Syntax [M,V] = tstat(NU)
Description [M,V] = tstat(NU) returns the mean and vari ance for Students t di stri buti on
wi th parameters speci fi ed by NU. M and V are the same si ze as NU.
The mean of the Students t di stri buti on wi th parameter i s zero for val ues of
greater than 1. I f i s one, the mean does not exi st. The vari ance for val ues of
greater than 2 i s .
Examples Fi nd the mean and vari ance for 1 to 30 degrees of freedom.
[m,v] = tstat(reshape(1:30,6,5))
m =
NaN 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
v =
NaN 1.4000 1.1818 1.1176 1.0870
NaN 1.3333 1.1667 1.1111 1.0833
3.0000 1.2857 1.1538 1.1053 1.0800
2.0000 1.2500 1.1429 1.1000 1.0769
1.6667 1.2222 1.1333 1.0952 1.0741
1.5000 1.2000 1.1250 1.0909 1.0714
Note that the vari ance does not exi st for one and two degrees of freedom.
See Also tcdf, tinv, tpdf, trnd
2 ( )
ttest
2-324
2ttest
Purpose Hypothesi s testi ng for a si ngl e sampl e mean.
Syntax h = ttest(x,m)
h = ttest(x,m,alpha)
[h,sig,ci] = ttest(x,m,alpha,tail)
Description h = ttest(x,m) performs a t-test at si gni fi cance l evel 0.05 to determi ne
whether a sampl e from a normal di stri buti on (i n x) coul d have mean m when
the standard devi ati on i s unknown.
h = ttest(x,m,alpha) gi ves control of the si gni fi cance l evel , alpha. For
exampl e i f alpha = 0.01, and the resul t, h, i s 1 you can reject the nul l
hypothesi s at the si gni fi cance l evel 0.01. I f h i s 0, you cannot reject the nul l
hypothesi s at the alpha l evel of si gni fi cance.
[h,sig,ci] = ttest(x,m,alpha,tail) al l ows speci fi cati on of one- or
two-tai l ed tests. tail i s a fl ag that speci fi es one of three al ternati ve
hypotheses:
tail = 0 speci fi es the al ternati ve (defaul t)
tail = 1 speci fi es the al ternati ve
tail = -1 speci fi es the al ternati ve
Output sig i s the p-val ue associ ated wi th the T-stati sti c
where i s the sampl e standard devi ati on and i s the number of observati ons
i n the sampl e. sig i s the probabi l i ty that the observed val ue of T coul d be as
l arge or l arger by chance under the nul l hypothesi s that the mean of x i s equal
to m.
ci i s a 1-alpha confi dence i nterval for the true mean.
Example Thi s exampl e generates 100 normal random numbers wi th theoreti cal mean
zero and standard devi ati on one. The observed mean and standard devi ati on
are di fferent from thei r theoreti cal val ues, of course. We test the hypothesi s
that there i s no true di fference.
x m
x m >
x m <
T
x m
s n
-------------- =
s n
ttest
2-325
Normal random number generator test.
x = normrnd(0,1,1,100);
[h,sig,ci] = ttest(x,0)
h =
0
sig =
0.4474
ci =
-0.1165 0.2620
The resul t h = 0 means that we cannot reject the nul l hypothesi s. The
si gni fi cance l evel i s 0.4474, whi ch means that by chance we woul d have
observed val ues of T more extreme than the one i n thi s exampl e i n 45 of 100
si mi l ar experi ments. A 95% confi dence i nterval on the mean i s
[-0.1165 0.2620], whi ch i ncl udes the theoreti cal (and hypothesi zed) mean of
zero.
ttest2
2-326
2ttest2
Purpose Hypothesi s testi ng for the di fference i n means of two sampl es.
Syntax [h,significance,ci] = ttest2(x,y)
[h,significance,ci] = ttest2(x,y,alpha)
[h,significance,ci] = ttest2(x,y,alpha,tail)
Description h = ttest2(x,y) performs a t-test to determi ne whether two sampl es from a
normal di stri buti on (i n x and y) coul d have the same mean when the standard
devi ati ons are unknown but assumed equal .
The resul t, h, i s 1 i f you can reject the nul l hypothesi s at the 0.05 si gni fi cance
l evel alpha and 0 otherwi se.
The significance i s the p-val ue associ ated wi th the T-stati sti c
where s i s the pool ed sampl e standard devi ati on and n and m are the numbers
of observati ons i n the x and y sampl es. significance i s the probabi l i ty that the
observed val ue of T coul d be as l arge or l arger by chance under the nul l
hypothesi s that the mean of x i s equal to the mean of y.
ci i s a 95% confi dence i nterval for the true di fference i n means.
[h,significance,ci] = ttest2(x,y,alpha) gi ves control of the si gni fi cance
l evel alpha. For exampl e i f alpha = 0.01, and the resul t, h, i s 1, you can reject
the nul l hypothesi s at the significance l evel 0.01. ci i n thi s case i s a
100(1-alpha)% confi dence i nterval for the true di fference i n means.
ttest2(x,y,alpha,tail) al l ows speci fi cati on of one- or two-tai l ed tests,
where tail i s a fl ag that speci fi es one of three al ternati ve hypotheses:
T
x y
s
1
n
---
1
m
----- +
----------------------- =
x

y
x

y
>
x

y
<
ttest2
2-327
Examples Thi s exampl e generates 100 normal random numbers wi th theoreti cal mean 0
and standard devi ati on 1. We then generate 100 more normal random numbers
wi th theoreti cal mean 1/2 and standard devi ati on 1. The observed means and
standard devi ati ons are di fferent from thei r theoreti cal val ues, of course. We
test the hypothesi s that there i s no true di fference between the two means.
Noti ce that the true di fference i s onl y one hal f of the standard devi ati on of the
i ndi vi dual observati ons, so we are tryi ng to detect a si gnal that i s onl y one hal f
the si ze of the i nherent noi se i n the process.
x = normrnd(0,1,100,1);
y = normrnd(0.5,1,100,1);
[h,significance,ci] = ttest2(x,y)
h =
1
significance =
0.0017
ci =
-0.7352 -0.1720
The resul t h = 1 means that we can reject the nul l hypothesi s. The
significance i s 0.0017, whi ch means that by chance we woul d have observed
val ues of t more extreme than the one i n thi s exampl e i n onl y 17 of 10,000
si mi l ar experi ments! A 95% confi dence i nterval on the mean i s
[-0.7352 -0.1720], whi ch i ncl udes the theoreti cal (and hypothesi zed) di fference
of -0.5.
unidcdf
2-328
2uni dcdf
Purpose Di screte uni form cumul ati ve di stri buti on (cdf) functi on.
Syntax P = unidcdf(X,N)
Description P = unidcdf(X,N) computes the di screte uni form cdf at each of the val ues i n X
usi ng the correspondi ng parameters i n N. Vector or matri x i nputs for X and N
the same di mensi ons as the other i nputs. The maxi mum observabl e val ues i n
N must be posi ti ve i ntegers.
The di screte uni form cdf i s
The resul t, p, i s the probabi l i ty that a si ngl e observati on from the di screte
uni form di stri buti on wi th maxi mum N wi l l be a posi ti ve i nteger l ess than or
equal to x. The val ues x do not need to be i ntegers.
Examples What i s the probabi l i ty of drawi ng a number 20 or l ess from a hat wi th the
numbers from 1 to 50 i nsi de?
probability = unidcdf(20,50)
probability =
0.4000
See Also cdf, unidinv, unidpdf, unidrnd, unidstat
p F x N ( )
fl oor x ( )
N
----------------------I
1 N , , ( )
x ( ) = =
unidinv
2-329
2uni di nv
Purpose I nverse of the di screte uni form cumul ati ve di stri buti on functi on.
Syntax X = unidinv(P,N)
Description X = unidinv(P,N) returns the smal l est posi ti ve i nteger X such that the
di screte uni form cdf eval uated at X i s equal to or exceeds P. You can thi nk of P
as the probabi l i ty of drawi ng a number as l arge as X out of a hat wi th the
numbers 1 through N i nsi de.
Vector or matri x i nputs for N and P must have the same si ze, whi ch i s al so the
si ze of X. A scal ar i nput for N or P i s expanded to a constant matri x wi th the
same di mensi ons as the other i nput. The val ues i n P must l i e on the i nterval
[0 1] and the val ues i n N must be posi ti ve i ntegers.
Examples x = unidinv(0.7,20)
x =
14
y = unidinv(0.7 + eps,20)
y =
15
A smal l change i n the fi rst parameter produces a l arge jump i n output. The cdf
and i ts i nverse are both step functi ons. The exampl e shows what happens at a
step.
See Also icdf, unidcdf, unidpdf, unidrnd, unidstat
unidpdf
2-330
2uni dpdf
Purpose Di screte uni form probabi l i ty densi ty functi on (pdf).
Syntax Y = unidpdf(X,N)
Description unidpdf(X,N) computes the di screte uni form pdf at each of the val ues i n X
usi ng the correspondi ng parameters i n N. Vector or matri x i nputs for X and N
the same di mensi ons as the other i nputs. The parameters i n N must be posi ti ve
i ntegers.
The di screte uni form pdf i s
You can thi nk of y as the probabi l i ty of observi ng any one number between 1
and n.
Examples For fi xed n, the uni form di screte pdf i s a constant.
y = unidpdf(1:6,10)
y =
0.1000 0.1000 0.1000 0.1000 0.1000 0.1000
Now fi x x, and vary n.
likelihood = unidpdf(5,4:9)
likelihood =
0 0.2000 0.1667 0.1429 0.1250 0.1111
See Also pdf, unidcdf, unidinv, unidrnd, unidstat
y f x N ( )
1
N
----I
1 N , , ( )
x ( ) = =
unidrnd
2-331
2uni drnd
Purpose Random numbers from the di screte uni form di stri buti on.
Syntax R = unidrnd(N)
R = unidrnd(N,mm)
R = unidrnd(N,mm,nn)
Description The di screte uni form di stri buti on ari ses from experi ments equi val ent to
drawi ng a number from one to N out of a hat.
R = unidrnd(N) generates di screte uni form random numbers wi th
maxi mum N. The parameters i n N must be posi ti ve i ntegers. The si ze of R i s the
si ze of N.
R = unidrnd(N,mm) generates di screte uni form random numbers wi th
maxi mum N, where mm i s a 1-by-2 vector that contai ns the row and col umn
di mensi ons of R.
R = unidrnd(N,mm,nn) generates di screte uni form random numbers wi th
maxi mum N, where scal ars mm and nn are the row and col umn di mensi ons of R.
Examples I n the Massachusetts l ottery, a pl ayer chooses a four di gi t number. Generate
random numbers for Monday through Saturday.
numbers = unidrnd(10000,1,6) - 1
numbers =
2189 470 6788 6792 9346
See Also unidcdf, unidinv, unidpdf, unidstat
unidstat
2-332
2uni dstat
Purpose Mean and vari ance for the di screte uni form di stri buti on.
Syntax [M,V] = unidstat(N)
Description [M,V] = unidstat(N) returns the mean and vari ance for the di screte uni form
di stri buti on wi th parameter N.
The mean of the di screte uni form di stri buti on wi th parameter N i s .
The vari ance i s .
Examples [m,v] = unidstat(1:6)
m =
1.0000 1.5000 2.0000 2.5000 3.0000 3.5000
v =
0 0.2500 0.6667 1.2500 2.0000 2.9167
See Also unidcdf, unidinv, unidpdf, unidrnd
N 1 + ( ) 2
N
2
1 ( ) 12
unifcdf
2-333
2uni fcdf
Purpose Conti nuous uni form cumul ati ve di stri buti on functi on (cdf).
Syntax P = unifcdf(X,A,B)
Description P = unifcdf(X,A,B) computes the uni form cdf at each of the val ues i n X usi ng
the correspondi ng parameters i n A and B (the mi ni mum and maxi mum val ues,
respecti vel y). Vector or matri x i nputs for X, A, and B must al l have the same
The uni form cdf i s
The standard uni form di stri buti on has A = 0 and B = 1.
Examples What i s the probabi l i ty that an observati on from a standard uni form
di stri buti on wi l l be l ess than 0.75?
probability = unifcdf(0.75)
probability =
0.7500
What i s the probabi l i ty that an observati on from a uni form di stri buti on wi th
a = -1 and b = 1 wi l l be l ess than 0.75?
probability = unifcdf(0.75,-1,1)
probability =
0.8750
See Also cdf, unifinv, unifit, unifpdf, unifrnd, unifstat
p F x a b , ( )
x a
b a
------------I
a b , [ ]
x ( ) = =
unifinv
2-334
2uni fi nv
Purpose I nverse conti nuous uni form cumul ati ve di stri buti on functi on (cdf).
Syntax X = unifinv(P,A,B)
Description X = unifinv(P,A,B) computes the i nverse of the uni form cdf wi th parameters
A and B (the mi ni mum and maxi mum val ues, respecti vel y) at the correspondi ng
probabi l i ti es i n P. Vector or matri x i nputs for P, A, and B must al l have the same
The i nverse of the uni form cdf i s
Examples What i s the medi an of the standard uni form di stri buti on?
median_value = unifinv(0.5)
median_value =
0.5000
What i s the 99th percenti l e of the uni form di stri buti on between -1 and 1?
percentile = unifinv(0.99,-1,1)
percentile =
0.9800
See Also icdf, unifcdf, unifit, unifpdf, unifrnd, unifstat
x F
1
p a b , ( ) a p a b ( )I
0 1 , [ ]
p ( ) + = =
unifit
2-335
2uni fi t
Purpose Parameter esti mates for uni forml y di stri buted data.
Syntax [ahat,bhat] = unifit(X)
[ahat,bhat,ACI,BCI] = unifit(X)
[ahat,bhat,ACI,BCI] = unifit(X,alpha)
Description [ahat,bhat] = unifit(X) returns the maxi mum l i kel i hood esti mates (MLEs)
of the parameters of the uni form di stri buti on gi ven the data i n X.
[ahat,bhat,ACI,BCI] = unifit(X) al so returns 95% confi dence i nterval s,
ACI and BCI, whi ch are matri ces wi th two rows. The fi rst row contai ns the
l ower bound of the i nterval for each col umn of the matri x X. The second row
contai ns the upper bound of the i nterval .
[ahat,bhat,ACI,BCI] = unifit(X,alpha) al l ows control of the confi dence
l evel alpha. For exampl e, i f alpha = 0.01 then ACI and BCI are 99% confi dence
i nterval s.
Example r = unifrnd(10,12,100,2);
[ahat,bhat,aci,bci] = unifit(r)
ahat =
10.0154 10.0060
bhat =
11.9989 11.9743
aci =
9.9551 9.9461
10.0154 10.0060
bci =
11.9989 11.9743
12.0592 12.0341
See Also betafit, binofit, expfit, gamfit, normfit, poissfit, unifcdf, unifinv,
unifpdf, unifrnd, unifstat, weibfit
unifpdf
2-336
2uni fpdf
Purpose Conti nuous uni form probabi l i ty densi ty functi on (pdf).
Syntax Y = unifpdf(X,A,B)
Description Y = unifpdf(X,A,B) computes the conti nuous uni form pdf at each of the
val ues i n X usi ng the correspondi ng parameters i n A and B. Vector or matri x
i nputs for X, A, and B must al l have the same si ze. A scal ar i nput i s expanded
to a constant matri x wi th the same di mensi ons as the other i nputs. The
parameters i n B must be greater than those i n A.
The conti nuous uni form di stri buti on pdf i s
Examples For fi xed a and b, the uni form pdf i s constant.
x = 0.1:0.1:0.6;
y = unifpdf(x)
y =
1 1 1 1 1 1
What i f x i s not between a and b?
y = unifpdf(-1,0,1)
y =
0
See Also pdf, unifcdf, unifinv, unifrnd, unifstat
y f x a b , ( )
1
b a
------------I
a b , [ ]
x ( ) = =
unifrnd
2-337
2uni frnd
Purpose Random numbers from the conti nuous uni form di stri buti on.
Syntax R = unifrnd(A,B)
R = unifrnd(A,B,m)
R = unifrnd(A,B,m,n)
Description R = unifrnd(A,B) generates uni form random numbers wi th parameters A
R = unifrnd(A,B,m) generates uni form random numbers wi th parameters A
of R.
R = unifrnd(A,B,m,n) generates uni form random numbers wi th parameters
A and B, where scal ars m and n are the row and col umn di mensi ons of R.
Examples random = unifrnd(0,1:6)
random =
0.2190 0.0941 2.0366 2.7172 4.6735 2.3010
random = unifrnd(0,1:6,[1 6])
random =
0.5194 1.6619 0.1037 0.2138 2.6485 4.0269
random = unifrnd(0,1,2,3)
random =
0.0077 0.0668 0.6868
0.3834 0.4175 0.5890
See Also unifcdf, unifinv, unifpdf, unifstat
unifstat
2-338
2uni fstat
Purpose Mean and vari ance for the conti nuous uni form di stri buti on.
Syntax [M,V] = unifstat(A,B)
Description [M,V] = unifstat(A,B) returns the mean and vari ance for the conti nuous
uni form di stri buti on wi th parameters speci fi ed by A and B. Vector or matri x
i nputs for A and B must have the same si ze, whi ch i s al so the si ze of M and V. A
scal ar i nput for A or B i s expanded to a constant matri x wi th the same
The mean of the conti nuous uni form di stri buti on wi th parameters a and b i s
, and the vari ance i s .
Examples a = 1:6;
b = 2.a;
[m,v] = unifstat(a,b)
m =
1.5000 3.0000 4.5000 6.0000 7.5000 9.0000
v =
0.0833 0.3333 0.7500 1.3333 2.0833 3.0000
See Also unifcdf, unifinv, unifpdf, unifrnd
a b + ( ) 2 b a ( )
2
12
var
2-339
2var
Purpose Vari ance of a sampl e.
Syntax y = var(X)
y = var(X,1)
y = var(X,w)
Description y = var(X) computes the vari ance of the data i n X. For vectors, var(x) i s the
vari ance of the el ements i n x. For matri ces, var(X) i s a row vector contai ni ng
the vari ance of each col umn of X.
y = var(x) normal i zes by n-1 where n i s the sequence l ength. For normal l y
di stri buted data, thi s makes var(x) the mi ni mum vari ance unbi ased esti mator
MVUE of
2
(the second parameter).
y = var(x,1) normal i zes by n and yi el ds the second moment of the sampl e
data about i ts mean (moment of i nerti a).
y = var(X,w) computes the vari ance usi ng the vector of posi ti ve wei ghts w.
The number of el ements i n w must equal the number of rows i n the matri x X.
For vector x, w and x must match i n l ength.
var supports both common defi ni ti ons of vari ance. Let SS be the sum of
the squared devi ati ons of the el ements of a vector x from thei r mean. Then,
var(x) = SS/(n-1) i s the MVUE, and var(x,1) = SS/n i s the maxi mum
l i kel i hood esti mator (MLE) of
2
.
var
2-340
Examples x = [-1 1];
w = [1 3];
v1 = var(x)
v1 =
2
v2 = var(x,1)
v2 =
1
v3 = var(x,w)
v3 =
0.7500
See Also cov, std
weibcdf
2-341
2wei bcdf
Purpose Wei bul l cumul ati ve di stri buti on functi on (cdf).
Syntax P = weibcdf(X,A,B)
Description P = weibcdf(X,A,B) computes the Wei bul l cdf at each of the val ues i n X usi ng
the correspondi ng parameters i n A and B. Vector or matri x i nputs for X, A, and
B must al l have the same si ze. A scal ar i nput i s expanded to a constant matri x
be posi ti ve.
The Wei bul l cdf i s
Examples What i s the probabi l i ty that a val ue from a Wei bul l di stri buti on wi th
parameters a = 0.15 and b = 0.24 i s l ess than 500?
probability = weibcdf(500,0.15,0.24)
probability =
0.4865
How sensi ti ve i s thi s resul t to smal l changes i n the parameters?
[A,B] = meshgrid(0.1:0.05:0.2,0.2:0.05:0.3);
probability = weibcdf(500,A,B)
probability =
0.2929 0.4054 0.5000
0.3768 0.5080 0.6116
0.4754 0.6201 0.7248
See Also cdf, weibfit, weibinv, weiblike, weibpdf, weibplot, weibrnd, weibstat
p F x a b , ( ) abt
b 1
e
at
b
t d
0
x
1 e
ax
b
I
0 , ( )
x ( ) = = =
weibfit
2-342
2wei bfi t
Purpose Parameter esti mates and confi dence i nterval s for Wei bul l data.
Syntax phat = weibfit(x)
[phat,pci] = weibfit(x)
[phat,pci] = weibfit(x,alpha)
Description phat = weibfit(x) returns the maxi mum l i kel i hood esti mates, phat, of the
parameters of the Wei bul l di stri buti on gi ven the val ues i n vector x, whi ch must
be posi ti ve. phat i s a two-el ement row vector: phat(1) esti mates the Wei bul l
parameter a, and phat(2) esti mates the Wei bul l parameter b i n the pdf
[phat,pci] = weibfit(x) al so returns 95% confi dence i nterval s i n the
two-row matri x pci. The fi rst row contai ns the l ower bound of the confi dence
i nterval , and the second row contai ns the upper bound. The col umns of pci
correspond to the col umns of phat.
[phat,pci] = weibfit(x,alpha) al l ows control over the confi dence i nterval
returned, 100(1-alpha)%.
Example r = weibrnd(0.5,0.8,100,1);
[phat,pci] = weibfit(r)
phat =
0.4746 0.7832
pci =
0.3851 0.6367
0.5641 0.9298
See Also betafit, binofit, expfit, gamfit, normfit, poissfit, unifit, weibcdf,
weibinv, weiblike, weibpdf, weibplot, weibrnd, weibstat
y f x a b , ( ) abx
b 1
e
ax
b
I
0 , ( )
x ( ) = =
weibinv
2-343
2wei bi nv
Purpose I nverse of the Wei bul l cumul ati ve di stri buti on functi on.
Syntax X = weibinv(P,A,B)
Description X = weibinv(P,A,B) computes the i nverse of the Wei bul l cdf wi th parameters
A and B for the correspondi ng probabi l i ti es i n P. Vector or matri x i nputs for P,
A, and B must al l have the same si ze. A scal ar i nput i s expanded to a constant
matri x wi th the same di mensi ons as the other i nputs. The parameters i n A and
B must be posi ti ve.
The i nverse of the Wei bul l cdf i s
Examples A batch of l i ght bul bs have l i feti mes (i n hours) di stri buted Wei bul l wi th
parameters a = 0.15 and b = 0.24. What i s the medi an l i feti me of the bul bs?
life = weibinv(0.5,0.15,0.24)
life =
588.4721
What i s the 90th percenti l e?
life = weibinv(0.9,0.15,0.24)
life =
8.7536e+04
See Also icdf, weibcdf, weibfit, weiblike, weibpdf, weibplot, weibrnd, weibstat
x F
1
p a b , ( )
1
a
---
1
1 p
------------
,
_
l n
1
b
---
I
0 1 , [ ]
p ( ) = =
weiblike
2-344
2wei bl i ke
Purpose Wei bul l negati ve l og-l i kel i hood functi on.
Syntax logL = weiblike(params,data)
[logL,avar] = weiblike(params,data)
Description logL = weiblike(params,data) returns the Wei bul l l og-l i kel i hood wi th
parameters params(1) = a and params(2) = b gi ven the data x
i
.
[logL,avar] = weiblike(params,data) al so returns avar, whi ch i s the
asymptoti c vari ance-covari ance matri x of the parameter esti mates i f the
The Wei bul l negati ve l og-l i kel i hood i s
weiblike i s a uti l i ty functi on for maxi mum l i kel i hood esti mati on.
Example Thi s exampl e conti nues the exampl e from weibfit.
r = weibrnd(0.5,0.8,100,1);
[logL,info] = weiblike([0.4746 0.7832],r)
logL =
203.8216
info =
0.0021 0.0022
0.0022 0.0056
Reference Patel , J. K., C. H. Kapadi a, and D. B. Owen, Handbook of Statistical
Distributions, Marcel -Dekker, 1976.
See Also betalike, gamlike, mle, weibcdf, weibfit, weibinv, weibpdf, weibplot,
weibrnd, weibstat
L l og f a b , x
i
( )
i 1 =
l og f a b , x
i
( ) l og
i 1 =
n
= =
weibpdf
2-345
2wei bpdf
Purpose Wei bul l probabi l i ty densi ty functi on (pdf).
Syntax Y = weibpdf(X,A,B)
Description Y = weibpdf(X,A,B) computes the Wei bul l pdf at each of the val ues i n X usi ng
the correspondi ng parameters i n A and B. Vector or matri x i nputs for X, A, and
B must al l have the same si ze. A scal ar i nput i s expanded to a constant matri x
wi th the same di mensi ons as the other i nput. The parameters i n A and B must
al l be posi ti ve.
The Wei bul l pdf i s
Some references refer to the Wei bul l di stri buti on wi th a si ngl e parameter. Thi s
corresponds to weibpdf wi th A = 1.
Examples The exponenti al di stri buti on i s a speci al case of the Wei bul l di stri buti on.
lambda = 1:6;
y = weibpdf(0.1:0.1:0.6,lambda,1)
y =
0.9048 1.3406 1.2197 0.8076 0.4104 0.1639
y1 = exppdf(0.1:0.1:0.6,1./lambda)
y1 =
0.9048 1.3406 1.2197 0.8076 0.4104 0.1639
Reference Devroye, L., Non-Uniform Random Variate Generation. Spri nger-Verl ag. New
York, 1986.
See Also pdf, weibcdf, weibfit, weibinv, weiblike, weibplot, weibrnd, weibstat
y f x a b , ( ) abx
b 1
e
ax
b
I
0 , ( )
x ( ) = =
weibplot
2-346
2wei bpl ot
Purpose Wei bul l probabi l i ty pl ot.
Syntax weibplot(X)
h = weibplot(X)
Description weibplot(X) di spl ays a Wei bul l probabi l i ty pl ot of the data i n X. I f X i s a
matri x, weibplot di spl ays a pl ot for each col umn.
h = weibplot(X) returns handl es to the pl otted l i nes.
The purpose of a Wei bul l probabi l i ty pl ot i s to graphi cal l y assess whether the
data i n X coul d come from a Wei bul l di stri buti on. I f the data are Wei bul l the
pl ot wi l l be l i near. Other di stri buti on types may i ntroduce curvature i n the
pl ot.
Example r = weibrnd(1.2,1.5,50,1);
weibplot(r)
See Also normplot, weibcdf, weibfit, weibinv, weiblike, weibpdf, weibrnd, weibstat
10
-1
10
0
0.01
0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.96
0.99
Data
P
r
o
b
a
b
i
l
i
t
y
Weibull Probability Plot
weibrnd
2-347
2wei brnd
Purpose Random numbers from the Wei bul l di stri buti on.
Syntax R = weibrnd(A,B)
R = weibrnd(A,B,m)
R = weibrnd(A,B,m,n)
Description R = weibrnd(A,B) generates Wei bul l random numbers wi th parameters A
R = weibrnd(A,B,m) generates Wei bul l random numbers wi th parameters A
of R.
R = weibrnd(A,B,m,n) generates Wei bul l random numbers wi th parameters
A and B, where scal ars m and n are the row and col umn di mensi ons of R.
Devroye refers to the Wei bul l di stri buti on wi th a si ngl e parameter; thi s i s
weibrnd wi th A = 1.
Examples n1 = weibrnd(0.5:0.5:2,0.5:0.5:2)
n1 =
0.0093 1.5189 0.8308 0.7541
n2 = weibrnd(1/2,1/2,[1 6])
n2 =
29.7822 0.9359 2.1477 12.6402 0.0050 0.0121
Reference Devroye, L., Non-Uniform Random Variate Generation. Spri nger-Verl ag. New
York, 1986.
See Also weibcdf, weibfit, weibinv, weiblike, weibpdf, weibplot, weibstat
weibstat
2-348
2wei bstat
Purpose Mean and vari ance for the Wei bul l di stri buti on.
Syntax [M,V] = weibstat(A,B)
Description [M,V] = weibstat(A,B) returns the mean and vari ance for the Wei bul l
A and B must have the same si ze, whi ch i s al so the si ze of M and V. A scal ar i nput
for A or B i s expanded to a constant matri x wi th the same di mensi ons as the
other i nput.
The mean of the Wei bul l di stri buti on wi th parameters a and b i s
Examples [m,v] = weibstat(1:4,1:4)
m =
1.0000 0.6267 0.6192 0.6409
v =
1.0000 0.1073 0.0506 0.0323
weibstat(0.5,0.7)
ans =
3.4073
See Also weibcdf, weibfit, weibinv, weiblike, weibpdf, weibplot, weibrnd
a
1
b
---
1 b
1
+ ( )
a
2
b
---
1 2b
1
+ ( )
2
1 b
1
+ ( )
x2fx
2-349
2x2fx
Purpose Transform a factor setti ngs matri x to a desi gn matri x.
Syntax D = x2fx(X)
D = x2fx(X,'model')
Description D = x2fx(X) transforms a matri x of system i nputs, X, to a desi gn matri x for a
l i near addi ti ve model wi th a constant term.
D = x2fx(X,'model') al l ows control of the order of the regressi on
model .'model' can be one of these stri ngs:
Al ternati vel y model can be a matri x of terms. I n thi s case, each row of model
represents one term. The val ue i n a col umn i s the exponent to whi ch the same
col umn i n X for that term shoul d be rai sed. Thi s al l ows for model s wi th
pol ynomi al terms of arbi trary order.
x2fx i s a uti l i ty functi on for rstool, regstats, and cordexch.
Example x = [1 2 3;4 5 6]'; model = 'quadratic';
D = x2fx(x,model)
D =
1 1 4 4 1 16
1 2 5 10 4 25
1 3 6 18 9 36
Let x
1
be the fi rst col umn of x and x
2
be the second. Then the fi rst col umn of D
i s the constant term, the second col umn i s x
1
, the thi rd col umn i s x
2
, the fourth
col umn i s x
1
x
2
, the fi fth col umn i s x
1
2
, and the l ast col umns i s x
2
2
.
See Also rstool, cordexch, rowexch, regstats
xbarplot
2-350
2xbarpl ot
Purpose X-bar chart for Stati sti cal Process Control .
Syntax xbarplot(DATA)
xbarplot(DATA,conf)
xbarplot(DATA,conf,specs,'sigmaest')
[outlier,h] = xbarplot(...)
Description xbarplot(DATA) di spl ays an x-bar chart of the grouped responses i n DATA. The
rows of DATA contai n repl i cate observati ons taken at a gi ven ti me, and must be
i n ti me order. The graph contai ns the sampl e mean for each group, a center
l i ne at the average val ue, and upper and l ower control l i mi ts. The l i mi ts are
pl aced at a three-si gma di stance on ei ther si de of the center l i ne, where si gma
i s an esti mate of the standard devi ati on of . I f the process i s i n control , fewer
than 3 out of 1000 observati ons woul d be expected to fal l outsi de the control
l i mi ts by random chance. So i f you observe poi nts outsi de the l i mi ts, you can
take thi s as evi dence that the process i s not i n control .
xbarplot(DATA,conf) al l ows control of the confi dence l evel of the upper and
l ower pl otted confi dence l i mi ts. The defaul t conf = 0.9973 produces
three-si gma l i mi ts.
norminv(1 - (1-.9973)/2)
ans =
3
To get k-si gma l i mi ts, use the expressi on 1-2*(1-normcdf(k)). For exampl e,
the correct conf val ue for 2-si gma l i mi ts i s 0.9545, as shown bel ow.
k = 2;
1-2*(1-normcdf(k))
ans =
0.9545
xbarplot(DATA,conf,specs) pl ots the speci fi cati on l i mi ts i n the two el ement
vector specs.
x
x
x
xbarplot
2-351
xbarplot(DATA,conf,specs,'sigmaest') speci fi es how xbarplot shoul d
esti mate the standard devi ati on. Acceptabl e val ues are:
's' use the average of the group standard devi ati ons (defaul t)
'v' use the square root of a pool ed vari ance esti mate
'r' use the average range wi th each group; requi res 25 or fewer
observati ons per group
[outlier,h] = xbarplot(DATA,conf,specs) returns outlier, a vector of
i ndi ces to the rows where the mean of DATA i s out of control , and h, a vector of
handl es to the pl otted l i nes.
Example Pl ot an x-bar chart of measurements on newl y machi ned parts, taken at one
hour i nterval s for 36 hours. Each row of the runout matri x contai ns the
measurements for four parts chosen at random. The val ues i ndi cate, i n
thousandths of an i nch, the amount the part radi us di ffers from the target
radi us.
load parts
xbarplot(runout,0.999,[-0.5 0.5])
The poi nts i n groups 21 and 25 are out of control , so the mean i n those groups
was hi gher than woul d be expected by random chance al one. There i s evi dence
that the process was not i n control when those measurements were col l ected.
0 5 10 15 20 25 30 35 40
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
21
25
UCL
LCL
CL
Xbar Chart
USL
LSL
Samples
M
e
a
s
u
r
e
m
e
n
t
s
xbarplot
2-352
See Also capaplot, histfit, ewmaplot, schart
zscore
2-353
2zscore
Purpose Standardi zed Z score.
Syntax Z = zscore(D)
Description Z = zscore(D) returns the devi ati on of each col umn of D from i ts mean,
normal i zed by i ts standard devi ati on. Thi s i s known as the Z score of D.
For col umn vector V, the Z score i s Z = (V-mean(V))./std(V).
ztest
2-354
2ztest
Purpose Hypothesi s testi ng for the mean of one sampl e wi th known vari ance.
Syntax h = ztest(x,m,sigma)
h = ztest(x,m,sigma,alpha)
[h,sig,ci,zval] = ztest(x,m,sigma,alpha,tail)
Description h = ztest(x,m,sigma) performs a Z test at si gni fi cance l evel 0.05 to
determi ne whether a sampl e x from a normal di stri buti on wi th standard
devi ati on sigma coul d have mean m.
h = ztest(x,m,sigma,alpha) gi ves control of the si gni fi cance l evel alpha. For
exampl e, i f alpha = 0.01 and the resul t i s h = 1, you can reject the nul l
hypothesi s at the si gni fi cance l evel 0.01. I f h = 0, you cannot reject the nul l
hypothesi s at the alpha l evel of si gni fi cance.
[h,sig,ci] = ztest(x,m,sigma,alpha,tail) al l ows speci fi cati on of one- or
two-tai l ed tests, where tail i s a fl ag that speci fi es one of three al ternati ve
hypotheses:
zval i s the val ue of the Z stati sti c
where i s the number of observati ons i n the sampl e.
sig i s the probabi l i ty that the observed val ue of Z coul d be as l arge or l arger by
chance under the nul l hypothesi s that the mean of x i s equal to m.
ci i s a 1-alpha confi dence i nterval for the true mean.
x m
x m >
x m <
z
x m
n
--------------- =
n
ztest
2-355
Example Thi s exampl e generates 100 normal random numbers wi th theoreti cal mean
zero and standard devi ati on one. The observed mean and standard devi ati on
are di fferent from thei r theoreti cal val ues, of course. We test the hypothesi s
that there i s no true di fference.
x = normrnd(0,1,100,1);
m = mean(x)
m =
0.0727
[h,sig,ci] = ztest(x,0,1)
h =
0
sig =
0.4669
ci =
-0.1232 0.2687
The resul t, h = 0, means that we cannot reject the nul l hypothesi s. The
si gni fi cance l evel i s 0.4669, whi ch means that by chance we woul d have
observed val ues of Z more extreme than the one i n thi s exampl e i n 47 of 100
si mi l ar experi ments. A 95% confi dence i nterval on the mean i s
[-0.1232 0.2687], whi ch i ncl udes the theoreti cal (and hypothesi zed) mean of
zero.
ztest
2-356
I-1
Index
A
absol ute devi ati on 1-45
addi ti ve effects 1-73
al ternati ve hypothesi s 1-105
anal ysi s of vari ance 1-23
mul ti vari ate 1-122
N-way 1-76
one-way 1-69
two-way 1-73
ANOVA 1-68
anova1 2-17
anova2 2-23
anovan 2-27
aoctool 2-33
aoctool demo 1-161
average l i nkage 2-179
B
bacteri a counts 1-69
barttest 2-36
basebal l odds 2-45, 2-47
Bera-Jarque. See Jarque-Bera
Bernoul l i random vari abl es 2-49
beta di stri buti on 1-13
betacdf 2-37
betafit 2-38
betainv 2-40
betalike 2-41
betapdf 2-42
betarnd 2-43
betastat 2-44
binocdf 2-45
binofit 2-46
binoinv 2-47
bi nomi al di stri buti on 1-15
negati ve 1-31
binopdf 2-48
binornd 2-49
binostat 2-50
bootstrap 2-51
bootstrap sampl i ng 1-50
box pl ots 1-128
boxplot 2-54
C
capabi l i ty studi es 1-141
capable 2-56
capaplot 2-58
casenames
readi ng from fi l e 2-60
wri ti ng to fi l e 2-61
caseread 2-60
casewrite 2-61
cdf 1-7
cdf 2-62
cdfplot 2-63
Central Li mi t Theorem 1-32
centroi d l i nkage 2-179
Chatterjee and Hadi exampl e 1-85
chi2cdf 2-65
chi2inv 2-66
chi2pdf 2-67
chi2rnd 2-68
chi2stat 2-69
chi -square di stri buti ons 1-17
ci rcui t boards 2-48
Ci ty Bl ock metri c
i n cl uster anal ysi s 2-256
classify 2-70
cluster 2-71
Index
I-2
cl uster anal ysi s 1-53
computi ng i nconsi stency coeffi ci ent 1-60,
2-155
creati ng cl usters from data 2-73
creati ng cl usters from linkage output 1-64,
2-71
creati ng the cl uster tree 1-56, 2-178
determi ni ng proxi mi ty 1-54, 2-255
eval uati ng cl uster formati on 1-59, 2-76
formatti ng di stance i nformati on 1-56, 2-308
overvi ew 1-53
pl otti ng the cl uster tree 1-58, 2-85
clusterdata 2-73
coi n 2-122
combnk 2-75
compari sons, mul ti pl e 1-71
compl ete l i nkage 2-179
confi dence i nterval s
hypothesi s tests 1-106
nonl i near regressi on 1-103
control charts 1-138
EWMA charts 1-140
S charts 1-139
Xbar charts 1-138
cophenet 2-76
usi ng 1-59
copheneti c correl ati on coeffi ci ent 2-76
defi ned 1-59
cordexch 2-78
corrcoef 2-79
cov 2-80
Cp i ndex 1-142, 2-56
Cpk i ndex 1-142, 2-56
crosstab 2-81
cumul ati ve di stri buti on functi on (cdf) 1-7
graphi ng an esti mate 1-134
D
data 2-3
ASCI I for tblread exampl e 2-16
bacteri a counts 2-15
car mi l eage 2-15
cl assi fi cati on 2-15
di mensi onal runout 2-15
gasol i ne pri ces 2-15
GPA versus LSAT 2-15
Hal d 2-15
polytool demo 2-16
popcorn 2-15
reacti on ki neti cs 2-16
regressi on wi th fi ve factors 2-15
U.S. census 2-15
U.S. ci ti es 2-15
daugment 2-83
dcovary 2-84
demos 1-153, 2-3
desi gn of experi ments 1-170
pol ynomi al curve fi tti ng 1-156
probabi l i ty di stri buti ons 1-154
random number generati on 1-169
dendrogram 2-85, 2-194
usi ng 1-58
depth
descri pti ve stati sti cs 1-43, 2-3
Desi gn of Experi ments 1-143
D-opti mal desi gns 1-147
fracti onal factori al desi gns 1-145
ful l factori al desi gns 1-144
di screte uni form di stri buti on 1-20
di ssi mi l ari ty matri x
creati ng 1-54
di stri buti ons 1-2, 1-5
disttool 2-87
Index
I-3
disttool demo 1-154
DOE. See Desi gn of Experi ments
D-opti mal desi gns 1-147
dummyvar 2-88
E
erf 1-32
error functi on 1-32
errorbar 2-89
esti mate 1-157
Eucl i dean di stance
EWMA charts 1-140
ewmaplot 2-90
expcdf 2-92
expfit 2-93
expinv 2-94
exponenti al di stri buti on 1-21
exppdf 2-95
exprnd 2-96
expstat 2-97
extrapol ated 2-272
F
F di stri buti ons 1-23
F stati sti c 1-85
factori al desi gns
fracti onal 1-145
ful l 1-144
fcdf 2-98
ff2n 2-99
fi l e I /O 2-3
finv 2-100
fl oppy di sks 2-149
fpdf 2-101
fracfact 2-102
friedman 2-106
Fri edmans test 1-97
frnd 2-110
fstat 2-111
fsurfht 2-112
fullfact 2-114
furthest nei ghbor l i nkage 2-179
G
gamcdf 2-115
gamfit 2-116
gaminv 2-117
gamlike 2-118
gamma di stri buti on 1-25
gampdf 2-119
gamrnd 2-120
gamstat 2-121
Gaussi an 2-146
geocdf 2-122
geoinv 2-123
geomean 2-124
geometri c di stri buti on 1-27
geopdf 2-125
geornd 2-126
geostat 2-127
gline 2-128
glmdemo 2-129
glmdemo demo 1-172
glmfit 2-130
glmval 2-135
gname 2-137
gplotmatrix 2-139
group mean cl usters, pl ot 1-127
grouped pl ot matri x 1-122
grpstats 2-142
Index
I-4
gscatter 2-143
Gui nness beer 1-37, 2-316
H
harmmean 2-145
hat matri x 1-83
hist 2-146
histfit 2-147
hi stogram 1-169
Hotel l i ngs T squared 1-121
hougen 2-148
Hougen-Watson model 1-100
hygecdf 2-149
hygeinv 2-150
hygepdf 2-151
hygernd 2-152
hygestat 2-153
hypergeometri c di stri buti on 1-28
hypotheses 1-23, 2-3
hypothesi s tests 1-105
I
icdf 2-154
i ncompl ete beta functi on 1-13
i ncompl ete gamma functi on 1-25
i nconsi stency coeffi ci ent 1-61
inconsistent 2-155
usi ng 1-61
i nspector 2-259
i nteracti on 1-74
i nterpol ated 2-311
i nterquarti l e range (i qr) 1-45
i nverse cdf 1-8
iqr 2-157
J
Jarque-Bera test 2-158
jbtest 2-158
K
kruskalwallis 2-160
Kruskal -Wal l i s test 1-97
kstest 2-164
kstest2 2-169
kurtosis 2-172
L
l east squares 2-267
leverage 2-174
l i ght bul bs, l i fe of 2-94
l i kel i hood functi on 2-42
Li l l i efors test 1-107
lillietest 2-175
l i near 2-3
l i near model s 1-68
general i zed 1-91
linkage 2-178
usi ng 1-56
logncdf 2-181
logninv 2-182
l ognormal di stri buti on 1-30
lognpdf 2-184
lognrnd 2-185
lognstat 2-186
l ottery 2-331
lsline 2-187
LU factori zati ons 2-266
Index
I-5
M
mad 2-188
mahal 2-189
Mahal anobi s di stance 2-189
manova1 2-190
manovacluster 2-194
mean 1-11
mean 2-196
Mean Squares (MS) 2-17
measures of
central tendency 1-43
di spersi on 1-45
median 2-197
Mi nkowski metri c
mle 2-198
model s
l i near 1-68
nonl i near 1-100
moment 2-199
Monte Carl o si mul ati on 2-157
multcompare 2-200
mul ti pl e l i near regressi on 1-82
mul ti vari ate stati sti cs 1-112
mvnrnd 2-207
mvtrnd 2-208
N
nanmax 2-209
nanmean 2-210
nanmedian 2-211
nanmin 2-212
NaNs 1-46
nanstd 2-213
nansum 2-214
nbincdf 2-215
nbininv 2-216
nbinpdf 2-217
nbinrnd 2-218
nbinstat 2-219
ncfcdf 2-220
ncfinv 2-222
ncfpdf 2-223
ncfrnd 2-224
ncfstat 2-225
nctcdf 2-226
nctinv 2-227
nctpdf 2-228
nctrnd 2-229
nctstat 2-230
ncx2cdf 2-231
ncx2inv 2-233
ncx2pdf 2-234
ncx2rnd 2-235
ncx2stat 2-236
nearest nei ghbor l i nkage 2-179
Newtons method 2-117
nlinfit 2-237
nlintool 2-238
nlintool demo 1-104
nlparci 2-239
nlpredci 2-240
noncentral F di stri buti on 1-24
nonl i near 2-3
nonl i near regressi on model s 1-100
normal di stri buti on 1-32
normal probabi l i ty pl ots 1-128, 1-129
normal i zi ng a dataset 1-55
usi ng zscore 2-353
normcdf 2-242
normdemo 2-249
normfit 2-243
Index
I-6
norminv 2-244
normpdf 2-245
normplot 2-246
normrnd 2-248
normstat 2-250
notati on, mathemati cal conventi ons xvi i
notches 2-54
nul l 1-105
nul l hypothesi s 1-105
O
one-way anal ysi s of vari ance (ANOVA) 1-68
outl i ers 1-44
P
pareto 2-251
Pascal , Bl ai se 1-15
PCA. See Pri nci pal Components Anal ysi s
pcacov 2-252
pcares 2-253
pdf 1-6
pdf 2-254
pdist 2-255
usi ng 1-54
percenti l es 1-49
perms 2-258
pl ots 1-49, 2-3
poisscdf 2-259
poissfit 2-261
poissinv 2-262
Poi sson di stri buti on 1-34
poisspdf 2-263
poissrnd 2-264
poisstat 2-265
polyconf 2-266
polyfit 2-267
pol ynomi al 1-156
polytool 2-268
polytool demo 1-156
polyval 2-269
popcorn 2-25, 2-108
prctile 2-270
Pri nci pal Components Anal ysi s 1-112
component scores 1-117
component vari ances 1-120
Hotel l i ngs T squared 1-121
Scree pl ot 1-120
princomp 2-271
probabi l i ty 2-3
probabi l i ty densi ty functi on (pdf) 1-6
probabi l i ty di stri buti ons 1-5
p-val ue 1-75, 1-106
Q
qqplot 2-272
QR decomposi ti on 1-83
qual i ty assurance 2-48
quanti l e-quanti l e pl ots 1-128, 1-131
R
random 2-274
random number generator 1-9
random numbers 1-9
randtool 2-87, 2-275
randtool demo 1-169
range 2-276
ranksum 2-277
raylcdf 2-278
raylinv 2-279
raylpdf 2-280
Index
I-7
raylrnd 2-281
raylstat 2-282
rcoplot 2-283
refcurve 2-284
reference l i nes 1-154
references 1-175
refline 2-285
regress 2-286
regressi on 1-23
nonl i near 1-100
robust 1-95
stepwi se 1-88
regstats 2-288
rel ati ve effi ci ency 2-157
resi dual s 1-86
Response Surface Methodol ogy (RSM) 1-86
ridge 2-290
robust 1-44
robust l i near fi t 2-272
robustdemo 2-292
robustdemo demo 1-172
robustfit 2-293
rowexch 2-297
rsmdemo 2-298
rsmdemo demo 1-170
R-square 1-85
rstool 2-299
rstool demo 1-86
S
S charts 1-139
scatter pl ots 1-135
grouped 1-122
schart 2-300
Scree pl ot 1-120
segmentati on anal ysi s 1-53
si gni fi cance l evel 1-105
signrank 2-302
signtest 2-304
si mi l ari ty matri x
creati ng 1-54
si mul ati on 2-157
si ngl e l i nkage 2-179
skewness 1-129
skewness 2-306
SPC. See Stati sti cal Process Control
squareform 2-308
standard normal 2-245
Standardi zed Eucl i dean di stance
stati sti cal pl ots 1-128
Stati sti cal Process Control
capabi l i ty studi es 1-141
control charts 1-138
stati sti cal references 1-175
stati sti cal l y si gni fi cant 2-17, 2-160, 2-190
stepwise 1-88, 2-310
stepwi se regressi on 1-88
Sum of Squares (SS) 2-17
surfht 2-311
symmetri c 2-115
T
t di stri buti ons 1-37
noncentral 1-38
tab-del i mi ted data
tabul ar data
tabulate 2-312
taxonomy anal ysi s 1-53
tblread 2-313
Index
I-8
tblwrite 2-315
tcdf 2-316
tdfread 2-317
tinv 2-319
tpdf 2-320
trimmean 2-321
trnd 2-322
tstat 2-323
ttest 2-324
ttest2 2-326
two-way ANOVA 1-73
typographi cal conventi ons (tabl e) xvi i i
U
unbi ased 2-309, 2-339
unidcdf 2-328
unidinv 2-329
unidpdf 2-330
unidrnd 2-331
unidstat 2-332
unifcdf 2-333
unifinv 2-334
unifit 2-335
uni form di stri buti on 1-39
unifpdf 2-336
unifrnd 2-337
unifstat 2-338
V
var 2-339
vari ance 1-11
W
ward l i nkage 2-180
weibcdf 2-341
weibfit 2-342
weibinv 2-343
weiblike 2-344
weibpdf 2-345
weibplot 2-346
weibrnd 2-347
weibstat 2-348
Wei bul l di stri buti on 1-40
Wei bul l probabi l i ty pl ots 1-133
Wei bul l , Wal oddi 1-40
whi skers 1-129, 2-54
X
x2fx 2-349
Xbar charts 1-138
xbarplot 2-350
Z
zscore 2-353
ztest 2-354

Statistics Toolbox: User's Guide

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Toolbox: User's Guide

Uploaded by

Copyright:

Available Formats

Computa tion

Visua liza tion

Probabi li ty Di stri buti ons

You might also like