You are on page 1of 369

# Preface

This solution manual was prepared as an aid for instrctors who wil benefit by

having solutions available. In addition to providing detailed answers to most of the
problems in the book, this manual can help the instrctor determne which of the
problems are most appropriate for the class.
The vast majority of the problems have been solved with the help of available
the problems have been solved with
software (SAS, S~Plus, Minitab). A few of
computer
hand calculators. The reader should keep in mind that round-off errors can occurparcularly in those problems involving long chains of arthmetic calculations.

We would like to take this opportnity to acknowledge the contrbution of many
students, whose homework formd the basis for many of the solutions. In paricular, we
would like to thank Jorge Achcar, Sebastiao Amorim, W. K. Cheang, S. S. Cho, S. G.
Chow, Charles Fleming, Stu Janis, Richard Jones, Tim Kramer, Dennis Murphy, Rich
Raubertas, David Steinberg, T. J. Tien, Steve Verril, Paul Whitney and Mike Wincek.
Dianne Hall compiled most of the material needed to make this current solutions manual
consistent with the sixth edition of the book.
The solutions are numbered in the same manner as the exercises in the book.
Thus, for example, 9.6 refers to the 6th exercise of chapter 9.
We hope this manual is a useful aid for adopters of our Applied Multivariate
Statistical Analysis, 6th edition, text. The authors have taken a litte more active role in
the preparation of the current solutions manual. However, it is inevitable that an error or
two has slipped through so please bring remaining errors to our attention. Also,
comments and suggestions are always welcome.
Richard A. Johnson
Dean W. Wichern

Chapter 1
1.1

Xl =" 4.29

X2 = 15.29

51i = 4.20

522 = 3.56

S12 = 3.70

1.2 a)
Scatter Plot and Marginal Dot Plots

.
.

.

.

.

.

.

.

.
.
.

.

17.5

.

15.0

I'

.

.

.

.

.

12~5

.

.

.

)C

10.0

.

7.5

.
.

.
.
.

.
.

5.0
0

4

2

6

10

8

12

xl

b) SlZ is negative

c)
Xi =5.20 x2 = 12.48 sii = 3.09 S22 = 5.27

SI2 = -15.94 'i2 = -.98
Large Xl occurs with small Xz and vice versa.
d)
x = 12.48

(5.20 )

Sn --

-15.94)

-15.94
( 3.09

5.27

R =(
1 -.98)
-.98 1

.

2

1.3

SnJ6 : -~::J

x =

-

-. 40~

UJ

R =

. 3~OJ

.(1
(synet:;
.577c )

L (synetric) 2 .

1.4 a) There isa positive correlation between Xl and Xi. Since sample size is

small, hard to be definitive about nature of marginal distributions.
However, marginal distribution of Xi appears to be skewed to the right. .
The marginal distribution of Xi seems reasonably symmetrc.
.....'....._.,..'....,...,..'.":

SCëtter.PJot andMarginaldøøt:~llôt!;

. .

. . .

.

25

20

.

I'
)C

15

.

.
50

.

.

.

.

..

.

.

10

.

. .

.

.
.
.

.

.

.

.

.
100

;..

150

xl

200

250

300

b)
Xi = 155.60 x2 = 14.70 sii = 82.03 S22 = 4.85

SI2 = 273.26 'i2 = .69
Large profits (X2) tend to be associated with large sales (Xi); small profits

with small sales.

3
1.5 a) There is negative correlation between X2 and X3 and negative correlation

between Xl and X3. The marginal distribution of Xi appears to be skewed to

the right. The marginal distribution of X2 seems reasonably symmetric.
The marginal distribution of X3 also appears to be skewed to the right.

Sêåttêr'Plotäl'(i'Marginal.DotPiØ_:i..~sxli.

.

.

. .

.

.

. .

1600

)C

.

.
.

1200
M

.

.

.
.
.
. .

.

.
.

.800

.

400

.

.

.

.

0

25

20

15

10

. . . .

x2

. '-'

. .Scatiêr;Plötànd:Marginal.alÎ.'llÎi:lîtfjtì.I...

. .

. . .

1600

)C

.

.

. .
.

.

1200
M

.

..

.
.

800

.

400

.

.
0

50

100

150

xl

200

250

.
300

.
.
.
.

.

.

. . .

4

1.5 b)
273.26

Sn = 273.26
(- 32018.36
82.03

x = 14.70

710.91
(155.60J
1

-.85
1.6

- 948.45

461.90

-.85)
-.42

.69
R = ( ~69

4.85

-32018.36)
-948.45

-.42

1

a) Hi stograms

Xs

Xi
NUMBER OF

HIDDLE OF

HIDIILE OF
INTERVAL
co

OBSERVATioNS
*****
5

INTERVAL

5.
6.
7.
8.
9.

6.

********

S

******

X2
NUHBER OF
OBSERVATIONS
1
*

HIDDLE OF
INTERVAL

30.
40.
50.
60.
70.
80.
90.

J

2
3
10

12
a

100.
110.

2
1

s

9.
10.
11.
12.
13.
14.
15.
16.
17.

un*

6
4
4

**********
************
********

0

2.
4.
6.

1*

OBSERVA T I'ONS

1.

.... .

:3 .

4.

s.

NUMBER OF.
OBSERVA T IONS

1 J ***$********* 15 *************** a ******** 5 1 ui** * J **** *** 4 7. ******* 5 n*** 2 2 ** u 1* 2 1 ** * 20. 22. 24. 26. X4 NUltEiER OF OI4SERVATIOllS 7 ******* B ******** 10. 12. 14. 16. 18. NUHBER OF S u*** INTERVAL * 1 I' I DOLE .OF 19 ******************* 9 ********* .3 U* HIDDLE OF * 1 Xl :s ***** 4. 5. 6. 7. * 0 a. J. * 1 n * X3 2. **** 1 0 INTE"RVAL HIDIILE OF INTERVAL 4 0 19. 20. 21. ** *** u** Uu.* .S LS. n* ***** ***** ****** **** S a. *********** 11 5 6 U* J 7. u***** 7 10. oJ . NUMB£R. OF 08SERVATIONS 2 ** o o X7 HIDDLE OF INTERVAL 2. J. 4. s. NUMBER OF OBSERVATIONS 7 ******* 9 ********* 1* 25 ************************* 5 1.6 2.440 7.5 b) 293.360 73 .857 - x -2 . 714 4.548 = 2. 191 -.369 3.816 1.486 S = -.452 - . 571 -2.1 79 .,.67 -1 .354 30.058 2.755 :609 .658 6.602 2.260 1 . 154 1 . 062 -.7-91 .172 11. 093 3.052 1 .019 30.241 .580 n 1 0 . 048 9.405 3.095 .138 .467 (syrtric) The pair x3' x4 exhibits a small to moderate positive correlation and so does the pair x3' xs' Most of the entries are small. 1.7 ill b) 3 x2 . 4 4 .. . 2 . Xl 2 2 4 Scatter.p1'Ot (vari ab 1 e space) ~ ~tem space.) 1 -6 1.8 Using (1-12) d(P,Q) = 1(-1-1 )2+(_1_0)2; = /5 = 2.236 Using (1-20) d(P.Q)' /~H-1 )'+2(l)(-1-1 )(-1-0) '2t(-~0);' =j~~ = 1.38S Using (1-20) the locus of points a c~nstant squared distance 1 from Q = (1,0) is given by the expression t(xi-n2+ ~ (x1-1 )x2 + 2t x~ = 1. To sketch the locus of points defined by this equation, we first obtain the coordinates of some points sati sfyi ng the equation: (-1,1.5), (0,-1.5), (0,3), (1,-2.6), (1,2.6), (2,-3), (2,1.5), (3,-1.5) The resulting ellipse is: X1 1.9 a) sl1 = 20.48 s 12 = 9.09 s 22 = 6. 19 X2 . 5 . . . 0 . -"5 . . . 5 xi 10 a12 = ~.11 d(P.?o. . .x2) = (0.(x1-Yfx2+Y2):1 + 3(Xi-Yi):1.10 a) This equation is of the fonn (1-19) with aii = 1.)(yZ-x2) + (xz-Yz):¿' = d(Q.Q) ~O. and aZ2 = 4.)(x2-y2) + (x2-YZ): = =.?0 so d(P. b) In order for this expression to be a distance it has to be non-negative for 2. :¿ all values xl' xz' Since. for (xl .-yi)2. 1.1) we have xl-2xZ = -Z. we conclude that this is not a validdistan~e function.pr.essi'on only if xl = Y1 and then the first is zero only if x. xl + 4xZ + x1x2 = (xl + r'2) + T x2 . Therefore this is a distance for correlated variables if it is non-negative easily if we write for all values of xl' xz' But this follows 2.2 = YZ.-Yi)4 + Z(-l )(x1-Yl )(x2-YZ) + (x2-Y2):¿' = 14(Y1-xi):¿ + 2(-i)(yi-x. 2. 4(x.P) Next.7 1.2(xi-y.Q) = 14(X. The s€cond term is zero in this last ex. 1 1 15 2. .1 1 . . 1 -1 7 x.13 Place the faci'ity at C-3..lxpl)' 1.P) = max(lx. is X2 .141) = 4 b) The locus of points whosesquar~d distance from (n..I.8 1.lx21.12 a) If P = (-3.4) then d(Q.O) is ..P) =max (1-31.. -1 c) The generalization to p-dimensions is given by d(Q. .80 .9 1. 1 3 -27 . 320.+ +______+_____+-------------+------~. 94 221 '. .38 (synetric) 337. s group.' 230.53 = 1146.+ 200. b) Mul tipl e-scl eros. . . . 42 .72 -218. 1:5:5.+ . 07 179.. . .60 82.. 2:5:5. . 280.. .+ .+ .35 865. .16 116. 180.48 286...14 a) 360.31 236.* I: I: 160..32 3 as .64 x = 12. No obvious "unusual" observations.. 240. ..+ )(4 . 65 812. . . . . )(2 130.78 -20. Strong positive correlation. 93 90.91 Sn 61 .62 13.10 61 . 20:5. 375 .123 .21 i = 1 .244 1 .896 .78 273.200 1 1 R -.56 1 95.132 .35 2. (syietric) 1 .892 . 37 .167 -.28 2.15 2. 04 .49 2.13 sn = 1 01 .139 .61 95.727 . H)6 .10 .548 1 R = (symmetric) .28 1.22 .454 .32 183 .438 .57 1. 67 3.133 = 1 ( synetrit: ) 1 Non multiple-sclerosis group.134 1 .114 1 .08 11 0.173 1 .62 5.127 .239 . 99 147.84 1.2u 1 03 . . . 1 I . . .. .. .l - . . 1 1 I -- ~ . . . . . . . '...S11 .. . l 1 . .75 G.. ... .. \1 . . . . . . ~ e . + .llfl . . . . "'. . 3. . . 3. l ..2 I t 1 .z.'_ 1 X:i .14 ~.2 .. . +.. t I : .. . 1. .1) 3.25 ". 1 I 1 . . I .27 . .. 1 . . . .88 t. 1 . I 1 \1 . ..11 1. .. . . .75 . .P. 1 t . ~ . . . . I -. . .". . + . . . . 3. . 1 .. 1 t I i .. t 1 1 . . . + . . . + .. . + . ~ I I +. ..1 3 . . . .80 .81 x = 2. 1 . I . . . 1. J 1I 2 1 . . + t 1 . ..7'5 3. 1 1 .. 2. 1 ..58 1. .. . . J - . . . . . 75f) .. .--..25 .0 . .21 2. .- 2. 1 1 III 2. . . 1 . I 1 . .. . . . t.o . . . 1 I .25 1. + . . 2 . . . . . . . . ..25 Z "~A ACTIVITY X% b) 3. . . I 1 -- . 1I. . 1 - E E . . . . . 1 .15 a) Scatterplot of x2 and x3..54 1. 1. it . .. . cl . . . . . i J .~~ . (synetric) 1 .11 .85 .09 1.57 . .15 Sn .551 1 .61 1.346 1 (syretric) .071 1 The largest correlation is between appetite and amount of food eaten.386 . appetite and activity have a moderate positive correl a tion.~6 .21 = .12 .035 1 -.02 -.535 .187 1 R = .455 .704 -.12 4.01 .61 .34 .11 .537 . 01 0 ..496 . A1 so.11 .02 .15 -.92 .85 . 077 ' .27 .39 .362 .. Both activity and appetite have moderate positive correlations with symptoms.58 .156 . 0192822 0.820 .400 . the 1 OOm and 200m dashes.72889 1.652 28.0111057 0.400 .544 10.89365 1.674 .368 1.74218 0.820 .732 .500 1.000 .eation is 0.72889 0.62555 1.230 .6938 0.85181 0.12 .508 .61192 1.67789 0.19 9.00000 0. for example.74909 0.02 4.13 1.ti2555 0.875 2.500 .871 .941 1.197 1. .69146 0.0214560 0. 0.065 .0085522 0.74909 0.0081886 0.00000 0.000 .0202555 0.36 .00000 0.0641052 0.000 .867 .801 .74369 0.854 .368 1.55222 0.508 265.89365 bewteen Dominant hemero and Hemeru.065 .0123332 0.4420 between Dominant humeru and Ulna.0099633 0.0076395 0.00000 0.8183 1.000 .0109612 0.875 .799 .806 1.00000 -0.212 .80980 1.680 .677 .0081886 0.0123332 0.00000 x- 0.720 .0161219 0.905 . 11.669 .544 1.000 3.199 . 7927 1.197 3.0161635 0.909 .69146 0.0087559 0.027 .193 51.89365 0.0076395 0.905 1.073 .082 .17 There are large positive correlations among all variables. and the highest corr.254 23.61192 0.152 .85181 1.720.677 R = .0771429 0.230 4.0667051 0.67789 0.0641052 0.973 .178 .199 .7044 0.801 .0124815 0.0170261 0.867 .0161635 0.060 .027 .669 .61882 0.8438 0.4420 0.082 .0077483 0.782 .152 6.474 10.08 153.021 .0170261 0.0177938 0.0099633 0. The lowest correlation is .806 .060 .66826 0.799 1.0202555 0. 0.0087559 0.791 .338 .99 .265 x = 2.941.728 .0085522 0.621 .338 .44020 0. and the 1500m and 3000m runs.973 1.809 .61882 0.680 .854 .782 .791 . Paricularly large correlations occur between running events that are "similar".178 .254 10.02145tiO Sn - 0.474 10.0161219 0. 7348 .674 .0192822 0.007 .909 1.80980 0.212 .55222 0.000 .000 .732 .74369 0.0077483 0.0177938 0.021 . R = 0. .847 2.66826 0.0101752 1.62 So= 4.152 .74218 0.871 .193 28.16 There are signficant positive correlations among al variable.809 .728 . 081 .102 .177 .972 1.000 .797 .71 .776 .17.60 Sn = .672 .866 .906 .906 1.100 .075 .147 .675 .000 .806 .144 .108 .906 1.147 .081 . The correlation matrix for running events measured in meters per second is very similar to the correlation matrix for the running event times given in Exercise 1.000 .167 1.729 .108 .114 .972 .114 .731 . Notice the correlations decrease as the distances between pairs of running events increase (see the first column of the correlation matrx R).18 There are positive correlations among all variables.816 .875 .000 .094 5.741 .675 .105 .074 .824 .138 .096 .54 4.096 .096 .938 .660 .000 .741 .082 .66 . 8.824 .092 .875 .99 .81 .086 .118 .124 .000 .081 .62 .804 .082 .14 1.094 .804 1.095 .093 .854 .081 8.100 .806 .672 R = .105 .144 .866 .852 .660 .065 .097 .095 .118 5.852 .115 .816 .000 .938 1.065 .102 x = 6.776 .096 .906 .092 .797 .694 .075 .093 7.086 .731 .854 1.729 .097 .091 .694 . i :: ..QI . . : o.C co co ...... Q I c: i- C .- UI ......'" - -...: o c: en .. .¡c:. -I: . . 00 '. '" o ...0 - QI CI ". .~ -... c: . z: ... .. . '" .. ". -0 c: C CD c. . ..' " " . c: . QI t- " ... 00 00.....I: ~ . . .. : . 00 " .. : .. . .C co .- .. .: .I: .. : .I: II ~ " o' " .: ... . - .. z: ..... .. :. o en . .. z- C .I: i = c: ..15 1.' . . en " o' .. in .. 0. -z .¡c:: . :: . .~.0 - - . .. - . I o' '0 .. ".' " CI -..... c . t- ~ CI - .....19 (a) o _R A 0 IUS RADIUS LHUI. c: i- .o'' 0... c: . C .... . o' : '.... .. .. . = C ~ .. ... .....' . ....II 0' i:: . ...ERUS tlUME~US ILULNA ULNA c- .............: C . . l. ... . " ". . .: . ... A... .:. .~: . .. ~ \. .. 'L ~. .. .. t ... . .. .' \ . .. . ... ... -it ..c ..- :. ..~.-. ll. . " . .. . ~: ....l . ~.lý .. .( .~. .. . . . . .i -...... t- .~. ..!. . . . . \.. . . .. . . -\1: . ~ . -l . . ..8...' ~.1fiii: '..' .~..'\: . . .. .t:.I...~.. ~. \..... :. .' !t ~ 1" ~. . ' tl. . . .-. . .1... .. :i~ .16 1." . . . . .. .. ...~ "~ .. .. ..... \. .~ ~. i l 0 \. ... .. .19 (b) ~.... ..0' .. . . -Ii.to l .:~ t. " . ". l. -i-.t':".~ . P. . . ~. . ... :. . .. .. .... ... . . .. .. . .. _... · . .¡ .. . ..... \. . . . . . . · ~c. . . .. ~ . . .o. . f . . ~l t.. . 'l . ..1.. . I. .. i:_. .1.. .... . "-:f' I! ..... :. ..~~ ..~.. .. . \ l'ø.. L _ _ (. . ." r ."" i. . . '.. .. .. (b) .. . .' A L_-l_ X .. . .. besides the strung out pattern to the right.".. Some observations. . . . . . in the lower left hand part could be outliers. \ . (a1 The plot looks like a cigar shape. . . From the highlighted plot in (b) (actually non-bankrupt group not highlighted). '" ~'~t . . ~'T · '\\ . .. \ X3 X1 . .20 Xl (a) .. but bent. .. .. . there is one outlier in the nonbankruptgroup. . .. ~ . . .. .tion.. \. x1 ..l l ..17 1. . . which is apparently located in the bankrupt group..ö . . (ll) The dotted line in the plot would be an orientation for the classificà.. ~ x3 . .-.. .. .... .... .. .. . . . .21 o (a) o (b) Outlier Outlier ~~ô 0'" . The observation in the upper right is the outlier. . . .. . . . . there is an orientation to classify into two groups. ... .. X1 . .. . . tfe' Ó.. . As indiCated in the plot.. . .. .l. .. .~.. X3 .... . G~~ . .. . .. ..... . ... . . . . .. .18 1. ... ..e Outlier Q (a) There are two outliers in the upper right and lower right corners of the plot.. X1 . (b) Only the points in the gasoline group are highlighted. . ... . . . . . . ./ . )l" .. . ./.. ../ . . / \. .. . Outliers ~ . . ~. .... ... .. Xz .22 possible outliers are indicated. .. . x. . ~~e .~ . fi .et ot)~ ... . . . . . . . x. · ). . ./ . . . . . .. . . . ../ ~e~~\. Xz . . . . t I ~~\e X1 .." . il . fi .. ../ .... . G Outlier ~ø ~~ ~ø. . .s. . ..e . .19 1.... .. ..I . .. . .. . . .../ / .... X1 Xz/ · .. . .~a: i-n: +J l-0 .. ci V) Co = V) i. ~ i-aci . ~ciVI en :: ..c.. z: Cd CJ I) .G M N. ci . c: .u VI C s.. '" +J VI ra ci . ~VIa.:: oi :: V) s..VI Q.. . VI CI VI ci u . ci to :: iCo .a:u 4. s.. 20 . ci .. ~ci :: iu VI II ai ci ~VI:: .c :: en i- Cd ci VI . II C' Cd .. V) oi = V) s...I -a: ra :: VI s.- -u Cd ci Q.c u N oi c: -= s. ci VI en :: a.c:s.. s... . . U . 0 IØ U s.. c( . iu ci ~VI i-U:: iVI -~ ..u :: "' e a. c: fa s. s.Q -0 VI CI . ~ C" ~ :: iU ~ci ~ ~s.I¡ Cd a c: s. --i. -u:: cc: s.Q c: ci ~ s. s 1 .21 Cl uster 1 1.24 20 10 13 4 C1 uster 2 3 9 14 19 18 C1 uster 3 22 . the same manner as those in Example 1. other groupings are~qually . for instance. . utilities 9 and 18 l1ight be swit. Note. plausible.22 Clust~r 4 16 8 11 Cl uster 5 21 5 Cluster 6 17 12 2 C1 uster 7 We have cluster~d these faces in.ched from 7.12. however. '5 Cluster 2 toC1 uster 3 and so forth. ¡.- .. 10 4 -.: l......0: ...'-1 ." . 13 '-a.i .1 ¡..l ".~... f ~ 20 ".- (not . / .emai ni ng stars cl usters.... '..l.. .. ..~.."-....'... -..25 We illustrate one cluster of "stars.23 1..." I: . The shown) can be gr~uped in 3 or 4 additional r. ... -. -. ..... . '/ .. .": ... ..615 0.409 0..00 0. 940 -0.~66 1555.-. 2 4 6 8 §! .~15 0.2895 0.168 -0. a.78 15.479 0. .3158 0.~'.0 8.26 Bull data R (a) XBAR Breed SalePr YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt 4.260 0. . -:- . .17 4.487 0. .000 -0. .02 Breed YrHgt FtFrBody PrctFFB Frame BkFat 2..116 0.. .. . ..317 o . 26 206 .94 128.l .116 -0. . . 860 0 . . .41 82.85 -0.. .44 -0.27 1.32 25308 . .. . :.94 128...488 0.000 -0. . .423 1 .41 82. . . .368 0. . ~ g 2 4 6 8 50 52 54 1i 58 60 . . ..000 Sn SalePr -429.000 0. . 75 10.1. . BkFat . .23 -0. .56 2.00 480. 56 2.555 0.14 -0. .368 0. .0 .32 25308. ~ .92 206.: Frame :i . ..000 0.96 2. . .521 0. . . 17 -1.. FtFrB I . .699 0. ...37 1.2 0.277 0. 3816 1742. . d O.488 -0.74 2.277 -0.3 0. . .46 2. 368 0.55 4.24 450. \. .472 0. 44 81.55 SaleHt 3. .434 -0.49 51. CD CD on SaleHt l: -:. . .38 145.01 480 .. . .860 0. . .02 383026.: . 75 51. .409 0.000 O..482 1. . . 525 0 .566 1.. .05 -0. . .224 0..208 0. .35 16628.5 .73 1.QOO ~0.09 -226. . .198 0.605 0.92 1.801 -0. 72 6592. .44 81. .02 15.. .81 8481.i'. '. N '" . 47 2. . .423 0.487 0. .434 0..97 145 .317 0.282 0. ..05 5813. .0 7. .49 -0...02 0...113 0. ~ . o CD . ~ .24 -0.81 2.390 0.72 6592. 1263 0. '" d '" d .82 43.624 1. .224 1.479 0.. .23 3.00 3.24 1. 35 90 1100 t30 5.37 1. .79 98. 523 0 .000 0. . t 0.940 0..208 £4.. . .64 450. .801 0.5224 995.691 0. . 624 0 . .482 -0..102 -0.. . . .344 0 . . -. CD CD . .9474 70. . . .525 0.260 1. . .28 1.44 1. . . I.4342 50. .38 9. 27 -1. . Breed .-'.472 -0.38 116.05 46.113 0.23 0.74 2.605 -0. . .344 -0. . .94 -0. .. . .5££ 0. .699 0. l.78 1.23 3.4 0.368 -0. . 8 --¡.09 98.691 1.46 272. . .47 -0. .47 'SaleWt 46.28 -226. ~ .198 0. . .82 43..102 0. . .8816 6.79 116. cci .73 -429 ..168 0.282 1. .I .0 6..05 3.14 272.390 0. 38 -0.47 5813. .1967 1.521 0. . Breed .523 0. . This single point has reasonably large effect on correlation reducing the positive correlation by more than half when added to the national park data set.1000 . Correlation with this park removed is r = .. .25 1. . . 0 0 1 . Gæct '5lio\£~ "' . . 500 . 2 3 4 5 6 7 8 9 Visitors (b) Great Smoky is unusual park. .391. (c) The correlation coefficient is a dimensionless measure of association. 1500 iI üi .173 Scatterplot of Size Y5 viSitors 2500 . . . . 2000 . The correlation in (b) would not change if size were measured in square miles instead of acres.27 (a) Correlation r = . . . i . I. .. ~.~. 1 ~ I ~ . :.. . .1 a) .y = Iß = 5.i :=-: -j.A : 'JO " : i i . i i : l : =i i I 1 I i 02-2: .. ~ .. . . '. i . : I . l-' : .. -ii.. .. ~ =. :i:-'. I . ILl / i I. : '7 ~~ /.. /l oJ : : . I ! ! i/'i : g ¡ .. .26 Chapter 2 2. /" 1-. i : I i .. ø ':.. . i ! : : 1 I I I . -"-. . I . ::t i i I . . . : . : . I i I I I . i . i I J : .s lt~i is1is x = i 3š x = 7~35'35 (1 1 31' c) ': : i ...i" 1 .I~ -. ~ . . .' :. 1 i) . .051 ..:': . I .:/ _~.~ . : ... : i .1 . i :3 . .. I !./_. . . r I . :i 'i i I .1'- . i . .. i' i.t I b) i) il) Lx = RX cas(e) = . x. I . ---. "T ~-:~~-'.' . . i I ... ------ II i i .1 3 d~j = . I . . :I Ii I i -tI ! ii i 7.1 i. .. i . .051 ) proJection of L on x . I ! ... -~'"' K (-~ ' "'7' .. 'i. ! " I I . i . ..- e . _ . . . . : ! I I . -A : t.1 .1 ' ..-. ./ : '-' .--_./ == i . i !! I i : I ._~-:.'-i-' ...! .--- I ..__ " . . ./ I .:~.-. ¡ i ~.9l'i 1 = = 19. . .. . 870 = arc cos ( .621 LxLy . ... .2 a) b) SA = 20 -6 r-6 1a ( -~ -q SA = . jth ... (~ (A I) = A' = A (C'f"l~ :l .1~ iÕ 4 -iÕJ i2 B'A' (12.)th k c. A 2. a) AI b) C' c i so 3.3 = C'B U ':11 ) 11 (~ ~) - = (~ (AB) i ':) 11 (i ~j )th entry k a.. . = i aitb1j 1 =Ja"b1.(: 1a -ìa lõ. (t''-' (C' J' 'l.=1 Consequently..th row (b..b2i ~'" IbkiJ 1 and A' lias.i 31 -1 = J) 10 c) 8 ' 10 = 4 (1: AB = n (i has -Tõ 7)' (AB) .. J R. (AB) i has entry. ('1 ~J'.. 1=1 has . 1 +Ja'2b2' 1 J+.b1i .b. i . -7) -: ) 1).27 15) 2. = Jl Next ßI I ajR.+ 1 a.~ -9 c) d) = AIBI -1 (-1 : e) No. = d) il). A(~Bi~)A-i = AA-l = I .l: 9xi ...4x1 X2 + 6X2 x' Ax .i. Å2 = 5 .5 IT QQI = -12 13 2.x2)2 + 5(x. ¡s 2.7 a) Eigenvalues: Ål = 10.. and 1= (A-1A)' = A1(A-l)l. a) 5i nce b) Since the quadratic form A = AI.I Consequently. 1 :J .+xi) ~ 0 for tX.894) .aj2"".447.~ ) I = (A-l)' A. (xi . 2/15) = (. (A-l)1 is the inverse of Al or (AI r' = (A-l)'.447) ~2 = (1/15. (f1A)B .ajk)1 so SIAl has ~i~j)th entry k bliaji+b2ibj2+. . -2 -:)(::1~ (2Xi..894~ -.B-1S' I so AS has inverse (AS)-1 · I bl (S-lA-l)AS _ B-1 B-1 A. 2.6 12l r _121 r IT IT 13 = 1 69 5 12 i3 IT 5 i3 a 169 ~1 A' is symetric. 2. Thus I i = I = (AA .28 column (aji.. It was suff1 ci ent to check for a 1 eft inverse but we may also verify AB(B-1A-l) =. Nonnalized eigenvectors: ':1 = (2/15~ -1/15)= (.lx2) -l (O~O) we conclude that A is positive definite.+bk1~jk = t~l ajtb1i = cji 51 nce i and j were arbi trary choices ~ (AB) i = B i A I . = QIQ .4 a) I = II and AA-l = I = A-1A.x2J ( 9 . genvectors: .z~ eigenvectors: .1 Normal. -1//5 + 5 (1/1S1 (1/IS.9 -2 1/15 -2/~ a) A-1 = 1(-2)-2(2) 1 .1 = (2/ß. 1//5J _ir 1/15) (1//5.z = (i/ß~ -2/I5J cJ A-l =(t 11 = 1 (2/15) (2/15.~ 2/15J . l/I5J .-1 (-2 -2) -2 =i1131 11 3 6 b) Eigenvalues: l1 = 1/2~ l2 = -1/3 Nonna1iz.ed eigenvectors: .29 b) A' V-2 -2 ) . 1/15J _ 3( 1/1S)(1//s' -2/151 ' A · (: 2.04 . (012 c) -1 A = 2//5 0041 1 9(6)-( -2)( -2) (: 9 . zed e. -2/ß1 -1 2 1/15 3L-21 5 .8 Ei genva1 ues: l1 = 2 ~ l2 = -3 Norma 1.1 = (1/¡.2. -2/15 J 2) = 2 (2//5) (2/15. . 1 fIlS r2/1S.~ = (2/15 ~ l/~ J =~ = (1/15. -1I/5J 2.18 d) Eigenval ues: ll = . l2 = .z =: (2/15~. 9-1/~ 2/~ 2) . + o..10 B-1.OOl)~ -4.000 -4 .OÒZOCl -: 00011 1 ( 4..0(0) = aiia22 Proceeding by induction~we assume the result holds for any (p-i)x(p-l) diagonal matrix Aii' Then writing aii = A (pxp) a a a .002 Thus A-1 ~ (_3)B-1 with p=2. app' ..002)~(4. .OOl)~ ~4.30 2. a2Za33 .OOl .. ~.0011 = 333..11 With p=l~ laii\ = aii a a a22 = a11a2Z . app) = al1a22a33 .24 to find IAI = aii I Aii I + 0 + .. aii. ~pp by the induction hypothesis~ IAI = al'(a2Za33.001 -: 00011 = -1. Aii a we expand IAI according to Definition 2A.4(4. S~nce IAnl =.D02001 _ 1 r 4..002001 -44.000.and 2..002 -1 A = 4(4.. 001 .333 -4 . 001 ( 4. .0011 )-(4. ( 4. 31 2.12 By (2-20), A = PApl with ppi = pip = 1. From Result 2A.l1(e) IAI = ¡pi IAI Ipil = ¡AI. Since A is a diagonal matrix wlth p p diagonal elements Ài,À2~...,À , we can apply Exercise 2.11 to get I A I = I A I = n À , . '1 1= 1 2.14 Let À be ,an eigenvalue of A, Thus a = tA-U I. If Q ,is orthogona 1, QQ i = I and I Q II Q i I = 1 by Exerci se 2.13. . Us; ng Result 2A.11(e) we can then write a = I Q I I A-U I I Q i I = I QAQ i -ÀI I and it follows that À is also an eigenvalue of QAQ' if Q is orthogona 1 . 2.16 show; ng A i A ; s symetric. (A i A) i = A i (A i ) I = A i A Yl Y = Y 2 = Ax. p _.. .. .. Then a s Y12+y22+ ,.. + y2 = yay = x'A1Ax yp and AlA is non-negative definite by definition. 2.18 Write c2 = xlAx with A = r 4 -n1. Theeigenvalue..nonnalized - - tl2 3 eigenvector pairs for A are: Ài = 2 ~ Å2 = 5, '=1 = (.577 ~ ,816) ':2 = (.81 6, -, 577) 'For c2 = 1, the hal f 1 engths of the major and minor axes of the elllpse of constant distance are ~1 12 ~ ~ ~ = -i = ,707 and ~ =.. = .447 respectively, These axes 1 ie in the directions of the vectors ~1 and =2 r~spectively, 32 For c2 = 4~ th,e hal f lengths of the major and mlnor axes are c 2 ' ñ:, .f c _ 2 _ - = - = 1.414 and -- - -- - .894 . ñ:2 ' IS As c2 increases the lengths of, the major and mi~or axes ; ncrease. 2.20 Using matrx A in Exercise 2.3, we determne Ài = , ,382, :1 = (,8507, - .5257) i À2 = 3.6'8~ :2 = (.5257., .8507)1 We know ,325) A '/2 = Ifl :1:1 + 1r2 :2:2 ,325 __(' .376 1. 701 - .1453 J A-1/2 = -i e el + -- e el _ ( ,7608 If, -1 -1 Ir -2 _2 ~ -,1453 We check Al/ A-1/2 =(: ~) . A-l/2 Al/2 .6155 33 2,21 (a) A' A = r 1 _2 2 J r ~ -~ J = r 9 1 J l1 22 l2 2 l19 0= IA'A-A I I = (9-A)2- 1 = (lu- A)(8-A) , so Ai = 10 and A2 = 8. Next, U;J ¡::J ¡ i ~ J ¡:~ J - 10 ¡:~ J - 8 ¡:~J gives gives ei - . 1/.; - (W2J ¡ 1/.; J e2 = -1/.; (b) AA'= ¡~-n U -; n = ¡n ~J o = /AA' - AI 12-A I - .1 0 80- À40 4 0 8-A = (2 - A)(8 - A)2 - 42(8 - A) = (8 - A)(A -lO)A so Ai = 10, A2 = 8, and A3 = O. (~ ~ ~ J ¡ ~ J - 10 (~J .gves ¡~ gives so ei= ~(~J 4e3 - 8ei 8e2 - lOe2 0 8 0 ~J 4e3 4ei Also, e3 = 1-2/V5,O, 1/V5 J' - 8 (~J ¡ :: J - Gei U so e,= (!J 34 \C) u -~ J - Vi ( l, J ( J" J, 1 + VB (! J (to, - J, I 2,22 (a) AA' = r 4 8 8 J l 3 6 -9 r : ~ J = r 144 -12 J l8 -9 L -12 126 o = IAA' - À I I = (144 - À)(126 - À) - (12)2 = (150 - À)(120 - À) , so Ài = 150 and À2 =' 120. Next, r 144 -12) r ei J = .150 r ei J L -12 126 L e2 le2 . r 2/.; ) gives ei = L -1/.; . and À2 = 120 gives e2 = f1/v512/.;)'. (b) AI A = r: ~ J l8 -9 r438 8J - r ~~ i~~ i~ J l 6-9 25 - À 505 l 5 10 145 0= IA'A - ÀI 1= 50 100 - À 10 = (150 - A)(A - 120)A 5 10 145 - À so Ai = 150, A2 = 120, and Ag = 0, Next, ¡ 25 50 5 J 50 100 10 5 10 145 gives r ei J' r ei J l :: = 150 l:: -120ei + 60e2 0 1 ( J -25ei + 5eg VùU O or ei = 'W0521 lD 145 ( ~5 i~~ i~ J eg e2 ( :~ J = 120 (:~ J 35 gives -l~~~ ~ -2:~: ~ or., = ~ ( j J Also, ea = (2/J5, -l/J5, 0)'. (c) 3 68 -9 (4 8J = Ý150 ( _~ J (J. vk j, J + Ý120 ( ~ J (to ~ - to J 2.24 a) ;-1 = ~ '9 ( 1 c) For ~-l +: À1 = 4, a 1 a b) n À1 = 1/4, À2 = 9 ~ À3 = 1, ':1 = (1 ,O,~) i À2 = 1 /.9, ':2 = (0 ~ 1 ,0) , À3 = 1, el -3 = (OlO~l)1 =l=('~O,OJ' =2 = (0,1,0)' =3 = (0,0,1)' 36 2.25 Vl/2 "(: a) a 3 4/15 1/6 1 (: 0 0Jt 1 -1/5 4flJ (5 2 a -1/5, a 3 4/15 1/6 = (~: a) -.2 .26~ - 2 1 il .1'67 ' 1 67 i " ~:i'67 V 1/2 .e v 1/2 = b) 2.26 2 OJ ( 1 -1/5 4fl5J o 'if.= ,-1/5 1 1/6= a 1 ° OJ i5 -1 1/6 0 2 ° = -2/5 2 1 a a 3 4/5 1/2 4/3) (5 a 1/3 a 2 3 a 0 :J -2 4 n =f 1 1/2 i /2 P13 = °13/°11 °22, = 4/13 ¡q = 4l15 = ,2£7 b) Write Xl = 1 'Xl + O'X2 + O-X3 = ~~~. with ~~ = (1 ~O~O) 1 1 i , i 1 1 2 x2 + 2 x3 = ~2 ~ W1 th ~2 = (0 i 2' 2" J Then Var(Xi) =al1 = 25. By (2-43), ~ 1X 1X ,+ 1 2 1 .19 Var(2" 2 +2" 3) =':2 + ~2 =4 a22 + 4 a23 + '4 °33 = 1 + 2+ 4 15 = T = 3.75 By (2-45) ~ (see al so hi nt to Exerc,ise 2.28), 1 1 i 1 1 Cov(X, ~ 2Xi + 2 Xi) = ~l r ~2 = "'0'12 +"2 °13 = -1 + 2 = 1 ~o 37 1Xl +1'2 X2) = Corr(X1 ~ '2 2.27 , 1 COy(X" "2X, + '2X2) 1 .103 ~r(Xi) har(~ Xl + ~ X2) =Sl3 := a) iii - 2iiZ ~ aii b) -lll + 3iZ ~ aii + 9a22 - 6a12 c) iii + \12 + \13' d) ii, +~2\12 -. \13, + 4a22 - 4012 aii + a22 + a3i + 2a12 + 2a13 +2a23 aii' +~a22 + a33 + 402 - 2a,.3 - 4023 e) 3i1 - 4iiZ' 9a11 + 16022 since a12 = a . -: J (-~ ~ J . X(2)) = ¡ ~ ~ J (j) COV(AX(1)..38 2.(2) (~ -iJ ¡ n = ¡ n (g) COV(X(2) ) = E22 = ¡ -.(l) = ¡ 1 -'1 1 ¡ ~ J = 1 (c) COV(X(l) ) = Eii = ¡ ~ ~ J (d) COV(AX(l) ) = AEiiA' = ¡i -1 i ¡ ~ n ¡ -iJ = 4 (e) E(X(2)J = ¡.2) = ¡ n tf) B¡.BX(2))=AE12B'=(1 -1) ¡~ ~J ¡ _~ n=(O 21 . -: J (h) COV(BX(2)) = BE22B' = ¡ ~ -~ J (-.31 (a) E¡X(l)J = ¡.(~: -~ J 0) COV(X(l).(l) = ¡ :i (b) A¡. ¡ i ~ J (e) E(Xl2)j = IL(. -~ J ( -~ i = ¡ -.X(2)) = ¡ l ::J ~ J (j) COV(AX(l) i BX(2)) = AE12B' ¡ 12 9 J 9 24 . = U i -~ J (j ~ -~ J U -n 0) CoV(X(1).39 2.32 ~a) EIX(l)j = ILll) = ¡ ~ J (b) AIL(l) = ¡ ~ -~ J ¡ ~ J = ¡ -~ J (c) Co(X(l) ) = En = l-i -~ J td) COV(AX(l)) = AEnA' = ¡ ~ -¡ J ¡ -i -~ J L ~~ ~ J . J (g) COV(X(2) ) = ~22 = 1 4 ( -1 6 10 -~1 i (h) COV(BX(2) ) = BE22B' .) = ( -~ J (f) BIL(2) = ¡ ~ . ) (g) Co( X(2) ) = E" = ¡ ¿ n (h) CoV(BX.I 1~ ~ J .~ J .l ~ ~ J 2.2) ) = BE"B' = U .40 .î ) L ¿ ~ J D .U j J H =l n (¡ j J . ¡234) = (î -~ ~) (-¡ -~!J (-~ n - 4 63 (e) E(X(2)J = ti(2) = ¡ ~ ) (f) Bti(2) = ¡ ~ -î J ¡ ~ J = I .33 (a) E(X(l)j = Li(l) = ( _~ J (b) Ati(l) = L î -~ ~ J ( _~ J .~ .¡ ~ J (c) Cov(X(l¡ ) = Eii = .~ ( 4 i 6-i~J (d) COV(AX(l) ) = Ai:iiA' . X(2))= -1 0 1 -1 ( _1 0 J ü) COV(AX(l).41 (i) COv(X(1).~ J . BX~2)) = A:E12B1 = ¡ 2 -1 0 J (=!O J 1 1 3 i1 -1 0 ¡ ~ .~ 4.~ J = ¡ -4. 2.)2 .447).42 2.7 ~ Ài = 10 and -1 x I x Fl Exercise 2. ). -. (. ).1 = 18.2 = 9. Hence the maximum is 18 and the minimum is 9. Therefore max xlAx = 10 andth1s maximum is attained for : = ~1. we have from Exercise el . max x'x=l .34 bib = 4 + 1 + 16 + a = 21. 2.3 = 9.1 = 7. (4 . = x'Ax wher A = (: ~).6. .35 bid . ).38 Using computer.- X i Ax = max ~ 'A! ~13 = À1 ~fQ where À1 is the largest eigenvalue of A.2 = 1.-did .).= 15 and bid = -2-3-8+0 = -13 (ÉI~)Z = 169 ~ 21 (15) = 315 2.32 = 0 gives ).17 2.37 From (2~51). For A given in 2. Hence the maximum is 7 and the minimum is 1.36 4x~ + 4x~ + 6xix. 3) = L: -:J (-:14 23) ( -~ J · 125 (-~ J 2/6 ) il ) d I B-1 d = (1~1) 2/6 11/6 = 2/6 1 ( 5/6 --' so 1 = (bld)Z s 125 (11/6)" = 229.894.).- biBb - = -4 + 3 = -1 = (-4. .43 2. 2.41 (8) E(AX) = AE(X) = APX = m o OJ (b) Cav(AX) = ACov(X)A' = ALXA' = (~ 18 0 o 36 (c) All pairs of linear combinations have zero covarances.42 (8) E(AX) = AE(X)= Apx =(i o OJ (b) Cov(AX) = ACov(X)A' = ALxA' = ( ~ 12 0 o 24 (c) All pairs of linear combinations have zero covariances. :.S.! = (4 tOt -4) i ':2 = ~z . 2.xl! = (-'.866 . Consequently So =( Z -3) and R= ( 1 .2 = :~ -:2 or :5'2 = -9/3 = -3. Therefore n 31. 0) I c) et L = m.44 Chapter 3 3. .xz! = (3. n 5'2 = ~i':2 or slZ = -4/3.86'6 1 J .. L = 12 . n ši. -11' ~2 = l2 .x2! = (-1 t '.i.. -.5 1 (32/3 -413) "( 1" -. L =11 ':1 ~2 Let e be the angle between ':1 and ~2' then eOs (e) = -9/16 x 18 = . = L!l or s" = 6/3 = 2. 22 ~2 Therefore n s" = L2 or$" = 32/3. and :2' then èos ~e) ~ -4//32 x 2 = -. Conse- quently S = and n -4/3 2/3R =-.1 :2 Let e be the angl e between . Also. r1Z = ~"s (e) = _ .5 :. Also.J b) :1 = II .5) 3. -3.1 a) ~ = (:) b) ~. = ~. n S = i2 or S22 = 2/3.8661 -3 '6 .2 a) g = (..8:6'6. 0)' c) L =/6. n 522 ~ l~ or szi = = 18/3 = "6. riZ = cos (e) = -.

.7/4 -9/2 9 .3 xl ! II = (1..N 3 -1 2 N 3.-- + 3 3 4 3.')' ~2J S . 4)'. . Thus d'i = (-3. 1 J' i l' · (. ... !2 = to.2) .) . so II . 1: = .Ši = .ii ! = (-2. 3).6 a) X'. ( 3 -9/2 J -1 3 ~ -3 -1 0 2 (-31 -3 -:J e -9) = -9 18 and Isl = 2. 1. -4 ( 32 1 -:J and l sIc: l2 -2 6 b) ii ! + (ll . 0.o1. .l6 0 =e -:)E ~.l1& X :) . (34 -2 4 .3. = (3..92 = 23' the matrx of deviations is not offull ra. 1.xl l) 1 2S=(X-xl CX-xl .1 x' = r -~ ~ -~ J. -1) and ~/3 = (-3. 2 S = (X . -3).45 3. 1..5 _2J 3 1 5 5 . 4.. .(~ 1 .J 1 :J ~ ". 3 (: i :J _.1 x')' ( X-I ?) = so S =. Thus li = 4 = a) l'. Since.

~2 = (.The 3-dimensional volume .707.. -. . !~ = (. -..1 ~ (-:~: 519 Eigenvalue-normalized eigenv~ctor pairs for 5-1 are: À1 = 1.enclosed by ~he deviation vectors 1 s zero.7Ð7) are: . 1 X')' X-I-xl) = ( ~ ~ "i ( øw 15 So S = -3/2 1 -1/2 ( 15/2 9 -3/2. .707. s = -4 5 ( 5 -4) -1 . The major axis of ell ipse 1 ies in the direction of ~2. c) Total sample varia-nce = 9 + 1 + 7 = 17 . -4/9J S".7~7. 4/9) 4/9 (5/9 Eigenvalue-normal ized eigenvectors for 5/9 5-1 Ài = 1..7n7) Half lengths of axes . if) For s= . .- of ellipse (x-x)'S-l(X-X) S 1 are l/Ir = 1 and l/~ = 3 respectively.707) i À2 = 1/9.1 = (. .707) À2 = 1/9. :~ = (. Isl = 0 (Verify).. the minor axis 1 ies in the direction of :1.46 15J -3 b) 2 S = (X -. .707. The 3 deviation vectors lie in a 2-dimensional subspace. 2 -1 -1 l4 1'5/2) -1/2 7 . . - 3.7 All e11 ipses are 'centered at i) For S = (: : J ' -x .

G 1 0 Isl = 1 ~J..x) ~ 1 are. lJ axes Half lengths of (x. -1/2 For S =(-1~2 -1/2 1 . S-l = (1/3 OJ Eigenvalue-normalized eigenvector pairs for 5-1 are: ).. fn fact.2 = 1/3. o 3 0 l/3 iii) For S = (3 0). . (x . ~i = (1. 1 Isl = 0 .l/2 -1/2J -1/2 . !~ = (0..es in the direction of ':2. 3.. Note that ~2 here is =1 in "part (i) above and =1 here is =2 in part (i) above... 0) ). Here. the salid ellipse. a solid sphere..8 a) Total sample variance in both cases is 3.x)'S-l . The major axes ellipse li. Major and minor " axes of ellipse can be taken to lie in the directions of the coordinate axes. . = 3.47 Half l~ngths ~faxes of ell ipse (x.1 = 1 13.is.x)' 5-1_.. 0 b) For S.x) s 1 of ellipse are equal and given by l/ir = l/lr = 13. Notice for aii three cases 1s1 = 9.. (x . l/lr = 1 and 1/1. of the again.. the minor axis lieS' in the directi~n of =1.

5 ( c) Check.5 18.5)3 _ 0 -+(2.5) As above in a) Sa = ( 3 ~ . -1 J' gives Xca = O. -1 J' giv. 1.5 .53 -= 5~~ J .18.~ 5~~ so S = -(13)2(2.5(2. .5 2.5) = 0 13 5.5 2.2.5 " ( J I I 10(2.18. 1.es Xca = O.5)(18.52.5) _ 9(18.5) + 39(15.¡o~ J 13 + 5. ~b) S =.0 2.5) + 39(15.3 J' and -2 -1 -3 1 Xc= -1 2 0 2 0 -2 1 3 -1 and we notice coh( Xc)+ ~012( Xc) = cOli( Xc) 0 1 so a = iI.5 .5)3 I . (b) S = 1~ 2.48 3.10 (a) VVe calculate æ = (5.9 (8) Vve calculate æ = (16.5 soI-S-(2.55 J Using the save coeffient vector a as in Part a) Sa = O.34 l and -4 -1 -5 2 2 4 Xc= -2 -2 -4 4 0 0 4 1 1 and we notice coh( Xc)+ coh( Xc) = cOli( Xc) so a = fl.5) -55(5.5)2 0 + 0 = i) ( 2. 3. 0 2.

1 4213 J 3. by the first equation in the fil"t set. a2 = O.11 Con~equently S = 14213 i14808 15538 o) 09:70) .49 (c:) Setting Xa = 0. 01/2 = (121 ~6881 R = 124 . . The columns of the data matrix are linearly independent.6515 ( 09:70 and 0-1/2 = (" 0:82 00:0 J The relationships R = 0-1/2 S 0-1/2 and S = 0'12 R 01/2 can now be verifi ed by direct matrix multiplication. 3ai + a2 = 0 7ai + 3ag = 0 so 5ai + 3a2 + 4ag = 0 ai 5ai g -"jag 3(3ai) + 4ag = 0 so we must have ai = as = 0 but then.

50 3. a) From fi rst pri nciples we hav. b) ~-I= (5 .8. 2) Using (3-36) and S · ( ~: -12 J .e f ~l · (2 3) (~J' 21 - Similarly Ë' ~2 = 19 and Ë' ~3 = 8 so sample mean = 2l+19+8 = 16 3 sampl~ vari ance = (21_16)1+(19-16)2+(8-16)2 = 49 2 I Also :' ~1 · (-1 2) (~J = -7. C -2 X= _ 1 and :' ~3 = 3 so sampl e mean = -1 sampl~ variance = 28 Finally sample covariance = (21-16)(-7+1)+(19-16)(1+1)+(8-16)(3+1) = -2.14.

SJ -1.28 '" sample covariance of .. 3. c' X :b'Sc=(23) " . and.5 (13 1 sampl e mean of -b.-1 sample variance of b' X · ~' S~ · (2 3) e: -121(: 1 = 49 sample variance of C' X = :' S:..samp1 e variance of b' X = l2 sampl e vari ance of c' X = 43 sample covariance of b' X and c' X = -3 1.5 3 .1 · . b' X. =-28 .5 -1.5 E · (.51 sample mean of b' X =~' ~. n part (a). l6 -2-2J1 (-11 2 Resul ts same as those .' (-1 21 C: -121 (".5 1.-..-X= 12 sample mean of c. S = -2.15 -2.1. (2 3) (:1 = 16 sample mean of :' ~ = (-1 2) (:1. X = -1 .

3. s~ =(1 -1 0 O)S(1 -1 0 0) =.154 ..873 .!:VJ:V -: !:V~V + ~V!:V = E(~') . Then y=(i -1 0 0)x=. . ' we have E(VV') = * + !. 'E(V' )" .E(~)!:V .!:V!..V.V!. .~V~ +~VJ:V .~V ..18 (a) Let y = Xi+X2+X3+X4 be the total energy consumption.~V _ +~V!:V :: E(~ ) .52 3.258 .: E(~ ) . s~ =(1 1 1 I)S(1 1 1 1) =3. I I) = E(~ .16 S 1 nee tv =E(~ -~V)(~ -~V)' I . Then y=(1 1 1 l)x=1.~ .913 (b) Let y = Xl -X2 be the excess of petroleum consumption over natural gas consumption.

B81' and has half-length ý' c = \I.9 . Ellpse centered at (0. The major ax lies in the direction e = I.Xl . .Xl .i)(X2 . haif-length .366\11.72.3) + -(X2 .8 x J2 50 I E I = .72 -(b) .8 x V2i J ¡i=(.J E=¡ -.94. I' = (n E =( 2 1 V2 and 2 L-l = v' ~) so I E I = 3/2 -4 .1)(x2 .72. The minor axis lies in the direction e =i-Aß-O .-Xi~X2 .S3 Chapter 4 4.39 = 1.2 ) ~c) c2 = x~(.9 .81.72 and E-1 = ~: i (i 1 2 2V2 2 ) (27l) .1 + .3)2) 1 ( )2 2V2( 2 2 .2)' with the major ax liav- in.460)'.39.5) = 1.ô34v'1.( -(Xi . V2"J 2 -T :i = (27l)'¡3/2 3Xi -23Xl(X2 + 3 X2 .72 2 .¡ c = \12.2 ta) We are given p = 2 .SSg.3) + -(X2 . .3) 4.S9 = .2) + -(X2 .2)4()2 (b) 2 2 2V2 4 2 3 3 3 -Xi . .1 (a) We are given p = 2 i 2 -.72 ( 4: i V2) :7 I(:i) = V: exp -.2) ) I( ) i (exp1-2"(2 2V2.1) + -(Xl .

_ ~ ).3 with A = _~ 1 to see that the covari anc. 1 +1a. (0 . 013 = 023 = 0 d) Yes. . by Result 4.5 that relates zero covariance to statisti~a1 in- dependence a) No.e i~ 10 and not o..3 We apply Result 4. (X1+XZ)/Z and X3 are jointly normal and their covariance is210 = 0.54 Constant density contour that contains 50% of the probability oc? I. 012 1 0 b) Yes.3. by Result 4.o -3 -2 -1 o 1 2 3 x1 4. form A * A i .. 3 2 ¿3 e) No.I. 023 = 0 c) Yes. ~ C' x ~0 .

As an example.7 (a) XilX3 is N(l + "&(X3 .X2-aiXi-a3X3) = : .5) .3X3 are dependent since they have nonzero covariance au + 20"12 .2X2 + X3 is N03. Since)(2 is independent of Xi.i. 1) c) x3lxi .(b) Xilx2.2) .5 = (3-2a3.5) . conditioning further on X2 does not change the answer from Part a).9) b) Require Cov (X2.2a3 = O.ss 4.3. te) Xl and Xl + 2X2 . Thus any ~ i = tai .a3J of the fonn ~ i requirement.4 a) 3Xi . . 3.x3 is N(-2xi-5. . ~c) X2 and X3 are independent sin-ce they have a bivariate normal distribution with covariance 0"23 = O.!) 4. X3 and X2 are independent since they have a trivariate normal distribution where al2 = û and a32 = o.a.5(xa . a) Xi/x2 is N(l'(XZ-2).x2 is N(¥x1+X2+3) .X3 is N(l + .1).~) b) X2/xi .6 (a) Xl and X2 are independent since they have a bivariate normal distribution with covariance 0"12 = O.a3J wi 11 meet the -a' = (1.3a13 = 4 + 2(0) . (d) Xl.2) . (b) Xl and X3 are dependent since they have nonzero covariancea13 = ..3( -1) = 7 4. 4.

Similarly.. ( ~: J is distributed N. with Cl = C3 = 1/4. (l. l~ J ) so the joint density of the 2p variables is I( v¡.) = (21l)pf lE I exp ( . ) E = iE.4 we have Ej=1 CjtLj = a and ( E1=1 c..) + -( -) + -( .for .=i bj/-j = l/. setting b1 = b2 = 1/4 and b3 = b4 = -1/4.5 we find that V 1 has mean EJ=1 Cj tLj = tL and covariance matrix ( E. ) L = fE. VI is N(O. l~ r (:: J ) 1 .S6 4.16 (a) By Result 4.Jl -( .8.s( VI E Vl + V2 E V2 4. iL)..(-)5. setting bi = b3 = bs = 1/5 and b2 = b4 = -1/5 we fid that V2 has mean ì:.p (0.8. Consequently. we know that Vi and V2 have covariance 4 (1 1 -1 1 1 1 -1 1 1 1) 1 (~b'c. (1 i -1 i -l ) ) = (27l)pl lE I exp . lL). we know that Viand V 2 are jointly multivariate normal with covariance 4 (1 1 -1 1 1 -1 -1 -1 ) ( L bjcj ) L = -( -) + -( .17 By Result 4. with Cl = C2 = C3 = C4 = Cs = 1/5 and /-j . we find that V2 is N(a. Again by Result 4. . C2 = C4 .8.) E = 0 j=1 4 4 4 4 4 4 4 4 That is.tL for j = 1. v. J (l.. (b) A.gain by Result 4. v.and covariance matrix ( ¿:J=1 b.~(v.L = lL. j = 1.=1 cJ ) .)+-( .)L= -( -)5+-5.. Similarly... J 1 "5 5 -)5+-( 5"5 '5 5-) ~=-E 25 .-1/4 and tLj = /.8. ...

~¥o -i). 195 has a Wishart distribution with 19 d.1) entry =011 + ~22 + tf33 .f.-J.57 4.6) i 1 L .11 we know that the maximum 1 He1 i hood estimat.24 +tf34 -'ZOlS +:tZ5 +-r35 +?'l ô .f.G13 + Z'23 l' (1 .(~JfP 1)1 b) From (4-23).Nô(~.io t). Then ~-~ .°46 .(GJ-m)(~H~J) '.t) c) From (4-23).m(i j) .2) entry = 0ô6 + :t55 + tf44 . 4.es of II and and t are x = (4.io t) and finally I2 (~-~) .-x = t tmH~J)(m-mHm-(~J)((:1-(m' -J .f13'ô (2. ~ .012 .N~(~.18 By Result 4.20 8(195)B' is a 2x2 matrix distributed as W19('1 BtBt) with 19 d.N~(~.2) entry = -r14 of :t. where 1 1 1 1 1 1 1 1'1 1 1 i a) BtB i has (1.-)' n _n j=l (x. = t tc~J Gi aj.za26 .-x)(x.((~J-m)mimn . b) stB' °31 =l °11 G33J .°S6 + zc45 °131 .

)'( n-1L )-l( X . ) is distributed as chi-square with p degrees of freedom. the sample size is 1 arge .Jl )'~-l( X .J.J. )'L-1( Xl .1 n-l~ ) (b) Xl . Since i L can be replaced by S.21 (a) X is distributed N4(J.J.J.J.S8 4.J. (c) Using Part a)i ( X . ) = n( X . ) is distributed as chi-square with p degrees of freedom.is distributed N4\OI L ) so ( Xl . (d) Approximately distributed as chi-square with p degrees of freedom.

) By (4-28) we ~onclude that ýn(X-~) S ~-~ is approximately b) X2 p.9351 (see Table 4. Q-Q Plot for Dow Jones Data 30 . Since TQ = .95 and n = 10. 20 . 0 . but the sample size n = 10 is small. 4.59 4. -20 -2 -1 0 1 q(i) (b) TQ = . cannot reject hypothesis of normality at the 10% leveL.95 ~ .23 (a) The Q-Q plot shown below is not particularly straight. . c i -1(.t) and that ~ is approximately Np(~'~ t).22' a) We see that n = 75 is a sufficiently lar"ge sample (compared with p)and apply R. .13 to get Iñ(~-!:) is approximately Hp(~. -C 10 . Diffcult to determine if data are normally distributed from the plot. . . 2 .esult 4. )C . -10 .2).

100 50 -2 -1 o 2 1 q(i) .'I.24 (a) Q-Q plots for sales and profits are given below. 150 . 10 -2 ~1 o 1 2 q(i) (b) The critical point for n = i 0 when a = . we cannot reject normality in either case. Q"4 P1Ót for l)rofits . although Q-Q plot for profits appears to be "straighter" than plot for sales. i 0 is .935 i. 200 ~ . TQ = . . Q-Q Plot for Sales 300 250 a. . TQ = . Plots not particularly straight. For sales.940 and for profits. Difficult to assess normality from plots with such a small sample size (n = 10).9351.968..60 4. lS . Since the values for both of these correlations are greater than .

7978 1.8365 4.3520 4.2831 4.5 Ordered SqDist .6416 2. 4 5 6 303.25 The chi-square plot for the world's largest companies data is shown below.0411 3.8 237054 Chi-square Ouantiles .8147 7 8 .1095 2.7 S = 303.3170 7.4073 1.1891 4.6 -35576 J x = 14.3142 1.2894 1. 5 4 1i is 3 g '" l! u 2 '! o 1 o o 2 1 3ChiSqQuantii.61 4. Information leading to the construction of this plot is also displayed.1083 5.2125 1.3518 . The plot is reasonably straight and it would be difficult to reject multivariate normality given the small sample size of n = i O.6J (-35576 7476.9091 26.6 710.2 -1053.0195 3.9 (155.8 -l053.6418 2.6430 3.

62 x=( 12.7539 1..1898 . . (b) Since xi(.2153 50% contour. . 3.48 5.7102 J s-I 1.8165.5) = 1.7329. . 1.0203. 2.2569 4. 2 (d) Given the results in pars (b) and (c) and the small number of observations (n = 10).2569 =(2. .7102 30.0176.3105. 4. . .8544' J Thus dJ = 1.6222 -17. . CÍ1i~squåre pløt for .8753.20J s=( 10. it is diffcult to reject bivarate normality. . 5 observations (50%) are within the (c) The chi-square plot is shown below.9009. 2.3753.26 (a) ' -17.39.7353. .

5 i. .5 \'a(i) %. *3* **2* :..a = . \ -2. * 2 2 2 so.5 Since r-q = .S -0.4. S i I -1.2 for n = 40 and . 63 * * * 100.27 q-~ plot is shown below.' 40..970 -i . *".05) t we would rejet the hypothesis of normality at the 5% leveL. * * * * * :\ 20.3 3 2* 2* 60.973 (See Table 4.5 0..

= 1.8162 1 .7783 8.30 (a) ~ = 0.3159 2.9351 the critical point for testing normality with n = 10 and a = . 47£ 1 1. 1849 1. 7647 3. We cannot reject the hypothesis of normality at the 10% level (and.3771 0.1901 o .1472 0.64 4. Generalized distances are as follows. (b) ~ = 1 (i.6392 0.10.1 O. 6592 10.4584 1.29 (a).= 1 so these values are not ruled out by the data. 1225 1 .1388 0. not at the 5% level).1489 1 .0360 0. For ~ = 1. TQ = .4135 o . 7874 O.2489 3.8856 o .8985 2 .9351 the critical point for testing normality with n = 10 and a = . (c). 80 14 (b).ti) = 1. 1388 7 .4731 1. 6283 0.4135 1. CHI-SQUARE PLOT FOR (X1 X2) 8 w a: 8 ~ ~ 4 c 2 0 0 2 4 6 8 10 ~saUARE 4. 8988 4. 7032 0. (c) The likelihood function 1~Â" --) is fairly flat in the region of Â.4607 1 . x = (~~4~:~~:~)' s = (11. So the proportion is 26/42=0. not at the 5% level).e.4607 o . These results are consistent with those in parts (a) and (b). no transformation). -. 2. 6592 O.0089 6.39 is 26. 0857 o .6190.1225 o .6494 o .3566 o .0360 2.6370 o . For ~ = 1.363531 3~:~~:~~~).4438 0. consequently.5 but ~ = 1 (i.e.7741 0.6228 5.981 ~.971 ~. n-n niot~ follow . The number of observations whose generalized distances are less than X2\O. TQ = . 1380 O. consequently. We cannot reject the hypothesis of normality at the 10% level (and. no transformation) not ruled out by data.

95986* 0. Those for (Xh X2).98098- Transformation (Xl + 0. given in the next page.USïO for n=98).4 0.59 x.S 0.ûå5)0.2S 4 X6 Xs 0.99057 - 0.005)°.9970 0.96341 X4 0.'(05)°. The multiple-sclerosis group: rQ - - - (X5 + 0.32 Xl X2 X3 X4 rQ 0.98079 *: significant at 5 % level (the ci"itical point = 'Ü. 4.s 1 rQ (Xs + 0.94482X-o.96133Xi3.9640 for n=29).32 Transformation *: significant at 5 % level (the critical point = 0.¡0.94526- 0.97209 0.005)°.4 0.21 (X3 + 0.91137 0.978-69 Xs 0. .65 4. (Xh X3).95039- X3 0. Transformations of X3 and X4 do not improve the approximaii-on to normality V~l"y much because there are too many zeros. 4.98464 - 0.005)°.31 The non-multiple-scle"rosis group: X2 X3 X4 Xs 0.005)-0.95585(X3 + 0.26 Transformation *: significant at 5 % level (the critical point = 0.9826 for n=69).79523- X4 0.92779(Xs + 0.94446- Xl X2 X3 0.91574X¡3.Sl (the critical point = O.49 *: significant at 5 % level - XO.33 Marginal Normality: rQ Xl X2 0.9652 for n=30).84135- Xi 0. Bivariate Normality: the X2 plots are (X31 X4) appear reasonably straight.

X4) 8 8 6 w a: w ~ c ~. :f CJ 2 2 0 0 0 5 10 e-SOUARE 15 0 2 " e-SORE 6 a .¿ ~ ~ 8 i: c " is is 2 2 0 0 2 0 " 8 6 CHI-SQUARE PLOT FOR (X2.X3) CHI-SQUARE PLOT FOR tX1 .X3) CHI-SQUARE PLOT FOR (X1.X4) 8 8 8 w 6 a: c ~ i: ~ 8 e-SOARE e-SOARE w 10 e-SORE e-SOUARE i: c~ 8 6 " 2 0 10 " " :f (.¿ " :f 6 " is CJ 2 2 0 0 0 2 " 8 8 " 2 0 12 10 8 10 12 CHI-SQUARE PLOT FOR (X3.X2) 8 8 6 w " ~ .X4) CHI-SQUARE PLOT FOR (X2.66 CHI-SQUARE PLOT FOR (X1.

34 Mar.97209 0. rQ increases to .979 for density.ginal Normality: Xl rQ. 0.95162- X2 X3 X4 0.991 .9591 for n==25).974 for n = 41 From the chi-square plot (see below). 4.897* * significant at the 5% level. Bivariate Normalitv: Omitted. Moreover. the chi-square plot is considerably more "straight line like" and it is difficult to reject a hypothesis of multivariate normality.i .(CrossDir) Xl (Density) .924* rQ I . it is virtually unchanged (.98421 0. Chi-square Plot 3S :l 25 :! 15 Chi-square Plot without observation 25 10 6 10 12 2 4 6 B 10 12 .99011 Xs 0.99404 *: significant at 5 % level (the critical point == 0.35 Marginal normality: & (MachDir) X. If this observation is removed.67- 4. it is obvious that observation #25 is a multivariate outlier.926).98124 X6 0.992) for machine direction and cross direction (. critical point = .

984 . There are several multivariate outliers. times measured in meters/second for the various distances are more nearly marginally normal than times measured in seconds or minutes (see Exercise 4.978 for n = 54 Notice how the values of rQ decrease with increasing distance.921* * significant at the 5% level.985 . 4.37 Marginal normality: 100m 200m 400m 800m rQ I .36).866* . .968* 1500m 3000m Marathon .909* .929* .976* .947* . There are several multivariate outliers.68 4. The chi-square plot is not consistent with multivariate normality. Notice the values of rQ decrease with increasing distance.978 for n = S4 As measured by rQ. The chi-square plot is not consistent with multivariate normality. the distribution of times becomes increasingly skewed to the right. In this case.989 . As the distance increases. critical point = .952* 1500m 3000m Marathon .969* .859* * significant at the 5% level.36 Marginal normality: 100m 200m 400m 800m rQ I .983 . critical point = . as the distance increases the distribution of times becomes increasingly skewed to the left.

Marginal and multivariate normality of bull data Normaliy of Bull Data A chi-square plot of the ordered distances o C\ r:l/ .0 .. 'C CI ~0 -ë ...... .69 4. 2 4 6 8 10 12 14 16 18 qchisq . c: ai I/ u...38... o lt . .. -2 .C' -2 -1 0 1 Quantiles of Standard Nonnal 2 -2 -1 0 1 Ouantiles of Standard Norml 2 . iu.2 -1 0 1 2 Quantiles of Standard Nonnal 00 .¡ ... II 0010 =- :i ai 0 .01 r = 0. . d lt co -2 0 co II 0 1 Quantiles of Standard Nonnal -1 2 ...g ¡¡ 00 en I/ ..9934 normal 00 _ is: . Gl _I/ :i CI co Æ ..9847 nonnal r = 0.' ~'" .C' C\ _I/ 01 "8 0 .9631 0 -1 Quantlles of 1 2 Standard Nonnal I/ c: r = 0. 0I/ ~ :: u. I/ r = 0. ~~ oX 0 ai 0i- C\ . C\ I/ 0 I/ 00 . .1 0 2 1 -2 Quantiles of Standard Norml 0 II not nonnal r = 0. .9956 normal lt r = 0..9376 not nonnal . ~ a.9916 normal 00 .

1213 4.3870 1.3982 10. 7797 34 4.6937 67 9 .4472 5 .1011 3 . 1129 5 .1263 73 11.8108 6680.9929 3.4599 4.9286 64 8.1430 -0.2891 5 . 4963 62 8 .1986 5 .6795 33 4.0080 Ordered dsq qchisq 51 52 53 54 55 56 57 58 59 6 .3191 69 10.9273 11.6197 5.0560 2.3115 15 3.2766 14 3.or the Q . 6097 3 .6725 2.0189 2.8618 4 . 7748 8 .9254 2.9254 -0.0534 209.0413 4 .2522 4 . 3982 2.6287 8. We reject the hypothesis of multivariate normality at a = 0.9839.4411 3 .7081 o . 3004 5.5808 2. 8587 5.9078 4.1430 Ordered dsq qchisq 26 3.5453 3 .6958 10. 9863 7 .8037 11.4130 147.4024 5 . 7603 4. 1896 5 .7225 75 17. 1263 1555. 3669 17 3.2896 16850. 1405 7 .4744 13.5743 2.4017 5 .6149 15.6485 3.5751 17. with a = 0. 7007 3 . 5279 2 .0677 76 21.3830 30 4. 9980 100 .9831 82.5938 6. 9118 2.Q plot correlation coeffcient test for normality is 0.9826 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 4.5512 1 .5099 5 . 0873 8 .1286 1 . 3048 3.5801 32 4 .4577 7 .8449 5 . 7313 5 . 1276 7 .2896 83.6681 3 .5044 -0 . 1876 5.9401 6680 .7551 2. 2763 6 .7832 2. the critical point f.2851 29 4. 8957 3.1967 54.6751 6.05. 7751 1.57'60 6. 3088 YrHgt 5-0.2828 4 .0902 27 3 .0534 -1.2524 6.4130 4. 8667 4. 6693 6 .9474 70. 0855 5.9401 82.2021 1 .5906 2.3721 18 3.2895 1 2 3 4 5 6 7 8 2 .3793 2.7754 6.70 XBAR S FtFrBody 100.1875 28 3 . 0495 2.6491 6.5044 -1.6829 70 10.0506 SaleHt SaleWt 2. 1763 7 .8168 7 .4141 19 3 .3518 5 .6027 22 23 24 25 3.0783 5. because some marginals are not normaL.8108 129.4197 66 9 .4812 31 4. 3088 3.9971 3. 8439 6. 7236 3. 1657 65 9. 7604 2.430 1 7.881-6 0. 2036 3 .7470 1.3189 3.9281 7. 2679 13 3.1911 2. 1445 4.0506 2.7062 63 8 .8912 2.8160 74 12. 7762 1. 3396 0. 7983 3. 9600 -0.6430 8 .6060 6 .6618 PrctFFB BkFat 2 .5587 9 10 11 12 3 .05 and n = 76.9917 68 10.321$12.3006 12. 3264 6.4049 20 3.7395 3.4142 83 .5224 995.3973 9. 0936 71 10. 1305 Ordered dsq qchisq 1 .6333 6. 5041 21 3 . 8806 35 4. 9600 209.9836 6 . 3989 9.1305 8594. .3470 16 3 .5816 8.7940 9.9605 5.1085 7 . 7554 7.2975 60 61 8 .5665 72 11.9831 129.6917 -0.6254 10. 2949 5 .6748 6 .0180 147.6524 9. 2244 4.4142 -0.8649 From Table 4.5896 7 .3215 6.3439 2 .2. ..e.71 4. Certainly if this observation is deleted would be hard to argue against multivariate normality. supp...984* .991 * significant at the 5% level. Plot is straight with the exception of observation #60. ".990 for n = 130 (b) The chi-square plot is shown below. . critical point = . ~& .993 . leader 15 .5).. o o 2 4 6 8 10 12 14 16 18 q(u-. rQ = .998 and we cannot reject normality at the 5% leveL. normality is rejected at the 5% level for leadership. benev. conform. . 1 = 0.997 ..997 rQ I . . . 10 . du)A2 5 .39 (a) Marginal normality: independence support benevolence conformity leadership . Chi-square plot for indep. If leadership is transformed by taking the square root (i.5)/130) (c) Using the rQ statistic. 960 after transformation.e. square root) makes the size observations more nearly normaL.01/.9389.u. 10 -1 1 (c) The power transformation ~ = 0 (i. Great Smoky park is an outlier.given next. The Q-Q plot for the transformd observations is given below.72 4.. rQ = ..' . logarithm) makes the visitor observations more nearly normaL..904 before transformation and rQ = .40 (a) Scatterplot is shown below.e.9389. The 5% critical point with n = 15 for the hypothesis of normality is ..t . o G. The 5% critical point with n = 1"5 for the hypothesis of normality is . cO "Visitors 5 6 8 7 9 (b) The power transformation -l = 0.975 after transformation.5 (i.837 before transformation and rQ == . . The Q-Q plot for the transformed observations is .. . rQ = .0': 500 o -.. ..... .... Given the small sample size (n = i 5).... transformed nat Chi-square plot for .. 2 3 4 5 Chi~square quantiles 6 7 .-.. ........ ".-. "... the plot is reasonably straight and it would be hard to reject bivarate normality... -..73 (d) A chi-square plot for the transformed observations is shown below......-........ o Ð 1 ... .'.. _. .:.'... ... Dutå60n Q-QPlatfotNatural Log 3. (b) The power transformation ~ = 0 (i.0 2. There do not appear to be any outliers with the possible exception of observation #21.958 before transformation and TQ = .74 4.41 (a) Scatterplot is shown below.9591.5 1.5 (I 5' o~ 2.0 -2 -1 o q(i) 1 2 .989 after transformation. The 5% critical point with n = 25 for the hypothesis of normality is . The Q-Q plot for the transformed observations is given below.e. logarithm) makes the duration observations more nearly normaL.0 1. TQ = . .::-...~ quanti 6 7 8 ..::'_'..:.... The Q-Q plot for the transformed observations is given next.:. ...snow... . ..'.:'...'__d"":.939 before transformation and rQ = ..:.'d. ". . ... ' .. The plot is straight and it would be difficult to reject bivariate normality.:.'._. . ManlMachinl! Time . .0 0'0 10 8 6 o ..._'"':::'..""" .. _ "".991 after transformation.7S (c) The power transformation t = -0. Ci.-.":':'"::-_'..' "'.."'.' _ '..-." 'J..:':::._.'..QeQ'.. -. reciprocal of square root) makes the man/machine time observations more nearly normaL.5 (i. . ..:::. rQ = .rernovat data . 2 .'.e. . The 5% critical point with n = 25 for the hypothesis of normality is ._:....'g'~ Chi-square plot for transformed. . ..c_'. 3 4 5 Chi-squa.plot for Reê:iprocal of Square Root of . .. ....9591. o .2 -1 o 2 1 q(i) (d) A chi-square plot for the transformed observations is shown below.. .) ~ " (i60) ..0325 .- b) li . we do not reject HO at the a" .g1 (:j-~O)(:j-~O)'! 5. 5.1 since \11 = (..60) is inside the ellipse.-i)(xJ.05.19.40( .60).3 a) TZ .64 (.3 = 13:64 !j=l r (x.I j~i (~r~~H~j-~o)' i/ Wil ks i 1 ambda '" A2/n = A1/Z = '.23) = 6. do not reject H1l at the a .05) .2 (see (5-5)) c) HO :~.2. 2~~) F2.40( .11) a =.05 so F2. s" (-i-:/3 f2 = 1 SO/ll = 13.05) =. FZ ..0325 (44)2 ..-i)'! ..00 Since T2 _ 13. .2(.17 a -. -J .'55. 3.05 level.5 HO:~' = (.17 (.1803 5. The r. n ..' 3FZ.23 Since TZ '" 1..1 .2(.7-6 Chapter 5 -i:13).(n-1) = 3(~4) . TZ = 1.J..62..05(3.05) = 3(19) =57.64 b) T2 is 3ri. ~ (7.05~ .'5'5.05 1 eve1 n (n-1)!.(I Jïi (~j-~)(~j-~) 'I 'r = 244 =.esult is ~onsistent with the 9Si confidence ellipse - for ~ pi~tured in Figure 5. ) el X-\1 .92.56.025/6) = 2.04.: .(.49. 19.SA - 42(~.17.74) d) Because ofthe high positive correlation between weight (Xi) and girt~X4).61.939 (See confidence ellpsoid in par d.CI-S -0 5. 115.002799 .5'5 J ) -1.. 61.9a'"J .27.52 -Pi) Beginng at the center x' = (95.J12.£.u4 .. -1.48) Gii1h: (86.8181 t18L818 212.76) Head length: (16.343 :t . 100.29) Body leng: (16.J3695.014) .728 (Alternative multiplier is z(.P4 L.Ji 2. ) (.01. .636 -1 9091 .41) lengt: (152.59(. 60.006927J(9S.59/61 = .93.52.'3L.00. than the 95% Bonferroni rectangle.55.909j 5.909 = (2.01 i71f2. ':J .22) b) 95% confidence region determined by all Pi. 33.48) Girt: (83. 103.5 9(' 939) .0117 .0144 .52. the 95% confidence ellpse is smaller.2064 .39 ~93. 173.Sti4 J -( . 19.343) . 32.0146jL-i.J45.56.603 .) c) Bonferroni 95% simultaneous confidence intervals (m = 6): 160 (.52 ...51) Body lengt: (155.0g) Neck: (51.636 J tZ = n(~'(~-~O))Z = a' .006927--.P4 such that (95. 176. (75.up93.6361 1. the axes of the 95% confidence ellpsoid are: major axis . more informative.39 12.025 / 6) = 2.8 f227.9 a) Large sample 95% T simultaeous confdence intervals: Weight: (69.77) Head width: (29.52.273 -181.31 = TZ r2.121 J .37) Head width: (29.019248 .60 ((. 121.638) Weight: .39).77 -1(. minor axis :t .~.003 2 = 1.59) Head Neck: (49. . ..tls ) follows.us:: 13.-.. ... .. ..025/7) = 2. . . ..690) n 61 x6 -xs :tt60 . . .. large sample simultaneous Bonferroni ..0 LO 0 .. . . . i I : i : :. .95 or 12.. . . . . . .u6 . .78 5.. .. LO CD I .98) +_ 2. . . ..-~ -------.49:: tl6 -.81 ..---.. .Continued) Large sample 95% confidence regions.(0036) S66 . . .'. ....025/7) = 2.88) + 9.0 "' -. .783 (Alternative multiplier is z(... . . (m = 7 to allow for new statement and statements about individual means): t60 (.9 ... . "l . ... .78~~2i. . ~ 0 CD 60 70 80 100 90 110 120 130 x1 e) Bonferroni 95% simultaneous confidence interval for difference between mean head width and mean head lengt (. 0 . . .2sS6 + sss = (31...--. I..26 -2(13. ... .---- LO )( C' 0 C' I : .13 -17. J . . . -' . . . 43) i1Lngth4-5: (-7.33.96 .24) ~Lngt-4: (-22.70.10 a) 95% T simultanous confidence intervals: Lngt: (13D..) d) Bonferroni 95% simultaneous confdence intervals (m = 7): Lngt: (137.65.J33.53.4-tl4_s ( i. 33.4).895) :tv157.Oll024 .tl4-S such that . 155. 198. 185.~~. 28.40) Lngth5: (166.00.97) .58) Lngt5: (155.96. 16-tl2_3.55. ( .55) i1Lngt3-4: (-3.37. 50.J72.95) Lngt3: (127. 149. 30.18.69.21) Lngt4: (167.895 447) (See confidence ellpsoid in par e.25. 15.43..447 mior axis :t . 53.025135 4 .95. the 95% confidence Beginnng at the center x' = (16.6Lngth2-3: (-1.42 .simultaneous intervals for change in lengt (ALngt): ~Lngth2-3: (-21.009386 . and tl4-S is the mean increase in length from year 4 to 5.91) b) 95% T.96/7=10.u2-3 is the mean increase in length from year 2 to 3. 174. 191. .u2-3) . 179.8 72.14) Lngth3: (144.37.. 187.69) c) 95% confidenceregon determined by all tl2-3.24.93) Lngt4: (160.33) .009386J(16. the axes of ellpsoid are: maior axis .14..79 5.~72.u4-S where .42) ~Lngth4-5: (-20. .. I : I . . .. . . .. ... -20 o 20 J. I ..0 :: 0 0. J.. ..... .10 (Continued) e) The Bonferroni 95% confidence rectangle is much smaller and more informative than the 95% confidence ellpse. .... vI .. . I .... .. . ... .. .. .. . 95% confidence regions..----------------~ .... : I i : I I I --~--------..2-3 40 ..... .. ..... .. ...~.. . o "l simultaneous T"2 o C" Bonferroni .. . .. .'80 -5.... . . .. ... . . .. .... 0C\ . .. . I 0C\ I I I I I I I I I I I . . .. .. . . ......... .... .... ...... . .. 45 Confidence Region 45 40 35 ~ L.0508 ~ .10) = T (3.n_p(.42 = 15. 36.0276 J -.1856.88.25) Sr: (-4.2412J.(5.094 £1 = (.97) (.-.10) =: 7 F2. . i. 17..8493 .0042 . S-1 =( 287.2412 527.759 ~ -- A .49.30.87. 16.11 a) E' =. 287. I -10 J x1 ( C r ) b) 90% T intervals for the full data set: Cr: (-6.0276 .49) §i 16 Fp.26) = 7. V) 'O N )( !' 15 I 20 25 30 35 40 45 -10 - . = (.0700) S = (176. 1 OJ' is a plausible value for i.0169 Eigenvalues and eigenvectors of S: .87) .t = 688.r- .83.81 5.7(. . -l. . 8. -UL .01 normty for d) With data point (40. 7I eo 50 u.5 1.0 1. -1.0..68) removed.0 0.0 os 1.3786 S =b r .627 we rejec the hypothesis of 80 .5 nomsrSr Since r= 0. .. 73. 0149 -T p1n-p 'I 1.47.. ii = (.9) Sc: (.53.10) '" 164 (3..0 .818 we rejec the hypothesis of this varable at a = 0.5 0. S-1·(2.11 (Continued) c) Q-Q pJotsfor the margial distributions of both varables .07 90% r intervals: Cr: (..8598..4'6) ~. 1.0303 1.5 nomscor normty for ths varable at a = 0. 17. .15. .0 1.0303 J 69..01 Since r = 0. . 40 30 20 10 0 ..10)= 7(62t F" 6(..8688). 10 .0.27) . F (.7675. -1.0406 J . oi 30 020 o.0406 -.7518 .82 5. . 8.5 0. 0. 5.5 i The first revised estimates are 'ß = 6. (0.0 0. 2. 0. 5.PiPIi =-PiPk.7799 0. 0.p¡)2(1 . the 95 % confidence intervals for Pi. 5. .8546 25 0. ( 4 i .P2 is (-0.6449 0. 0.8691 100 0. (b).7847 5.12 Initial estimates are 2 1.370). P3 = 0. 0.6911 0.8745 m 4 10 0.500 0.585.0 0.6836 0.084.219. Using Pj:l vx'5(O.310.a..16 (6).ßi :l Vx3.7749 0.(0.05)V(pi(1 .098.6.05)VPj(1 .105.401).13 The X2distribution with 3 degrees of freeom. Xkj) = E(XijXik) .Pi) (b).pi)2pi +(0 .6042 0.0833 i -( 0.2ßiPi) In.Pi) = Pi(1 .14 Length of one-at-a time t-interval / Length of n 2 15 0.682).7644 D.8125 i 5. There is no significant difference in two proportions.029.83 5. 0. respectively.Pi)fn.2500 1.5 'ß .6983 "0.0394). E = 2.9375 ( 4.-D5)VPj(1 . Using Pj:: vx3. the 95 %.221.488.ßi) .pj)ln.confidence intervals for Pi. 11.(0. 0.15 (0).0. . Var(Xij) = (1 . (0.0 .112).044.0000 . P2. P4.(0.412). E(Xij) = (l)Pi + (0)(1 . the 95 % confdence interval for Pi . ('0.6678 0.118.ßi) + ßi(1 .7489 0.E(Xij)E(Xkj) = 0 . Using Pi .17 ßi = 0. (0.5 0.Pi) = Pi.198) respectively. Bonferroni interval = tn_i(a/2)/tn_i(a/2m). Ps are (0.8632 -50 0. P2.258.(0. ~ . ßi = 0. 0.lô6).8718 00 0. COV(Xij.\0.1667 0.217).2. 11 are "(0. .oi o.e : . .'1. . -x __ (18£0. .19 a) The summary statistics are: 361"621 .473.. . .031 n = 30. That is...Lf xM i J ii o..50.. .30)'.: :. 0.. 1. 700 600 -.. 700 70 / 60 I' . .. . :..t .0 .-. 50' . 60 . N x I .. 0.03 348"6330. .~... i.. .o 30 40 50 60 70 15 20 2S 30 35 X2 X3 15 20 2S 30 35 X3 5.. 50 40 . . 1 0 I. ..- 25 20 30 -2 - ..103 .2 2 1 2 1 NORMAL SCORE NORMA SCORE NORMA SCORE 0 -1 ..._ .c ."1 -1 0 - . i .13 and s = (124055.999 ) ( -0.. ..17 361621 . . .33.. .. .. .- 30 J 50 40 400 35 ~ -1 0 .18 \lo).... Hotellng's T2 = 223. . . : ... ... . The group of students represented by scores are significantly different from average college students. x 500 400 I... 50J ~354 ..9'0 J ...994 .. .C ~ ..84 5. .037 0. . ..-.0 . .. The lengths of three axes are 23. We reject Ho : fl = (500.. 01 . ..038 ) ( 0.010 .. 30 ..183... (b). 0.....0. And directions of corresponding ax.! . 0 0 0 70 700 . .o .I- . -- 15 -2 2 1 -. 400 .. ..a.31..a. ... .:. .-. .. 500 I" ir 60 I' I . .. . 60 .. . . .05) is 8. .104 0.-I. . .. I.995 .. Data look fairly normaL.(c)...2. . . The critical point for the statistic (0: = 0. .006 ) ( -0. .730. .. 0. .. . - - . are -0. . . I ! . . . I '. 'v. : i : : ~ I : i : '.1 . .. I ! i . : .._. . .. - is given by the set of \1 a 95% confidence region for ~ - (1860. . : .5tl-~lJ . .~ = 138 ~ 1. ! : I i ! . 361621 . ... : i ~E . 2306 .13-~2) . . .000 - . /' i : .-.03 348633tl.fJ"w : .aoø. .. ~ ._- -~ Ì" /. : ! i . .__.4 and The l. n. - . IOQ" i 2. i i : ß~5'4.. .17 3'61~21. '1 .994394. ¡ .Fp n_p(a) = 3~ 2~i) F2 2St ..2306. since 1 p(.2300 Ir = 886. . .. i . : -- : f j . --_. : .øo. i . i .:~) . i . Th~refore the ell ipse has the form -------_. 2. i .. . l.tl5) = ..1 I 1 .. ' .. --_. ... . .. I ! i .2306 half lengths of the axes of this ellipse are 1. ' Then. .'50-\11' 8354. -----------_.13 .. . 3öOO ' I '.. ". ~ . . ! l : 11 .ft Xl : . . .105740. I : . ..13-~2 (124055.~~. 90 83~4 .994394) _1 Å2 = 82748 !2 = (.So .------_.03J' ~1(1860.1 0574~) . :¿1. i : .85 wher~ S has e i g~nva 1 ues and e. . : '. : .. ._. . i "J JI : ! . : . ¡ '- I 10000 : . ! i J . i : . g~nv ect~rs Å1 = 3407292 e~ = (. :: -- - l I . _--_. :i -2..989 and the correlation ...._--... ***** . .989 0...-. Finally.-------***" * .2--.0 1._-- * * * 6000. Ai so . the bivariate normal distribution is a plausible probability model for these data... we would reJect the hypothesis HO:~ = ~O at the 5% level. on .' . _._-"- .'0 t:rr._. coefficients for the test of normal ity are ..-- ** 8000 . the scatter diagram (see below) does not indicate departure from bivariate normality.0 I. -1.0 ------_.-_..86 b) Since ~O = (2000. the data analyz~d are not consistent with these values. ! ._-* *** ---. .._..~ 2._- ._--._--- . Q-Q Plot-Bend i n9 Strength X2 12000... Thus. l . . * . So.990 respectively so that we fail to reject even at the ii signifi- cance level..._...e 1 at.. 10000)' does not fall within the 9Siconfidence ellipse. c) The Q-Q plots for both stiffness and bending strength (see below) show that the marginal normal ity is not seri ously viol ated.. .._----- * * * 10000. ... ** ** .0 3. * 4000. -.. .. _..0 _ ____....._____. ***2 *2 ** ** **** 1600 .___ ._--_ -_. _.0 ._-.. .-.."__0"" .---" ._---" 1200.__._-_ _=J._~ ...9. .~... .9. * * 800._----------. 2.0 . ._ .990 .._...Correlation . I 0.-... * * * 2400 .._ _ _ _.. -..-_. -2. * * * .87 Q-Q Plot-Stiffness Xi 2800... * * *** * ::ooo. ..!.. 1. .._.__.._ .. ---~------I--:.0 . . ** * * * * * . . _.__.. * * * 2000. _.. ......... ._-.- ........ X2 ....-------. * ** .. 10000...._-~._.. * * . '. . 12000. ... ...---. -. 6000._... -. 2400 . . --. .-.'88 Sea tter 01 agram -_.. ..---.. i 4000 .... ... -..----_._..-~r..-:- ..*... I 4000 ... ..__...._..__.__. . .-_._-..-- I 80ÖO...... ..~. ._-_.._---- 1200. * * * * * **** 1600.._...-_...- ** * * *...... --------- 800.. ._-.. .1---0. 823 274.822 UPPER 197 .- 0 NORMAL SCORE 2 .: . 260 ..774 LOWER Bonferroni C.~ ii i ii.. ..o i .. i. ..20 (6)... iiu.. .. ¥ i" i' . 25S 285. . . . 280 270 .89 5. 200 .nsforming XI. 310 . .. . i. 189 ..299 Simultaneous confidence intervals are larger than Bonferroni's confidence intervals. Could try tra.. ii.. 'ii .. they are plausible since the hypothesized vector eo (denoted as . i . 300 . 110 II .: 189.782 284..2 280 270 .. . 180 r . in the plot) is inside the "95% confidence region. . . 250 -2 -1 0 NOMA SC-QRE 2 ISO 20 X1 . xN . ... (11)... )( 190 ./ . "7 ... 260 250 -1 . i. 96li S1mullJeouB Cooldence Region for Wean Veclor iiI ii. 1'. . 300 J 290 .. 423 274. . 210 -. Q~Q plots suggests non-normality of (Xii X2). --- . taneous confidence intervals wil touch the simultaneous confidence region Q-Q PLOT FOR X1 Q-Q PLOT FOR X2 310 -. Simulfrom outside.. " i.: Simultaneous C. i Ui i' . . (c). 29 xN . '..422 197 . . Yes. . 778 .723 1.465.88 and .79268 1.73484 0. t ( .81832 1.10756 0.642 .11402 0.69384 STDEV 0.10685 0.602 .9 .742 . 887 .28347 0.046 Bonferroni the T2 intevals use the constant 4.21 HOTELLING T SQUARE .635 .10295 . 95"6 1. 970 .766 .909 .26360 0.00417 ) TO 1. 583 1.753 .608 .786 .70440 0.800 The intervals use BONFERRONI TO - 2.~218 P-VALUE 0. 499 .84380 0.757 .540 1.914 2. 629 1.946 .90 5.880 1.3616 T2 INTERVAL xl x2 x3 x4 x5 x6 N 2S 25 25 25 25 25 MEAN 0. . 4 0 . .'.. X2 -2 ~ xM o.. 8 8 6 4 4 5 20 15 25 . X1 4 ~ ... 4 .. . .. x 14 12 10 .. . the approximation to normality is improved. . . o.X is 20 25 a: W .- . 2 4 6 8 10 X2 14 .. ø..' . 6 0 2 .. :. . 8 10 X XI x ..... . .'-- 8 6 0 .. -2 NOMA SCRE .1 2 -2 NOMA SCORE :: ~ ... .. -2 iI 8 14 . 4 2 . -- x.22 91 la). . 10 . After eliminating outliers. . '" . .l 8 . a-a PLOT FOR X1 a-a PL-DT FOR X2 a-a PLOT FOR X3 30 18 2S 15 20 X C/ W . ."" ..l:: a l:: a:: . .. o .. 5 ... 2 4 II . 10 5 . . '. .... . 14 12 10 8 .. 6 II 4 4 4 8 8 10 X1 14 -I 0 2 NOMA SCRE 111 12 10 . .1 14 '" . 10 111 . 30 10 S 1S a-a PLOT FOR X1 a-a PLOT FOR X2 a-a PLOT FOR X3 12 10 8 6 . .. :: 0 ~ .. . . . . ... . . . 6 .. .. ILL .. . .. . .... . . 18 16 14 12 10 8 4 NORMA SCRE II 10 I. .2 14 .. 2 0 -1 . 15 l- -I -2 0 .. 0 -1 2 NORMA SCORE l- 15 ~ . 16 14 12 10 Xi 14 X 18 30 18 CJ 18 18 14 12 10 I. '" NOM4 SCORE l- 12 10 8 -_. 5 2 .... .... _....- ~ .'.. .... . ..5. 5 ..' 10 a: . 14 12 10 .. 55 8.: Simultaneous C.79 15.67 12.78 10.44 9.96 11.34 Simul taneous C.41 13.33 5.24 10. Outliers remov.23 8. I.72 8.: 9.63 5.65 12.edi~ LOWER UPPER Bonferroni c. i. i.24 8.25 4.87 9.87 .09 12. (b) Full data set: Bonferroni C.76 Simultaneous confidence intervals are larger than ßonferroni's confidence intervals.92 l.21 15.19 12.16 5.82 12.: 9. I.: Lower Upper 9. . ..fe.= . . . . 55 110 - -i s: . . . .. 120 2 NScMB -2 -1 0 2 NScBH ~= . C) :i .c 't CD x t' :2 130 - .. . . I i 0 1 . . ¡ 10 . . . ... C) Ul lU m 100 - .23 a) The data appear to be multivanate normal as shown by the "straightness" of the Q-Q plòts and chi-square plot below. . . 45 - . I i "U 5 . . . -1 i -2 -1 . . .994 i. . .93 5. .. 140 ..4l. . .( -. . ... . . . .. . 120 - . . -2 -1 0 1 2 . . . . .. 140 - . or 0) :i Ul lU 130 ID .. . . . . .= . 90 - . .c Ul lU 50 - Z . . .. . . .. . -T . . .5)/30) i 2 NScNH i.1 I I . -2 '. - . . . = . I -1 0 ! r. . . . .995 10 - . . . NScBL d¿) .992 o - . 5 _. - .97'6 . . 02) (49.43.66) (130.05/8) = 2.94 5.17.31) The Bonferroni intervals are slightly shorter than the T intervals.89) 95% T simultaneous confidence intervals: 4(29) F (. 52. 136. 51.47) (95.496 26 4. 134.87) (131.05) = 3.23 (Continued) b) Bonferroni 95% simultaneous confidence intervals (m = p = 4): t29 (.42. 102.26 MaxBrt: BasHgt: BasLngt: NasHgth: (128. 102.87. .663 MaxBrt: BasHgth: BasLngth: NasHgt: (128.75.32. 133.91) (48. 135.08.78) (96.73. .....5 0 0 0 ...........9 1207.6 1303............ 12 14 16 ...5 0 ...8 ô06.0 MeetOT 1738........... . 0 9654. 0 0 0 Q) . 2 6300 ........a....070...... 2 4 6 8 10 12 14 16 Observation Number The XBAR chart for x4 = COA hours ai :: ii . ..........4 1182...........__.......2 800...... ---------------------------------------- ..8 -2........9 Holdover 2676......0 0 a(" .4 1478.............. .. Wisconsin.. Police Department data LegalOT ExtraOT xbar s 3557.--............. ....... ....9S 5... ...................7 COA 13563.... ........... .....0 CD 0 0 0 (" 0 ......: ii:: "C ":..0 0 .... .............. 1 -622.2 2222 .... . .. 1 use use L-CL = 0 LCL = 0 use LCL=O The XBAR chart for x3 = holdover hours 0 ai :: ii .. 'e .... ..y...... . ...... 1 -946 ... ...... .. 'e .... ..... ...... .. ....24 Individual X charts for the Madison... .0 5026.....0 17473. .. 2 4 6 8 10 Observation Number Both holdover and COA hours are stable and in control....................... ... .0 474.. . .........: ii:: "C ":...5 LeL UCL 5377....... ......13564) =5........... ......__. ..._..._..... ....e in control.80 X 1O-6(x3 ... .....37x 10-6(X3 .....+ .. ... .n.. quality control 95% ellpse is All points ar.. i: in t! 'It' C\ o .. . 0 0 0 C\ ...25 Quality ellpse and T2 chart for the holdover and COA overtime hours.. J: c( 0 u0 0 t'0 .991 ci ..96 5.. '''.... The 1... co ....2677)2 + 1.18 x 10-6(X4 ...99.2677)(X4 .. . 00 0 . .... 00 0 T- ..... . .._......It 0 in 0'I0 :i 0 .. . The quality control 95% ellipse for holdover hours and COA hours 00 0 r.13564)2 +1.. T- -1000 0 1000 3000 5000 Holdover Hours The 95% Tsq chart for holdover hours and COA hours a: r- UCL = 5.. ... 0 00 ...... ... ... ....2677)2 + 1........97 5...........13564) = 8. The 95% control ellpse for future holdover hours and COA hours 0 . ..v c( 0 () 0 ......... The 99% Tsq chart based on x1....2677)(X4 .27 The 95% prediction ellpse for X3 = holdover hours and X4 = COA hours is 1....... X2 . ..37x 10-6(x3 .......... x2 and x3 o ....+ .extraordinary event overtime hours......... -1000 0 1000 3000 Holdover Hours 5000 ...18 x 1O-6(x4 .. 0 ........................ 0N0 o o o o ...... ..... ............. . ..... CD C' ~ co v N o 5. and X3 = holdover overtime hours..... 00 ... 0co0 0j !! :z .....51...13564)2 +1.......80x 1O-6(x3 ....... All points are in control.........26 T2 chart using the data on Xl = legal appearances overtime hours........ 0366 -.0221 . (b) Multivariate observations 20.0626 .0049 .207" -.0198 0.0008 .0083 .0146 .0066 .0031 -.0078 . Point Variable P-Value Grea ter Than UCL 20 33 Xl X2 X3 X4 X5 X6 X4 X6 O.0211 -.0083 -. 000'0 0.0268 -.98 5.0211 .0228 . 0088 \).39 and 40 exceed the upper control The individual variables that contribute significantly to the out of control data points are indicated in the table below.0078 .0031 .0474 .0008 .0146 -.0049 .0066 .0221 .0001 0.36.0000 36 Xl 39 X2 X3 X4 X2 X4 X5 X6 40 XL 0.0924 .0155 .032 s= The fl char follows.062 .0054 o .0197 .506 .0155 -.1446 .1086 .0000 0. 0000 0. 33.0197 .0616 .0474 .0343 0.0616 .0210 0.3428 .00.0105 0.698 .QO 0. limit.01 0.0032 .0-013 .0114 0.0228 .0268 . 0000 \) .0088 O.28 (a) x= -. OO. 0000 o .065 .0.0000 X2 X3 X4 O. 207/.429) Since the multiplier.282 -7 (.438:t2.J) = .508:t3.072.258:t .J) = .(.51) = 18.508:t. Since T = 12.838) Coal: .873:t .873 :t.05/2(6)):: z(.010.508:t 2.(. 1.F6.J) = .4141.085.081(.635.873:t 3.(.635(.(.863 -.330 -. we do not reject H 0 : ¡.087. 3. .J) = .753/.2 .J) = 1.161:t3. .635(1. 5. .766:t.635(.155 -7 (.081(.362.404 -.99 2 472' 2 29(6) .873:t 2.635(. .258:t 3.161:t 2. for the Bonferroni intervals and everything else for a . Multiplier is ~%.49 = 3. the r intervals wil be wider than the Bonferroni intervals.9251.J) = .161:t.766:t 2.182 -.635(.620) Nuclear: .283.Natural Gas: .J) = .(.258:t.J) = .076 -7 (. for the 95% simultaneous r intervals is larger than given interval is the same.111) Natural Gas: .978/.J) = .736) Petroleum . 2.421.081(.9251.508:t .611) Petroleum .30 (a) Large sample 95% Bonferroni intervals for the indicated means follow.171-.24 (.112.(.081(. = 0 at the 5% leveL.472 c: -.226.392/.790) Coal: .081 Petroleum: . 1.635 Petroleum: .766:t .738 -7 (1.J) = 1.J) = .Natural Gas: . 2.753/. Multiplier is t49 (. .438:t.(.345 -7 (.237) Total: 1.J9.J) =.089 -. the multiplier.161 :t.978/.178.0042) = 2.414/.250) Total: 1. 5.258:t2.29 T = 12.438:t3.766:t3.146 -.170) Natural Gas: .05) = .081.25(2. .135.081(.635(.256. .593) Nuclear: .404) (b) Large sample 95% simultaneous r intervals for the indicated means follow. 2.392/.05) = 7.207/.081(1. .(1. .438:t. indicates the confidence ellpse is elongated in the ei direction.240:t.391. Â.00160 with corresponding eigenvectors ei = (. .0125)¡.0058J l.208el . = . reciprocal of square root) makes the man/machine time observations more nearly normaL.= p. . .240:t2. The power transformation t = -0.341) .0058 .524 23. (See Exercise 4.OS)e2 =:t.e.240 s = r . r: 2(24) :tvÂ. = .171:t.241.0018 .1513 -.99925 .101 ~ (. the axes of the 95% confidence ellpsoid are maior axis: :! IT v Â.101) Y2 :tt24 (.J.) For the transformed observations. .527 The eigenvalues for S are Â.391..042 = 9.021e2 mInor axis: The ratio of 25(23) .05) ei = :t. so the 95% confidence intervals for the two component the transformed observations) are: means (of Yi :tt24(.391.139. F223(.31 (a) The power transformation ~ = 0 (i.2(24) F2 23 (.171:t2..03866). say Yi = In Xi' Y2 = 1/'¡ where Xl is duration and X2 is man/machine time.0018 =.1513 = 2. S-i .171J Y l .r 7.41.05/2(2)) = 2.416/.e.J. = 2. the major and minor axes.905J l23. the lengths of 25(23) . logarthm) makes the duration observations more nearly normaL.9.100 5.03866 .. 3. e2 = (.930 ~ (1.0125)'¡ =.905 624. (b) t24 (.15153.99925l Beginning at center y.5 (i. 50 22.27).333. I.1 WI Ei9~nvalues andei9~nvectnrs of Sd are: "1 = 449. are consistent with this result. I. After the elimination of the outlier. we reject Ho : '6 = O. the difference between pairs became significant.6338. . 30 -5 .Chapter 6 ii.0l25) = 2.36.r~ctions.: -20 .08 -3.25 Simultaneous confidence intervals are larger than Bonferroni's confidence intervals.52 3. 57 -2.3 The 95% Bonferroni intervals are LOWER Bonferroni 'C.I.est answers the question: Is ô = 0 ins1tfe the 95i confi- dence e 11 ipse 1 6. ~~ = (.egion.92 -2. Yes.: Simultaneous C.97 Simul taneous 'C.2 Using a critical value tn_i(cr/2p) = tio(O.778.85 29.56 -23.57 units.082. Half length of minor axis is 12. respetively. 45 -5.943) "2 = 168.73 32.70 -~ .. UlWER Bonferroni ~.: -22 .70 UPPER 1.943.70 Since the hypothesize vector '6 = 0 (denoted as * in the plot) is outside the joint confidenæ r. !1 = (. Bonferroni C.31) 20.333) Ellipse cent~rl!d at r = (-9. Half length of major axis is 20.!. the t. i. : UPPER -21.13. Major and minor axes lie in :1 and !2 d.58 units. -. 6. . 0 -o.69 ..3 0 MU11-MU21 o 6.o. Bonferroni C."'1 -it Figure 1: 95% Confidence Ellpse and Diffence 'Simultaneous T2 Interv for the Mea ..09 -0.R -1. ld .3 0 M 20 U 1 2 10 M U 2 0 2 . (b).4 -0. . 1.0 Lower Uoner -1. I. -0... 0. we reject 0.7 _0..: T Simultaneous C.1 .4 -0.' -0. 0.95% Simultaneous Conidence Region (or Della Vector 102 ..07 0." -1.' .4'59.64 95% Cofidence Slips Ab the Me Vecor . -0. o...05 is 9. .0.10 -20 -10 -.1i -1...215.2 -0..S .02 -1..' 0.. 1.: .04 -0.4 (a)....3 -I..' (o~O) .S -0.. -0.0 0..5 -1..2 -e.. '.a ..10 0.. Since the critical point with cr Ho : . HoteHing's T2 _ 10.4 -1...& -0..8 -0.. ó =.3 Problem 6.18 -0. 0 .-..i 7 . Marginal normality cannot be rejected for either variable...76 ~ 0. /.103 (c) The Q-Q plots for In(DiffBOD) and In(DiffSS) are shown below..25 _0. it is diffcult to a-a Plot o. //~' ./ // _2.25 CI is .... -'- 4 ../ .5 ..50 /////" /-.& _1....s // -./"./ _0.// ...1~.60 .. 3 .5 lr' Qlt.%2 plot is not straight (with at least one apparent lJivariate outlier) and../ ///// ../ .1_ Chi -squa-e Aot d th OderedDistcnce d . . 0.. ... .--~/ ~/- ê ../. .25 1.... 0...-///- o ... ..//-' ~. r"''/ // . 1_ Q-Q Plots 1. ///.- ////_1.S /' .~ o 0.. ~/ .. i5 / . ôo _o.. (n =11) is small.. / // //"/ _../.... The.00 . 0-11 ..... although the sample size "argue for bivariate normality. -I....5 ....... . 2 :! 3.0-- 6. . q = 3 ((~~~li)l) Fq_l.3) :! -/6.3 The means are all different from one another.104 -1 . n = 40.- T2 = n(Ci)' (ese' )-1 (ei) = 90.3 111 .9 -32.67 J5~õ5 = -11.9 :! 3.1-2: (46.1 .4.05) = (3~~2 (3.0 1-2 .25) = 6. ~.1-3: -4.3 :! 3. = (~1'~2'~3) · -32.6 66.6.1-3: ti.5 a) H: Cii = 0 wher. CSt' = (55.6) i:x = (-11.n_a+i(.4 .57.2).4 ~ 6.67 o-- Since T2 = 90.67 reject H :Cii = 0 b) 951 simultaneous confidence intervals: 111 .5 .e C = ('0 1 -~ ). ! 21505.7 TZ = (74. c).~6 "1 +n2-p. -4/3 J 4/3 r~13 Spoled = -1. 4-2) ((1 + 1) -1. JJ-1( J (ni +"2-2)P 1 FP'"l +n 2-P1 ( .~ TZ = (2-3.l132: 2 :I 7.3 _ 201 .IOS 6.1.01) .t!2 = ~ at the ct = .26 reject HO:~l ..5 7.6 = 3.4 .1 .01 1 evel . 99%simul taneous confi-dence intervals: 1121 .05)= 6 .5 1122 . .45 1 +n2-p 1)..88 ~ 45 do not reject HO=l2 -!!3 = ~ at the ci = .-88 r (:~J ("1 +n2-2)p _ (5)2 _ -(" p 1 ( .l1:n: (2-3) :! I4 Æ~+l)l.2 Trea tment 3: Sampl e mean vector samplè covariance matrix (:) .) 6. = 1'6 1 53661.= -1 :16.Fp'1n +n 2-- Since TZ = 3.5 ( 1 1 (10963.o level.6) (45 + 55) 21505. Since r2 = 16.4 (18) .4 (1.'6 .4 ( 1.'6 a) Tr"eatment 2: Sampl e mean vector -3~2 J (:l sampl e covariance matrix (-3.2 6.4 201.4.6 b) .. f.. 84 18 35) Tota 1 (corr~ct~) 35 Hl2 (54 '5+3+4-3-9 n .8 a) For first variable: trea t:nt observation (: 5 8 1 2 5 3 + mean.-). SSobs = 246 .001 7 J 6... &êrS _ poo ~ .0026 (. -1ldxi-it= (. effect SStr = 36 1 (0-1-12 20. = : 7) = (: 4 4 4 4 4 4 r.-13 rlB 3-1=2...esidual + 4) (2 2 2 2 4 + 1 -1 a + -2 -2 -2 4 -1 -1 -1 -1 SS · 1 92 mean . 106 .-1 -2 J SSres :: 18 For second variable: 5 55) (333 3 3 6 3 = 5 5 5 + -1 -1 -1 311355 (79 6 9 9) (5 5 SSobs = 402 5 5 -3 -3 -3 -3 SStr =84 SSmean = 300 3)(-1 1 ~2 1 'J + -1 2-1 1 -1 -1 1 SS res· 18 Cross product contri butions: 275 48 240 -13 b) MANOVA tabl e: Source of Vari ation SSP Treatment B = (36 Residual W - 48 d. 48 J -13J . 9 for!! matrix C _ n.U362 Using Table 6...J .. + (ig -x)ngJ = i(nix1 + .~ = 17 .107 * ~ 155 c) li = TäT = 4283 = .. using Bartl ett' s procedure. . .-x)(x.01) = 4..i09 Since x.01) = 13.n .28 we again conclude treatment differences exist at e = .10 d.-a)' = C(.. 6..-a)(d.. n.) = C X and so 6. . Since F4. + "g)i-x(ni +.16(. a = 1 I: d.J" . . ( (p+g..x) . .1 .. Sd =. .0362 = 28.3 with p = Z and 9 = 3 (1 .-J .2 ) ln A* = .. + (xg-x)u ) = x((x1-x)ni + .1 ..a = C(x..n-.-x)')C' = t:SC' ...9 .. 9 .'2 1n .12 . +" )) = 0 . \IK) 9-1) .-J .1 r(d.02 ... + ngxg-x(ni + .77 we:~onc1ude that treatmnt differences exist at e = ..1 r(x. = x(("l + ..e . Al ternat1vely.+ ng)).01 level...) (5) ( ) .. = C~ 1 1: x.g ei 1)'((xi-x)u1 + .J" -J" .IÃ \ (En 1 .. . .J n"J ..01 l~vel. OJ SStot = 220 SSmean = 12 SSfac 1 = 104 SSfaC Z = 90 SSres=14 For second variable: 8233 = 3333 + ~ 6 -5T -3 -6 (3 3 33 33 3 (8 Z OJ 3J SSt~t = 44() 1 -ZJ (-3 0 3 OJ 1 1 1 1 + 3-2 1 -2 + 1 ° -2 1 -6 (3-2 3 -2 1 -2 2 ° -1 -1 (-65 -6 5 -6 5 5) SSmean = 1\) SSfac 1 = 248 SSfac Z:i 54 SSres · 3Q .. +"21p (tr t-' ((". :l (nZ ..13 .!:1)1 ". see Result 4. + (nZ .1 2 -1) j 51. ~ ~ ~J + (-~ -~ -~ .) z~ ". + ".108 6. a) and b) For firs.)+nzlexp 2 lt ~. -2 4 -3J + 2 -1 0-1 -3 -4 3 -4.t variab1 e: .:Z 4 -3'1 (0 1 -1 . _ A_ using (4-16) and (4-17). respect to * at 1 Qi+n2-~ 12.11 l(~1'!:2.J 1 1 1 1 .12 i = n +n ((n1 -1)S.t)l(~2'.+( "2-1152) =(( (".t) = L(~l .~~J + (1 . and ~z at ~l = ~1 and ~2 = ~ respectively and with . The likelihood is maximized with respect to ~.2)SZ) = n +n S poo 1 l!d (for the maximization with respect to .10 with n. ( : -: : ~l = (~. _~. residual effect . factor 1 Observation = mean + effect + factor 2 .l + "2(~2-~21' t-l(~2 .l' t-'(~-~.(~.2)52) 6. -3 -3 -3 -3 1 -2 4 -3 -2 0 1 ..+nZ b = 2 and B = (n1 -1)S. IÙ ë: -5.1 = 11 L191 332J d) We reject HO:!l =!2 = !3 = ~ at a = .05 level since .05) = 9.f.(Q-1 = -(61_ 3-2)ln.8 c) MANOVA table: Source of Variation SSP d..05 l~vel since ((g-l )(b-l) _ (p+l2.13204' and concl ude there are factor 1 effects.. ss i.3 0 Total (Corrected) r 208 1911 . 9 .8) -8 .109 Sum of ~ross products: SCP tot = SCP m~an + SCP fac 1 + SCP fac 2 + SCP r~s 227 = 36 + 148 + 51 . We al so reject HO:~l = ~2 = ~3 = ~4 = ~ at the ~ = . gb .87 ~ X:( ..49 .5 1n ( 356 ) = 19.1"=3 -1 =2 Factor 1 148 1481 248J l04 b-l=4-1-3 Factor 2 51 51) 54 ( 90' (g-l )( b-l) = 1) Residual (14 . r~s 2 \ ))J1nA* SSP fac +( SSP res . 5R. SSP Variation 1841 2 Factor 1 184 (496 208J 24) '3 Factor 2 .n( . I SSP I -(gb(n-l) .05 level.03 we ~.nti* -(6 _ 3-3)R.p SSP r.59' 6887 . -13.05) = 12.808) = 2. G . 0: -6 R. . =lSSP1. -13 5tn tn+ res SSP res .110 .24 36 .14 b) MANOVA Tabl e: Source of d.(g-l Hb-l n/2)R. 6."f.88 0( xi: (.! reject HO:!l1 = !12 = . .05) = 21. K ~ res _ ((g-l )(b-l) _ (p+l .nA*.0 (32 4:) (36 Interaction ..n~ res I \)" 2" i==SSPf .n ( 356 ~ = 17. = !34 = ~ (no i!lteraction effects) a~ t~ a = .. and concl ude there are factor 2 effe~ts...6 ~S41 12 Residual Total c) Since -84 (312 400J (Corrected) 23 " 1 24 124J 688 (876 . +Iss.2 (b-l))R.77 ~ X~ (.(p+l . 05) = 12.49 we rejet:t -2 -3 - HO:"t_1 = "C = "t = 0 (no factor 1 effects) at the a."59 we do not reJect HO:~l = ~2 ? ~3 = ~4 = ~ (no factor 2 effects) at.n(.the a = . OS 1 eve 1 . ( lssp 1 ) = -. Since .1fi 0( XU.111 Si nc~ -(gb(n-l)-(p+i-Cg-1))/2)R-nA*=-11. ..05) = 9.1.05 1 eve 1 . = . ~ ISSPresl ) -(gb(n-1)-(p+l-(b-l ))/2)Wi* :I -lZl lssp + SSP r fac 2 r.s :: -12R.7949) =2.SR-n(.24.19 ~ XH.SLnfac lssp 1 ~e~sp res r .47) =16. 05 level.54 Since T2 = 254.81 .0 1 05"625 . A* D .4) is..05) = 9.5230) :0 9~40.(247.n-q+l a I rc =n = 421.7 .3 + 1. '.(1. -1 a a-- 6. I~ :t (Il~q+ 1) fHq-1 .27(..3819 Since '* -(gb(n-l)-(p+l-(g-l))/2)tn A =-14.1l.n_q+1(a) = (3~~it11F3. ~(~:~ii)l) Fq_l.5 1509.112 6. Hi: C!: 'I Q where. Suniary stati stics: 1 906. 95i simultaneous confidence interval for -dynamic.11 + ll2) . S = 94759 87249 94268 1 01 761 761 6ô 81193 91809 90333 1043Z9 .) 9.54 we reject "a at C1 = . 14*'= . .96 :) X: (.- r2 = n ( Cx p ( CSC i ) -1 (Cx) = 254.5~n (.5 -. g.OS level.~. c=U 1 -1 o 1 -1~J . with :' = (1 1 -1 -1). means h. n · '5.5 :: 174. 59ó) .05 level.16 H : Cll = 0. . These results are consistent with the exact F tests.1 x = 1749. For HO: ~1 = ~2 = ~. Again we reject "0 at a.3819):2 · 13.51n(. b · 2. .I ( n-1 q-1 ) () .1 1725.05) = 7. we reject HO at a = .2 .static.7 .15 Example "6. versus . a) For "0:!1 .'5230 and :~4. 40~9.5 -7(-411.(#¡ + #3) Format main: (¡. Party effect-"different" resonses slower than "same" responses.8l8.: -(2.(¡¡ + #2) Interaction: (#2 + #3) .3.6 -7~-280.40 42.113 Arabic G) 6. reject H 0 : C.4:t. -125. (d) The multivarate normal model is a reasonable model for the scoresconesponding to the party contrast.40.1 J S.40 20.1) 32 Format main effect: -307:t.J9.17 (a) Q) Format ø Words ø Different Same Party Effects Contrast Party main: (¡.4:t.939.J9. T2 31(3) .3 + #4) .4) .9) 32 Interaction effect: 22. Format effect-"words" slower than "Arabic".598. (c) The M model of numerical cognition is a reasonable population model for the scores.3.0) 32 No interaction effect. 29 (b) 95% simultaneous T intervals for the contrasts: Party main effect: -206.5 -7 (-32.J9.4. -186. 75. ince = 135.(#1 + #4) Contrast matrx: c= -1 -1 (-1 ~1 . . the format contrast and the interaction contrast.9.93) = 9.2 + ¡.u = 0 (no treatment-effects) at the 5% leveL. . . .. . . . 2 6 4 qchisq . . . . . .. ...18 Female turtle Male turtle A chi-square plot of the ordered distances (0 .. ..... . C" Q) E 0 .. . . M . . 0 -1 -2 1 Quantiles of Standard Normal . c Q) . -1 10 CJ co . 'C . .. . -... . .. .. . co . :5 10 . ... ... Q) .. ..-10 .. .. .. 0 -1 .114 6. .. C' ~ (0 . . .. . . -i: ~..~ (0 -g . . . .. . co . ëi i: . 2 1 -2 Quantiles of Standard Normal . ......~ ~~ . . . .. ..... . . . . .. . . . -. . .. . 0 C\ 0 0 .. . ::~ (0 .. 0 . . . Q) ~ Q) "E .. . :: co (0 -2 . (0 1 2 ~ ~ .... -1 t) 1 Quantile ófStandard Norm::l 2 . . 8 qchisq 0 -1 2 1 Quantiles of Standard Normal .. .... 4 2 6 8 10 ... .s... . C' CD A chi-square plot of the ordered distances .. ... . -2 Quantiles of Standard Normal .. -'C .. . 0 . . .. . -2 ....s .s . C\ . ...... . . .. . .. . .. ..s ta (' 10 ia (' . ... :2 .. ~ ... .. -2 .... . 10 -ê ~ "C . .... .. -. ~ .. . .. 10 -§ C" ëü .. . .. .CJ S (' ... e. ... ... ..-C) Q) .. .. -1 1 0 2 Quantile of Stanar Noral 10 co M -. 0 . . - . . ~ . 0 :2 ui ë. . 2735714 0.0541167 0.0689451 0.0127148 0.833461 27.546415 95% simultaneous CI for the difference in female and male means Bonferroni CI LOWER UPPER 0.8164658 4.ent vector: COEFFVEC .0140655 0.4775738 3.0165386 0. 7254436 4.7031858 0.118029 2.1290622 0.3275751 .115 mean vector for f~males: mean vector for males: X1BAR SPOOLED X2BAR 4.0158563 TSQ CVTSQ F CVF PVALUE 85.43.2217252 0.1466248 0.052001 8.9006593 4.3451377 LOWER UPPeR o .0140655 0.0187388 0. 9402858 4. 72677 -8.0127148 0. 6229089 3.0165386 0.07ß8599 0. 355E-1 0 linear combination most responsible for rej ection of HO has coeffici.710687 67.0577676 0.2365537 0.2926638 0.0113036 0. a.48 .0134 12.3'621 J.-) c: pool.3664 2.590 _ (12. -4.6959 J' (('1 + pool l)S ed )-1 = n1.ae . we reject HO at the a = .01) = 13.SS(.7599 46.0203) .7458 5. (( )-1 (.8112 7.855026.8960 '( 15. Thereis a diff~rence in the (mean) cost vectors betw.4084 -.) '" _s-l (.ni+n2-p-.55 F3. L--i'0939 -. n2 .( -2 3.el trucks. 5640 .8 J .HO: lh .6857 6543 ( 4.92 Since T2 =(-:1-)-:2 'n 1+ 1 n2) Spooled :) (n1+n£-2)p . 9.5750 2.8745 -.Ol) . Hi8 223.7731 13.t.113 i :z = (~~:~:J i. Spooled = 20.19 a) ~1 = 8. 1525 .116 6. _ (57)(3) (ni+nZ-p-l) Fp.9066 51 = 17.3623 . wax x = -1.~l -) -:2 = St'.01 1 evel. .219J 18.!!Z' = 2 .8512 7.ed -1 .5. 2.een gaso1 ine trucks and dies.9'633 Sz = 25.5441 4. 4. .790 ~12 .~2 .~21: 2.e to pool.xp 5inca (.~22: -2.~23: -8. Thisis consistent with the result in part (a).~2 'Lñ1 51 +ñ2 52 ~1 .)) r: 1 l' )-1 (.-) 2 ( ) .113 t 3.~ence interval sara: ~ll .117 c) 99% simul taneous conf.. by Result 6.34 we reject HO: ~l .01 = 11.15 ~)(3 .(~1 .~2 = 43. Since 51 and 52 are quite differant. it may not be reasonab1.-) I ( 1 1 )-1 (. .~2 . However.913 dl Assumpti on ti = t~.!:l -!:2 .341 ~13 .01 level.:2 ñ1 5. ~1 .650:! 4. using "large sample" theory (n1 = 36~ n2 = 23) wa have. + "2 S2 ~l .( )) 1 (~l -.~2 = ~ at the a = .578:! 4.. .118 6. 2. (c) Results below. . . 260 260 L a 240 I 1 2201 m 200i 160 .490238. . 3392202 RESULT Coefficient Vector: COEF -3. There is mean difference '951. . . . Comparing Mean Vectors from Tvo Populations rObS. .... simultaneous confidence intervals.76436 -1. . 31 Delete~ T2 C 25. . 260 . . 1: Mean Diff. . The most critical linear combination leading to the rejection of Ho has coeffcient vecor (-3.. . .. 985685 8.9914645 Reject HO. ... Both tests reject the null hypothesis of equal mean difference. 2: LICIMD LSCIMD -11.20 (I) 31. .07955 (Tail difference) (Wing difference) ..161905 -5 . .005014 5. .. .490238 2.: LAELCI Mean Diff. ..I.07955)' and the the linear combination most responsible for the rejection of Ho is the Tail difference... . .. . 300 280 '" i ngm (b) The output below shows that the analysis does not differ when we delete the observtion 31 or when we consider it equals 184. . 78669 -1. T2 C 25.119 ~omparing Mean Vectors from Two Populations 1.003431 8.9914645 Reject HO.27998 -6. . 2: LSCIMD -11.e. indicating no significance difference..-. ~. 1220203 . 95% Cofidence Ellips Ab th Me Veda -:61 .. .1812088 REULT COEF Coefficient Vector: -3. .. . 5 ..i lObs. 31 = 184.s.574268 2."' (d) Female birds ai~ g~flerally larger. since the confidence intervl bounds for difference in Tails (Male . There is mean difference 957'simultaneous confidence intervals: LICIMD LABELCI Mean Diff..Female) are negative and the confidence intervl for difference in Wings includes zero..~ .662531 5. 1: Mean Diif.~\ . :: -eo "f iL\" \ t. 830 :) (3~~(4) F4.21 (a) The (4.120 6.241 .OS level.01 (-.con- siderably.74 = .4) entri~s in 51 and 52 differ .OS) = 11. S-l a a:ed(X-1 _ -2 _ pool .16 . ( c) x ) -3.47 so we reject HO at the ~. ..~2 = ~ and H1: ~1 . "1 = n2 so the large sample approximation amounts to pooling.~2 t ~ T2 = 15. ( b) H 0 : ~1 .3S(. . Howev~r.2) and (4. pooled.2 = O.344 Since T2 = 15.6876 .1. The coeffcient of the linear combination of most responsible for rejection is (-95. T2 = 15.3972 J _ jS8' XM _ 5.3152 = 3. Have p = 4 and v = = 37.4 + 1 4. 0.3404 The Hotellng's T2 = 96.05) 37.344-4+1 34. Notice the critical value here is only slightly larger than the critical value in (b).87"60572.0~5.i .513. we see that X2: long term interest rate has the largest coefficient and therefore might be useful in -classifying a bond as 'lhigh" or "mediumlt quality. . -0.344 so. 11.6. the . Therefore.37. for female mean -male mean: ¡ -0.05)=149.600. 6. -0. reject H 0: I! -!J2 = 0.4650835.032834.830 ::11.0025.2336 J -1.3383659 (c) \Ve cannot extend the obtained result to the population of persons in their midtwenties.v-p+1 37. Further. 4+16 (e) From (b).344 .3296 XF'. 1. graduate students are -probably more sedentary than the typical persons of their age.376(2. we reject Ho : J.00 where 11. -5.p + 1 p.762)'.1697. at the 5% level.00 is a critical point corresponding to cr = 0.487 ).53556 critical value is vp F (.121 (d) Looking at the coeffic1.737. the same conclusion reached in (b).830. Firstly this was a self selected sample of volunteers (rrienàs) and is not even a random sample of graduate students.16348346 -1. (b) The 95% simultaneous C.ents â1'Sii.5.J. 1.8687428 -17.647)=11.1548 49.234.145.344(4) F (.1 = 2.22 (a) The sample means for female and male are: ¡ 0.3136 J ( 0. 38.513 v . whieh apply to the standardi zed variables. 820 J B = (11. .3 ~ 2.07563 NAllOVA Table: Source Trea tment d. 149 .0471i.F4 .344 75. P ~ IA 153.292( . S =.OS) "Ie rej.64~ A =* ~ -l = 2235.30~ 1 .37 .122 6.974 .00474 J x = -2 · 0418 J .428 J. ( 1U70 ~3 = 2.. 232.104 Since (rni-p-2\ (1 .352 2 4.125 J R.026 J (2. (.294 .1 0368 .950 14.IÃ) ~ !.18576 ~l = (3.p = 2. W = (16. (28. .09860 . SSP -21 .64 -17 · .326 J . 9 = 3 (~epal width and petal Width) responses only! .ect Ho: !l =!2 =!3 at th~ ci.4 J S = 3 (.23 n1 = n2 = n3 = 50.147 695J 90.143£4 -.081 .05 level.03920 . S2 = (.f.esidua1 .729 Total B+W = . Sinceg= 3.0.0.94 1785.o3:t 2.30:t 2.025 NasHght: O.+ 2-) -.0.2.44 £12 -£22 : 0.57 'l13 .r3 = Q at the 5% level implyig there is a time period effect.78 £23 .+ 2-) -.£34: .629 3.'l31 : -3. £11 -£21 : -1:t 2.£32 : .(90-4-2'(1~) = 2. Evidence for changes in skull The usual MA~OV A assumptions appear to be satisfied for thse data. we would just reject the null hypothis Ho :..36 size over time is marginal.£33: 3.8301 value with 8 and I~8 degrees of freedom.030 0.-1:t 3.57 £12 -£32 : .44 £21 -£31 : .78 NasH T" .24 Wilks' lambda: A* = .33:t 2.044.03:t 3.66 .14:t 3. Fstatistics andp-values for ANOVA's: F p-value MaxBrth: BasHght: BasLgt: 3.0.36 £24 .1.8301. All the simultaneous intervals include O. If-changes exist. then these changes might be in maximum breath and basialveolar lengthofskull frm time periods 1 to 3..94 2153(2-+2-) -.123 6.049 is anF 4 A .2 (2.3(2.05/24) = 2.0.1:t 3.901 Any differences over time periods are probably due to changes in maximum breath of skull (Maxrth) and basialveolar lengt of skull (BasLgt).44 £11 .47 .94 /840.57 £22 .£23 : 0.£34: .4(2.1"1 =!2 =. 95% Bonferroni simultaneous intervals: t87(.9:t 3.36 V 87 30 30 'l14 . 2.78 87 .T.1O:t3.1:t 3.: 0.30 :t 2.049) = . 30 30 87 30 30 87 30 30 'l13 .2:t 3.£33: 3.94 1924.94 BasBrth BasH.Et BasLgth m = pg(g-I)/2 =12. Since p-value = P(F:.LU . .9:t 2.1:t 3.84 .+ 2-) -.0.lO:t2. C (~1 . IB+WI Afer transformation.rl. Plots of the husband and wife profiles look similar but seem di sparate for the 1 evel of acompanionat~ lnve' tha t you feel for your partneru.~ level... A * =IWI =. .26 To test for paralle11 sm.i 159 and F = 18.0. The excess electri~al usage of the test group was much low~r than that of the control group for the 11 A. and 3 P. b) Parall el ism hypothesis HO: C~l = ~2 with C given by (ó~l).674 .) cZ = 8. c 2 = 8.25 Without transftlrming the data.98 (. Since T1 = 19.58 . .7 ( see (6-62)).M. 17 i (.616J 1 . .124 6.05 level.FO. 05.870 CSpool~dCI = (. C(~l .1198 and F = 18.028 .413J 2. we reject HO at the 11 = . ~ . 6.·.M~ usage for the two groups contradi cts the parallelism hypothesis.029 J -.167 .52.~2) = -.733 . 144 L.M.685 .) c~ = 8.341 11 = 9.58 .036 (-.~2) = .7 we reject Ha at the a. consider H01: C~l = C~2 with C giv~n by (.6-61). 1 P. 6. hours. A * :: .095 fo r a = .98. .05) = 1. The s imi 1 ar 9 A..27 a) .014 (CS c1r1 = poo 1 es .33 13 J .947 2..93 There is a clear need for transforming the data to make the hypothesis tenable. 383.407 o . S imul taneous C..J12 = 0 at 5% significance leveL.649 1. 63 2.657 9..000 9. 692 1.99 1.84 -7. for L.343 Pooled Sample Covariance Matrix: 36.143 -4. There is a significant difference in the two species.078 16.80 -6.008 2. The X2 plots look fairy straight.565 o .990 14.914 0. carteri mean = 0 is (0. .14 -1.371 Difference -2. torrens mean .675 9. 274 0.55 -0 . torrens mean .039 2.971)' 951.886 -0 .992 6.L.211 0.59.437 o .639 2.96 3.086 o .714 L.47 1. 76 0.carteri mean: UPPER LOWER -8 . torrens and L .L.573 2. 637 1.268.671 3.854.595 6.16 -0 . Sample Mean for L. torrens 96.41 -1.934 6.371 -0 .767 0.i .457 42. 343 43 .657 30. -2.229 13.101 Linear Combination of most responsible for rejection of Ho: L. -0.151. 213 0.943 -0.13 ~ 16.98 -1.187.28 T2 = 106. -~.514 25 . 0. I.914 35. 743 39. \Ve reject Ho :¡. 73 -4. carteri 99 .carteri : L.053 0.571 9.829 -3. .006.31 The third and fifth components are most responsible for rejecting Ho. 2.426 2.0.371 14.629 9. 615 0.125 "' .764 3.t 6.314 14. 0637 -0 .1385 6.00304801 0. '0 10 ~ ~ 0 51 .0876 .0535 -.03074 0.00482862 0.01056 0.0221 .0286 0.: Simultaneous C.00482862 O.0455 .0283 .02548 o .0079 -0.05784 Summary Statistics: 0.0195 .02751 0.0228 o .:i 0 5 5 '5 '0 5 25 20 15 10 5 o 20 25 o-SOARE O-SOUARE 6.0434 . 00017 .~285 -. 0566 0.0625 0.516."0278 2.Jl2 = 0 at 5% significance leveL.01628931 0.0130 -.0679 .29 (a).0128 -0.0123 0.0218 T2 INTERVAL xl x2 x3 x4 xS xli The N 24 24 24 24 24 24 MEAN 0.01'30 .0072 -0.0443 -.CHI-SQUARE CHI-SQUARE PLOT FOR Lcarteri fOR L. (b). 0~154159 0.1030 0.0'030 .00325 -0.00602526 IIotellng's T2 = 5.04689 TO .. i.00366259 0. The critical point is 9.Bonfer~oni STDEV intervals use the T in"tevals use -.1020 -.0493 .0057 -0.01513 o .00304801 0.979 and we fail to reject Ho :/£1 .00012 -0.. BONFERRONI ) - TO -.0445 .0275 .0436 .89 and 126 . i.00154159 0.3616 9. S XBAR o .04817 0.0247 -.946. 0596 0.1235 o . w a: '" :.0333 -.(H)417 4.0701 -. LOWER -0.0385 o .30 HOTELLING T SQUARE - P-VALUE 0 .0505 -0..: UPPER o .torr~ns PLOT '5 15 ä! '" :. (e).0294 Bonferroni C.0430 t the constan-t ( . 1843 11. 995 XL X2 XL 0.655 284..6275 414.10651620 0.0055 9 .~675 760.7924 H = Type III SS&CP Matrix for FACTOR1*FACTOR2 XL 12 X3 205.115 365.6191 21.OQ05 XL X2 X3 H = Type III SSkCP Matrix for FACTOR2.22 ~5 . (Vat'~e..1843 8 . 1843 8.6675 3153 .48 49.+~ot') X3 7..38824348 11.105 121.01244417 1.5 Statistic N=l Value 0. 95166"667 8 0.520833333 Manova Test Criteria and Exact F Statistics for the Hypothesis of no Overall FACTORl Effect H = Type III SSkCP Matrix for FACTORl E = Error SSkCP Matrix S=l M=0. 22 X3 1'07.6275 X2 365.1291666667 -108.7008333333 -10.012 .365 X3 76.0011 6 0.6575 162.t~) XL 196.205 I3 76.995 94.1825 42.6878 30.3127 XL X2 F Num DF Den DF fir ~ F 6 6 6 3 10 0.187"61127 10.48 121.0205 o .375"67504 18.0205 o .70910921 10.1825 1089.101666"67 Manova Test Criteria and F Approximations for the Hypothesis of no Overall FACTOR2 Effect H = Type III SS&CP Matrix for FACTOR2 E = Error SSkCP Matrix S=2 M=O N=l Statistic Wilks' Lambda Pillai's Trace Hoteiiing-La~iey Trace Roy's Greatest Root Value 0.12916666"67 -108.835 352. 7~583333 254 .127 6.6575 H = Type III SS&CP Matrix for FACTORl X2 -10.10166£67 363.4125 72.695 1~7 .4125 ( Loco.0019 5 0. 38824348 11.655 X3 42.015 414.1843 Den DF Pr ~ F 4 4 4 0.3"65 XL 104.0205 4 o .31 (8) Two-factor MANOVA of peanuts data E = Error SSkCP Matrix XL X2 X2 49.78583333 254.89348380 Wilks' Lambda Pillai ' s Trace Hoteiiing-La~iey Trace Roy's Greatest Root F Num DF 3 3 3 3 11.0675 X3 7. 55 163..40 7.- --/ .15 200.3735 6 6 6 3 Den DF Pr) F 8 0.15 190.5066.30 52.15 57./ ~y/-/ /- .30 58.75 170.15 2.25 3.20 -0.30 9.95 -2.0113 seem large in absolute value.55 c 1 6 199. //"~// .80 2.15 57.10 -1.80 -0.25 -3.29086073 Hotelling-La~iey Trace 7 . 1 PRE2 RES2 PRE3 RES3 CODE FACTOR1 FACTOR2 PRED1 RES a 1 5 194.95 5.50 160.00 c 1 6 199.65 130.80 -2.75 -0.50 66.60 47.65 130.40 190.5582 3.10 200.0655 5 0.30 52./~ .30 3. 95 2.00 d d e e f f 2 2 1 1 2 2 6 6 8 8 8 8 200.05 -4.80 -0.25 3.75 170.10 1.25 -3. /' -- -.10 Figu i: Q-Q P1ø .15 b 2 5 185.20 49.50 160. but (b) The residuals for X2 at location 2 for variety 5 Q-Q plots of residuals indicate that univariate normality -cannot be rejected for all three variables. --'I -i.55 1Q3.20 49.$0 47..40 200..Red-lfot Yied .0587 6 0.55 161.55 -1.55 161.82409388 Wilks J Lambda F Num OF 3.30 -9.07429984 1.0339 3.40 -7.15 -2.55 1.128 Manova Test ~riteria and F Approximations for the Hypothesis of no Overall FACTOR1*FACTOR2 Effect H = Type III SS&CP Matrix for FACTOR1*FACTOR2 E = Error SS&CP Matrix S=2 M=O N=1 Statistic Pillai l s Trace Value 0.25 1$4.40 4.80 0.75 0. 54429038 Roy l s Createst Root 6._ """/~ ./ .55 b 2 5 185.7721 11.30 58.15 a 1 5 194.45 3./ *//.20 0.45 -3.25 1"64.80 0.05 4.95 -5.40 -4.30 -3.0508 10 0. ./ y'/' / ...~~. a location Dependent Variable : yield..63 0....' .. 7008333 196 ./ / / /' . 2050000 17.. and X2 = sound mature kernel./ //' // /' .Reidual for Seed Size . Sum of OF Squares Mean Square Model 5 401 .. . .65 5. //.. 1016667 102..".. 794111 2.136324 4....4/._ (c) Univariate two factor ANOV As follow.//' /~/"..1_ //.5"508333 F Value Pr ~ F 4.~..3675000 11 506 ...../ //.".04 5..0750 OF 1 2 2 Type III 55 Mean Square o . .'" .'/' ..//.7008333 98. .//- . 038~ .. ~.A ..'" -iI a-./ .3835000 Error 6 1 04 ../+ /~/ ..Residual for Sound Mature Kernels /' ..0575000 205..90 0.. 2: Q-Q Plot ./' '// . 1225000 Source Corrected Total Source location variety location*variety F R-Square Coeff Var Root MSE yield Mean O.....167433 195..../' ."'/...:.0418 O.9175000 80.129 ...0446 Value Pr ~ F 0. . . .. _10. 11 50000 0." Figure 3: Q_Q Plot ..// '/' .8474 0. for Xl = yield variety interaction.' .. Figure ./ /. /.-.. *Evidence of variety effect and. 0300.0508333 42.59 8. 852298 4 . The GLM Procedure Dependent Variable: seedsize Sum of OF Squares Mean Square Model 5 442.5208333 142.507500 390.067500 1089.777500 406 . 8350000 15 .5250 OF Source location variety location*variety 1 2 2 Type II I SS Mean Square 162.0177 Error 6 352.1443 .60 o . 30833 Source location variety location*variety OF Type II I SS Mean Square 1 72 . 105000 58.9516667 F 0.684167 11 2383 .188166 3. 823533 7.130 Dependent Variable: sdmatker Sum of OF ~quares Mean Square F Value Pr :. 355500 6.92 0.1016667 85. 0759 0.5741667 88.65 O.015000 780.76 9. 882500 Source Corrected Total A-Square Coeff Var Aoot MSE sdmatker Mean o . 660559 158. F 4. 832398 7 . F Model 5 ~031 . 0292 Value Pr :.0157 0.695000 162.347500 F Value Pr :.975655 55 .99 2. F 2. 5208333 72. 8058333 11 537 .0146 0.72 o .067500 544.28 6.5148333 Error 6 94 . 1476 Value Pr :.4091667 Source Corrected Total F A-Square Coeff Var Aoot MSE seedsize Mean o .9758333 2 2 284. F 5. 875 20 .325 Bonferroni (Dun) T tests for variable: X3 Alpha= 0.700 -4.685 -23.275 11.05 level are indicated by '***'.01576 Minimum Significant Difference= 10.95 df= 8 MSE= 38.375 Comparisons significant at the 0.05 level are indicated by Simultaneous Simutaneous Difference Upper Confidence Limit Between Confidence Means Limi t -8 .050 -22.131 (d) Bonferroni ~simultaneous comparisoRs of va-ri. 175 22 .900 -9.875 17 .137 -Comparis.confidence Means Limi t .8 -19.560 -4 .26 '***' . Q5 Confiden~e= 0. F ACTOR2 Difference Upper Confidence Between Confidence Limit Comparison 8-6 8-5 6-8 6-5 5-8 5-6 Simultaneous Simul taneous Lower Limi t Means -20.6 -0.575 -42.175 8.'037 .6 Cri tical Value of T= 3.66333 ~ritica1 Value of T= 3. 763 .01576 Minimum Significant Difference= 13. Only varieties 5 and 8 Bonferroni (Dun) T tests for variable: XL Alpha= O.575 Lower F ACTOR2 Compari son 6-8 6-5 8-6 8-5 5-6 5-8 23.700 3.385 7. 412 -21.875 -5 .862 1.512 9.300 9. 300 -7. 200 47.960 18. differ.7'62 . Comparisons si~ificant at the 0.8 .835 5.5 o . Simultaneous Lower Comparison Confidence Limit FACTOR2 8 8 6 6 5 i: Simultaneous Difference Upper Between .ety. .037 0.ons significant at the 0.685 Bonferroni \Dun) T tests for variable: X2 Alpha= 0.050 -47. 763 .59833 Critical Value of T= 3. '500 4.512 .250 -8. and they differ only on X3. 325 42. 250 -3. 960 -3 .95 df= 8 MSE= 141.S -8.575 -9 .05 Confidence= 0.135 8.135 -18.01576 Minimum Significant Differen~e= 25.HL 900 _1 "7i: -0 .625 19.0"5 level are indicated by '***'.95 dr= 8 MSE= 22. 560 4.200 -17.11') R Qi:" *** *** .05 Confidence= 0.875 30 .835 3."625 21.575 -30. 500 17 .~ -11 .7"62 1'0. 385 -17 . 034 0. Wilks' lambda and Pillai's trace statistic give the sam F value.32 (a) MADV A f-or Species: Wilks' lambda A~ = .082. the value of (For Nutrent. For MANOV A. However.l (.203 and p-value = . Wilks' lambda statistic does not indicate Species effects. 260 23. p-value = P( F-.202 720cM.425 The ANDV A results are mostly consistent with the MANDV A results.082) = . 09\l 3. .738 8.260 2.458 F P 10 .05) = 199.099 275.25 'Species effects Do not reject Ho: No MADV A for Nutrient: Wilk'S' lambda A~ = .173 F4.011. suggests there may be Species effects.239 4.827 MS 131. The exception is for 720CM where there appears to be Species effects.give essentially the same result for all factors.722 60.50 0.05) = 19. 1. Analysis of Variance for 720CM Source Spec Nutrient Error Total DF 2 1 2 5 SS 2£2.) For larger sample sizes.562 F2.99 P 0.82 0. A look at the data suggests the spectral reflectance of Japanse larch (JL) at 720 nanometers is somewhat larger than the reflectance species (SS and LP) regardless of the other two nutrent leveL. 06 0 . 5.6776 with F = 5.07.00823 F= 5.3£1 47.2 (.5 Do not reject Ho: No nutrent effects (b) Minitab output for the two-way ANOV A's: 560cM Analysis of Variance for 56QCM Source Spec Nutrient Error Total DF 2 1 2 5 SS MS 8.132 6.476 4.550 F 28. Wilks' lambda and Pilai's trace stati'Stic would .31599 F = 1.011) = .119 4.489 4. Pilai's trace statistic. 1.489 9. This difference is not as of apparent at 560 nanometers. p-value = P( F -. See the Mintab ANDV A output below.39 65.528)=. (a) MAGV A for Species: Wilk' lambda A~ = .74 '0.90 48.000 F4.'52 (.571.000 42.25 4 7 9S . (Joa Analysis of Variance for 720nm Source Species Time Species Error Total * Time DF SS 2 2026.133 6.000 F4.85 MS 1013.40 I! u.12 Reject Ho: No interaction effects (b) A few outliers but.000 Fa.05) = 2.81 4 193. in general.. p-value =P( F~ 45.90 MS 482. 81 27 35 76. 8~ 2 5573. 30 20 20 10 10 o .84 F 169..000 O. p-value=P(F~ 15.06877 F = 36.95 2.52 -0.000 0.52 (.43 2766.t54 95:3. Observations are likely to be positively correlated over time.33.55 Reject H(J: No species effects MANDV A for Time: Wilks' lambda A'2 = . Analysis of Variance for Source Species Time Species*Time Error Total DF 2 2 SS 965.55 Reject Ho: No time effects MANOV A for Species*Time: Wilks' lambda A~2 = .59 637.10 Residual Residual (c) Interaction shows up .s -6 -2 0 -4 4 .04917 F= 45.66 3112. Histogram of the Residuals Histogram of the Residuals (nipo Is 560nm) (.629. Observations are not independent.46 0.05) = 2.528. p-value = P( F ~ 36.62 198.97 224. 000 0.05) = 2.for the 560nm wavelengt but not for the 720nm ~ wavelengt.08707 F= 15. 30 u.52 (.571) = .54 F P 15.po Is 720rv) 90 90 90 eo 70 . \)7 P O.18 1275.629) = . 70.20 o 10 .58 7(J. 60 g 50 tì 60 ~ 60 :i ~ 40 Gl 6. residuals approximately normally distrbuted (see histograms bèiow). 5S 27 35 1769.574 20 30 . 2948 72.4824 88.5328 Estimated covariance matrix 7.7728 97.l.1111 86 .2393 -5.0134 100.7003 80.514 2301.1313 -5.1816 -0.2095 80.714 2098.8108 79.308 2343.8273 73.5918 74.2511 91.839 2610.243 2464.1745 97.3694 72. 0998 0 .912 2660.28.482486.2745 81.4707 70.33-. õ960 88.0000 -0 .3788 2271 .0000 0.1848 65. 0000 6.3496 80.11~6 89.313 25S6.1106 73.7867 64.5312 (B'Sp~ (-l)B) . .0348 80.72 .IN .1425 88.749 2756. 5S06 86.6562 72.2095 73.4040 0 . 313 0..9818 64.0936 2443.7500 95.2$67 Sl S2 92.2393 1 .120 2196.4040 0. 0000 0 . (Continued) 134 t d) The data might be analyzed using the growth cure methodology discussed in Section 6.8907 63. we reject the null hypothesis of a Iinear fit at a = u. with a = 0.7728 63.5918 75.544 2277.1111 88.1425 86.01.013489.961 2098.5506 81.2745 72.2875 MLEof beta 71.7003 80.9555 71.9035 -0.7328 -0.818 Lambàa = 1~ I / IWll = 0.5890 98.5890 71.514 2327.1939 73.749 2369. .0764 72.961 2369.201 Since.34 Fitting a linear gr.549 2196.3800 .5487 80.452 WL 2803.065 o . 625 1~45.1812 71 .~OO -0. fu ths case.065 1845.438 2271.6.3623 74.9555 71.308 2335.q + g)) 10g(A) = 45.0l) = 13.8108 80.544 2335. 120 2531..0000 ~.549 2610.~tp . an interesting question is: Is "Spetral reflctane the same for all species for each date? 6.4441 89.Ol.3623 72.: X~4-i-l)2(O.438 2821. 0000 0 .7500 81.1189 86.3636 78.5487 91.912 2327.(-1) 93.920 2464.282 2660.69. The data might also be analyzed asuming species are "nested" within date.3788 ~.5049 -1.owth curve to calcium measurements on the dominant ulna XBAR Grand mean 72.4156 Spooled W = (N-g) *Spooled 95.920 2443.009 2343. ~OOO 2762.2933 70.4733 71.3636 80.

099 2342.027 2333.9783 0 .410 23ß9. In I S2 1= 18.0366 2836.1003 0.0799 71.2789 -5 .5441 90.83. .(.674 3. with a = V.on the dominant ulna.9033 1.362 2314.7725 70.7839 71.6616 78.93.90948) -54(18. 894 2333.522 2387 .243 2342.0676 90.0081 78.9965 77. n¡ = 45.886 2809..7653 Since.7962 80.9965 77.1546 70.0760 64.522 2889.1 J(2(4)+3(2)-lJ = .0799 -2.27712 so u =(~+~.7962 93.957 Estimated covariance matrix W2 2857.886 2400. In 1 S pool~d 1= 19. we reject the null hypothesis of a quadratic fit at a = 0.135 6.01) = 6:635. (.0081 78.522 2316.'"25 Lambda = I w I / I W21 = 0.027 78.27712) -44(19.848 23ß9.522 2394. 175 2400.~70 2394.9404 -5.063 2358.05) = 7.q + 1)) 10g(A) = 7. . In 1 S¡ 1= 19.05) = 7.2066 0.7725 80.9319 2723.93 The chi-square degrees of freedom v =.0366 75.099 2277.3ß2 2369.! 2(3)(1) = 3 and z.36 Here p = 2. 9323 71.~39 2101.410 2358.OI.243 2369.167 2764.(n .0028 -0.01.1894 -0.6548 S W = (n-l) *8 94.02242 44 54 44+54 6(2+1)(2-1) and C = (1. reject Ho : ~¡ = ~2 = ~ at the 5% leveL.40324.894 2314.: Z.3215 -0. XBAR MLE of beta (B 'Sp~ (-l)B) .9033 0.0028 -0. 6.9783 9.1003 0. 8673 -1.322 2723. n2 = 55.674 2387.3020 -2. Since 2 C = 18.271 2093.35 Fitting a quadratic growth curve to calcium measurements.02242)(98(19.80:65 3 .81.Hp .0676 77.6039 92.90948.0372 2764.2066 0.070 2093.175 2101.893 ~ XL2_i(0. treating all 31 subjects as a single group.(-1) 70.40324)) = 18.

. inee (28 1 F 1 28 J-'M-r.J14. we have p = 5.an i. In 1 S2 1= 6.o (. 1= -17.59.11438) -6(-17.+~.97). reject H 0 : 1:.94 The chi-square degrees of freedom v = . In 1 S...8 +J.0(. the large sample procedure is essentially the same as the procedure based on the pooled covariance matrix.55.lnISpoled 1=-7.07065 23 23 23+23 L 6(3+1)(2-1) and C = (1-.23(6. (J. In 1 S.Jiron. n3 = 38.99.05) = 43.24429)(53(-7.587 ~(21.94)0 x. X2= . Since 2 C = 23. 1= 9.62718 so u =(~+~. -76 97 S' r = 76.05) = 12.sJ =(186.09274. = 24. reject H 0 : 1:1 =I:2 = 1:3 = 1: at the 5% leveL.705J.136 6. Xl = vanadium.48091) .24.40 The chi-square degrees of freedom v = .77. .05) = 12.37 Here p = 3.108533 .423508 -.48091.Jberyllum.52) Female Anacondas are considerably longer and heavier than males. we reject H 0 : PF . 6..!3(4)(1) = 6 and . (b) With equal sample sizes. 38. 6.38 Working with the transformed data..24900.11438 so u=r.67870. = 1:2 = 1: at the 5% leveL. n2 = 11. In I Spooled 1= 8.97:f2.05) = 5.24429 and C = (1-.1 Jr2(25)+3(5)-I) L6 10 37 6+10+37 L 6(5+1)(3-1) =.55:f 2. n. In I S2 1= -7. InIS31=-7.11) PF2 -PM2: 29.PM': 119.05) = 43.97 )0 xi (.587 PF' .PM = 0 at the 5% leveL.81620. 29.24. so 28 F 28 M 47.1 Jr 2(9)+3(3)-1) =.77.!+.xM)' = (119.62718) .09274)) = 48. Since 2 C = 48.24.108533) d.41. Xs = aromatic hydrocarbons. we have (iF . (e) Here p=2.186.148 47.%. = 7.705 14.5.148 ~ (88.40 )0 xi (.033186 . n.23(9..67870)) = 23. 154(.%. -8 +-8 .39 (a) Following Example 6.81620) -10(-7.(.59.07065)(46(8. X3 =.05/2(2)):: z(0125) = 2.24900) -37(-7. 150. X4 = 1 ¡f saturated hydrocarbons J .!5(6)(2) = 30 and .99. n2 = 24.

222 2 = -1. -1. Because of the interaction effect.01 in all cases.20.80 -1 0.41 Three factors: (Problem) Severity.= -4.217J Error sum of squares and crossproducts matrix = 1.22) 11 8 Implementation time: xG = 6. . MANOV A results for significant (at the 5% level) effects.3.217 2.0 130.01852 .68.49 -7 (-2. Effect Severity Complexity Experience Severity*complexity Wilks' F lambda .004 the two responses. Implementation time. Two responses: Assessment time.06398 .1 .03694 . 1.00 265.80. we do not calculate simultaneous confidence intervals for the magnitudes of the mean differences in times across the two levels of each of these main effects.201.000 .39 :!2. There is no interaction term associated with Individual ANOV A's for each of experience however.9 . Relevant summary statistics and calculations are given below. of freedom: 11 Assessment time: xG = 3.39 95% confidence interval for mean difference in experience: 3.J2. the main effects severity and complexity are not additive and do not have a clear interpretation. each at two levels.96 95% confidence interval for mean difference in experience: ~2.222 Error deg. Since there are only two levels of experience.000 .70. xN = 10.33521 P-value 73.667 (2.68-5. We see that both assessment time and implementation time is affected by problem severity.4 9.96 :! 2. problem complexity and engineer experience as well as the interaction between severity and complexity. Wroblem) Complexity and (Engineer) Experience.71:!. Assessment time and Implementation time. xN = 5.62) 11 8 The decrease in mean assessment time for gurus relative to novices is estimated to . we can calculate ordinary t intervals for the mean difference in assessment time and the mean difference in implementation time for gurus (G) and novices (N).54 -7 (-4. For this reason.667 2 6.201 -. show only the same three main effects and two factor interaction as significant with p-values for the appropriate F statistics less than .16:! .137 6.

20 hours.70 hours.22 and 2.62 and 4. .138 be between 1. Similarly the decrease in mean implementation time for gurus relative to novices is estimated to be between 3.

7 9zZ Al so..667 3...117 .200 .y* = v-1/2y_v-1/2.o\'1 hint and note that s* = y* .667 8.1 -1~1 (8::)= 11 (-::) = (~:::J 1 l (120 . 3£3 1. (Z2 .7Bz1 .000 5.4'67 25 - l2 .467 fitted equation: y = -..19z2 7.2£7 9 3 ..79 2.-w Foll.23. pri or to standardi zi ng the variables.. .11 . 1 08 1 .283 .088 .43 + 1.267 13 9 . . 166 ..6£7) 7.292 -1 . yy The fi tted equation for the origi na 1 variabl~s is = 1 33 - y _ 1 2 (Zl .726 .l'y =-- 180 85 123 .. .2 Standardized variables zl z2 Y ..533 .æ ami (n-r-l )02 = Ê*'.716. .000 and y = 12.130 fi tted equa ti on: ...2.2t7 3.~0 -o. Is = 5.tlOO.726 -1 . Y = 1 .3J3 8 .. 7.667.ooõl 1.lOO -'5.2£7 zl 7..l39 Chapter 7 7.2. 5.451 .757. 391 . Residual sum of squares: :1: = 101.400 13. 000 15 5. 1 Y=LS=_ _ 15 351 199 142 = 12 .5\ .716. z1 z12 :z ž2 = 5.695 -..667 + 1.33z1 . y = . £: = Y-Y . '¡sz z = .2£ ..467 3 .. zl = 11 . 81 7 1 .120 -10 ß = (Z'Z).174 .200 23... £5.7'57 and IS = 7.2 1 .~ .391 -1 .- = 9 .6'ó7.6'67 .400 = 9 13. .3 .Ê* is distributed as X1 n-r-1..

ia) pro je.. so M .4 ii ) v=I b) V -1 .j=l J J j=l J is diagonal with jth diagonal -element 1/'1. .1) .e at A.ct... so J n n ""W .Y. is the 'Projection t c) .--d1ag .. 1-1-1 1= ri+1 (Z'Z).. 7.))/n 7.. ..= ¿ ).. + th .) cj y-l is diagonal with jth diagonal element l/z~ ~o J n ~W .:' e. .. ..0. ze- .~ (Z'Z)(Z'Z)-(Z'Z) = PAp1(PA-P')PAP' = PM.e. ~ ..1_ 1 .5 So.Àr. .¡zJ.0) F. ution follows from Hi nt. we check that th~ defining relation holds p -... .w = (zlz). J=1 ß = (z'y-1z)-lzIV-ly = (..r (YJ.140 7.... with PP' = P'P = I .A = :J .)/( ¿z~)..or .1+1 . 1 n n so ß.= r 1. -1 ri + is a generalized inverse of il since .)/( r z....e! = PAP' ...Api = PAP' = l Z ti ) 8y the hint.).z'y = t L z. r Ai -1.= 1 1 .e~ = PA-P' 1. j=l J j=l J â = (zIV-lz)-l :iv-l~ = (L y. that In we show . lZ8 = Z'y. if ze-.6 a) irs t nO.. o À1 AA.0 = - is the Z' (y- .. = A +1 a . 1.o n of y.0 Si nce Z'l = ! ).

.~.two independent X2 random variables divided by their degrees of freedom. = ~(2) ~ (_~U1J . _1_1 The (S11 are r1+l mutually perpendicular unit l~ngth vectors that span the space of all linear combinations of the columns of Z.À.hii)' That is.q~)y = Z(Z'Z).~ 1À. 141 _ -1/2 c) = l. From the Hint.. hji ~ 1. = ¿ q. e. nition 2A...t) H2 = Z(Z'Z)-l Z'Z(Z'Z)-i Z' = Z(Z'Z)-i Z' = H.(2) - r . it is positive semidefinite. for _1 1 _1 ri +1 -1 .=..e. Ühe latter follows because the full random vect-or ê is distributed independently of SZ).. Thé projection of iis then (see R.4 that ß =(~ii) = (Z'Z)-lZ'Y is distributed as N +1(ß. (h) Since i . . .esul t 2A. = -1-1 i =1 1 I.H is an idempotent matrix.7 and Z = (Zl h J . (Z'Z)-l is posiiÏe definite.a2(Z'Z)-1) indepen4ently of nâz = (n-r-l)sZ which is distributed as a2 X~-r-l.8 (.2 and Def. (~(2'-~(2))'(Cl~~(2'-~(2)) iscl2 and this is distribut~d independently of S2. ¿'i~:hij = tr(Z(Z'Z)-iZ') = tr((Z'Z)-iZ'Z) =tr(Ir+1) =r+l. On the other hand.Z'y 1= .-1 i=l -1 -1d) See Hint.H)a = (1.. Then Consider q. Ze. The result follows from the definition of a F random variable as the ratio of .q.(q~y) = ( L q. 7.- I (q!y)q. . 7. Hence hij = bj(Z'Z)-lbi ~ 0 where hi is the i th row of z.12) ri+l ri+l ri+l '1-1-1. Then 0 ~ a'(l .=1 q. Let a be an n x i unit vector with j th element 1. r 1 +1 . )ZI ri +1 Z(Z'Z)-Z' = Z(I. Write .=1 -1.:r-q A Recall from Result 7.

8 -3.i"i:z'ZiJ n' J.5 .5 L ..~9 1 ~5 J Hence 4..9 Z' = (' .ní:~ (z' _ z)2 L:z. ¿Zi + nzj) 1 (Zj .0 2.i .142 (c) Usill (Z'Z)-In£J1=1 ="':1(z' ¡ ¿~I zl -l:~1 . SJ 24 =.2 0 ".j ) .1.1 1.5 0 1.0 = . y = Z~ = 3..2 3. -1 4 2 1 2 3 4.SJ + r 1.5 3.z)2 .9 -1.. 1=1 i=i hjj .0 a 2..(1 Zj)(Z'ZJ-I ( .5 " .z)2 7. ~(2) = (Z'Z)-lZ'~(2) = ri ~5 J t = (~(l) :1 ~~2)J .. e =y-y= 5 -3 3 -1.1 1.0 ..A A A Y'Y = y'y + tit r 55 J-15 l- 0 . .2 -3.8 3..0 -1.0 3. -13.5 .9 3. -2 -1 1 1 :l a (ZIZ)-l '=('0/5 1.9 .1 .10 J ~(1) = (Z'Zi-1Z'l(1) = L~91.5 22 ..2 .5 1.5 1.0 .2z.( .0 . .1 SJ ( 53 . + í:i'i(Zj . 1 -13.5 1.-z)2 £Ji=1 we obtain 1 ( ßn ) i_I .9 -1.

0 J.5 Y02 . 3.225) ~2)P~ (19) = 69.5) (.18 -. . the 95% confidence interval for the mean reponse is given by (1.5 7.35) .5J (Y01. -2..8.18 .5) l'"3.75) 7. Y02 .0) :t 3.9 (1.¿) a 95% prediction ellipse for the actual V's is given by (YOl -2.55. 5. os) (0: . -.75 (7.:! 3.9 )11 + (1. 25.5' 9.825 .35. the 95% prediction interval for the actual Y is given by (1.SS) s (1 + .75).143 7.1 (\9) or . b) Usi ng Resu1 t 7.~H~KI j9)'or (~ . 5 J (3 . c) Using (7-l.: .10 a) Using Result 7..7..~)I.

ZB ) i (T. -8 z . )( Y ..10 with rl replaced by A.Z' B) = I (V.. Tl)us ~ is the best 'Choice f-or any positive d'efi ni te A.- ~/here ~ is any non-zero row of ~(~-B). ) . 1 2 ( c ) tr(A-lt~-B)'Z'Z\P-B) = tr(~p-B)'Z'Z(s-8)AJ = tr(Z i Z (S-B )A(~-B) i) . n (Y. = tr(Z'(f3-B )A(S-ß)' Zi) ~ C i Ac ) 0 . -B z . . j=l -J -J -J -J and i:~=1 dj(B) = tr(A-1rY'-ZB)'(Y-IS)) Next. .zP+ZS-ZB) = ê'€ + '~~-B) i Z i Z(~-B)J so i:~=1 dj(B) = tr(A-l£'tJ + tr(A-l(j-B)'Z'I~i-B)) The fi rst tenn does not depend on the choice of B. Unless B = i.11 The proof follows the proof of Result 7.ZS) i (Y-ZB) = (Y-Z~+Zp-ZB) i (y. Z(S-B) will have a non-zero row.144 7. (¥. Usi n9 Resul t 2A.

34 .13 .Z2 +-1 mean square error = cr ..745 (d) Following equation (7-5b).707 1 s -z(2)zl çl s z~ 2)z(2)-z(2 )zl s zizl .'3 . 57 R = zl (Z2Z3) =/3452.13.12 (a) (1)) best linear pr~di~tor = -4 + 2Z1 . Therefore =IiT= 2 Py Z i · Z 2 = 7. ( : : J . we partition t as t = iL ~ -i ~J 1 1 ii 1 and detenni ne cava r.a i + az = 4 yy _ zy zz .¡ If (a) By Result 7.33 = VS691 . ß_ =zz s-l-s _zy (b) Let !(2) = (Z2.( : J (1 ) .145 7.78 (c) Partition ~ = t l~11 so .Z3J = r 3..73) L 5.y a a i t-1 (c) PY(x) = -zyayy zz -zy _ IS _ . ance of ( 1 given z2 to be :. 1) = l: ~). (1.

s' s-l s 380.04 380.consideration.31 Removifl9 lIy.) = r3649.14 = .z3 7.ZZ =ß') 2 = /s · Is i S syz yy-z2 z. 82 r z.82 ' 02. z2 11 - r~-YZ2I' -zlr~-z2 = zl Z2 s 'S z2z2 = .42 Thus 380 .z2 _ --.37 i l23.(.d riskll and "achieved . z2.146 S691.zl.42 (a) The large positive correlation between a manager's experience and achieved rate of return on portfolio indicat~s an apparent advantage for managers with experience. · z2 s z z Zi yz'.£2 /3649.51 126. we no\'1 have a positive c. (b) from (7-S1) s s syz.YZ2 r r2.11 L _z3z(.25 23.orr-elation between "attitude towar.04 1''02.821 z(l )z(l) -z3z(1) z3z3 -z3z(.34 r S I s' s i i S f---------- = z(l )Z(l): -Z3Z. _ YZ2s 2.05 i _________l-___ 217. ryz.) i S = '600. The negative correla- tion between attitude to\'iard risk 'and achieved rate of return indica tes an apparent advantage for conservative managers.ear'S of eXl'eriencell from . S2 yy z2z2 s zl zl ryz. .) i '3z3 and s .

Now for example.147 returnll.15 (a) MINlTAB computer output gives: y = 11 .o 1. tion of ßO is /1.0067)-1 (45.s an apparent advantage . It appears as if Zl is not needed in the model provided £1 is include~ in the model.OS) = 4. (c) The 95% predi~tion interval is ($51 . .24 3 3 Zl 'Zz .2z. the ~stimated standard devia- .16 Predictors P=r+1 C. 7.025 12058533 . Since fi.2) = .17(..$60. After adjusti ng for years of experience.z. are no apparent (b) An analysis of the residuals indicate there model inadequacies. residual sum of squares = 2tl499S012 with 17 degrees of freedom. F = (45.2)( . Thus s = 3473. 7.025 Zl 2 Z2 2 12..to mana~ers who take ri sks. .23~) (d) Using (7-".996152 = 4906.Q). there .228..870 + 2634'1 + 45. Similar calculations give the estimated standard deviations of ß1 and ß2.45 we cannot reject HO:ß2 =~.

17 with a p value of . (d) The t-value for testing H 0 : ß2 = 0 is t = 1.'. 7..004946 0. the residual plots below are consistent with the usual regression assumptions.5 o .0'6806 0.9.0% Analysis of Variance Source DF SS 2 131.02785 0. ..282.5 "5.0 ~2. .00 2.".'. We cannot reject H 0 at any reasonable significance leveL.7% 3. .es Assets S = SE Coef (. 5. consider the simple linear regression model relating profits to'sales.045 0.999 0. Resîdual Plots for Prots ." ii 2.:¡':-.013 0.17 7.55. The model should be refit after dropping assets as a predictor varable.63 14.oef 0.Reua' (c) With sales = 100 and assets = 500.00577 Assets Predic-tor Constant Sal.oftt~ Residuals 4: 5.95). -"'..-".'.5 20. D' 3 .0 1 .148 sales and assets follows.92 4.0 17.5 ! 0.0 90 :.0 125 15.0 ii.0 10 Fi Value . That is. All leverages are less than 3p/n=3(3)110=.. The leverages do not indicate any unusual observations.4() P 0.005768 p 0. .:. a 95% prediction interval for profits is: (-1.86282 T 0.Reual Residuals Versus the Order of the Data llistni.:'Nónf~tProbabtltypìól: of the ResdualsÝetthe fi Value Residiials 99 . Ii I " 2 f 1 -2.0 "10 10.058 (b) Given the small sample size. 26 Regression Residual Error Total 7 9 104.0'6~1 Sales + 0.'641 = 43..71 MS F 65.. 20.:-.:--.282 R-Sq(adj) R-Sq = 55..17 (a) Minitab output for the regression of profits on Profits = 0.44 1.~1 + 0.45 235.

114z4 with s = 1. A Q-Q plot of the residuals is a bit wavy but the sample size is not large..0 (b) The AIC values are shown below.' .60.63 3 (sales.46 7.322z2 +.19 (a) The "best" regression equation involving In(y) and Z¡. assets) 29. Note that p is the number of model parameters including the intercpt..Zs is In(y) = 2. (b) A plot of the residuals versus the predicted values indicates no apparent problems.18 (a) The calculations for the Cp plot are given below.4 2 (assets) 7.149 7.058 and R2 = . .0 3 (sales. p (predictor) AIC 2 (sales) 29. Perhaps a transformation other than the logarthmic transformation would produce a better modeL.24 2 (assets) 33. assets) 3. It may be possible to find a better model using first and second order predictor variable terms.756-. 2 (sales) 2. Z2.

* * * * * Xl = .60647...0045zS A regression of Ln(y) on the first principle component gives " .05 and R% = . written in terms of standardized predictor variables. for example.1435.. In this case 1n(y) = 1. the first principal component.1...iso 7.0038z2 z with s = 3. (a) (i) One reader. starting with a full quadratic model in t~e predictors z1 and z2' suggested the fitted regressi'On equation: " Yl = -7.2755Z4 .7371 .701 and R% = . z2. 1n(y) = 1. .6615. A regression of 1n(y) on the fourth principle ~ompon~nt produ~~s the best of the one pri ncipl e component pr.li with s = .. .3901Z2 .5281 z2 ..1 .3604x4 and s = .070.e proportion of the variation in the r~sponses. (Can you do bett.6357Z 3 .. ..ession relationship whkh explains a larg..4465.. is .22. Al so a Q-Q plot of the residuals has the fOllO\'ling gen.er than this?) of the residuals versus the fitted values SU99~sts (ii) A plot the response may not have constant variance. .3808 + ..21' This data set doesn1t appear to yield a regr.7371 + .015.z5 in the ing eigenvectors give the coefficients principle component..8940.edictor variable regress ions.. The correspoml- of '1' z2..618 and R1 = .20 Ei genva 1 ues 'Of the carrel atÍ\Jn matrix of the predi ctor vari able'S 2:1.eraT ap?ear- ance: . 7.235.z5 are 1.8545.

...16... . (0 .~. Perhaps a better regr.ession can be obtained after the responses have been transformed or re-expressed in a di fferent metri c... .. ..151 Normal probabilty plot . .. a 95~ prediction interval of zl = 10 (not needed) and z2 = 80 is 10.. .... co .....0217 or (5. ~...¡... C/ ei :: 'C ïñ -...84 :! 2.' -2 -1 o 1 2 Quantiles of Standard Normal Therefore the normality assumption may a 1so b~ suspect...... ..37)..- .. .. I .. (iii) Using the results in (a)(i). . ..32... . .. .~ ... C\ Q) 0: 0 -. ...

16 ll.08 0.1 0._~~.1566 2.552 DomUlna predictor Constant DomHumerus DomUlna Coef 0.1% R-Sq = 71.08 .00473 Residual910ts for DomRadius.The regression equation is DomRadius 0.2 '0.22 (a) The full regression model relating the dominant radius bone to the four predictor variables is shown below along with the "best" model after eliminating non- significant predictors.246 0.0.1064 0..Dom Hi:metus and 'Dm Ulna PtedictlS Normal probabilty Plot " 'O the Reduàls Reiduals Verus the Fitt Value .40 0.~ 50 .31204 2 22 24 F P MS 0.012 0.1985 0..276 DomHumerus . 4 ! II . :D. -I .2174 -0.152 7.357 DomUlna + 0. 98 0.076 0..8% S = 0..7 OJ! 0.346 2.05940 0.088 0. .20797 0.407 Ulna Coef 0..0663502 . 1l 90 .0 0. ~ '0. Q) The regression equation is DomRadius = 0. 87 0.165 Humerus + 0.1637 0.2756 Predictor Constant DomHumerus Humerus 0.02"6 0.1027 0. 20 1. 80 1. A residual analysis for the best model indicates there is no reason to doubt the standard regression assumptions although observations 19 and 23 have large standardized residuals.10406 0..162 DomHumerus + 0..3566 0.103 + 0.164 + 0..000 0.5519 S = 0.002 1.1652 DomUlna T SE Coef R-sq(adj) = 66.16249 0.6% R-Sq = 66.1035 0.7% Analysis of variance Source DF Regression Residual Error Total (ii) SS 0. 58 0.10399 21.53 R-sq(adj) = 63.6 0.1147 0.~ -.74 3.1381 0.0687763 P T SE Coef 0.1 ~ò16 "8.9 1.0 Resual Fi Value Histogl"in of the Residuàls ResidualsVelsus tleOrder ofthè Ðlt 8 f& :! i . 10 1 1l J.4068 Ulna P -1.O¡O 0.97 0.------~~-------------"'-"~_.1 0.128 0.1 ~Or 2 4 6 8 10 12 14 16 18 20 22 24 . ..

153 (b) The full regression model relating the radius bone to the four predictor varables is 'Shown below. 5645 Constant Ulna 0.127175 . It appears as if a multivariate regression model with only one or two predictors wil represent the data well.52 0.31 1.152 Humerus + 0.312038 Total Coef 0.198 DomUlna + 0.127175 24 0.193195 Residual Error 22 0.18 2.223 + 0.11.2202 T 2.11'65 0.92 3. The results for these predictors follow.564 DomUlna + 0.322 DomUlna + 0.217 0.088047 P 0.910 0.01103 0.5% Regression .321 Ulna Predictor Coef 0.68 1.08971 0.274029 Regression Total Error sum of squares and cross products matnx: MS F I 0.3220 0.159 DomUlna Ulna SE Coef 0.5% R-Sq(adj) = '67.064903J .17846 0.99 P VIF 0.050120 .00 1. 1 0.003'674 .2% R-sq(adj) = 55.29 O.8% S = 0.1120 0.39 Analysis of Variance Analysis of Variance Source DF 5S 2 0.09'6597 26.178 + 0.OOC 0.1680 0.0110 DomHumerus + 0.462 Ulna Coef Predictor Constant o . The regression equation is Radius = 0.5953 MS 0.27 0.005781 F 15.080835 (.1 S = 0.020 S = 0.1976 0.08931 0.080835 24 0.46 Predictor p Constant 0.6% Error sum of squares and cross products matrix: The regression equation is Radius = 0.252 0.092431 0.207 0. 11423 DomHumrus Humerus DomUlna Ulna -0.184863 Res idual Error 22 0.2% R-Sq(adj) = 72.000 Source DF 5S 2 0.1833 P 0.1674 0. a multivarate regression model with predictors dominant ulna and ulna may be reasonable.114 _ 0.2108 0.3209 SE Coef 0.064903 .0'606160 R~5q = 70.00 2.0559501 R-Sq 77.07'60309 R-Sq = 59.2235 DomUlna o .11 1. 0'68 2 .1755 T 2.003 2.4625 SE Coef T 1.062608 (.595 Ulna The regression equation is DomRadius 0.059 0.058 o .014 0.09676 -0.050120J .1520 0. This fitted model along with the fitted model for the dominant radius bone using four predictors shown in part (a) (i) and the error sum of squares and cross products matrix constitute the multivanate multiple regression modeL. Using Result 7.

0001 0.4341 Sum of Mean DF Squares Square Model F o . (a) Regression analysis using the response Yi = SalePr. 1000 2000 Predicted /:..... 66g7 o ..01'50 o .4161 SalePr Dependent Variable: X2 Analysis of Variance Source C(p) 7 ..495 2.. .) FtFrBody J Frame ~ BkFat) SaleHt) (a) Residual plot oo II ~ o ll .- .....-...0795 F Value Prob) F 18..6697 o .5782 X5 56 0. 0049 o .154 7. 0029 o ..905 -3..... c: ~ '" o ii:: i: 'Uj .671 Error Prob)F 0.854 o . 00 3000 /' .332721 o .......633612 22.0009 o ....~. ....06)2(1 + z~(Z/Z)-lZO)' SalePr:~reed ._. ~ .025) /(425..5826 X3 0.. 0 o .839 180673.0127 6. Sumary of Backward Elimination Procedure for Dependent Variable X2 Variable Number Partial Model Step 1 2 3 Removed In R**2 R**2 K9 7 0.29880197 -2.823664 1929.366 2..5655 0.3735 6..'W.._...364490 1749. ..482 -3......:~.05739 Parameter Estimates Root MSE Variable DF INTERCEP 1 XL 1 1 X4 X6 X7 1 1 1 18 o . 00 ll o o '9·.. .66673277 T for HO: Parameter=O Prob ) ITI -2...9663 70 12647164.0057 The 95% prediction interv~l for SalePr for %0 is z~ß:f t70(0.o '" 1ã i::: 1i Gl a: ~b) Normal probability plot 00 ll ~ .....23...:.... .0043 0... Gl a: .. 7073 C Total R-square 425....224 0.....-~.. .. 75490590 389. u.832 3292571. .. 0c: oo .0001 5 16462859.3g86440 -77.. .78342 75 29110024. -2 -1 o 2 Quanties of Standard Normal .090 4...420733 89.~..17300145 701.-. . e.0041 0.4033 o ... .21819165 133..~..1538 2. 177'529 46 ._.. 5655 Parameter -Standard Estimate Error -5605....

The few outliers among these latter residuals are not so pronounced.1..:". .235773 -0. ..0001 o ...¡¡ '" a: y/ "" .0057 0. 00827438 -5 . .'. c: .025) J~O....854 0.. .183611 o . 9 7...6189 X4 4 0........841 -3 .... .6108 1 2 3 4 C(p) F Prob~F 7......19018 R~ot MSE R-square Parameter Estimates 0..: ~ ..6108 Parameter Estimate 'Standard Error Parameter=O Prob ~ ITI 0.03992448 0........058996 4...0122 0.. 56794 0... c: . 2902 o . .... Sumary ofBa~kward Elimination Procedure f~rDependent Variable LOGX2 Variable Number Partial Medel Removed In R**2 R**2 Step X3 76 0. ..8 8... ~ . -2 -1 o 2 Quantiles of Stadard Nonnal ... 000 1 1 5. ....:. 0001 1 0........155 (b) Regression analysis using the r.02968 1...6 7.736 0.. . .0033 0...... 1348 6.060 o ..4 7.... .4890 o . . '" ~ .....2 7.. 0594 6 .. 2265 Dependent Variable: LOGX2 Analysis of Variance 'Sum of Mean Source DF 'Squares Square F Value Prob~F Mode 1 4 71 75 4....... C\ II c: II 0 iü :: :2 ..3070 o ...19......6655 0..91286786 5. In(alePrFfl3r.\ .0001 2 . 00742 27.0031 The 95% prediction interval for In(SalePr) for Zo is z~ß:f t7o\O.0081 0....599 3. . . .049418 -'0..6121 '0 .03'617 Error C Total 6...4368 1...Q2)2(1 + z~(ziz)-izo).337 o ..6368 X7 0..6121 6.. : C\ c? "" ." j.....597-'2 0.. 9 '.~~.027613 0. c: C\ II iü :: c: "0 ..esponse Yi = In(SalePr)..0 7. . 00846029 o .0 Predicted a: J C\ 9 ..ed S PrctPF8 j Frame i SaIeHt) (b) Normal probabilty plot (a) Residual plot ....".0013 0.4537 1 .01927655 Variable DF INTERCEP 1 XL 1 X5 X6 X8 1 T for HO: 0.6311 i9 5 0........ 9445 2 .

..94798 Root MSE Parameter Estimates o ."... ........ ........... e.34737 o . ll .96:f t73(0.. . ._. v-...0003 The 95% prediction interval for SaleHt for z~ = (1.06.2 -1 o 2 Quantiles of Standard Normal . 89866 Source C Total R-square 0. ... SaleHt Dependent Variable: X8 Analysis of Variance Sum of Mean OF Squares Square F Value Model 2 235.36221288 X3 o . N N .. .....8987(1.918 3..156 7.. ii:i 'l 0 a: .. 000 1 o .. Ul ii:i 'l c a: .970) is 53.. ëii Gl Ul .t' .00151072 T for HO: Prob ) Parameter=O 2....773 0.334 9.. 60.. .'...:.... SaleHt:r~rHgt) FtFrBody) (b) Normal probabilty plot (a) Residual plot ....5.wr....' .. (a) Regression analysis using the response Yi = SaleHt and the predictors Zi = YrHgt and Z2 = FtFr Body./ ëii Gl 58 ... C) 54 Predicted 56 ../. /~:... 0224 o ...... .... :....' .74'533 117 .87267 131. 7823 Standard Error Parameter Estimate Prob)F 0.165 Error 73 75 65.. C) 52 .50..60204 301. .......86). 08088562 X4 1 o ..0148)..... = (52..846281 3.55..: . . .821 ITI 'Û ....:.24./ ..0001 Variable OF INTECEP 1 1 7.025) \/0.pa ... 802235 o ... 005.... e.

........ o ... . ..... ...0859 o . SaleW~rHgt) FtFrBody) (a) Residual plot (b) Normal probabilit plot oo C' o o C' o o N II ñi =' '0 '¡¡ Gl IX o o - . .1755..-_..9:: t73(0.719286 0.0148) = (1316. 1500 1600 1700 1800 ...60268 75 1263799.. ci . .970) is 1535.6(1.._.63614 195228.. .771'6 o .' ooC\ . ci Predicted r /" / ¡... . oo - -.....'..278 0.) ITI 0.291 4... . ~ ' .99544 11963..5.33265244 2...50.. ..'.17430765 0.......0001 The 95% prediction interval for SaleWt for z~ = (1.745610 1 F Value T for HO: Parameter=O 1.rI. .025)V1l963....157 (b) Regression analysis using the response 1í = SaleWt and the predictors Zi = YrHgt and Z2 = FtFrBody. ....)F 16._.31807 73 873342. .. ......... o . . .. .._. . -.~..¡¡ Gl IX o 8 ... .. SaleWt Dependent Variable: X9 Analysis of Varian~e Sum of Mean Source DF Squares Square Model 2 390456. ..0001 o ...316794 387 . 3090 Standard Error 675.319 0.. .6316 Error C Total 109.......or o oo .. ....- .' ... II ñi =' '0 .741 Prob ..2 -1 o 2 Quantiles of Standard Normal . ....3. ..... ._.5)...37826 Parameter Estimates Root MSE Variable DF INTERCEP X3 X4 1 Parameter Estimate R-square 1 Prob... 93499836 9.... ..' ..... o ..

The '95% prediction ellpse for both SaleHt and SaleVvt for z~ = (1.31902811 Wilks i Lambda Pillai's Trace Hotelling-Lawley Trace Roy's Greatest Root F 11.0001 72 0.96)(ìó2 .O.4850 a .61475433 57 .970) is 1.96)2 + O.4850 11.9)2 ..9) 2(73) . Multivariate Test: HO: YrHgt = 0 Multivariate Statisti~s and Exact F Statistics S=1 M=O N=35 Statistic Value F Wilks i Lambda o .0001 The theory requires using Xa (YrHgt) to predict both SaleHt and'SaleWt.cQOO1 72 0.24186604 0.5.0098(Yói . 4850 11 . The 95% predicion ellpse for both SaleHt andSaleWt Chi-square plot of residuals o - ~ Ò 'l. 4850 11 .4469 Num DF 2 2 2 2 Den DF Num DF 2 2 2 2 Den DF Pr ~ F 72 72 72 72 0.4469 Pillai's Trace Hotelling-Lawley Trace 0.3498(1'i .. 4469 Roy i s Greatest Root Multivariate Test: HO: Multivariate Statistics S=l 1 .0001 FtFrBody = a and Exact F Statistics N=35 M=O Statistic Value 0.0001 Pr ~ F 72 0.000l(Yó2 .50.59574625 57 .4469 1.0001 0.05) = 6.1535..i. 38524567 57.0001 72 a .1535.31902811 0.0001 0.:/' oo ~ co l: II ~l! '" N o ~ ai o 2 4 qchisq oo io - o . o- CO 6 B 10 51 S2 53 S4 5S 55 57 Y01 .158 Multivariate regression analysis using the responses Yi = SaleHt and Y2 = SaleWt and the predictors Zi = YrHgt and Z2 = FtFrBody.53.OI4872F2.4282.53..59574625 57. even though this term could be dropped in the prediction equation for Sale\Vt.75813396 0.72(O.

.025)y'128596...962 o .....1200) is 958.617 6......~.... .. 60408 Root MSE o ... 328962 0..... (It is possible to drop the intercpt but We retain it.ñ .073084 Eror o .88303 C Total 16 7705940. 3625 128596.. " oo ~ 500 1000 2000 Predicted 3000 .0001 Error 14 1800356....0001 The 95% prediction interval for YÍ = TOT for z~ = (1. ..... ...0941) = (154... .1. (a) R-egression analysis using the first response Yi.. 0203 1 206.. CD ir 0 0 y r' ......5:l ti4(O..05 leveL....... 7878 o ... . .. CD ir ..9364 22. 7664 Parameter Estimates Parameter Estimate Standard Parameter=O Frob ~ ITI 1 56..159 7... 00 y ..' .......1).70336862 193. The backward elimination proce- dure gives Yi = ßoi + ßl1Zi + ß2iZ2. 00 0N0 In ãi :: Q "C ..) Dependent Variable: Y1 Analysis of Variance TOT Sum of Mean Source Model DF Square s Square F Value Prob~F 2 5905583.. AU variables left in the model are significant at the D..' N In ãi :: "C üi 0 ... .' . g .....' -' ..8728 2952791....609 o . AMT) (a) Residual plot 0co0 (b) Normal probabilty plot 00 co .' . TOT~eN .. ~ -2 -1 o Quantiles of Standard Normal 2 . 720053 507.. ~ . ..........04977501 Variable DF INTERCEP 1 Zl Z2 T for HO: 0.2353 R-square 358 .0.. ......274 2.9(1....1763. .25. . .....79082471 0..' ....

..0941) = (-9......Gl c: C) C) oo oo ...2 .2692 14 1620657.11640164 T for HO: Parameter=O Prob ....05 leveL.866 o ...0001 Error C Total R-square 340..) ITI -1. G) oo ïii c: i.......2(1..1 :l t14(0... ...... Dependent Variable: Y2 AMI Analysis of Variance of Mean Sum Source DF Squares Square F Value Prob..871 0....0053 o . 2387 183.1517.....231 3.......' .A..86521452 606 .....1.. 7870 Parameter Standard Estimate Error -241......... ... AMi=t(eN J AMT) (b) Normal probabilty plot (a) Residual plot oo CD oo CD (/ ¡¡ :: ooC\ ....234. ¡¡ i::i oo C\ .1 o Quantiles of Standard Normal 2 .160 (b) Regression analysis using the second response Y2' The backward elimination procedure gives 11 = ß02 + ßi2Zi + ß22Z2...025) Jii5761..5384 2994860.347910 196... ......23886 16 7610377....... .)F Model 2 5989720.o¡ ........ 04722563 0. ...' .23703 Root MSE Parameter Estimates Variable DF INTERCEP 1 Zl 1 Z2 1 o . 0001 The 95% prediction interval for 1'2 = AMI for z~ = (1.1200) is 754..~....8824 25....... !t0P'" o oo i: o . en .298 6.4)..' '9 '9 500 1000 2000 Predicted ....' ... . 324255 o . . .. 309666 o . All variables left in the model are significant at the 0.344 115761...

101 .5)(1'Ó2 -754.782 x 10-5(l'2 .1200) is 4. .1)2 s( ? 2(14) ( ) . .44050214 Pillai 's Trace o .6890 1 .1s.16942861 1.754.0941-iF2.958. lI 'C 'C N 1: !! CO ~ CI o'E N 0 00 0- 00 10 .1657 11 0. 0 2 3 4 qchisq 5 1l 7 o 500 1000 1500 2000 Y01 .0391 Based on Wilks' Lambda. .8. QRS=O Multivariate Statistics and F Approximations 8=2 M=O N=4 Statistic Wilks' Lambda Value 0. the three variables Zs.5859 Num DF 1.. 1983 18 0.214 x 10.5)2 + 4..161 (c) Multivariate regresion analysis using Yi and 1"2.8. 1755 22 .305 x 10-5(Yå1 ..D5 .\O. 0 . The 95% prediction ellpse for both TOT and AMI Chi-square plot of residuals U) gII - II i: .9447 Pr ~ F 20 .1) = i. DIAP=O.( .. Multivariate Test: HO: PR=O.( . ..958.968. . 60385990 Hotelling-Lavley Trace Roy's Greatest Root 1. The 95% prediction ellpse for both TOT and AMI for z~ = (1.7541 6 3 Den DF 6 Û 3 . Z4 and Zs are not significant.1.07581808 F 1 ..

77z4 (ii) Observations with large standardized residuals (outliers) include #51.511 .5192 76.0+.0288z2 +.076 28.01 17z3 + 44.008 .727 -.3504 R2 Fitted model Yi = -70.1 + .047 .6595 75. Apar from the outliers.009 .210 .0282z3 + Y4 = -17.210 . #58.511 .239) (b) (i) Using all four predictor variables.550 -1. Otherwise the residual analysis suggests the usual regression assumptions are reasonable. 73.1.798 16.0555z3 + 82.049 .7% . (ii) The simultaneous prediction interval for Y 3 wil be wider than the individual interval in (a) (iii).755 45.6% s 1.5% 75.0120z3+ 15.162 7.029 .3616 80. Observations with high leverage include #57.015 -45.26 (a) (i) The table below summarizes the results of the "best" individual regressions.118 .914 .398 .089 .53z4 27. #60 and #61.763 -17. the estimated coefficient matrix and estimated error covariance matrix are -74.232 -24.120 B= i I= . (ii) 95% prediction interval for Y 3 is: (1.122 A multivariate regression model using only the three predictors Zi.398 .011 85.3530 . #52 and #56.9640z1 + 26.419 .92+.220 2. Z3 and Z4 wil adequately represent the data.0593z2 + . the residuals plots look good.4% .193 .025 .59z4 Y3 =-43.8+.3.04z4 Y2 =-21.7% .244 . .6-. 4.077. Each predictor variable is significant at the 5% leveL.098 .185 .12z4 Y2 = -20.914 . (ii) The same outliers and leverage points indicated in (a) (ii) are present.0224z2 +.089 .486 .193 .

High=2.1364 RZ For the multivariate regression with the two predictor variables Severity and Complexity.270 3.163 7. Exper are coded: Novice=l.919J 5. Guru=2.834 + 1.003Complexity Implementation = -4. .477.9162J 1.) There are no the levels of signifcant interaction terms in either modeL.834 î: = ( .477 Severity + 5. Complex=2.827 (-1. Ea~h predictor variable is significant at the 5% leveL.9162 4.270Severity + 3.827 Complexity 74.9% 2. the estimated coeffcient matrix and estimated error covariance matrix are B = 1. the levels of Complexity are coded: Simple= 1.9853 71.919 + 3.003 -4.27 The table below summarizes the results of the "best" individual regressions. Fitted model Assessment = -1.9707 1. Experienced=3. 3. (The levels of Severity are coded: Low= 1.1% s .5643 A residual analysis suggests there is no reason to doubt the standard regression assumptions.

.

.

.

.

~i = (l/ff. E.. 1/12 .' 4.p)2 = 0 Thus ~0'2-À)H.4 figenvalues of * are selutions..of 1. 'For À2 =O'2(l+i'Ii. 8.o.a2_À)2_2cr4p2J = 0 S'O À = 0'2 'Or À ='"2~lt~hl. . PYl Zz = . The principal campenents are Yl = . ~ . = . = 1.7n7Z. 8.6325/(1+1. = . Y2 = . proporti()n of tatal population variance explained by ~l is 6/(15+1) = .894XZ Vareyi) = À1 '= 6.. ~ fj/Z.a2.-À11 = (a2-Àp-2t~2_ÀH. 4 are not unique.447X2 . fer À~'=a2'tl-pI2).3 Ei~envalues of tare 2.903.707L. For À1 = (12. The two variables contribute unequally to the principal components in 8.164 Chapter 8 8. 4. Therefore.2 ..894X1 + .)1NEiZ).2(a).6325 1 £ = (1 . The two (standardized) variables contribute ~qually to the principal components in 8.1 because 'Of their unequal varian~es.. 8.816 (b) No.429 2. + .6325 Proportien of total population '2 = . With these assignments . -1/12. (c) Py L = . ~~ = 0/".). Py Z = .genvect~rs assaciate with the ei~en- values 4.86. One choi~~ is =i = iO 1 O)çnd :~ =(0 0 1).the 'principal components are y 1 = Xl' Y 2 = X2 and Y 3 = X3 .7!J7L¿ Var(Yi) = À..6325J (a) Y.1 Eigenvalues of * are À1 = 6.707Z2 varianceexpl ained by Yl is 1.-l/I2J..903.~47X1 . À2 = 1. 1 1 .

/ž X2 + 2" X3 1 (l-pl2) 8.p .1.ø ( c 1) = (1 + (P-1)9 H c 1 ) thus varidying the fir~t eig~nvalu~-eigenv~ct~~ pair.p3 .3(1:'À)p1 = 0 or (l +29-À)(1-p-À)2 = O. a2(l+pm 1 (1+1'12) Y 1 = a Xl .i y. 1 1 (b) By direct multiplication . Hence À1 = 1 + 2. .p12) y 3 = '2 Xl .p ..l ues of 2 satisfy IE-ul = (l-À)3 + 2. = 2.y~ - .p.. further ~ :i= (l-p)~i' .3.65 Propert ion of Total Pri nci'fa 1 Component Vari ance 1 1 02 1/3 ..? and results are consistent with (8-16) for p = 3..5 (a) Eigenva. À2 = À3 = 1 .12 X3 1 1 1 1 1 1 Variance Explained y 2 = "2 Xl + 12 X2 + 2" X3 0'2 (1-.

60.999x2 Sample of Y2 =t =13. 14. This is confirmed by the correlation coefficient ry'¡. to put the all the variables on similar numerical scales.999xl + .000. Half length of major axis is 102. with the larger variance.041xl +. ry" x = .3139 (b) Proportion of total sample variance explained by Yi is -l /(-l + t) = . .6 (a) Yi = . wil dominate the first principal component obtained from the sample covariance matrix.707z2 Y2 = .707z1 -. sales.687 The first component is almost completely deterined by Xi = sales since its variance is approximately 285 times that of X2 = profits.70).4 in direction ofyi' Half length of perpndicular minor axis is 4.6861 Sample variance of Y2 =t =.166 8. (d) The sales numbers are much larger than the profits numbers and consequently.8 variance Y2 =-.9982 (c) Center of constant density ellpse is (155.707z1 +. 8. rÝ1'l2 = .xi = 1.000.7 (a) Yi = .8 (b) Proportion of total sample variance explained by Yi is -l/(-l + t) = . Obtaining the principal components from the sample correlation matrix (the covariance matrix of the standardized variables) typically produces components where the importance of the varables.4 in direction of Y2' 19 1 .041x2 Sample variance of Yi = -l = 7488.707z2 Sample varance of Yi = -l = 1.8431 (c) rýi.' 2 (d) r)~ x = 1.i¡ = . as measured by correlation coefficients.918 The standardized "sales" and "profits" contribute equally to the first sample principal component. is more nearly equal. It is usually best to use the correlation matrix or equivalently.918.

8 (a) rý¡.(.167 8. (b) Using (8-34) and (8-35) we have k rk 1 2 3 4 5 .354 . .485 would reject Ho at the 1% leveL.2.719 The correlations seem to reinforce the interpretations given in Example 8..326 .280 4 .435 .604 .5..437 2 .732 -..67 so r=2.831 -.564 .%.2 k = 1.353 T= 103.1.374 5 . .353 . This test assumes a large random sample and a multivariate normal parent population.01)=21.299 r = .J i == 1.5 Correlations: i'\k 1 2 1 .694 3 .726 -.: .Zl == êik.

i ~o11 ~ 1 11 .rO) = . .O"~I) = ii 1 -2aLttr(~n-l)$)) ..tO) A = 'max Ui = .) 2 .- :~.r - ( 21) 2 \ n-n1 ) 2 I S 12 The same resul t appl . ~O' . =1 11 n PT n 5. 2 £! E!!l e max LÜi~t) -' ~. - .~a. il:fo L(~. p Under HO ~ max L(ii.¡e get (b) When t = 0'2 I . n 11 11. .+0 1=1 and the 1 i kel i hood ratio stat..9 (a) By (S-lt) _ .lti8 8... stic becomes . ves n max e-i n n n L(l1.) = 1 11 (2ir)'2 (n-l)2 s~.II L(\1.ed to each variable independently 9. using (4-l6) 1 max L(\l . n and (4-17) \.) 11.e !Y !æ (2n) 2 (cr2... 098 0.64ti -0.(ll . .271 0.00.00022398 0. WellsFargo. Exxon Principal Component Analysis: JPMorgan.094 0.0013677 0.s YO = p.00008897 CitiBank WellsFargo RoyDutShe1l ExxonMobi1 0.000142ti 0.04ti 1.529 0. Under HO there' are P lJ. .tShel1 ExxonMi 1 0. ExxonMobii JPMorgan JPMorgan CitiBank WellsFargo RoyDutShell ExxonMobil 0.001189 .118 -0.000 .149 O.899 PC4 0.309 -0.651 0.1"69 8.345 0. RoyDutShell.414 0.9~ Continue) 50 np e ( )np/2 -np/2 max l.0002538 0. RoyDutShell.00015903 0.570 0. The unrestricted case has dimension p + p(p+l)lZ so the X2 has p(p+l )/2 .00043872 0.00017999 0..589 -0.322 -0.00072251 0. f.00006055 0.529 PC1 0.0 . CitiBank.0'2 (lr)np/2(n_l )np/2(tr(s))np/2 e -np/2 = n p 1 np/2 (21T)np/Z (.780 0.10 (a) Covariances: JPMorgan.1 = (p+2)~p-l )/2 d. 's and.00076568 Fargo.038 -0.00012325 o . 8.663 -0.326 0.a2 I) = -).223 0. '01Ì v~riance so the dimension of the parameter space.642 0.801 PC2 PC3 0. CitBank.!) (1 tr (5) )np/2 and the result follows.248 0.07012 0.00027566 0.00018144 0. Wells Eigenana1ysis of the Covariance Matrix 103 cases used Eigenvalue Proportion Cumulative Variable JPMOrgan CitiBank WellsFargo RoyDu. + 1.00007341 0.216 0.00006410 0.0.250 -.954 PC5 -0.639 0.0.497 -0.307 0.00050828 0.055 0.00043327 0.625 -0.155 0. 118931 -0.397 .270 -1.35 ê3 ê4 ês ê2 .393975 0.759246 0.170 (b) From par (a).28.2.514314 -0. 102 9.769147 -0.00012.128991 0.60 is = 2. ~ are íl: (.513 10.953 12.00195) Â.068822 -0.00025 14 = .507549 0.900 (Symetric) (b) ~ = 108.00137 t = .00014 is = .030 55.062264 0.202000 .626 . ~ = .t = .440 89.828018 0.078 .1.00070 .315978 0.29 14 = 4.27 A ei t =43. Bonferroni 90% simultaneous confidence intervals for Âi Â.767815 -0. .081081 -0. (c) Using (8-33).479670 0.431404 -0.00100) ~: (.15 .308887 -0.037630 0.040076 -0. 8.: (.259861 -0. .554515 -0.067 9 :570 31.0036) (d) Stock returns are probably best summarized in two dimensions with 80% of the total variation accounted for by a "market" component and an "industry" component.t = 31.00019.306 . .249442 -0.00106.-027909 o .858905 0.673 s= 4.00054. -0.937 -.049884 -0.11 (a) 3. 759x3 -.15)/(108.222 -. Variable Xs now has much more influence in the second principal component. The change in scale for Xs did not appear to have much affect on the first sample principal component (see Example 8. percent employment (X3) and median home value (xs).038xl +. We might call this component an employment contrast.Y2.80.316x4 -.91 =-.527 -.15+31/29+4. We might call this an achievement component.y¡.35)=.480x3 +.171 .947 .669 .27+43.220 -. .3) but did change the nature of the second component. -.X¡ -.129xs 5'2 =-.398 -.119x2 -.8S9x4 +.508xS (c) Correlations between variables and components: Xl X2 X3 X4 Xs r.062xl -. The first component appears to be a weighted difference between percent total employment and percent employed by government.60+2.590 The proportion of total sample variance explained by the first two principal Components is (108. The second component appears to be influenced most by roughly equal contributions from percent with professional degree (X2).249x2 -.669 -.238 r.x.212 .27+43. 822 .4'64 .364 3 .448 1.811 . 3.1 34 .586 -2.052 .4~.OO2x7 accounts f-or 87% of the total sampl-e variance.479 (Symmetrk) 1.142 1. ~~ = 1.522 .112xii +. 1 94 .:.28..411 .28.319 .0 -.. Tliefirst . 1 33 1 .978 .1 .167 .166 . Y1 = -. 5:7 = .316 2. ~4 = 2. ~Nete t~ large sample varianc~ f"()r x2 in S).c'Ompont is essentially IIso1ar r-adiation".779 30. ~6 = .024xS +. 500 8.089 ..21 The first sampl-e princi-pal component ..0 ( Symetri c ) . 51G) ..395 6..378 -. ~3 = ll.04'5 3Q. .2£.154 1.297 .0 R = 1.557 ..593 S = .171 .27-0 .673 2.0 Using$: ~1 = 304.52.Oinxi +..074 .254 .. 768 2 .101 1.993x2 +.110 .502 .235 .779 .23'5 . ~2 = 28.177 11 .182 1 .183 .53.624 1 .. 15'6 .0 1.OO5x4 +.12 (300.0 1.014x3 -.172 -2.0 -.914 .11£ .

The first camponent contrasts "\'/ind" with the.527z2 +.20.541z7 These components ~cceunt fer 70% of the total sample vari ance. effect. remaining variables.~37Z1 -.-.308z7.~05z2 -. It might be some general measur.. À5 = :65.199z5 +.319z7 . It might represent a wi~ transport .1~9z6 +.ó44z1 +. .407z4 +.435z4 -.39. "NO" and "He"). A "better" interpretation of the components \'iould depend on more .on"..551z3 .+. It might represent the effects of solar radiåtion since solar radiation is involved in the production of NO and D3 fro!l the other pollutants.5S7zti .225z2 -. The third 'c-omponent is -eampos-d largely of ii. ~2 '=: 1.16 The first thre. .324z6.g.. . Y3 = . A 1.tind" and certain pollu- tants (e. and the pollutants "NO" and Iln3"..113z3 -. The data can be eff€ctive1y summarized in three or few~r dimensions.498zS -.34...278z. .e of the pol1uti()n level ~ The second component is largely cemposed of "solar radiati.173 Usingji: .3 = 1.197z5 +.e sample principle components are A Yl = .54. À 7 = .378z4 -..-.extensive subject matter knowledge. The choice of S' or R makes a difference.007z3 -.. -.73. '"4 = .. Yi = -. À£ = . A ~1 = 2 .

Eigenvalues of the Correlation Matrix PRIN1 PRIN2 PRIN3 PRIN4 PRINS PRIN6 Proportion 'Cumula t i va 0.388886434 X6 0. 65'679 o .1-08386 0. 217405"649 .-07"645 0 . X4 " .0000 0.931345370 0.087004959 l1C933412 0.110933412 o .4958 o .14478 o . 0342 o ..89479 0. 53"66 0.58g699088 '0.3863 o .110131391 X4 X5 X6 0. 5514 0. 3464 0. 15815.1570 0. 4554 o .4958 0.008817694 0.3616 o .0.021814433 o .3464 .65031 0.074885"659 o . 0000 .00000 .861455923 Correlati~n Matrix XL X2 X3 14 X5 X-ô XL 1 .021814433 -0. 5350 o .0707 (b) We wil work with R since the sample variance of xl is approximately 40 times lai.93134537C 0.5514 1.3863 o .110409072 0.347989910 0. 1184"69052 o .074885659 o .6'54750889 0.040543 0. '064672 o .536' 0.024851988 o .0790 .3616 o .024851988 0.1875 o .276915309 0.2"6228 0.. 157" 1.15815'0852 -0.388886434 o . 000 )(4 X5 X6 o .477385 0.'ÛOOO o .862172372 -0.0000 o .571428861 o .21740"5649 o .704'6 0.5350 o .179408 0.12733 0.589699088 O.Q52 -0.ger than that of x4.129607 0.0102 1 . 11'(131391 0. 2432"6 o .86431 1.77764 0.38803 0.008817'694 X3 XL X2 X3 X4 X5 X6 0. 0342 -0.276915309 X5 1 .4554 o .78640 1 .7'Û46 1 .9594"6 Eigenvalue Difference 2.29881 o .612821160 0.0102 0.78786 1 .0707 -.47738 0.087004959 0.347989910 0.174 8.0790 X2 X3 0.0000 o ..13 (a) Covariane Matrix XL X2 XL X2 X3 4.118469052 1 . 1875 1.

. 402854 -.421060 0.116262 o .. 053090 -. 29923 o .998. -..026600 0.17931 -0.X6 are PRINl PRIN2 PRIN3 PRIN4 -0.124585 . (d) Correlations between principal components and Xl ..200526 -.291738 0. 020959 . .429880 0. ~2 = 4. Å3 = 1..175 Eigenvectors PRIN2 PRINl PRIN3 PRIN4 PRIN5 -. . 073090 o .178715 o .14 S is given in Example 5~Z_ ~l = 200.051.339"55 o .211635 o .55393 -0.665604 -. 146492 o .88222 o .029x3 .5.04949 X3 o .628157 .521276 XL X2 X3 X6 .39440 -0.72056 o . -09457 0.07646 0.16171 0..380135 XS 0.97 of the total sample variance.3 The first sample principal component explains a proporticn AI J 200.60720 o .600851 PRING O.203339 o .055877 -.429300 o .75289 o . 78335 X5 o .103175 o .5 + 4. 873960 o .14412 8.029 .339330 o .. We nee the fit four components to summarize the data.358773 o .532689 .302ti8 X4 o .53'67"6 0.5 + 1.551149 -.37909 XL X2 X6 -0.076408 0.061367 -.998x2 +.43969 -0 .4458 .331839 0. 90675 o .. 687297 o .02766 -0 .051x1 -.794127 (c) It is not possible to summarize the radiotherapy data with a single component.5.498607 X4 o .02175 -0. =1 = (-. Hence Yl = -.207413 -. Also.10986 -0.3) = .5/(200.44446 -0.

. 2.0 -1. . w .¡ 'I' ** * .elativ. * . "s"dium in S). Yl (1) * -'15.. .\) q(i ) .. .176 The first principal cQmponent is essentially Xz = sQdium content. Q-Q plot for Yl.. ' ** ** w '.. 1 -2..0 .0 i.. A (NQte the (r.i~ 3.0 1 I 0. -75... Theseàata appear to be approximately normal with no suspect observations.ely) large sample vtlriance for Q_Q plot of the Yl values is shown bel-ow. li '¡w ". .... w .. * oW 'f' -45.0 1. -30. -60.. o.

45xi + . in Example 8. À3 = 452.32 92'6. "" :1 = (.01/~948.15 904.3779.6.l1 = .45. A 1 so . .13.72 Consequent1y~ the first sample principal component aCt:ounts for a proportion .28 1128.23 784. Y.15 831 .01.41 S = 7'63.177 1088.40 8. . . obtained from R.53) Co nsequent 1 y ~ . À4 = 24~.09 850.49.49x2 + .53x4 The interpretation of the first component is the same as the interpretation of the first component.51.. À2 = 4'68.25. = .1"5' ~ A A A À1 = 3779. (Note the sample variances in S are nearly equal). .76 of the total sample variance.73 1336.5lx3 + .53 (Symmetri c) 1395.. .

4676 0.7875 0.6513 Prop.0114 0.0353 0.7074 0.£385 0.x4 Eigenvalues -of R 2. The thkd principal component is nearly a contrast betV'æ. '0.4021 -0. 1886 -0.4702 0.6644 0.2302 -0.392'0.x6 Eigenvalues of R 2.3216 0.0100 0.5346 1. (c) Principal component analysis using xl .6668 -0.n Northern pike and BluegilL . 3925 0.2530 0. Dev. 6284 -0.8874 0.'0471 0.178 8.1810 ~O.9293 1.6722 0.5972 0.7249 -0.0000 The first principal component is essentially a total of all four.5385 0.1731 pe 1 pe2 pe3 pc4 peS pe6 St.6983 -0. (). 4295 O.4429 Eig~nvectors of R O.2016 0.0917 -0.1863 0.2787 -0.8911 ~O.1786 0.0355 0. The second contrasts the Bluegil and Crappie with the two bass. 1.66£5 Prop.1107 0.liey~ is eontrasted with aU the others in the first principal eompoont . (b) Principal component analysis using xl .4242 Eigenvectors of R -0. 0.1640 0.5284 -0.3765 -0. 1.81£1 0.153g 0. 0000 The \Va.0834 0.4938 0.0143 -0.5914 -0..0403 0.3549 1.4652 ~.1107 Cumulative Prop.84£9 0.3871 -0.34'07 '0.2927 -0. of Vax.1969 0.0872 -'0. Principal component analysis of Wisconsin fish data .16.1539 0.9842 0.5074 0.7846 0.6157 0.e largemouth bas.2911 0 . 7~32 0 .7354 0.7'071 0.7'013 -'0.6716 0.5004 0.5172 -0.7126 -0.(') An are positively correlated. of Var.6231 0.4203 0.2250 0. 0. Dev.7352 0.0719 0.8893 1.5555 -'0.5318 pel pc2 pe3 pe4 St.9921 0.0707 Cumulative Pr~p.3'081 -0. The second principal component is essentially the 'Walleye and somewhat th.7293 -0.3621 -0.5711 0.4111 0.look at theLOvariance pattern).

179 8.001 and the first two principal components are .0085298 .0114179 .0089085 .Q13001'6 .0128470 . .181 .337 ...0185352 .432 .0103784 .218 .0079578 .633 . .673.500 .Q0 . -.0168369 .0200857 .0177355 . .0115684 .430 .024 .003 0. .02 0.0912071 .lS4 0.01'05991 The eigenvalues are o ..204.0210995 .0.159 . ..0167936 .514 -x x . .008 o .0080712 ...17 COVARIANCE MATRIX ----------------- xl x2 x3 x4 xS x6 .0803572 .018 0.Q223S. .0694845 . . . .06677"62 .

247 0.990 0.741 PC4 -0.335 .919.422 0.309z5 +A23z6 +.854 0.856 .240 -0..295 0. (c) All track events contribute about equally to the first component.017 -0.395z4 + .973 0. 200m(s).459 0.321 -0. Somoa .378 0.327 0.167 -0.148 0.406 -0.540 )71 = .090 1.237 -0.080 Pe6 PC7 0.389 0.376z6 + .572 0.919 0.383 0.Y2'Z¡ -.018 0.677 0. Czech Republic 8.998 0.266 0.909 0. 54.195 0.923 . USA 2.941 200m(s) 400m(s) 800m 1500m 3000m Mara thon 200m(s) 400m(s) 800m 1500m 3000m 0.378z1 + .127 -0.905 0.161 0.082 0. This component might be called a track index or track excellence component. China 5. Russia 4.101 -0.352 -0.906 .Yi.0910 0.141 0.414 -0.309 0. 400m(s). Marathon 100m(s) ().355z7 )72 =-A07z1 -A14z2 -AS9z3 +.026 -0.245 .389z7 Zi Zz Z3 Z4 r. 200m 400m) with the times for the longer distances (800m.820 0.720 0.383z2 + .389 PC3 0.368z3 + .887 .806 0.977 cumulative 0.680 0. 3000m.180 8.0143 0.867 0.194 PC5 0. .067 0.952 r.l.745 -0.645 0.040 0.18 (a) & (b) Principal component analysis of the correlation matrix follows.669 Eigenanalysis of the Correlation Matrix 0.937 .323 -. 3000m. 800m. 1.389z5 + . Poland 9.240 0. 1500m.674 0. Germany 3.161z4 +.308 Z5 Cumulative proportion of total sample varance explained by the first two components is . These rankings appear to be consistent with intuitive notions of athletic excellence.000 proportion 0.328 -.2793 0.368 0.395 0.407 -0.008 0.587 -0.089 0..728 0.8076 0.732 0.048 -0.911 .731 0. The second component contrasts the times for the shorter distanes (100m. France 6. marathon) and might be called a distance component.. Romania 10.094 -0.782 0.959 Variable 100m(s) 200m(s) 400m(s) 800m 1500m 3000m Mara thon PC1 0.830 0.128 Z6 'l7 .809 0.423 0.871 0.189 -0.1246 0.376 0.364 .830 0. (d) The "track excellence" rankings for the first 10 and very last countries follow.791 0.013 0. Correlations: 100m(s). Australia .6287 0.0545 0. 1500m.799 0.801 0.355 PC2 -0.819 -0. Great Britain 7.002 Eigenvalue 5.

1500mls. marathon) and might be called a distance component. l500m.095 5'1 =.1667141 l500m/s 3000m/s o .981 0.1465604 0.0954430 0.2'65 0.379x3 + .1377889 0.132 0.000 -0.089 -0. 800m/s.435 0.434 0.874 .391 0.1138699 0. Covariances: 100m/s.299x4 + .X¡ -.097 0.19 Principal component analysis of the covariance matrix follows.0997547 0. Marmls 3000m/s 1500ml s 800m/s 40 Oml s 200m/s 100ml s 0.829 0.12384.0956u63 200m/s 400m/s 800m/s Marml s 0.0943056 0.0966724 0.882 .357 -0. 200m/s.323 -0.18.274 0.055 0.081 -0 .08607 0.944 .017 0.519 -0.Yi.1018807 0.236 -0.937 .237 -0.367 -.053x4 +. This component might be called a track index or track excellence component.379 0.667 -0.391xs + .0810999 Marml s 0.184 -0. The "track excellence" rankings for the countries are very similar to the rankings for the countries obtained in Exercise 8.0933103 0.057 .299 0.038 0. 3000m/s.0822198 0.1184578 Marml s Eigenanalysis of the Covariance Eigenval ue Proportion Cumulati ve Variable lOOmIs 20 Oml s 400m/s 800m/s 1500m/s 3000m/s Marml s 0.276 .00885 0. 114'6714 0.YllXi .423X7 5'2 =-.1765843 0.211 0.187 -0.396 0.624 0. The interpretation of the sample component is similar to the interpretation in Exercise 8.926 PC2 Matrix 0.829 PC1 0.046 -0.0905383 lOOmIs o .376 0.998 PC5 PC6 PC7 0.357 0.01498 0.310 0.053 0. 3000m.00617 0.951 .423 0.0650640 0.053 -0.585 -0.519x3 +.926.460 0.21 IXs +.136 0. All track events contribute about equally to the first component.007 0.376 -.nn 8. 200m 400m) with the times for the longer distances (800m. The second component contrasts times in mls for the shorter distances (100m.311 0.730 PC4 0.03338 0.08'64542 0.886 r.18. 499 0.002 1.73215 0.1054364 0.010 0.1437148 0.320 Cumulative proportion of total sample variance explained by the first two components is .3 lOx¡ + .427 0.030 0.098 -0.0749249 o .10831'64 0.199 0. -0809409 0.0%01139 0.445 -0.138 -0.0921422 0.128 0.396x6 +.0735228 0.Q5 0. 400m/s.689 -0.894 -0.991 0. .460x6 + .127 0.176 .902 .734 0.434x2 -.124 0.136 -0.964 PC3 -0.376x¡ -.445x7 Xl I X2 X3 X4 Xs X6 X7 r.00207 0.357 x2 + .410 .

328 0.087 -0.2275 0.529z1 +.351 0.000m Marathon 0. marathon) and might be called a distance component.958 .003 -0.YI.376 Cumulative proportion of total sample variance explained by the first two components is .918 0.244 0.295 -0.273 -0.345 -0.236 -. Portugal 10. lu.338 0. (c) All track events contribute aboutequally to the first component.366 10.20 (a) & (b) Principal component analysis of the correlation matrix follows.339 ('3 + .0097 0.345z3 -.061 -0.154z5 -.348 -0.309 ('1 Z2 ('3 r.04'69 0.860 .366z7 + .069 Yi = . These rankings appear to be consistent with intuitive notions of athletic excellence. Kenya 4.7033 proportion 0. The second component contrasts the times for the shorter distances (100m. USA 2. Eigenanalysis of the Correlation Matrix Eigenvalue 6.077 0.440 o .594 0.344 -0.295z6 -.183 0.217 -0.370('6 + .133 0.332z1 + . Germany 9. Cook Islands The principal component analysis of the women.375 0.353z4 + .154 -0.039 -0.896 r.783 -0.697 0.334z7 -. 1500m.332 0.530 0.54.259 -0.026 0.Y2'Z¡ .470'2 +.089 -0.009 0.353 0.182 8. 233 -0.Z¡ .354 100m 200m 400m 800m 1500m 5000m 0.366 0.972 PC3 PC4 o .346 0.878 l4 .984 PC5 0. Great Britain 3.267 -..089z4 -.147 0. 500m.948 .341 -0.134 -0.914 .529 0.072 -0. This component might be called a track index or track excellence component.838 0.067 0.300 0.339 0.040 0.018 -0.244 0.'6384 Variable PC2 PC1 0.387('8 Zs ('6 ('7 ('8 . Australia 6.917 .OOOm.706 -0. 114 0. (d) The male "track excellence" rankings for the first 10 and very lasti:ountris follow.918.423 .349 -0.362 o .948 . 200m 400m) with the times for the longer distances (800m. the men's track data is consistent with that for .946 0.028 0. Canada . Brazil 8.359 -0.000 PC7 PC8 0.066 0.346('2 + . France 5.001 1.387 0.366z5 + .334 -0.071 -.123 -.0976 0.381 -0.276 -.227 -0.470 0.0707 0. 1.993 PC6 -0.354z8 Y2 =..061 -0.055 0.335 -0.370 0.999 0..012 0.004 -0.652 -0 .838 cumulative 0.851 -0 .080 0. Italy 7.2058 0.OQ6 0.541 0.

0Q.0537207 0.0425034 0.0434979 0. 400m/s.0434632 0. 2oom/s.177 pC7 0.023 0.215 0.971 .128 pc4 pc5 PC6 -0.024 0. 1500mls.096 -0.311 0.006 0.303 -0.387 -0.0572512 0.332 0.970 0.00752 0.6. .20.002 1.074 0.247 0.0959398 0.008 -0.902 .543 0.217 .428 0.523 -0. . 8oom/s.0745719 0.0428221 0.317 0.278x4 + .387xs Xl X2 X3 X4 Xs X6 X7 Xs r.339 0:584 0.49405 0.050 .00112 proportion cuulative Variable 10Om/s 200m/s 400m/s BOOm/s 1500m/s 5000m/s 10. 1o.20.93 0.436 0.006 0. The "track excellence" rankings for the countries are very similar to the rankings for the countres obtained in Exercise 8.923 0.111 -0.318 0.0942894 0.OOOm/s Marathonm/S 5000.442 -.0314951 0.334 -0.0431256 100m/s 200m/s 400m/s 800m/s lS00m/s 5000m/s 10.0571560 0.416 pc3 PC2 -0.384 -.390 -0.21 Principal Covariances: 1oom/s.374 -0.416xs 5'2 =-. 5000mls.0553945 0.0979276 Covariance Matrix Eigenanlysis of the 0.469 0.948 .432 -0.0523058 0.0432334 0.420 0.445 -.033x4 +. The second component contrasts times in rns for the shorter distances (100.039 pc8 -0.YI.119 -0.261 0.0448325 0.244 0.0558678 0.00575 0. .063 0.033 .432 0.858 .964 . 800m) with the times for the longer distances (1500m.947 Eigenvalue 0.0648452 0.998 Eigenvalue 0.OOOm/s Marathonø/S 0. 200 400m.X¡ .0587731 0.341 0.983 0.0541911 0. All track events contribute about equally to the first component.317 -0.822 .0736518 0.0468840 0.044 -0..0617664 0.183 component analysis of the covariance matrx follows.046 0. 10.215 -0.187 -0.016 -0.0482772 0.327 -0.364 0. 000 PCL 0.317 X3 + .oom/s.173 0.364xs+ .00322 0.0688217 0. 8.010 0.368 0.0909952 0. marathon) and might be called a distance component.OOOm/s Marathonm/s 0./s 0.311x2 +. 'SooOm.119 0.439 -0.0562945 0. .04622 proportion 0.608 -0.OOOm/s Marathonm/s 5000.'S23x2 -.244xl + .100 -0.0729140 10.235 -0.684 0.266 Cumulative proportion of total sample varance explained by the first two components is . The interpretation of the sample component is similar tt) the interpretatìon in Exercise 8.0567342 0.310 0.3lOx7 +.844 0.421 0.934 r.844 0.013 0.428x6 + .993 0.261x6 +./5 10.0469252 0.070 -0.421x7 + .0905819 cumlative 0.180 -0.450 -0 .849 . 033 0.584 -0.352 -0.'696 -0.0937357 0.01391 0.lfiJQ~~ 10Om/s 200m/s 40Om/s 800m/s 1500m/s 0.0599354 0.181 .063xS +.469x3 -.Y2'X¡ -.079 0.0535265 0. This component might be called a track index or track excellence component..278 0.923.432xl -.535 0.391 j\ = .0761i388 0.01332 0.

000213 0.008577 _ .2 2. 000444 o . (NOTE: 10 obs hidden.904389 0. 008388 -. 0.284215 0.608787 _ .7 5.184 8. 002665 -.22 Using S Eigenvalues of the CovarianeeMatrix Cumulative Eigenvalue Oi ffe renee proportion 20579.037984 0.1 3.133267 _ .011906 _.998778 0..509727 -.014293 _.9 2.Y2. 855204 0.487193 .003112 0.000253 ~ o .005597 o .1 .004847 0.005887 0.1 o .425175 0.000130 0. Symbol is value of X1.009330 (Õ. 000003 PRIN1 PRIN2 PRIN3 PRIN4 PRINS PRIN6 PRIN7 ~!!S!!. 009680 0.018864 0.487047 o .082331 _.000493 0. e72697 .005278 O.043786 0.3 0.4 15704.010389 0.029196 0.000457 0.004886 _ .0 Eigenveetors X3 X4 X5 X6 X7 X8 X9 PRIN3 PR IN4 PRINS PRIN7 PRIN2 PRIN6 PRIN1 0.6 4874./ 0.024592 _. 000256 yrhgt ftfrbody prctffb fraiie bkfBt saleht salewt .8 0.000341 Plot of Y1.535569 o .003227 _.286337 0.034277 0.5 0.000000 0. I 2500 8 Y1 8 1 8115 2000 5 8 1 5 1 8 81 8 8 18 5 885 5111 11 551 51 15 111 1 1 8 155 55 18 1 1 8 8 8 1 8 8 8 8 8 1 5 5 1500 -100 a 100 Y2 200 300 0.013820 _ .191437 0..593037 0.000069 0.4 0. 008526 0.390573 0.748598 -.808198 0.311194 o .000018 0.

355562 0.433957 . 59575 0.. .~3562 o .434144 0..314787 0.253312 _ . 236723 0.177061 0.04706 PRINI PRIN2 PRIN3 PRIN4 PRINS PRIN6 PRIN7 o ....719343 0. .799288 -.113356 0..2 i i -1 a Q2 1 2 3 .) 38 obs hidden.*.215769 -.618117 _ .127800 _.017768 PRINl PRIN2 PRIN3 PRIN4 0. 600 1500 1 -3 ....101315 0.186705 0..006722 1 ..33713 0.09945 0. 88560 0.. 800 ...208017 0..14650 0.02.449931 0.774926 0.. -.600515 Plot Syiibol of Vl.99328 o .242818 0.) 1200 2500 VL SymbOl used is hi( ~ ***** 1000 VL 2000 .185 8.038732 0...97235 0...060204 0..22 (C"Ontinued) Using R ~igenvalues of the ~orrelation Matrix Ugenvalue Difference Proportion 'Cuaiulative 2...072234 _. (NOTE: Syiibol used is Plot of Yl.02 2 3 .. .290547 0.002397 ..415709 .129837 -.276561 0.247479 0. 582337 is value of Xl.74138 0.78357 0..31996 o .. 582433 0. (NOTE: FOA S 36 obs hidden.191018 0.007728 0..00000 Eigenvectors X3 X4 X5 X6 X7 X8 X9 PRINS PRIN6 PRIN7 0.450292 0.105912 0.568273 _.12070 1.. 02644 0.. . 176650 -.714719 0......020929 0.03930 0.02.269947 . ..065871 _.42143 0.412326 0. 042442 ftfrbody pr(:tffb freiie bkfat .452854 0.94580 0.142995 0.579367 0.160238 yrhgt o ..3 .109535 0.Y2.. ***.) 1200 8 8 8 8 88181 1000 8 118851 8151 Vl 1 800 8 88811111 1 8 15 1 1111155 55 1 1 5 5 600 i i 800 900 1000 1200 1100 1300 V2 Plot of VL.047036 sa leht salewt .581171 0.18581 0.315508 0. . 042790 0.452345 . .2 -1 0 .77969 4.58867 0. (NOTE: 27 obs hidden.

409938 -0.1758 0.848576 0.598371 0. although the sample correlation between the second and Neck is small (-.389366 -0.0492 0.9270 0.580499 0.9200 0.9559 0.9013 1.314€78 -0. Body length.0000 0. The second component essentially contrasts Weight with the remaining body size varables.9437 0.532348 -0. Body lengt and Gir.368552 -0.9025 0.41'0333 0.9439 O.286817 0.87/4673.073950 -0.849339 -0.303143 -0.887138 -0. The first two components explain 99.052267 0. those varables with the largest sample varances.000202 -0.435168 .319513 -0.47 32.0.043918 -0.092026 0.9635 0.0266 Eigenvectors of R (in colums) -0.033880 0.9045 0. b) Using R R 1.318718 -0.0-04276 0.9271 1.012754 -0.291938 0.470832 -0.9045 0. 9025 0.52 0.581252 -0.032666 -'0.228969 0.54 Eigenvectorsof S (in colums) -0.128024 -0.0000 0.928388 -0.8% of the total sample varance.9270 0.008692 -0.9177 0.261937 -0.32 8.9437 0.84 = .4091 £2 0.222694 -0.035396 0.0473 0.012289 -0.216748 0.443264 The first component might be identified as a "size" component.12 1.8752 0. component and Head width.558334 0. These body measurement data can be effectively sumarze in one dienion.9177 0. Neck.846078 -0.186741 0.001815 -0. 9200 0. Gir.231095 0.9635 1.958 or 95.368132 0.070597 -0.4'04313 0.9544 0. 403'672 -'0.019105 -Q. The first component explains 4478.9544 1.1 % of the total sample varance.892805 -0.355060 -0.064458 0.074260 -0.313431 -0.243840 -0.9461 0.23 a) Using S Eigenvalues of S 4478.9503 O.519785 -0.9439 0.251473 0.9013 0.6447 0.060354 0.-0000 0.9461 0.599053 -0.110784 -0. Head lengt.411999 -0.1~6 8.440119 0.8752 1.0565 0.9559 0.0000 0.012490 0.05). It is domiated by Weight.0000 0.719785 .194132 -0.22€606 0.87 152.082353 0.0000 Eigenvalues of R 5.561034 -0.695916 -0.9271 0.458838 -0.060162 0.058127 0.9503 0.

6447/6 = . the first principal component is a "size" component. . but these latter components explain very little of the total sample varance.1 % of the total sample variance. The second principal component contrasts Weight.23 (Continue) Again. The first two components explain 97% of the total sample variance. Head lengt and Head width. c) The results are similar for both the covarance matrx S and the correlation matrx R.l87 8. The fist component in each analysis is a "size" component and aInost all of the varation in the data. These data can be effectively sumarzed in one dimension. All varables contribute equally to the first component.941 or 94. This component explains 5. Neck and Girth with Body length. The analyses differ a bit with respect to the second and remaining components.

3 -4'6261S .9 -114.S -140.9 15 3076.9 502.1 2 -1096.5 -972. 4 ~244 785.8 -293.4821 0. 3 110517.1 444.3 -53.9 560.0642 -0.3 1~1312.8 1251.2662 -0.6 728.1 1458543.9 761592.9 -60.2 101.0567 -0.6 Eigenvectors of S -0.3 286.3 618.3 181437.1 1819.6 490.4 -1479.9 -942.5 -149.0 85. 2667 0.8 359.6 -586.3270 0 .3269 0.8 11 -3931.1 -105.5 8 -1151.6 .3 1079573. Wisconsin.4 2676.3404 -0.4 -893.5 -69.5615 0.4 -379.2 88.5 123.8 279.1 43399 .1 288919.188 8.5 -70.3575 -0.1543 -0'.2 152.1 523.8 -122.6 Eigenvalues of S 4045921.2 -242.711)3 -0.0 209.1 372.4308 0.2 -1113809.8 650. Police Department data XBAR 3557.S -462615.8 1478.2 178.3092 -0.6 2011.3862 -0.8 1698324.2 13 326.4 16 2261.0008 -0.8 809.6 800 7141 S -72093.0810 -0. 9 111)1018.8 9 -497.4932 -0.6 -9.S -199..8 1079573.9 12 -1392.8 330923.9 2265078.24 An ellipse format chart based on the first two principal.5 123.9 139692.7 222.4 85714.5 -15.6415 -0.7 924.4 -1688.8 -598.3675 -0.1 101312.1 7 1118.8 330923.5 5 -1255.0359 -0.4311 -0.0 14 3371.3 -244785.8 367884 .0316 -0.6 -109.8 -899.8 652.9 319.6122 0.1 1448.8 -83.0 -2285.1 -499.9 13563.9 -44908 .3071 -0.5 24138728.~ 4277'67 .7 -104.6 42771)7 .2 10 -2397.3 43399 .1 35.9 224718 .4126 -0.1544 0.7 -72093.1796 -0.6 365. 1 420.6 284.6 -324.8 222491.~ -1113809.2 110517 .2 4 -1360.1173 Principal components yl y2 y3 y4 y5 y6 1 1745.5 -147.5541 0.7 419.8 85714.1 700.9 -1029.cmponents of the Madison.3 -593.1 -422.9 -3715.4062 -0.5 636.1 -274.4094 0.9 -273.0 -1322.0 94302.6 1752.7 572.2 -158.4 -44908 .8 222491.9 207.7 6 971.8453 -0.5157 0.4898 -0.2 3 210.4 1399053.9 139692.3 3.7 -316.9 11'61018.6 7.5696 0 .

. 12 14 1'6 . .189 2. . 1-0 . It ~ 0 ~ .99 The 95% 'Control ellipse base on the first two principal.. . 4 6 .5 X 10-7 yl + 4. Period 12 looks unusuaL. 8 Period ..4 x lL-7 yi = 5. . .25 A control chart based on the sum of squares dij. Sum of squares of unexplained t:omponent of jth deviation . M ~ en en iq 0 d . 2 .cmponents of overtime hours ooo ~ ooo "' -400 o 2000 4000 y1 8..

0 1 2 3 Component Number 4 5 .351 -0.361 0. Conform.5 0. Conform.439 -0. t.451 0.298 -0.118 0.018 0.187 Supp Benev Conform Leader Supp Benev Conform 0.713 0. Benev.o 0.401 -0. The first component contrasts independence and leadership with benevolence and conformity.173 -0.439 l'ortion Cumulative Variable Indep Supp Benev Conform Leader 0. These components explain almost all (98%) of the variabilty.295 -0. Leader Eigenanalysis of the Correlation Matrix 1.008 0.187 0. Correlations: Indep.087 -0.561 -0.253 -0.007 -0.491 -0.151 0.548 -0.480 Using the scree plot and the proportion of variance explained. Benev.018 1. The second component contrasts -support with conformty and leadership and so on.5888 0.525 -0.386 0.788 0. it appears as if 4 components should be retained.667 -0.0905 0. Leader Indep -0.333 Cell Contents: Pearson ~orrelation Principal Component Analysis: Indep. All four of the components represent contrasts of some form. Supp. SG-llot of Indap.190 8.864 PCL PC2 PC3 Eigenvalue 2.1966 0. Supp.521 0.454 0.469 -0.460 0.471 0.492 -0.121 0.982 PC4 PC5 -0. 3682 0.648 0.274 0.733 -0.000 0.327 -0.26 (a)-(c) Principal component analysis ofthe correlation matrix R.115 -0.439 0.7559 0. It is difficult to provide an interpretation of the components without knowing more about the subject matter.

... .. . Observation #111 is a bit removed from the rest and might be called an outlier. yll . . . . .. .. . ..-.. ." .. .. " . .. .. .. . ... . . . .....' . ....__..191 . fi . ... ... . . -4 . -2 .":-: SCàlterplot òfy2h:al vsyiihat .. .." ... -4 . .. .:: . . . " . . " . o 2 1 3 ..~ . ._-.. .:::_.....- .. .':. .'::'/-.. 2 . .. .:---. -3 . .. ... .. .. . . i 3 The two dimensional plot of the scores on the first two components suggests that the two socioeonomic levels cannot be distinguished from one another nor can the two genders be distinguished. . . e.. ." ". . -'.. ..:--:'-.. .:--d"::. . ". .. 1 U' .. .."". .::'--'. ... '.' . . /till o . . ..' -: . .-: ~\ '" . ..3 .-::-":___. . . Scatterplot of y2hatvs 11l1lt . .~ .___... . .. . .. . -: . .: . . .2 yII1 -I ~3 . .. ...... ..... .. . . . .I. . .. . ... .' ..

The second component contrasts support with conformity and leadership and so on.1 LeaCler--Cv Mamx Scre Plot 50 11 i "1 ~ .484 Eigenvalue Proportion Cumulative Variable PC1 Indep -0. Leader Eigenanalysis of the Covariance Matrix 68. Supp. Leader Indep 34.9729 5.4198 -7. The first component contrasts independence and leadership with benevolence and conformity.017 1.422 0.115 0.304 0.219 -0.612 0.7 Inde Supp Benev Conform -18.983 PC4 0.752 0.494 2. In this case.9580 Principal Component Analysis: Indep. These components explain almost all (98%) 'Of the variabilty.=i 30 ¡¡ 20 10 o 1 2 3 CompolientNumbe 4 5 .8447 9.515 0.309 -0.493 Supp Benev Conform 31.163 0. of ItidèP# -.579 Leader -0. it appears as if 4 components should be retained.9419 Leader 26.8682 -8.868 pc3 PC2 -0.478 Using the scree plot and the proportion of variance explained.042 0.352 0. All four of the components represent contrasts of some form.3488 -13.386 0.734 -0 .0426 Supp 17.101 0. Conform. it makes little difference whether the components are obtained from the sample 'Correlation matrix or the sample covariance matrix.192 covarance matrix S. The components are very similar to those obtained from the correlation matrix R.7233 -9. Conform.7165 Leader Benev Conform 29.643 0.5134 0.9422 33.524 0.079 0. Seney.572 16. Seney.119 -0.140 0.000 pc5 0. (a)-(d) Principal component analysis of the Coyariances: Indep. 509 0.583 0.222 0.484 0. Supp.398 0.380 0.706 23.354 0.090 0.271.612 -0.0718 -15.7502 -4.392 0.

. .90. .15 ø -J 10.i: (l-1. Observations #111 and #104 are a bit removed from the rest and might be labeled outliers.752 . .10 io o y1hav 20 ~ .. . . ø "" . . ..20 . .. -. . . . e.. . 2 .... u. ... Large sample 95% confidence interval for Â.752 )=(55. . . . i. . ... . . .... .. . . .... So'. . .. .21130 68. ... . .-. . 1# . .. .. . "....31." .. . . . i.. . .! . fill . . _.. -....96.Sctterplot of y2hatatv VB y1hàtcv Lj 15 to . _'.. .. . . .193 Scatterplbt of y2hatcov vs ylhatcov 15 ... .. .83) . 1# ~ l lö4 -15 -20 . .. . 1 .. ... . . ... . e. . "" #111 ø .96-21130 ((l+1.. . . .. ... -iO .l. ... . .. .. .. .. ... "... . ... . . .. ... .Uil o iO 20 yilhatcv The two dimensional plot of the scores on the first two components suggests that the two socioeconomic levels cannot be distinguished from one another nor can the two genders be distinguished... . 68.. ~ .

998 PC3 0. BS BL EM SF EM 0. 1403 Variable PC2 BL EM SF BS PC1 0.984 0.000 PC4 -0.27 (a)-(d) Principal component analysis of the correlation matrix R. SF.800 0.194 -0.318 -0.510 -0.002 1.035 0.597 0.565 0.960 0.506 0. 975 Cell Contents: Pearson correlation Principal Component Analysis: BL.914 SF 0. EM. 988 0 . Correlations: BL.053 -0.0126 O.942 BS 0 . SF.500 0.485 0.020 0. EM. 875 0 .194 8.960 Cumulative o .819 -0. All the variables load about equally on this component so it might be labeled an index of paper strength.8395 Proportion 0. 003 0.237 -0.0076 0.995 0.261 -0. BS Eigenanalysis of the Correlation Matrix Eigenvalue 3.698 The proportion of variance explained and the scree plot below suggest that one principal component effectively summarzes the paper properties data. Component ftJlmbe .508 0.

.3Hl 0. the cOl'elations of the variables with .513359 0. .332 0.006 0..00 0. BS EM SF BS 0. Sætterplot ofylhat vs y2hat .. ~3 -4 -050 "0. 88£636 4. . .003 0.25 0. ~ .032 0.147318 i.198 0.497 -0.997 PC1 PC2 0. . Covariances: BL.988 0. EM.259 0.. EM.733 0. However..155 0. but there are some differences in magnitudes. . SF. . 000 PC4 -0.. . 'O e..988 0.:. e.. o.325 -0.987966 0.201 0.786 -0. BS Eigenanalysis of the Covariance Matrix Eigenvalue proportion cumulative Variable BL EM SF BS 11..364 The proportion of variance explained and the scree plot that follows suggest that one principal component effectively summarzes the paper properties data.434307 2.25 0.104 0. The loadings of the variables on the first component are all positive.480272 BL BL EM SF BS 8.009 0.001 1. . .295 0.431 0.: .856 0.901 -0.75 1.5 (a)-(d) Principal component analysis of the covariance matrix S.~.458 -0.140046 0.972056 Principal Component Analysis: BL.999 PC3 0..50 y2hiit U)O 0. o.195 The plot below of the scores on the first two sample principal components does not indicate any obvious outliers. . . SF.302871 1.204 0.987585 0.

EM. . 15. .8 y2hatcov / 1. . . Component NurilJ The plot below of the scores on the first two sample principal components does not indicate any obvious outliers. 27.5 . ..0 0....196 the first component are . . SF and BS respectively. this component might be labeled an index of paper strength.6 .. ..0 .2 1. . . ..989 for BL.. . 0.plot of ylhatcov vs y2håtc .990 and . 17. 'Stàtb~r. . a. "... Again.4 0.. .998.. .5 . .928.' . ..a.

40 Catt 60 8Ø 100 (b) Principal component analysis of R follows." .~~ . ... ... . . I . 69 and 72 are outliers.197 8. Observations 25. .. 21) :. Five components explain about 90% of given the the total variabilty in the data set and seems a reasonable number scree plot. . 34. #" r...28 (a) See scatter plots below. ' tt 1 I) . . " ".. . . . ¡ ¡OO 100 0 . . r.. .. 0 20 li 3~..'1 "1 i . "1 i'. 300 . .. il ?i 114'1: 200 100 0 Dist 300 40 50 5cttl'1CJtCJfPistRlI"SiCatte 500 . ... . Removing the outliers has some but relatively little effect on the analysis. . Scttei¡løt øf'family YS Distad 160 . . l=... .

160 0.126 0.300 -0 ..242 PC8 PC9 0.089 0.486 0. 002 -0 .499 -0.240 0.088 0. 2364 1.411 -0.759 -0.118 0.137 0.818 0.249 PC6 -0.833 0.360 0.014 -0.392 0_407 PC7 -0.357 0.077 -0.2720 0.121 0.016 0.255 PC2 PC3 0. Cotton.197 0.048 -0. 0581 0.273 0.311 0.068 -0. 053 -0. 675 -0. Dlsd.100 -0.900 1.496 -0.1470 0.452 -0.065 -0. 131 Princlp81 Component An8lysls: F8mlly.458 -0.241 0.246 0.038 0.554 0.218 0.082 -0.011 -0.68.342 -0.606 0.34.372 -0 .021 0. 079 -0.200 -0.0845 0.625 0.029 0. .378 -0.215 -0.229 0.170 0.579 -0.402 PC7 PC8 PC9 -0.190 -0.6058 0.373 0.249 0.444 -0 .715 PC2 PC3 PC4 PC5 -0.043 -0.284 0.204 0. 1718 0.352 0.361 -0.149 -0 .527 -0 .198 .388 -0.415 -0.352 0.194 -0.041 0.574 -0.027 0.132 0.164 0.770 o .182 O.111 PC4 0.1443 0.461 0. 4381 Eigenvalue 4.278 0.030 -0.2400 0.353 0.385 -0.6043 0.098 -0.379 -0.169 0.045 0.434 0. AdjCotton.751 0. 068 0.110 0.941 0.187 -'0.024 0.065 0.100 -0.1114 0. 043 0.968 0.616 0.337 -0.269 0.355 0.181 -0.000 variable AdjFam AdjOistRd AdjCotton AdjMaize Adj Sorg AdjMillet AdjBull AdjCattle AdjGoats pe 0.051 -0.225 -0.021 -0.604 0.687 -0.988 1.134 0. 000 PCL 0.123 -0. MIII8 BulL.041 0.056 0. 028 0.831 0.072 -0.030 0.987 Eigenvalue 0.361 0.171 0.077 0.497 -0.454 -0.351 PC5 0.1182 proprtion 0.1851 O. AdjMalz AdjSorU.129 0.049 -0. · Eigenanlysis of the Correlation Matrix proportion 4.242 0.363 0. Møz Sor9.440 -0.9205 0.173 0.116 -0.309 0.598 0.971 0.008 0.216 0.100 PC6 -0.3 Coirpone 45 Numbe 6 Prlnclp81 Compon8nt An8lysls: AdjF8m.040 0.067 0.645 0.059 0.621 -0.102 0.069 -0 .329 -0 .012 Eigenvalue cuulative proportion C\lative variable Family OistRd cotton Maze sorg Millet Bull Cattle Gots 1.509 -0.460 Eigenvalue 0.Outle 25.745 0.055 0.465 0.458 0. 089 -0 .048 0.011 0.139 0.240 -0.030 0.097 0.095 .033 -0.7918 1.72 remove) Eigenalysis of the correlation Matrix proportion culative 0.009 -0.013 C\lative 1.134 O.263 -0.019 0.067 0. AdjDlstRd. 504 0.219 -0 .247 0.569 0.396 -0.502 0.122 0.027 0..446 0. 594 -0.797 -0.885 0_5044 0.460 0.293 0.465 0.445 0.941 0.146 -0. 127 0.632 -0..274 0.3661 0.012 -0.

0946 v = 2 (. Conservatively.4137. Â. This component might represent subsistence farms.354.05) = . Without additional subject matter knowledge. family) except for distance to road (DistRd) load about equally on the first component.0946(11. we set the chi- 2(. (. t = .5 -1 o 1 2 ylhat-ylbar (b) To construct the alternative control char based upon unexplained components of the observations we note that di = . .0. For observation 18. this component is difficult to interpret. all livestock.4137) . y. component appears to Again.99 Â. (. The third component is essentially a distance to the road and goats component.186. The alternative control char is plotted on the next page and it appears as if multivariate observation 18 is out of control. Observations 3 and 11 Scalterplot of y2hat-y2bar vs ylhat-yl_ .0782 so e . This component might be called a far size component. makes the largest contribution to d~18 and . this component is diffcult to interpret.05 or approximately 1. 8.07) = 1. The fifth contrast sorghum with milet.1111 -T -1.199 (c) All the variables (all crops. S~2 = . The fourth component appears to be a contrast between distance to road and milet versus cattle and goats.05) = 5. The ellpse consists of all YI':h such that Yl + ~2 S X.4.4137)2 = 4. Milet and sorghum load positively and distance to road and maize load negatively on the second component.29 (a) The 95% ellpse format chart using the first two principal components from the covariance matrix S (for the first 30 cases of the car body"2 assembly data) is "2 shown below. lie outside the ellpse.0782 = .U782 squared degrees of freedom to 1) = 5 and the VCL becomes ex. where -l = .

Car body #18 could be examined at locations 1 and 2 to determine the cause of the unusual deviations in thickness from the nominal levels." 1. l. the variables t.0 . =ä 5(.05) .200 getting the most weight in Y 4 are the thickness measurements Xl and X2.

) = . LL' = .3 a) L = r'':1 = 1i . . ~93.49 .ii = .31 = .2.lil.3.40Hi .e = f . Slightly different (.'5 (.7 .7. .45 .45 . .91 .2 å) For m-' h1 = 9.9 .9.81 .4 i (.35 .5). Corr(~. 2 .) = Cov(Zi'Fi)' i = J. Zl' has the largest correlation with the factor and therefore will probably' carry the most weight in naming the factor(.F1) = 9.25 so 2 = LL' + 'l 9. ' .35 . = . Thus Corr(Zp'F1) = 111 = .45 L' = (. .corr(Z3.Fi) = 9.' By (9-5) cov(Zi. b) corr(Zi'F. .65 .F. .831 . b) Proportion of total variance explained = ~ = .1 .63 . (. hi = 9..1.63 .III = 49 .7~29J .201 Chapter 9 9.8' .21 =.451 9.35 .lY21' .49 .25 .5..711 " . The first .63 .variab1e.'¥ = LL = .i6 = .5ti23 =.6251 9.7 .507.81 h1 .63 .Ìi = . ' L = h1 ~1 = Ii .35 .87'61 from result in Exercise 9.25 The communalities are those parts of the variances of the variables explained by the single factor.

. . S . \'ie post multiply by (Ll' +'1) to get I = ('1-1 -'1-1L(I +L.'¥-l L( I ~'(I + L' '1-1 L) -1) l' .. + l! Therefore. 9.. It shoul.) s ~~+l + À~+2 + .LL' -.. + Àp 9. It A (sum of squared entri es of : S .LL . A.V) S (sum of squared . - be since m = 1 common factor completely determin.. A ¡Ai.-1 (Ll + '1) ..LL).. has zeros on the diagonal.. S .P(2)A(2)P(3) \'1hich has sum of squared er1tri es A .e~ e = 2 .1 .202 Result is consistent with results in Exercise 9... c) Multinlyin~ the result in (t) by L we get . A " I . hint. '11 = ~2 1 + ~2 2 + . .LL i ..6 a) Follows directly from hint...LL =.. b) Using the hint. (sum of squared entri es of S .. Ai .'l . Ai Ai tr(P (2)A(2)P (2) (p (2)A(2)P (2)) J = tr(P (2)A(2f (2)P (2)) .5 Since V is diagonal and S .'(-1 LÙ + L ''1-1 L) -1 L' = . .'¥-1L)-1L''1-1)(LL' +'1) = '1-1 (lL' +'1) -'1-1L(I +L''l-lL)-1L'v-1(LL' +'i) '-(use part (a)) = .i = tr(A(2)A(2)P(2)Pc2)) = tr(A(2)A (2)) ...entries of .1. "'i Ai A A. A m+ m+.c .._..-1Ll +1 -'1-1LL' +'1-1L(I +L'V-1L)-lL' _ '1-1i(i +LI'1~1L)-lL' = I Note all these multiplication steps are reversibl~. By the.

for any choice Ipl/a22 s 121 :S /aZ2 t set . finally.11121 = (111 +~1 'so aii = 9." =:! .717 = :t 1. = 111131 ) . there are an infinite number solutions to the factorization. 9.119.255 · .1 ~ 0'22 -0'22 = o... Since i21 \.. ¥ = 9.11 +... a12 = 9. and .: 9.9/.21 + *2 ( 1 ..55&.)(.21 1 .4) and 9..¡as arbitrary of within a suitable interval. ~12 a2~ 121 +"'d 9. (use part (a)) = '¥-lL-V-$$1 _ (I i-L1'1-1L)-1) = iy-'L(I +L''1-1L)-1 Result follows by taking the transpose of both sides of the final equal ity.203 (Ll' +V)-1L = 1f-\_iy-1LlI +L''i-lLrll''¥-~ .31 + "'3 No\'1 i~ ~ = :..21 = . a12 .i = 121131 1 = 9. 1 + il1 . m = 1. We 2 on 't1 = aii lYll = (111 .l.7 Fran the equation ~ = Ll . 9.4 . Thus .l21 =':! . 1 = (:.-a11 -~V11 --0 and tPZ = a22 . from . 1: = Ll + 'i for m = 1 imp 1 i es = . we have .l119.1 9.l~1 ~ (111 pZau obtai .31 ~ -w have .9 = 111R.lll = alZ/121 an~ check.wl' a2i = ~21 +"'2 and a12 = 111121 let p = a12/lan la22 .12.' '¥.8 .z~ rii ai ~ . Then..t31 =:! .21.9.717. so 9.Ii 112 al2 . --....:~:_'._.. . -----..3111 .--....C .. ... -... Factor 2 .... - _.... Wh... . ._.. ..:-.5 Li quors'.....-: .-.-..... '.._--: _. ...........-.. -----_...9 (a) Stoetzel's interpretation seems reasonable...---------"'------. "...e~ ' We ~o 717 J . .. .__. which is inadmissible as a varianc~. ... -.. . -_._---..._~- ........ " . ~~:="~. ..._-...'_'''_._- _._~.._-=::~?n~ ::~:=::£--= -----:~--~_._----- '...~5 '-:". ..._...5' ..----_._. ..... .....: .--:~-~ .. -_.... ....._. 'A a oc ' .. . ...._~_____:.= .. . -.. "': ...::__ _. .:. .... ... .__." .. ._.._.-. .. ~-':'=-_-:... c variances and communal ities based on the unrotated factors.0 -.' ...--. .1....... -.'--7._--..' .. . ..-~_. _: ..'.-...._.. .: k _...."M._....._._ ..7..._.... ~.-.. .... -_.------.... ...1 .. .. -:. '-. ._-_.~~. .-.~ . .--=~:_~::::__ ~ =~~ d:b.._. ...._..::.. .. ..~~c._. .. _.. _.------_.: _..__.. ._._. .._"--....- . ...._.... ~._:...~:-._.-..0 ~--... ... . . _._..-...204 Note all the loadings must be of the same sign because all the have covari ances åre positiv... _... '0 ..0.. ._ ..-.. ...~:_.4 LL'. . -----:-~7-..~ ~::~: .." _...________..-.. .. ... .... . ...~ -..._. . ._..__.. i:\.... .... -..._--_._------...... . _.. __...__.--'-.. ._._ -_....'-..'-.....'~=:... ---_... -- ..-._.--... ~.. -_.=:-... .......'. 9. The first factor seems to contrast sweet witb strong 1 iquors..--... . ~...:. ..---.. ..._..... -_. __ 0' -_... ._.. .-. .'---. _... .-.. ." .._~_ ... . _...------~__---._-- o Rum .--_.. .. ._.-_.... .' .._-.:. n. . ..-..._....1 .... .' ' .--:..._._.1 ~~7S-= ~.57~...... ... ...~-----_. n .-_. . . -----~......':'-- .:~~: ~::::::......._--~.575 so' ~3 = 1 . ...._. .~ -....~~~!..__'-___.. ..~:"~. ._.. . ...... . ..-_.....-:...._- .._--_.:_h.o~....__. ... .. .. .... Mirabelle --'" .. _.... ....... .. -.. _._--'r-...--:.. ..._._ 0. O. -. '. ..__.. :...~. _.._. _.. .---~ . ..7 ...._... -. ... .- It doesn't appear as if rotation of the factor axes is necessary. .. .--.--... .. .._-----_.. ....~ . . Kirsch" . .255 J.. are given in the f~llcwing table: ..""'" ~ -':': .. .--l a .... ---~-~........-:~-_...... .._.. ..:_~... -:. .__.... ...._..' is e~ . .. . .717 ... Ca 1 vados L..-....._-.. ... : .._-_..255 = (:¡14 ..558 1.-. ...... ..7 1 ..5 --_.-_......_.. ..558 (... -'--'-'-' --.. .9 1. ---.......::.C:-:~-~. .' :. (a) & (b) The speci f." .------_.... . .::i :...__.._.-_. ~...... ~-~._.. .... . -'_...." _ '" __....__. -'" .._".- -_...(b) ---"-"'-_..... -~_. ..... ".... ........ ~___~. .....'-' -'... ... . ......9 J ....-..... ...:'--:. .. . ._..._-~... ' " Marc.. ._.' .... N ._...:. ..._..." . --_......... ':'. ... :~:::~~~ . ~ --~_. .. --factorl --.. '.._.. .._._.n ...~.::... ...._--..' ....... . ... . :. 1221 .032 0 .8779 Ti bi a 1 ength .e struct-ur..001 . 0938 .000 . 9.000 0 . 001 -.11 0 0 Substituting the factor loadings given in the table (Exerci'Se 9.7582 .000 0 .193 -.0000 1 . R-Lz l-'i= z 0 .0000 Humerus 1 ength .000 .9905 Ulna length .10) into equation (9-45) gives.0095 ..000. interpretation -of the fa(:tor-s .7"h or 6. Y (unrotated) = .2418 Femur 1 ength .04692 Al though the rotated 1 cadi ngs are to be preferred by the vari-max ("sim.=1 r 9. A ...01087 y' (rotated) = .4177 6 or 66. = Factor 2 : ! r 12i 6 0 1 = 1= (c) 4.7% ..205 Speci fi c Vari ance Vari abl e Communa 1 ity .pl.5976 .000 .ell) cri terion.0001 6 . 000 .017 -.003 .000 ..9062 Skull 1 ength (e) The proportion of variance explained by each factor is: Factor 1 : ~ .4D24 Skull breadth .018 . 0191419 6. the residual matrix is: 8n .0052707 8.4167255 6. A 2 .685 6. . 0. _ hi = 0.0232507 (.0052707 6. A 2 .6828 5..4474 J ( 1.J2 = 0.0765267 0.1596480 6..7549 7. ".) The proportion explained by the factor is .6107 7.\Î = 10-6 X 2.1021632 J it' = 10-3 X 7.8197 J Sn = -8 = -2 S = 10-3 X 7.12 The covariance matrix for the logarithms of turtle meaurements is: S = 10-3 x 8.7551 6.0752017 . 2 . ( 0.4906 Thus we get .4474 0 2.0058563 (a) To fid specific variances .. 9. n .1673 . In(width) 0.0191419 8.0006342 (c.0104373. not S because likelihood estimation method is used.0001 734. i = 0.7549 5.1673 0 00..112497.1021632 3.0720040 8. .0219489 = .1 23 (10.J3 = 0..:) From (a)-(c).206 seems clearer with the unrotated loadings..Ji = 0. the communalities are '" 2 . h2 = 0.8182 5. 2 .685 7. 2 hi + h-i + h3 = 0.1494 5.i'S.6828 7. A 2 .8563 J ( 10.8182 (b) Since li~ = Îti for an m=l model.1596480 The maximum likelihood estimates of the factor loadings for an m=1 model are Estimated factor loadings Variable 1..i = 8¡¡ .hi the maximum Note that in this case.1124971. h3 = 0. we should use 8n to get 8¡i.4373 7.7727585 J ( 11.0004941.it' .075201 7 Therefore.9440 811 + 822 + 833 0.7551 n 4 7. In(height) 0..6553 5. In(length) 2. we use the equation ..0056053.0765267 Fi 0.8197 5. l.02449 0 The m=3 factor model appears appropriate.093691 -0.L = /i"tl1f~ V. 9.062289 -0.00696 0.811034 0.014563 0.866045 0.070351 0.207 9.079682 0.093691 -0.133955 0.15 (a) variable HRA HRE HRS RRA RRE RRS Q REV variance communality 0.015523 0.021205 0.005616 0.Et~~:: /i~fEA"l = /i"'/i"S = A. "'At. A 1f 1f '1 = I. Ther-efore. .I ...00066 -0.960366 (b) Residual Matrix o 0..058312 -0.188966 0. ALIl .063146 -0.036263 0. The third factor is historical related to accounting historical measures on sales (HRS.02449 -0.033867 0.014563 -0. 004011 -0.13 Equation (9-40) requires m ~ ¥2P+l .002626 0 -0.065101 -0."'.02145 -0.021205 0 0.920318 0.013953 -0.032645 0 0.100611 0.¡g).068312 -0.015523 0. REV).022111 -0.065101 0 0. /i ~/i ~ = /i and E f E = I .006854 0.00696 -0..899389 0.002626 -Q.97322 0. 9. The second factor is related to accounting historical measures on equity (HRE.903478 0.052289 -0.006454 0...035712 0.14 Since "'~ Ä_l A~ Al ""1.009639 -0.058416 0.013953 -0. Accounting meaures on assets (HRA.022111 -0.107308 -0.RRA) are weakly related to all factors.036263 0 0.068971 0. However.096522 0.078402 -0.. RRE).02678 0. .063146 0 -0. market- value meaures provide evidence of profitabilty distinct from that provided by the accounting measures.107308 -0.032646 0.00066 -0.036712 0. (c) The first factor is related to market-value measures -(Q.039634 0.078402 -0. Hêre we have m = 1. RRS). L''l.931029 0.. we cannot separate accounting historical measures of profitabilty from accounting replacement measures. '.009639 0.010351 0.068416 0.02145 -0. '" '" 1.033857 -0.008864 0.005464 0.~ "!!S~-1 "'!.004011 -0.005516 0. P = 3 and the sti"ict inequality docs not hold. 6 o t. c u. 0.4 FACTOR 1 Roia FaCr Panem 0.6 0.9 HRS 0.3 Q RRE HRE 02 0.4 RE 0.6 FACTOR 1 Rotatl FaClr Pattem 0.3 Q .8 RRS 0..7 I' a: 0.9 HRS 0.4 REV 0. HRA 0.8 RS 0.6 0.6 0.5 R~RA 0.4 HRS 02 Q REV 02 0. ' it 0. RRE 0.208 PROBLEM 9.15 HRE .4 FACTOR 2 ROlated Factr Panem 0.8 R :¡ NO.5 RRA 0.6 gc u. T 0.8 RRE HRE .4 .6 0.02 0.6 0.7 '" a: 0 t. HRA a: "". we have A I A i A i (. ..L''Y.U-1 Us. apar from rounding error. . "" = n ti. equation (9-57) suggests the regression and generalized least squares methods for computing factor scores could give somewhat different results.. 9.16 '" '" 1A " 1 fJ. Consequently.x . Lz). Since the number in the (1.1) position.L' '1. _ x UI I A J.(x.x LU.&)''1. f I... '" 1.209 9. the factor scores have sampl e mean vector Q and zero sampl e covarfances.L"..(Cj . ng (9A-l).LI'1..fJ~ = n ti. n "'.\" _fJ' = A .17 Using the information in Example 9. -! = _.x x . is a -.l -J-J' . . = ti.L ''l. is appreciably different from 0. a diagonal matrix. n "'1'" "'l.-J .. and the observations have been standardized. J=l ' n.-x) and : ~J ~ n A 1'" "1 n L U l.ti(I+6)Â. ""'1 r fJ.= n(I+ti..&HìSj . = Â.1" L' '1V.L6.0283 . .).1n.."-1 '" 'f. "1 "r ( -)(' -)1"'-1.~-~(I +ti)ti- j=l Al" "'. . j=l !.0283J (Lz 'l.J. . J=l From (9-50) .= which.\" ( ) 0 "'A A 1"''' Since 1 ~1"'Al fjfj = Â.0137 diagonal matrix.12. "1SAlA" = n ti".2220 -. -J .2220. 37743 0. Bluegil and Crappie load heavily on the first factor.65863 LBASS 0.1969 0.7354 0 .28186 0. 36324 SBASS 0.1728 Proportion 0.08767 0. 50404 0. 48544 0.X4 Ini tial Factor Method: Maximum Likelihood Factor Pattern (m = 1) FACTOR1 BLUEGILL BCRAPPIE SBASS LBASS 0.96841 0.80526 0.3663 0.77273 -0. 76738 BLUEGILL BLUEGILL BCRAPPIE SBASS LBASS 0.4429 Difference 1.18. Principal Components Varimax Rotated Factor Pattern BLUEGILL BCRAPPIE SBASS LBASS FACTOR1 0.7876 0.73867 SBASS LBASS o .13066 O.93147 0.85703 0.8893 1.62774 Maximum Likelihood Varimax Rotated Factor Pattern F ACTOR1 F ACTOR2 BLUEGILL 0.48072 FACTOR2 0.98748 -0. .77273 0.70812 o .40581 o .19047 (b) lvlaximum likelihood solution using Xl .6157 0.1539 0. 73867 -0. while large- mouth and smallmouth bass load heavily on the second factor. 64983 o .1719 0. 63002 o .17543 0. 70439 LBASS 0.5385 0.16518 0.02251 BCRAPPIE o .48073 0.1107 Cumulati ve 0.0000 Factor Pattern (m = 1) Factor Pattern (m = 2) FACTORl FACTOR2 F ACTORl BCRAPP IE 0. Note that rotation is not possible with 1 factor.X4 1 2 3 4 Ini tial Factor Method: Principal Components Eigenvalue 2.41799 (c) Varimax rotation.51319 For both solutions.67309 0.4350i 0 .210 9.1539 0.76738 0.25907 SBASS 0.19445 BCRAPPIE o .65312 Factor Pattern (m = 2) F ACTOR1 F ACTOR2 BLUEGILL 0.5385 0. 64983 ~ . Factor analysis of Wisconsin fish data (a) Principal component solution using Xl .36549 o . 29435 O.21480 0. 00000 0.76170 0.47779 The first principal component factor influences the Bluegil.3925 0.5711 0.14613 Initial Factor Method: Maximum Likelihood Factor Pattern FACTORl FACTOR2 F ACTOR3 o .2830 0.21097 0.20739 o .11256 -0.51192 0.3199 0.02285 BLUEGILL 0. 0707 o . 5004 6 o . 96466 SBASS o .211 (d) Factor analysis using Xl .6644 Difference 1. 26350 o . 46665 Varima Rotated Factor Pattern F ACTOR1 F ACTOR2 F ACTOR3 BLUEGILL BCRAPPIE SBASS LBASS WALLEYE NPIKE o .9843 0.1107 Cumulative 0.3549 1.0876 0.49190 0. 0 .28458 0.12720 -0.44657 -0.1786 0.0719 0.49746 WALLEYE 0.06957 BCRAPPIE 0.46222 0.24459 -0. 13392 -0.05282 0. . and the Walleye and smallmouth bass on the third factor.01286 0.X6 1 2 3 4 Initial Factor Method: Principal Components Eigenvalue 2.23481 o .8459 Factor Pattern (m = 3) F ACTORl F ACTOR2 o .7352 0.29875 LBASS o .46485 0.1640 Proport ion 0. 22600 0 .54231 SBASS LBASS WALLEYE NPIKE 0. 46530 o . 02359 0 .18156 o .58051 SBASS 0.03199 -0.50492 o .20017 0.18979 BCRAPPIE 0. The Northern Pike alone loads heavily on the second factor.0762 o .72422 0. 60333. 13806 BLUEGILL 0.07998 LBASS WALLEYE NPIKE 5 o . 26232 -0 . 83342 -0.01989 BCRAPPIE o .42801 0.24062 NPIKE o .47611 -0.26931 0.20771 O. 85090 -0.05767 0. 99637 0 .74189 0.97853 0.00311 -0.33099 -0.04905 0.86227 0. The MLE solution is different.71176 0.31567 0. 39334 0 . 80285 Varimax Rotated Factor Pattern F ACTORl F ACTOR2 F ACTOR3 o . 0834 o .1640 0.92348 -0.3925 0. 72944 -0. 00000 o .9293 1 .4242 0. Crappie and the Bas. 06520 o . 00000 BLUEGILL 1 . 06257 0 .22770 -0. 12927 -0.0000 F ACTOR3 -0. 00000 0. 316 0. Abstract Math 1 2 3 4 5 6 7 VP Factor Factor Factor 1 2 3 0.729 ROTATED FACTOR LOADINGS (PATTERN) Growth Prof.570 '0.'0369 '0. (b) and (c) l1aximum Likelihood (m = 3) lJNROTATED FACTOR LOADINGS (PATTRN) FOR l1AXIMU~' LIKELIHOOD CANONICAL FACTORS Factor '1 Growth 1 Profits 2 3 4 5 6 7 Newaccts Creati ve r~echani c Abstract Math VP 0.544 0.-0.0352 . 720 1 .527 0.389 0.355 0.181 O~OOO 1 .953 0. O~ 919 0.772 0. 0.347 0.295 3~ 180 1 .0000 .437 0.334 0.300 Communa 1.921 o . ve Mechani c .9615 0.000 0.541 0.968. ti es 1 _ Growth 2 Profi ts 3 Newaccts 4 Creative 5 Mechani c 6 Abstract 7 Math O.566 0..019 0.255 0.208 0. Factor Factor 2 3 0..0000 0.632 26'Z 3. 464 Specifi~ Variances 1 .9648 O.653 .9124 1 .0385 .0876 .S519 . ts Newaccts Creat.912 0.774 0.520 1 .19 (a).0000 0. 1 84 0..509 0.374 0.794 0. 967 O. 4"54 0.433 0.295 0.9631 ..250 0.437 0. nOOO .4481 .426 .179 0.721 0.054 0.212 9. 637 .0 . '(4) Using (9-39) \. .single variables IIcreativell and "abstra'Ct" r.213 1. .591 .(LL i +'1) that an m = 3 factor sol ution repr.0 1.0 ""A .843 1.147 . A R .0 It is clear from an examination of the r.646 .708 . .848 .944 .0 (Symetri c) 1.572 .674 .674 . If we consióer the.0 Ll . The first factor links the salespeople performance variables wi th ma th a bi 1 i ty.386 1. '575 .esents the observed carrel ations quite well.826 .0 + 'l = .641 1.641 .386 .572 1. However.0 1. rotated loadings.0 ..0 1.465 .746 . provide intei:-retations for the factors.413 .923 1.853 .espectively.694 .679 .925 .0 (Symmetri c) .700 1.912 . p.926 1.147 .566 1.o1)= 11.e dominated by the. ..948 .542 . we see that the last two factors ar.) x32(.413 .56£ 1.0 R = .696 . .884 .542 .455 .927 .3 .0 . m = 3 we have 43 833 1 n (.700 .. it is dlfficul t to . = 7.iith n = 50.esidual matrix .591 . 00007593l\ = 62 1 .0 .000018427). . .673. . m.20 Xs ~ -.of the m = 2.751) . Neither. .36 3.344. (e.78 3u.957. ('1 has zeros on the main diagonal for the maximum lik~lihood solutions).679..497) Using the regression method for computing factor scores.70Z. .522. A_1 have. AA " note that the matrices R.13 31. Using (9-50).98 L(symetric) 2.129.686.. -.852. He .214 so \'le reject HO:r = LL' + 'l for m = 3.395) f' = (-. 1.52 ) .criteri-on.~7 S = 300.1. t~e residual matrix above indicates a good fit for m = 3. .) ~' = (1. . .805) 9. -.23 6.2~3.271.59 -2. LL' + V have small determinants and rounding error could affect the calculation of the test statistic. we .= 3 factor models appear to fit by the' x'.465.78 11.computed Factor scores using weighted least squares can 'only be A_l for the principal component sol utions si nce '1 cannot be com" puted for the maximum likelihood solutions. wi th f = LzR~ : - Principal components (m = 3) Maximum 1 ikel i hood (m = 3) f' = (. 1.. Principal components (m = 3) l' = (. -. Again. correlation matrix R. X2 = solar radi~tion and the pair X5 = NOZ and ~6 = OJ . (c) Haximum likelihood estimation (\.37 i -.96 I 5.215 (a) Princi pa 1 components (m = 2) Factor 1 Factor 2 1 oadi ngs 1 oadi ngs i Xl el. .radiation and Xs = °3.19 I Cb) Maximum 1 ikol ihood estimates of the loadings are obtained from L = ~z where Lz a~e the l. .23). estimates of the communålities.oadings obtåined from the sample A Z '.17 17. the first factor is dominated by Xl = solar ..between the paJr Xl = wind. For thë unrotated m = L solution. On the other hand.42 .61 I . The second factor seems t~ be a contrast .74 1. of accounti ng for the covari ances inS than the m = 2 principal component sol uti on.) Xs (N02) X6 (03) -. (For t see problem 9. the pri ncipal component sol ution generally produces uniformly small~r ~stimates of the specific variances. Note: Maximum 1 ikol ihood estimates of the loadings for m = 2 may be di ff1cul t to obtain for some computer packages without good . One choice for initial esti- mates of the comnunallties are the communalities from the m = 2 principal components solution.32 -.lind) X2 ~solar rad.ith m = 2) does a better job. seeproblem9..' " 9.'332 5 6 0. Case 2 3 . This will affect the estimated loadings obtained by the principal component method. The second' factor" might ba interpretad as a contrast bebieen wind and the pair of pollutants N02 and 03. Recall solar radiation and ozone have the larg~st sample variances.370 -0.2 -0.544 -0. for maximum 1 ikel ihood estimat.01. maximum likelihood estimates (m =2.384 -::0.724 -0.070 0. L =i D~Lz and S = O'lRO\ the factor scores gener~ted by the equations for tj in (9-58) will be identical. to som~ extent.es.io: .. is dominated by solar radiation and.252 0.316 0.23~) are giv€n -l below for the first 10 case~. the fact~r scores generated by the we.456 0.515 0. ghted 1 eas~ squares formul as in (9-SQ) wi 11 be identical. ozone.509 -0.22 (a) Since.~gain the ff.rst factoi.546 -0.. l"e factor scores generated by the regression method wi th . f1 - 0._.530 7 8 9 10 :¡ .790 -0.129 4 0.179 " f2 -0... Similarly. .492 0. tl23 p. 9.74 .950 ~O.48 .04 (Æ . 083 1 .31 C! -.203 -0.1 68 0. Principal components (m = 2) Factor 2 Rotated load..24 -.717 0.049 0.23 .77 ~'..53 I .072 (c) The sets of factor scores are quite different.937 -0. ngs 1 oadi ngs loadin~s Facto r 1 -.) .56 Factor 1 .492 10 -0.368 'f .20 Xl Xs (wind) (°3) -.410 ~0. 646 ~ 1 . f1 '" 1 . Factor scores depend heavily on the method used to estimate loadings and specific variances as well as the method us~d to g~nerate them.05 Q£ 2 Factor I -.30 .518 0.856 4 0.st 10 cases are given below: Case 1 2 3 . 029 1. X2 (solar rad..394 5 6 7 8 9 f2 0.447 0. 811 0.795 o .259 0.52 Xs ( NOZ) .65 -.217 (b) Factor scores using principal component estimates (m = 2) and (9-51) for the fit. .) . However.the ca~e.32 -. If this is .19 ~6 (°3) .04 .factr.09 -X ' 2 (solar rad.econd factor-.03 C& I -. It mi ght be called a "ozone pollution factor'l. Al so the two sol ution methods give somewhat different results indicating the solution is not ve~ stabl e.21) for analysis. there is nothing to be gained by proposing a fa-ctor model.25 -.65 -. the second factor appears to compare one of the pollutants with wind.27 C: em IX5 (N02)' .218 l1aximum likeli'hood (m = 2) Factor 1 Factor 2 1 oadi ngs loadings Factor 1 Factor -. There are some differences for the s. we see that both solution methods yield similar estimated loadings for the first .43 J ~ 1 (wi nd) Rota teet loadi ngs 2 -.17 . We may need about as many factors as vari~blas.38 . It might be called a "pollutant transportU factor.10 Examining the rotated loadings.50 . \4e note that the intèrpretations of the factors might differ depending upon the choice of R or S (see problems 9.20 and 9. Some of the observed carrel ations between the variables are vary small implying that a m = 1 or m = 2 factor model for these 'four variables will not be a completely satiSfactory description of the under~ 'lying structure. . 381 PerCen tProDeg 0.541 -0.398 1.219 9.295 0.2236 0.. SçreêPlqlofPopulation.371 -0.729 0.119 .0 lI ai iü 0. The scree plot.180 .173 4.685. the correlation between Percent Professional Degree and Median Home Value.373 -.026 .9919 0.180 1.512 0.192 . .065 . However.411 1.870 0.010 1.24 -.192 1. a factor analysis with fewer than 4 or 5 factors may be problematic.696 MedianHorne Variable Population Variance % Var 1.373 .0 2 4 3 Factr 5 Numbe Principal Component Factor Analysis (m = 3) Unrotated Factor Loadings and Communalities Factor1 Factor2 Factor3 Communality 0..5 lI :i ii ~ 1.756 0. reinforces this conjecture.9'62 -0.. MedianHøme 2.708 PerCentEmp::16 0.153 0.676 PerCen tGovEmp 0.0 -.065 1.274 0.313 -.807 -0.064 -0. there is no sharp elbow.685 -.313 -.830 0.5 0.209 -0.845 .0 1. we present a factor analysis with m = 3 factors for both the principal components and maximum likelihood solutions. Consequently.0 The correlations are relatively small with the possible exception of .411 -. shown below. The scree plot falls off almost linearly.0 -.119 .026 -.584 0.8642 0.460 -0.3675 0.685 R = .0 .837 -0.010 ..0 . 7772 0. 3 2 . .109 -0. .~ .321 1.6043 0.348 1.577 0. .0419 0. . .. .801ì -0.870 MedianHome ~ 0.0803 0.216 Factor Score Coefficients Factor1 Factor2 Factor3 0.' .135 -0.059 -0.321 11.147 PerCen tProDeg PerCentEmp~16 0.102 C-d.208 1.940 -0. .. I 0 -1 . -2 .015 1.226 1. .362 PerCen tGovEmp Variable Population MedianHome Variance % Var Factor3 -Coiiunality -0.~. . .103 0. ..281 4. .7382 0. .962 0.989 PerCen tProDeg -0. .701 -0. .522 0.082 1.138 -0. .830 Variance 1. .4050 0.020 -0.146 0.941 0.0011 -0.2236 0.Q_85Q'/ -0.220 Rotated Factor Loadings and Communalities Varimax Rotation Variable Population Factor1 Factor2 Factor3 Coiiunal i ty 0.1310 0.49'6 3.807 0. MedianHome (PC:) 4 .068 0.099 0.999 0. -3 -2 -1 o Firs i Faêtr 2 4 3 Maximum Likelihood Factor Analysis (m = 3) * NOTE * Heywood case Unrotated Factor Loadings and Coiiunalities Factor1 Factor2 -0. 'a .009 -0. ..277 _.000 1. .028 -0.019 Variable Population -0.169 0.000 0.845 PerCen tGovEmp % Var 0. i . . . .047 -0. .070 Score Plot of Population.313 PerCentEmp~16 0.278 -0. .755 . .544 PerCentProDeg PerCentEmp~16 PerCen tGovEmp MedianHome 0. . ~ 1 æ .160 0.658 -0...395 -0. . . . .298 0.000 0.059 -0. .756 -0.118 ~ ~ 0. .. .984 0.052 0.. . . We call this factor an "employment" factor. 1 . . . .046 -0.046 0..070 PerCentEmp::16 -0. . . . It is difficult to label this factor but since income is probably somewhere in this mix. -2 .. 000 -0. .001 -0.984 0. . . The second factor is a bipolar factor with large loadings (in absolute value) on Percent Employed over 16 and Percent Government Employment.496 1. .145 1. 000 1. .001 MedianHome Factor3 -1.155 -0. iX -1 . .1740 0. The third factor is clearly a "population" factor. 025 0. . Factor scores for the first two factors from the two solutions methods are similaro . it might be labeled an "affluence" or "white collar" factor.. 4 1. MedianHoníé (MU£') of ..177 0.000 Rotated Factor Loadings and Communalities Varimax Rotation Factor1 Factor2 0. ~ : 0 . the first factor in both methods has large loadings on Percent Professional Degree and Median Home Value.235 1.053 1. . -3 -2 -1 o 1 2 3 4 fil'Fllctr A m = 3 factor solution explains from 75% to 85% of the variance depending on the solution method.755 0.041 -0.002 -0. 2 .221 ~ c¡ ~ 0.000 -0. ~ . .. . .047 PerCentEmp::16 -0.. . .7772 Factor1 Factor2 -0. . The second and third factors from the two solutions are similar as well..090 PerCentProDeg 0. . -ø i: 8 . . .206 Factor Score Coefficients Variable population Plot Score Population. . 017 PerCen tProDeg 1.159 -0. . .010 PerCen tGovEmp -0. .137 -0.061 0.298 0. .165 0.430 0. .. .333 PerCen tGovEmp Variable population ~ MedianHome -. ....5750 0.. 0282 3 . . .. .315 Variance % Var Factor3 Conuunality 0.036 -0. . Using the rotated loadings. .1% 86. 1 .186 81 .e v. 90. loadings Maximum Likelihood Fa ctor 1 loadings Shocki. spet:imens 9 and 16.e 4.530 1.734 87.804 . Stati c test 1 . Vibration 293. .204 91 .ery similar.222 9.802 The factor scores produced from the two sol ution methods ar.033. Variance Expl ained Factor scores (m = 1) using the ~gression method for the first 'few cases are: Principal Components Maximum L i kel ihood -.. 320..ers.524 . The correlation between the two sets of sc~~es is . T1i'Outli. 287. 275.343 H)4.809 90. 291. Pri nci pa 1 Components Factor 1.242 94.9% Proportion .625 S -' 94./ave 317.992.761 76.Z80 101. were i'Óentifi.25 105. 297.009 -.719 .808 .329 (Symmetric) A m = 1 factor model appears to represent these data qui te well .15. Static test 2 307.d in 'Exæipl. 7 Li tter 4.4 471. 32.Factor 1 11 oadi nQS 1 'P .9 182.30.26 a) Principal Compûn~nts L m = i( Factor 1 loadi nos lm=2( '1 .0 27.5 18.223 9. . 0 Percentage Vari ance ' .0 32.7 30.4i 76.9 309.8 ' '~ v.4 205.2 271 .5 344.3 31.8 Percentage Variance Explained 76.0 245.9 -6. The maximum likelihood.4% l! b) " 9.4 529.6 Litter 4 '. 68.4i Maximum Likelihood Litter 1 .2 Litter 2 30. Litter 1 21.2 Li tter 2 " 30. .8i Explained.2 Litter 3 31.2 Litter 3 28.5 1.4 -4. Factor 10adinas 26. estimates of the factor loadings for ii = Z we're not o.5 1 98.btained due to convergence difficul ti es in the computer program.9 310.9 -8. c) It is only necessary to r~tate the m = 2 solution.i 370. Factor 2 '1 oadi naS I. 224 Principal Components (m = 2) Rata ted 1 oadi ngs FactOr 1 'Factor 2 Litter 1 26.71 Li tter 3 .20 ..44 45.44 .78 .4% Explained 9.33 .21 .86 .87 .32 Litter 4 .36 .4 Litter 4 31.5~ 32..87 -°.53.59 .8 Litter 3 14.4 12. Factor 1 Factor.4% 40.4 li tter 2 27. J Factor 2 .12 .5~ .Z 11..7 33.06 .91 Li tter 2 .8 Percentage Var. (m = 2) Rotated loadin9s 'l . Percentage Variance Expl ained' 76.85 -.14 .Principal Components.27 .15 .91 .6% i Factor 1 .5 13. 2 10a~ings . ance .5% 9. loadings Litter 1 . Factor 1 '1 .34' Litter 2 .81 Expl ai ned '" "-1 f = L R z z. . loadings 1 Li tter 1 .34 .91 .81 . .. _ = .225 Maximum Likelihood (m = 1) .297 .39 litter 4 .17 litter 3 .ßl Percentage Variance 68.78 . 124 -0. 400m(s).061 0.308 0.02770 0.582 0.270 274.984 4.230 0.869 % Var -0.124 -0.02 0. 030 0.999 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factor1 Factor2 Communality 0.06138 0.38 0.517' Variance 242.785 400m(s) 0. The second factor.20276 0.238 270.00 0.075 800m 0. the first factor explains about 98% of the variance and the largest factor loading is associated with the marathon.725 0.640 200m( s) 6. 1500m.014 278.749 6.34456 0.42682 28.15532 0. 200m(s).881 1.051 -0.006 -0.18181 0.725 -1.90373 0. The first factor might be labeled a "running endurance" factor but this factor provides us with little insight into the nature of the running events.02141 0. Covariances: 100m(s).55435 10. 3000m.38499 6.06617 0.23388 4.86309 2. explains relatively little of the remaining variance and can be ignored.767 -2.431 0.66476 10.172 100m(s) 200m(s) 400m(s) 800m 150 Om 3000m 0.36 0.749 -0.401 1.027 0. with either unrotated or rotated loadings. 21965 Principal Component Factor Analysis of S (m = 2) Unrotated Factor Loadings and Communalities Variable Factorl Factor2 Communality 0. however.226 9.052 -0.006 0.158 3000m Mara thon Variance % Var ~ 16.178 o ~~ Marathon (i5. Marathon 100m(s) 200m(s) 400m(s) 800m 1500m 3000m 100m(s) 200m(s) 400m(s) 800m o .217 1500m 0.70609 . 800m.74546 0. Using the rotated loadings.08389 0. It is better to factor analyze the correlation matrix R in this case. Using the unrotated loadings.00755 0. the principal component solution with m = 2 is given below.270 36.28 The covariance matrix S (see below) is dominated by the marathon since the marathon times are given in minutes.50918 1.53984 0.21616 3.073 0.453 270.052 0.999 1500m 3000m 0. the first factor explains about 87% of the varance and again the largest loading is associated with the marathon.267 100m (s) 0.27015 1.38 0.07418 0.380 -0.38 0.438-' 0.453 -0.19284 0.89130 Marathon 0.33418 Marathon Marathon 270.129 278. It is unlikely that a factor analysis wil be useful.143 -0.373 -5. 1500m.923 0.921 0.669 200m(s) 400m (s) 800m 150 Om 3000m Mara thon 200m (s) 400m (s) 800m 1500m 3000m 0.~ ¡c 3 .ctrNumbet Principal Component Factor Analysis of R (m =2) Unrotated Factor Loadings and Communalities Communality 0.933 Variable Factor1 100m(s) 200m(s) 400m(s) 800m 150 Om 3000m Marathon Variance % Var 0.677 0.791 0.Maràthon(èortlilàtion lbtnx) II .801 0. Marathon 100m(s) 0.782 0.906 0.941 0. _.680 0.867 0.720 0.909 0. 3000m. 400m(s).ii ØI ¡¡ 2 1 o 1 3 4 5 i 6 F.856 5.905 0.871 0.820 0.887 0. Scree Plot of iOOm(l.806 0.934 0.8076 0.090 6.809 0.919 7 .940 0.732 0.854 0. 200m(s).728 0.799 The scree plot below suggests at most a m = 2 factor solution.938 0.973 0.951 0.674 0. 800m.919 0. Correlations: 100m(s).910 0..828 0.960 0.227 The correlation matrix ~ for the women's track records follows.6287 0.830 0.4363 0. . j 0 :. 90'6 0.933 0. Score 2 !1 .934 0.919 0. . . . ...960 0.4363 0. .525 400m(s) 0.481 3000m 0.259 0. .228 Rotated Factor Loadings and Communalities Varimax Rotation Variable Communali ty 0.. . ~-1 tI . .172 0. #11 -3 -2 -1 o 1 Factr 2 3 Firs Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Communality 100m(s) 200m(s) 400m(s) o . . . . . .856 0. . .440 Factor Score Coefficients Variable Factor1 Factor2 -0.. 6'62 5 . .801 0.940 0.886 4 5 .386 1500m 0.984 0.848 0.488 200m(s) -0. . .288 -0. . 6 i 04 o . . . .Marâthôll(PC:.0833 0. rn=2.085 6..240 -0.828 100m(s) 200m(s) 400m(s) 800m 1500m 3000m Marathon 3. .480 100m(s) -0.479 Variance % Var 6. .c .244 -0....(.976 0.. .3530 0. .919 3.972 800m 1500m 3000m Marathon Variance % Var o .#3/ .~i. .255 Marathon 0. .921 0. . -2 .2032 0.... .035 800m 0.280 0.445 Plot of 10011(5). '#1/0 .592? 0. 024 BOOm 1500m 3000m Marathon 0. . .. ..449 0.481 1. 400m) with the longer events (800m. *":u. #11 (Cook Islands) and #31 (North Korea) are outliers.972 0. 1500m. 3000m. .369 -0..003 ScorePJol of 100m(s). -1 -2 -2 -1 o 1firs Fad 2 3 4 The results from the two solution methods are very similar.432 Factor Score Coefficients 6.728 0. .229 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factorl Marathon 0.237 200m(s) -0.::. The two factor solution accounts for 89%-92% (depending on solution method) of the total variance.107 0..077 0..595 0. for both factors. Note.. ~ . All the running events load highly on this factor.984 0.84B 0. 1° . 200m.I. Using the unrotated loadings.454 Communality 0. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance"-since the running events 800m through the marathon load highly on this factor.. . .690 Variance 3.1B06 100m(s) 200m(s) 400m(s) 800m 150 Om 3000m % Var 0. .lS7 0. the remaining running events in each case have moderately large loadings on the factor. .395 .879 0.. Maràthon(JiL~.#31 2 . the first factor might be identified as a "running excellence" factor.019 400m(s) -0. This bipolar factor might be called a "running speed-running endurance" factor.317 -0.455 0.976 0. #'1 .0225 0.886 Variable Factor1 Fat:tor2 100m(s) -0.025 -0. The second factor appears to contrast the shorter running events (100m.. marathon).662 3.036 0.. m=2) 3 .2032 0. The second factor might be classified as a "running speed" factor.906 0. The plots of the factor scores indicate that observations #46 (Samoa).B56 0. l 1 = l .915 0.772 0. 08'5 o .110 0.110 0.81822 0.36399 % Var 0.800m/s.148 variable lOOmIs 200m/s 400m/s 800m/s 1500m/s 3000m/s Marml s Variance 0. 400m/s.1184S78 0. Covariances: 100m/s.471 200m/s -0.1765843 0.116 0.025 0. The results for a m = 2 factor analysis of S using the principal component method are shown below. sse -0.81822 0.363 lOOmIs -0.1377889 0.1083164 0.0735228 0.Q .829 0.287 0.45423 0. Since all the running event variables are now on a commensurate measurement scale.148 variable 10 Oml s 200m/s 400m/s 800m/s 1500m/s 3000m/s Marrl s Variance 0.0822198 0.412 Factor Score Coefficients Factor2 variable Factor1 -0.926 0. 3000m/s.0749249 0. 33S 0.066 0.128 0. A factor analysis of R follows.0810999 Marml s 0.306 -0.0933103 0.066 0.168 0.0864542 0.0960189 0.603 400m/s 800ml s 1500ml s 3000m/s Marmls 0. 200m/s.116 0.0956063 0.542 . 28'0 -0 .1018807 0.0954430 0.29 The covariance matrix S for the running events measured in meters/second is given below.1465-604 .514 0.171 -0. it is likely a factor analysis of S wil produce nearly the same results as a factor analysis of the correlation matrix R.104 0.08607 % Var 0.73215 0.1437148 0.222 -0.0921422 0.1138699 0.0943056 Marml s Principal Component Factor Analysis of S (m = 2) unrotated Factor Loadings and communalities communality 0.128 0.1667141 lOOmIs 200m/s 400m/s 800m/s 1500m/s 3000m/s 0.097 0.168 0. Marmls 3000m/s 1500ml s 800m/s 400ml s 200m/s 100ml s Marml s 0.0809409 0.0997547 0.0966724 0.926 Rotated Factor Loadings and Communalities Varimax Rotation communality 0.1054364 0.083 0.1146714 0.0650640 0. 1500m/s.230 9.083 0.0905383 0.1238405 0. 675 0.776 0.816 0. 3000m/s. All the running events have roughly the factor.906 0.5 ¡. The second factor appears to contrast the shorter running events (100m. 400m/s. The correlation matrix R is shown below along with the scree plot.1 0. 200m. 800m/s.729 0. The two factor this solution accounts for 93% of the varance. the remaining running events in each case have moderate and roughly equal loadings on the factor. marathon).. The second factor might be classified as a "running speed" factor.:lI 0.0 1 2 3 4 5 component Number 6 7 .866 0.854 Scree Plot of lOOmIs. 3000m.660 200m/s 400m/s 800m/s 1500ml s 3000m/s Is Marr 2 OOml s 400m/s 800m/s 1500ml s 3000ml s 0. ..797 0. the first factor might be identified as a "running same size loadings on excellence" factor. II ! D. Marm/s lOOmIs 0. Correlations: 100m/s.731 0.i: 0.852 0.875 0.804 0.8 fl. for both factors.806 0.2 0.231 Using the unrotated loadings.824 0.7 D.672 0. 400m) with the longer events (800m. Marm/s (Correlation Matñx) 0.938 0. A two factor solution seems waranted. Note. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon have higher loadings on this factor. This bipolar factor might be called a "running speed-running endurance" factor.906 0. 1500m. 1500m/s.694 0. 200m/s.3 0.741 0.6 .972 0. 4799 0. -1 .358 0.4799 0.232 Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Communal i ty Vari. 0'. .484 0.025 0. -2 -4 -3 -2 -1 Firs Factr .248 0..455 0.914 0.489 lOOmIs -0.960 0. .445 Factor Score Coefficients Variable Factor1 Factor2 -0.484 20 Oml s -0. 'l .252 -0. . .911 0.926 0.418 0. .886 0. :t.3675 0. .941 0.8323 0.093 Rotated Factor Loadings and Communalities Varima Rotation Variable Factor1 10 Oml s 20 Oml s 400m/s 80 Oml s l500m/s 3000m/s Marml s Variance % Var Communality 0.941 0.960 0.914 0. .947 0.ance % Var 5...771 0. .Marm/s (PC. I) .947 0.932 0.436 0. . . o ~q!" l 1 11 .tíifl'ØOm/s. . . i 2 .#31 ..(j.able 0..243 -0. . . .911 0. .. '8 0 . .839 0.249 0.i \ 2 . ai o.833 6. .142 0.926 3.293 ScoreP1.6477 0.m=2) 3 .932 0.481 6. .265 -0. 0 0 .499 400m/s 800m/s 15 OOml s 3000m/s Marr/ s 0. . .:: 'ii .. 0 .1125 0. .875 3.871 0..875 lOOmIs 20 Oml s 40 Oml s 800m/s 1500m/s 3000m/s Is Marr Vari. . ".400 0. 850 Variable Factor1 0. ". 2 " " ... . .836 0.073 10 Oml s -0..122 2 OOml s -0.441 0. .041 80 Oml s 1500m/s 3000m/s MannI s -0.5716 0. . " .233 Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and communalities Communa1i ty Variable 0. o lI -l .. Marmls (MLE. " " .765 1500ml s 3 OOOml s MannI s 3.. 1 ~ l. . " .836 0. .:i ..914 0. .. ~! '.521 -1. .~ . A .2560 0.106 -0.971 0.048 40 Oml s 0. . . .:c: .6844 0..~ ..850 0.379 0.0165 0.412 rO..039 0. m=2) . .ii 3 .2560 0.2395 0.. .518 0. . .896 0.896 0.431 Factor Score Coefficients Variable Factor1 Factor2 -0.859 0.124 0.167 -0.. 100ml s 200m/s 400m/s 800m/s 26 0. -2 -3 -4 -3 -2 -1 Firs Fac:or o 1 2 .. .984 0. .984 0. .983 0. .949 0. .737 lOOmIs 20 Oml s 400ml s 80 Oml s 1500m/s 3000m/s MannI s % Var 6.812 Variance Rotated Factor Loadings and communalities Varimax Rotation Communality 0.971 0.983 0.435 0. . . .082 5. . . . .017 scOl'ePlotøf100mls. . . ..894 0.737 0..014 0. *q.463 Variance % Var 6.894 3. Using the unrotated loadings.28. This bipolar factor might be called a "running speed-running endurance" factor. If the correlation matrix R is factor analyzed. the first factor might be identified as a "running excellence" factor. 1500m. women's track records when time is measured in meters per second are very much the same as the results for the m = 2 factor analysis of R presented in Exercise 9. #11 (Cook Islands) and #31 (North Korea) are outliers. for both factors. The two factor solution accounts for 89%-93% (depending on solution method) of the total variance.28 are quite different from the meters/second scale. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon load highly on this factor. Note. All the running events load highly on this factor.28 or in meters per second. 200m. It does make a difference if the covariance matrix S is factor analyzed. . 3000m. it makes little difference whether running event time is measured in The results of the m = 2 factor analysis of seconds (or minutes) as in Exercise 9.234 The results from the two solution methods are very similar and very similar to the principal component factor analysis of the covariance matrix S. the remaining running events in each case have moderately large loadings on the factor. since the measurement scales in Exercise 9. The second factor appears to contrast the shorter running events (100m. The second factor might be classified as a "running speed" factor. The plots of the factor scores indicate that observations #46 (Samoa). marathon). 400m) with the longer events (800m. Using the unrotated loadings.983 1.406 -1.002 0.110 0.025720 0. It is unlikely that a factor analysis wil be useful.013 85.849941 9.819569 14.615 1.158 -0.135356 2.841 0.097 0.023034 0.192564 0.130 71. 200m. 5000m.049 -0.002751 0.057938 0.312 -0.034 -0.537 -0.066193 0.996 Rotated Factor Loadings and Communalities Varimax Rotation Variable 100m 200m 400m 800m 1500m 5000m 10 i OOOm Mara thon Variance % Var Factor1 Factor2 Communality -0.015 0. 10.378905 Principal Component Factor Analysis of S (m = 2) Unrotated Factor Loadings and Communalities Variable Factor1 Factor2 Communality 0.996 0. Using the rotated loadings.265613 1.688936 3.178857 0.044 400m 0.643 80.401 -0. the first factor explains about 98% of the variance and the largest factor loading is associated with the marathon.578875 1.722 5000m 10.392 ~~J 0.168473 0. Covariances: 100m.074257 0.529 14.537 2.342538 80.141 0. -0.649 0.573 0.256022 0.111044 0.124575 0.30 The covariance matrix S (see below) is dominated by the marathon since the marathon times are given in minutes.134 -0.019 0.430489 .019 0. 400m.lb4 85.083 -0.022929 0.049 0.234 0. explains relatively little of the remaining variance and can be ignored.979 1.832 0.130 84. The first factor might be labeled a "running endurance" factor but this factor provides us with little insight into the nature of the running events.000m Mara thon Variance % Var ~ ~.179 2.002 -0.270 200m 2.031 -0. the principal component solution with m = 2 is given below.152 100m 0. 340139 0.034348 0.666818 0.119 '0.643 80.008264 0. It is better to factor analyze the correlation matrx R in this case.229701 1.033 0.043 800m 0.262 0. however.853486 1. the first factor explains about 83% of the varance and again the largest loading is associated with the marathon.223 0.234 2. The second factor.069956 0.300903 0.105833 0.007131 0.317734 0.107 0.649 0.507 0.033 1500m 0.048973 0.235 9.541038 10 i OOOm Mara thon 2.399 -0.125 0. 800m. Marathon 5000m 1500m 800m 400m 200m 100m 100m 200m 400m 800m 1500m 500 Om 10 i OOOm Marathon 10 i OOOm Marathon 0. 262533 6. 1500m.OOOm. with either unrotated or rotated loadings.034 0. 913 0.878 0.712 0.861 0. .896 0.376 0.766 0.7033 0.768 0.918 1 .761 0. 400m.896 0.847 0. 0. 800m.713 0. 200m. 1S00m.780 0.000m 0.804 0. Marathon 200m 400m 800m lS00m SOOOm 10..914 0.954 100m 0.861 0.276 Communality -0.236 -0.rNumbe Principal Component Factor Analysis of R (m =2) Unrotated Factor Loadings and Communalities Variable Fa to 100m 200m 400m 800m lS00m SOOOm 10.795 0.080 7.838 Fa tor2 0.OOOm.843 0. 10.123 -0.267 -0.3417 0.748 0. Correlations: 100m.309.878 0.766 0.772 0.988 0.915 0.715 0.236 The correlation matrix Rfor the men's track records follows.948 0..917 0.972 0.807 0.937 0.6384 0.000m Marathon The scree plot below suggests at most a m = 2 factor solution.917 6. SOOOm.920 0.676 200m 400m 800m lS00m SOOOm 10.423 0.797 0.000m Mara thon Variance % Var 0.944 0.944 0.947 0.969 0.901 0.721 0.840 0. S Fà.957 0.740 0.845 0. 866 0.780 0.237 Rotated Factor Loadings and Communalities Varimax Rotation Communality Variable 0.586 -0.972 0.283 200m 0.183 400m 800m 150 Om 5000m 10.944 0.349 0.814 0.n.988 0.937 100m 200m 400m 800m 1500m 5000m 10.991 0. .000m Marathon 0.788 0.380 0. .Scareeløt~f'100ml . .2249 0.533 -0.. .413 -0. .963 0.4134 0.ltn=2) .053 -0.989 0. Marathoia:lPÇ.875 0.335 100m 0.277 0.920 0.403 4. .. o ~ -3 o -1 -2 3 1 Firs Fadr Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Fac 100m 200m 400m 800m 1S00m SOOOm 10.810 0.1432 0.233 0.3417 0. -. ..949 6..913 0.772 0.000m Marathon Variance % Var r Communali ty 0..176 0.840 0.0..000m Mara thon Variance % Var 7. .7299 0. .224 -0.969 0..912 0. .004 -0..1168 0.927 0. #-" .893 .515 Factor Score Coefficients Variable Factor1 Factor2 0.847 0.866 0.420 .802 0. it tj 10 .989 0.091 7.186 -0. ..918 3. . . -2 .nì::l) 3 .256 -0.. 400m. iF II .493 0.Seore9Jpt..558 0. fi 'I1l 'i ..OOOm and marathon are in the other group (negative loadings).761 0. ..994 -0. . 5000m. the 100m. 800m and 1 SOOm events are in one group (positive loadings) and the 5000. L . .209 -0.000m Marathon 7. The second factor appears to contrast the shorter running events with the longer events although the nature of the contrast is a bit different for the two methods.238 Rotated Factor Loadings and Communalities Varimax Rotation Variable Communal i ty 0. . the 100m.'ll..051 .989 0... .125 100m 0.490 200m '0. For the maximum likelihood method.of1..1986 % Var 0.893 Variance 3..772 0.423 -0.089 0.400 Factor Score Coefficients Variable Factorl Factor2 0. 1.All the running events load highly on this factor. For the principal component method. . this bipolar factor might be called a .1432 0.000m and marathon events have negative loadings. 1O. IS00m. 1 .988 0. the first factor might be identified as a "running excellence" factor.788 0.011 800m 1500m 5000m 10.866 0. Using the unrotated loadings.9446 3.104 -0.000m Marathon 0. Nevertheless.'ØOJDI"" Marathon (ML£.. 200m and 400m events have positive loadings and the 800m. 200m. .. '" e. 1O.056 -0.963 0. .054 -0.912 100m 200m 400m 800m 150 Om 5000m 10.003 0. .866 0. . -3 -2 ~1 o 1 2 3 Firs Factr The results from the two solution methods are very similar.1 .044 400m 0.. 28.296 0. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon load highly on this factor.060 0.0688217 0.0942894 0.844 0.0729140 0.0425034 0. lOOmIs 2 OOml s 400m/s 800m/s 1500m/s 5000m/s 10.0434979 0.0432334 0.239 "running speed-running endurance" factor.49405 0.54027 0. 800m/s.0572512 0. OOOml s Mara thonr/ s 5000m/s 10. Note.301 0.094 0. The factor analysis of the men's track records is very much the same as that for the women's track records in Exercise 9.171 0. it is likely a factor analysis of S wil produce nearly the same results as a factor analysis of the correlation matrix R.293 0.0587731 0.923 0.0482772 0.0905819 Principal Component Factor Analysis of S (m = 2) Unrotated Factor Loadings and Communalities Variable lOOmIs 2 OOml s 400m/s 800m/s 1500ml s 5000m/s 10.0428221 0.0937357 0.0736518 . 10.0617664 0. The plots of the factor scores indicate that observations #46 (Samoa) and #11 (Cook Islands) are outliers.219 0. SOOOm/s.0448325 0.195 ' 0. The two factor solution accounts for 89%-92% (depending on solution method) of the total varance. OOOml s Marathonr/s 0.223 0..OOOm/s.0314951 0.0469252 0.093 0.0979276 0..0541911 0. 200m/s. for both factors.061 0.076'6388 5000ml s 10.0431256 200m/s 400m/s 800m/s 1500m/s 0. Since all the running event variables are now on a commensurate measurement scale.0535265 0.0562945 0.0648452 0.092 0..0558678 0.0599354 0.0523058 0.0537207 0.038 0.0745719 0.0468840 0. 9. Th~ second factor might be classified as a "running speed" factor.0909952 0.0959398 0.04622 0. OOOml s Marathonr/s Variance % Var Fact 0.0434632 0.066 0.31 The covariance matrix S for the running events measured in meters/second is given below. 1S00m/s. the remaining running events in each case have moderately large loadings on the factor.038 0. The results for a m = 2 factor analysis of S using the principal component method are shown below.079 Communali ty 0. OOOml s Mara thonrl s lOOmIs 0.256 0. A factor analysis of R follows. Covariances: 100m/s.05715£0 0.0553945 0.0567342 0. 400m/s. the 800m run has about equal (in absolute value) loadings on both factors and the remaining running events in each 'Case have moderate and roughly equal loadings on the factor.094 0.415 0.526 -0. .377 -0. 10.061 0.254 -0.379 0.489 0.078 0.116 0. This bipolar factor might be called a "running speed-running endurance" factor. 200m. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 1500m through the marathon have higher loadings on this factor. OOOm/ s Mara thonr/ s Factor1 Factor2 -0.275 0.060 0.334 Using the unrotated loadings. the first factor might be identified as a "running excellence" factor. A two factor solution seems warranted.000.022 0.32860 0.561 -0.066 0.197 -0.048 -0. All the running events have roughly the same size loadings on this factor.283 0.038 0. The second factor appears to contrast the shorter running events (100m. marathon).000m/s Marathonr/s Variance % Var Communality 0. 800m) with the longer events (1500m.273 0. The correlation matrix R is shown next along with the scree plot.287 -0.105 0. Note.562 0.21168 0. The two factor solution accounts for 92% of the variance.080 0. 400m.159 0.923 Factor Score Coefficients Variable 100m/s 2 OOm/ s 400m/s 800m/s lS00m/s 5000m/ s 10.038 0.184 0.362 o . 5000m.092 0.151 . The second factor might be classified as a "running speed" factor.093 0. 12 0.54027 0.240 0.240 Rotated Factor Loadings and Communalities varimax Rotation Factor1 Variable 100m/s 200m/s 400m/s 800m/ s 1S00m/ s 5 OOOm/ s 10. 947 ScteePlot of lOOmIs.755 0.3023 0. Maråthonl11 (C()rrêlatiol1f1atr~) 7 5 l 4 'ii- i: 3.800 0.784 0.913 0.000m/s Marathonm/ s Variance % Var 6.754 0.691 0.661 0.700 0.726 0.929 10 Om/ s 20 Om/ s 40 Om/ s 80 Om/ s 1500m/ s 5000m/ s 10.6765 0.828 0. 800mls. 400m/s.794 0.6258 0.697 0..968 0.852 0.241 Correlations: 100m/s.OOOm/s Marathonm/s 0.965 0.895 0.913 7 8 .758 0. 3 ¡¡ 2 1 o 1 2 3 4 5 6 Factr Number Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Communali ty 0.914 0. 10.732 0.778 0.085 7. 200m/s.841 0.834 0.939 0.836 0. 1S00m/s.744 0.916 0. on 20 Om/ s 400m/s 800m/s 1500m/ s 0.986 0.909 0. OOOm/ s Marathonm/s 0.935 10.833 0.899 0.872 5000m/s 10 i OOOm/s 100m/s 200m/s 400m/s 800m/ s 1500m/ s 5000m/s 10. SOOOm/s.745 0. .706 0..760 0.OOOm/s. .399 7.805 0. "" .841 0..178 0.270 -0.895 0. ..913 Factor Score Coefficients Factorl Factor2 Variable -0..056 0.186 10 Om/ s 20 Om/ s 40 Om/ s 800m/s 1500m/s 5000m/s 10.968 0.3023 0..834 0.882 0.522 -0.896 Variance 4.004 0.405 -0.369 0... .423 0.929 0.371 0.514 10.913 0.566 -0.-. . . .#'1(.939 0. . .236 0. ..000m/s Marathonm/s 0.418 -0...m=2J ' ..341 0.315 -0.000m/s Marathonm/s % Var 3.466 100m/ s 200m/s 400m/ s 0. 1 2 .1907 0.178 0.242 Rotated Factor Loadings and Communalities Varima Rotation Variable Factorl Communality 0.215 0. .914 0.- -2 ~1 0 Fiis Fact .965 0.1116 0.74i 800m/s l500m/s 5000m/s 0.261 ~PC. Marathonm/s (MLE.989 0. . OOOm/ s Marathonm/s 0.. I .. . .089 0.. . . . .986 0.957 0. . .. .986 0.865 0. #t(L -2 .886 3. .012 0. . .111 -0..268 -0.986 0.859 0. .865 0.859 0.758 0. .711 0.957 0.985 0.777 0.797 0. .899 100m/ s 200m/s 400m/ s 800m/ s 1500m/s 5000m/ s 10. l 1 .951 -0.570 0.758 0. 2 . .886 0.000m/s Mara thonm/ s 0:773 0.08ti5 0.. .000m/s Marathonm/s % Var 7.985 0.094 6. . .055 -0. . .899 Factor1 Variable 100m/s 200m/s 400m/s 800m/s 1500m/s 5000m/ s 10. . .388 -0..492 Variance Factor Score Coefficients Factor1 Factor2 0.3380 0.0865 0..942 % Var 7. . . -.806 0.457 0.870 0....219 -0.1540 0. .7485 0.394 3. .. 00 .928 0.777 0.055 -0.243 Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Communality 0. m=2) 3 . -2 -1 0 Firs factor 1 i .9325 0. *11 -3 -3 . . .047 Score P..f " D J .lotof 100m's.792 Variance Rotated Factor Loadings and Communalities Varima Rotation Variable Communality 0. .128 0.008 Variable lOOmIs 200m/s 400m/ s 800m/ s 1500m/ s 5000m/s 10.046 0. The second factor appears to contrast the shorter running events with the longer events although there is some difference in the groupings depending on the solution method. The second factor might be classified as a "running speed" factor.30 are quite different from the meters/second scale. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon load highly on this factor. If the correlation matrix R is factor analyzed.244 The results from the two solution methods are very similar and very similar to the principal component factor analysis of the covariance matrix S. The 800m and 1500m runs are in the longer group for the principal component method and in the shorter group for the maximum likelihood method. for both factors. Note. Using the unrotated loadings. the first factor might be identified as a "running excellence" factor. The results of the m = 2 factor analysis of men's track records when time is measured in meters per second are very much the same as the results for the m = 2 factor analysis of R presented in Exercise 9. the remaining running events in each case have moderately large loadings on the factor. this bipolar factor might be called a "running speed-running endurance" factor. it makes little difference whether running event time is measured in seconds (or minutes) as in Exercise 9. The two factor solution accounts for 89%-91 % (depending on solution method) of the total variance. It does make a difference if the covariance matrix S is factor analyzed.30. The plots of the factor scores indicate that observations #46 (Samoa) and #11 (Cook Islands) are outliers. Nevertheless. .30 or in meters per second. All the running events load highly on this factor. since the measurement scales in Exercise 9. 15478 BkFat 0. 28442 X4 0.0000 o . 39838 o . 0695 a . 75367 X5 0.9378 o .0000 1 .33514 o .13508 0.32.60974 -0.11083 X8 0. 0741 o . 9996 5 6 7 3. 65725 a . 08393 o .83816 0. !J V"ü . 8082 a . 00009 Varimax Rotated Factor Pattern F ACTORl F ACTOR2 FACTOR3 o . 32637 X3 0.27102 o . 29875 -0 .3163 a . 36809 -0.0000 a .0000 1.42460 X4 0.~045 2 . 62380 o . 33505 o . 40890 a . 4292 4869. 96506 F ACTOR2 F ACTOR3 a .37408 X6 0. 0002 o .88812 0. 00000 1 .44716 0.31943 0.36162 -0.0000 1 . 94363 Frame SaleHt SaleWt Initial Factor Method: Maximum Likelihood Factor Pattern X3 X4 X5 X6 X7 X8 X9 F ACTORl F ACTOR2 FACTOR3 o .4688 O. 00086 a .48170 X7 0.06913 X8 0 .50195 0.2456 2.0000 1 . Factor analysis of data on bulls Factor analysis using sample covariance matrix S Initial Factor Method: Principal Components Difference Proportion 20579. 23003 a .05273 F ACTOR3 0.16509 0.38394 -0. 43720 o . 26204 o .45576 X6 0. 8082 Cumulati ve 4874. 3948 a . 8475 O.01180 o . 00598 a .25711 -0.25026 X7 -0. 28992 SaleHt a . 83599 'SaleWt Varimax Rotated Factor Pattern FACTOR1 FACTOR2 X3 0 .50159 X5 0 . 33038 o .03120 a .66769 X9 a .245 9.55648 0. 0001 a .41219 0. 48777 X4 0 . 64446 a . YrHgt a . 00000 o . 94025 -0. 26667 19 ().85244 a . 18354 FtFrBody PrctFFB 0.21635 YrHgt FtF.1914 o . 39033 a .6126 15704. 87'634 BAS scaws the loadings obtained PrctFFB SaleHt SaleWt The scaling is Î.38532 -0 . 18382 Frame BkFat 0.42819 0. 00000 o . 39308 YrHgt FtFrBody PrctFFB Frame BkFat o .30219 YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt from a covarance matrx and then rotates the scaled loadings.6748 5 ..34428 0. 9998 o . 1129 0.46689 X9 -0. ()OO 4 3 2 1 Eigenvalue Factor Pattern FACTORl X3 a .85951 o .49074 o . 62342 a .51405 X~ 0 . 00000 0. 52282 -0.42166 X7 -0. 75340 0 . 25282 -0.25853 0.rBody O. 94883 X6 0.90600 X5 0. 50894 o . 36843 0./ ç: .94438 a . 23003 0 .25711 -0.5887 0.72177 X6 0. 0393 o .38772 o .15014 -0.3200 0.1059 0. 08393 O. 39838 o .27085 -0.246 Factor analysis using sample correlation matrix R 1 2 3 4 Eigenvalue 4. 25282 -0.1465 o .01180 -0.7836 0.94188 0.23541 -0.41219 0 . 39308 o .42819 o .38949 -0. 00000 a . 34932 0 .74194 SaleWt o .03120 Frame BkFat -0. 83599 SaleHt SaleWt Varimax Rotated Factor Pattern FACTORl FACTOR2 FACTOR3 X3 0. 00000 F ACTOR2 F ACTOR3 1 .27102 X8 0.11715 -0.87071 0.06532 o .15210 0. 85244 o .91334 X4 a .8856 0.25513 -0.21799 -0. 00000 FtFrBody Pn:tFFB 0.36843 o . 0265 o .etation of factiJl' from S.9458 o .18382 X7 -0. .4214 Initial Factor Method: Principal Components 6 7 0.5957 0. 52282 o .74140.93812 0.9723 a . 50159 0. 39692 YrHgt FtFrBody PrctFFB SaleHt Ini tial Factor Method: Maximum Likelihood Factor Pattern X3 X4 X5 X6 X7 X8 X9 FACTORl o .1910 0.82646 YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt Varimax Rotated Factor Pattern X3 X4 X5 X6 X7 X8 X9 F ACTOR1 FACTOR2 FACTOR3 0.37900 X8 0.33710.0602 5 O.51405 0.26505 0.41206 0. 00000 YrHgt 0.62380 o .88091 X7 -0 . 34428 o .44792 0 . Factor Pattern F ACTOR1 X3 a . 00598 0. 1858 o . 94025 o .43720 X9 O. 83700 X5 0.16509 X4 0. 28992 o . 0209 Cumulati ve 0.88812 0.12071.75340 0. 9933 o .0000 .21811 o .36162 -0 .83365 0.26667 0.03335 0.01382 -0.55648 X5 0.69440 o .94438 0.48930 -0.13094 0. 87634 YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt The interpretation of factors from R is different of the interpr.04948 0. 0471 Difference 2.25026 0. 36484 a . 0994 o .37460 o .0067 1 . 00894 o . 21635 X6 0.78354 o .54798 F ACTOR2 FACTOR3 -0 .28442 0.94883 0 .2356 Proportion 0.7797 0.85951 o .35794 -0 . 24262 0.5887 0.79502 Frame BkFat 0.91927 X9 0.05273 0 . .0 ....__... r ci-2 .............. ......~.......a...... ..~-_. ........... .......-......... '1 ...:.. - ..............: . - .._...... . ........ - ..... ......._. ~ ...:.......... .. ......... ci I I -2 -1 I I i 1 2 Factor scores for the first two factors using R and varimax rotated PC estimates of factor loadings "1 - 51 (O- N . .i...... ........ ...... o ..... ".... ........... . .._. ¡ "..". 0 :.... . o ............ . ...... ... ii i I i i ... o . ......__......n.....1 o 1 2 ............... ......... ...................... .... ........ .._.o . .247 Factor scores for the first two factors using S and varimax rotated PC estimates of factor loadings 51 (O50 N- ... we give am =3 factor solution. ~ lU Unrotated Factor Loadings and Communalities Variable indep supp benev conform leader Variance 2.163 0.s 0. 0. Correlations: indep. Consequently.670 0.439 Fàctor2 Fact~r3 l:9 .7559 0. An initial factor analysis with m = 2 confirms this conjecture..33 The correlation matrix R and the scree plot follow.248 9..561 -0.492 -0.864 .401 0.471 0.173 -0.943 0. 5 8 Q) -0.979 4.909 0. .0.333 Scte Plot ofindep.298 -0. These correlations and the scree plot suggest m = 2 factors is probably too few.819 0.018 -0.leader (¡. 1 3 2 Factr Number Principal Component Factor Analysis of R (m = 3) .151 Communality 0.327 -0.274 0.422 cr5~m.3207 0.. supp.256 1.187 supp benev conform 0. leader supp benev conform leader indep -0..009 -0. benev. The correlations are relatively modest.100 -0. 3682 0. conform.1966 % Var 0. 1211 0..262 Factor Score Coefficients Variable Factorl Factor2 Factor3 indep -0.000 1. -.909 0. . .3207 0.155 ~ -0. .. ...330 conform % Var Communali ty 0.979 0. .073 -0. .Ieader (PC. .. .. 3587 1.. ...532 CD indep supp benev conform leader Variance % Var and Communalities 1.670 0.000 -0.690 benev 0.008 . .362 -0.194 0. ... .379 (-OJ077 1.. ..41-a -0..111 Variance 1.545 leader 0.086 0. 5486 0..249 Rotated Factor Loadings and Communalities Varimax Rotation Variable Fact~ Factor2 Factor3 indep supp benev (=0.. .136 -0.312 1.240 0.147 supp 0. . 5591 0.000 0.943 0.272 4.U9" -0.000 -0.864 0.3 . I" .: . .589 1. .. . .824 0.indep... .372 -0. Score Plot of.119 -0.. .6506 0.752 -0. .i12 CO:890J ~1~ (~0. .. 0133 4. .. .1 0 Fii'l'actr -2 Maximum Likelihood Factor Analysis of R (m = 3) * NOTE * Heywood case Unrotated Factor Loadings Variable Factor3 Communality 1. " -. . .832 0.277 -0. .010 conform 0. ...129 0..203 . ..790 1. . m=3l 4 .127 -0..003 leader -0.310 0.081 O.819 0. .. .018 -0.971 0. . '\.. . . 3114 0. 211 Using the unrotated loadings and including moderate loadings of magnitudes .824 Factor Score Coefficients Variable Factor1 Factor2 Factor3 indep supp benev conform leader -0 .129 0. 024 -1. may be easier to interpret.5. 000 -1. The rotated loadings are consistent with one another for the two solution methods and.123 -0 . 069 -0.034 0. 000 O. 000 O.532 0.1211 0. although all the factors ar bipolar.000 -0.130 0. The second factor is essentially a "leadership" factor although if moderate loadings are included. this factor is a . 000 -0. 000 1.000 0.980) 0. 5842 % Var 0. The first factor is a contrast between Independence and the pair Benevolence and Conformity.243 1. 2170 0.~ ~ Gb. the arangement of relatively large loadings on each factor is quite different for the two solution methods.250 Rotated Factor Loadings and Communalities Varimax Rotation Variable indep supp benev conform ~ Factor2 Factor3 Communality IT 0. 016 0. the factors are all bipolar and appear to be difficult to interpret.992 0.219 -0.264 1.3199 0.317 1.9681 cg.0 8 -O. Moreover.589 1. 000 -0.213 Variance 1.515 ~ -0. 000 4. 081 0.4-.011 1.432) .098 leader -0. Perhaps this factor could be classifed as a "conforming-not conforming" factor.454) -0. BS BL EM SF EM 0. SF. 886636 0. EM.480272 .875 0. The factor scores for the first two factors are similar for the two solutions methods.34 A factor analysis of the paper property variables with either S or R suggests a m = 1 factor solution is reasonable.147318 0.984 0.975 BS o . while those with above average scores on Benevolence tend to be below average on this factor.987585 2.942 BS 0.987966 BS 1. Teenagers with above average scores on Leadership tend to be above average on this factor. if moderate loadings are used. Perhaps we could label this factor a "lead-follow" factor. The third factor is essentially a "support" factor although. 60 and 61 may be outliers. SF. the latter is difficult to interpret.434307 0.513359 SF 4.302871 EM SF EM 1.251 contrast between Leadership and Benevolence.140046 0. this factor is a contrast between Support and Conformity. the results for a m = 2 factor solution using both solution methods is also given. No outliers are immediately evident.914 SF 0.988 0. Covariances: BL. Plots of factor scores from the two factor model suggest that observations 58. All variables load highly on a single factor. BS BL BL 8. To our minds however. EM. again. 59.972056 Correlations: BL. The covariance matrix S and correlation matrix R follow along with a scree plot using R. 9. For completeness. 960 Factor Score Coefficients Variable Factor1 BL EM SF BS 0.258 0. This factor might be called a "paper properties index.664 1.684 8.878 0.285 0.960 BL EM SF BS Variance 3.468 11.042 0.248 0.905 0.988 0. load highly on this factor.101 0.8395 % Var 0. Note: There is no factor rotation with one factor.188 0.252 Principal Component Factor Analysis of S (m = 1) Unrotated Factor Loadings and Communalities Variable Factorl Communality BL EM SF BS Variance % Var 2.988 Factor Score Coefficients Variable Factor1 BL EM SF BS 0.991 0. All varables load highly and about equally on this factor.449 0. All varables.259 0.042 The first factor explains 99% of the total varance." .255 The first factor explains 96% of the variance.960 3.8395 0. 295 0. Principal Component Factor Analysis of R (m = 1) Unrotated Factor Loadings and Communalities Variable Communality 0.984 0. given their measurement scales. 295 11.734 0.441 2. 852 0.980 BS Variance % Var 3.7784 0.035 3.642 0.960 0.307 0.191 0.988.493 0.008 0.991 -0. 821 -1.993 0.868 0.996 -0. 013 1 BS .999 0. The first factor explains about 95% of the varance and all varables load highly and about equally on this factor.8£8 1 .000 0.427 3.761 0.7082 0.996 SF 0.975 0.817 0.9798 0.984 Communality 1.9 2 BL 0.000 0.235 EM 0.522 0.945 3.835 0.361 0.271 7 1.568 Factor Score Coefficients Variable Factor1 Factor2 -0.995 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factor1 Factor2 Communality BL EM SF BS Variance % Var 0.991 0.232 SF -0.000 The results are similar to the results for the principal component method.968 0.993 -0.945 Factor Score Coefficients Variable Factor1 BL EM SF BS 1. Again.999 0.000 0.8395 0. the factor might be called a "paper properties index.951 EM 0." Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Factor1 Factor2 Communality 0.571 0.1403 0.650 BL 1.7784 0.128 0.098 0.9798 0.995 0.000 0.996 2. 3.253 Maximum Likelihood Factor Analysis ofR (m = 1) * NOTE * Heywood case Unrotated Factor Loadings and Communalities Variable Fac BL EM SF BS Variance % Var 1.000 0.914 0. .9700 0..428 Factor Score Coefficients Variable Factor1 BL EM SF BS -0. Since the unrotated loadings provide a clear interpretation of the first factor there is no need to consider the rotated loadings. . 078 1.986 0.7128 Ù ..2800 0. . .922 and Communalities Factor2 Communality 0. e.809 0. .254 . ".00 1. #"-0 . 000 -1. lr61 .070 3.986 0. 000 0.875 0..000 -1..I . .000 1.2572 0..870 2.576 0.641 0. . 795 -0.. . the second factor explains very little of the variance beyond that of the first factor and is not needed. .975 1.984 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factor1 BL EM SF BS Variance % Var 0.. 000 0. . 000 0.. 016 -0.759 Factor2 -0. Maximum Likelihood Factor Analysis of R (m = 2) * NOTE * Heywood case Unrotated Factor Loadings Variable Factor BL EM SF BS Variance % Var 0.485 0.853 0.992 .#tõfJ .757 0. . ..000 1.103 0.185 0..523 0. #S9 .984 1.988 0.000 3.564 Factor2 Communality 0.. The potential outlers are evident in the plot of factor scores.492 1.9700 0.6900 0. -1 0 Firs FaCtor Using the unrotated loadings. 000 3.0.992 1. . FFF. ZST AFL LFF FFF LFF 0.39989 -0.784 0.35980 Correlations: AFL.785 . A m = 1 factor solution using R appears to be the best choice. ZST AFL LFF FFF ZST AFL LFF FFF ZST -3.63707 0.06227 3.21404 0. LFF. 9.34760 308. The same potential outlers are evident in the plot of factor scores. S8 and 59 may be outliers.25S The results are similar to the results for the principal component method. Covariances: AFL.906 FFF -0.711 ZST 0.00577 221.05161 -185. The covarance matrix S and correlation matrix R follow along with a scree plot using R. Since the unrotated loadings provide a clear interpretation of the first factor (paper properties index) there is no need to consider the rotated loadings.733 -0. FFF.40633 0. Plots of factor scores from the two factor model suggest that observations 60 and 61 and possibly observations 57.35 A factor analysis of the pulp fiber characteristic varables with Sand R for m = 1 and m = 2 factors is summarized below.00087 0. the first factor explains 92% of the total variance and the second factor explains very little of the remaining variance.793 -0. LFF. Using the unrotated loadings. Perhaps we could label this factor a "strength--uality" factor.047 175.860 455. all with positive loadings.860 Factor Score Coefficients Variable Factor1 AFL LFF FFF ZST 0.433 -0. LFF and ZST group.48 % Var 0. LFF (long fiber fraction) and ZST (zero span tensile strength) may all have to do with paper strengt while FF (fine fiber fraction) may have something to do with paper quality.001 AFL LFF FFF ZST variance 455. AFL (average fiber length).000 0.000 The first factor explains 86% of the total varance and represents a contrast between FF (with a negative loading) and the AFL.256 Principal Component Factor Analysis of S (m = 1) unrotated Factor Loadings and Communalities variable Communali ty 0.645 0.48 0. .858 0.573 279. 717 AFL LFF FFF ZST Variance' 3. we might label this bi polar factor a "strength-quality" factor. Again.894 0. Again. 090 0.841 AFL LFF FFF ZST Variance % Var 3.422 0.3577 0.614 0.273 The first factor explains 84% of the variance and the pattern of loadings is consistent with that of the m = 1 factor analysis of the covarance matrix S. we might label this bi polar factor a "strength-quality" factor.900 0.1245 % Var 0.278 -0. Maximum Likelihood Factor Analysis ofR (m = 1) Unrotated Factor Loadings and Communalities Variable Communali ty 0.394 -0.132 The first factor explains 78% of the variance and the pattern of loadings is consistent with that of the m = 1 factor analysis of the covariance matrix R using the principal component method. .839 Factor Score Coefficients Variable Factor1 AFL LFF FFF ZST 0.770 0.1245 0.839 3.261 0.877 0.870 0.781 3.781 Factor Score Coefficients Variable Factor1 AFL LFF FFF ZST 0.257 Principal Component Factor Analysis of R (m = 1) Unrotated Factor Loadings and çommunalities Variable Communality 0.3577 0.279 0. 7070 0. ~: ~ .422 Factor Score Coefficients Variable Factor1 Factor2 0. .087 3.. .927 1. .501 .. 50 0.504 Variance % Var 3.288 LFF FFF ZST 3.t1~'! .863 ZST 2. .429 1. .0176 0.258 Because the different measurement scales make the factor loadings obtained from the covariance matrix difficult to interpret..#'"1 .927 0. we continue with a factor analysis of the correlation matrix R with m = 2.082 ZST 0. . . *' sf ..3493 0. l.863 0. .3577 0.942 0.757 0. 6893 0..949 0.839 Variance % Var Rotated Factor Loadings and Communalities Varimax Rotation Communality 0.942 0.953 Variable AFL LFF FFF 0...359 0....*"'0 -4 -3 ~2 -1 Factor FirS 1 2 .. . Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and communalities Factor2 communality 0.949 0..613 AFL LFF FFF -0.256 variable ~ AFL .7070 0. 075 -0.-.. .953 0.696 0. . .876 0. .2 -1 o Firs Factr Examining the unrotated loadings for both solution methods...534 0. ... 01 ZST Variance % Var 2.944 0.~~' Communal i ty 0. ....879 1.336 AFL LFF FFF ZST 0.5146 0.259 Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Factor2 -0.943 0.. .¡". " .879 Rotated Factor Loadings and Communalities Varimax Rotation Variable F5a~~. -3 -2 .2796 0.070 3.944 0.. we see that the second the remaining variane... 197 0.943 0.2351 0..+7 1l~"i.049 -0. .. ~ . .076 m=2) "'''c.503 3.. ..292 Variable AFL LFF FFF ZST Variance % Var 3. 5023 0..922 0...876 0. this factor has factor explains little (about 7%-8%) of moderate to very small loadings on all the variables with the possible exception of ..752 0.-:-.376 Factor Score Coefficients Variable Factor1 Factor2 -0.752 AFL LFF FFF .033 0.":..: . .0124 0.5146 0.423 -1.205 -0.101 0.8 .809 Communality (-0: 38ij 0. . Also. 305 0.724 Cot ton 0. Maze.217 0.679 -0.031 0. However.. LFF and ZST. this factor appears to be a contrast between variable FF and the group of variables AF.081 0. Bull..063 0.389 0.357 0. 58 and 59 may be outliers. After m = 2 there is no shar elbow in the scree plot and the plot falls off almost linearly.054 Maze 0.727 -0.136 0. perhaps. Goats Bull Cattle Sorg Millet Maze Family DistRd Cotton DistRd -0. 61 and.730 0.197 0.520 0. 3. Potential choices for mare 2.260 variable FF.353 0.084 0. If retained. DistRd.623 0. the second factor looks much like the first factor for both solution methods. A one factor model appears be sufficient in this case.36 The correlation matrix R and the scree plot is shown below. We give the results for m = 4 but.028 0.560 0. 4 and 5.382 0. here is a case where a factor model is not paricularly well defined. this factor might be called a "fine fiber" of "quality" factor.336 -0.088 Bull Cattle 0. Cattle. Milet.484 Goats ScreeP. To summarize.443 0.506 0. Sorg. That is. to our minds. there seems to be no gain in understanding from adding a second factor to the modeL.175 0. observations 57. plots of the factor scores for m = 2 suggest observations 60. Correlations: Family.109 0.424 0. Cotton.765 0.383 0. 9.399 0. Using the rotated loadings.568 -0.Goats i 2 3 4 5 6 Factor Number 7 8 9 .404 0.lotofFamily" .071 Sorg Millet 0.022 0. 6476 0.211 1.706 0.320 -0.178 Goa ts % Var Rotated 1. · .494 Maze O.818 0.074 0.114 Goats SCOff: PlolõfFamily" mjGoats (PC.175 Maze -0.026 -0.301 o 564 -0.798 -0.614 0.879 0.078 -0.706 0.148 0.260 -0.963 0..171 -0.951í 8J11 0. . .112 0. 0.022 0.460 Variance 0.006 -0. .714' -0. .180 0.818 1.338 -0.3593 0. m=4.118 0.396 sorg Millet Bull Cattle 0.798 -0.614 -0.286 -0.811 0..199 Sorg -0.024 0. 724J ~~ 7.114 Factor Score Coefficients Factor4 Variable Factor1 Factor2 Factor3 o .8985 0.t.974 61. t:i.030 0.005 0.156 0.903 Family DistRd Cotton tt~~ -0.080 0. .165 0. 0 -0.029 0.108 Factor4 communality 0.3593 0.7840 0.164 -0.019 0.150 ~ ~~~ 0. 7 .092 0.907 0.013 0.329 0.842 -0.811 0:466 0.HlO -0.856 0.063 Cattle -0.110 0.~":.¡S' 1 Firs 'i .224 Bull 0.226 Factor2 -0.Oti3 -0.047 0.633 -0.204 0.125 0.042 0.197 Family -0. 0581 4.. 'l3'! .090 -0. ..~' Factor 2 3 4 .070 -0.001 -0.374 0.842 0. -lll.535 0.7 -#.309 1.247 -0.907 0.068 0.043 0.r .9205 0.ft :.076 0. '. 0291 0.851 -0. .210 0.863 i -0.582 0.102 and Communalities Factor Loadings Varimax Rotation Variable ~ Factor4 Communality 0. e.014 DistRd -0..1443 0.~6) 2.344 Cotton 0.986ì 0.158 0.851 .. .183 (0'.026 \I .246 -0. -1 o .115 -0.145 7.023 0. 28 0.261 Principal Component Factor Analysis of R (m = 4) F~ unrotated Factor Loadings and Communalities Variable Factor3 0.856 0.974 0.118 Family DistRd Cotton Maze Sorg Millet Bull Cattle Goa ts Variance % Var F~ ~ 0..697 Millet 0.629 EO. 151 64~ì Variable F~~~~ FamilY .246 1.9322 0.#t.120 0.075 0.404 -0. 426 0. .215 -0..649 -0. ( " .044 -0.071 r=i0~5 Q112 21) 0.782 -0.5841 0.141 Cattle -0.089 0.324 -0.229 0.962 0.052 -0.044 -0.096 0.869 -0.033 cotton 0.062 -0.247 -0. -.303 0.017 DistRd Cotton -0.162 -0.7340 0.034 Maze Sorg -0.995 Maze 0.896 0.558 fj~' rff Millet Bull Cattle -0. "7(" .013 FamilY -0. '" -.9824 0.010 -0.301 0.074 0.028 -0.980 0.161 -0. .. .9322 0.109 0.065 communalities i ty Factor3 Factor4 Communal 0.2098 0.361 -0.681 0.109 -rr.374 0.369 -0.4ti5 1.465 -0.073 5.362 -0.023 -0.:~ . 2850 0.649 -0.440 0.990 -0.4il 0.307 0. .009 0. 7047 0.249 Goa ts 2.162 -0.026 Bull 0. .131 -0.369 -0.148 0.113 0.002 -0.869 0.091 -0.082 5.262 Maximum Likelihood Factor Analysis ofR (m = 4) unrotated Factor Loadings and Communalities Factor3 Factor4 Communality 0.211 Maze Sorg Millet Bull Cattle 0.003 0.009 -0.002 0.025 0.081 0.782 -0...078 -0..185 0.15tì C=Ô.189 ~ Factor Loadings and Varirnax Rotation Variable .290 0.466 Goa ts Variance % Var 2.023 Sorg 0.746 0.001 DistRd -0.ti59 0.837 -0.268 E§:m ~: 0.009 Goa ts .915 fO.009 -0..064 DistRd Cotton a1 ~ 0.385 0.093 -0.605 FamilY 0.7 . -1 o 1 Firs Factr .7035 0.1.143 Factor Score Coefficients Factor3 Factor4 Variable Factor1 Factor2 0.331 Variance % Var Rotated 1.017 -0. l.003 Millet -1. ":Js .025 -0.606 0.6610 0.lf52.103 0.016 0.206 0.837 -0..189 0.990 0. '.166 0.962 0.659 0. ..370 0. The factor loading patterns are more alike for the two solution methods using the rotated loadings.) Variables Family. Milet and sorghum are grasses and may provide feed for goats. The rotated loadings on factor 4 for the two methods are quite different. (From R we see that DistRd is not correlated with any of the other varables. The interpretation is not clear in the maximum likelihood solution. Plots of factor scores for the first two factors indicated there are several potential outlers. If these observations are removed. DistRd does not load on any factor in the maximum likelihood solution. The fact that the two solution methods produce somewhat different results and explain quite different proportions of the total variation (82% for principal components. although factors 2 & 3 in the principal component solution appear to be reversed in the maximum likelihood solution. Consequently. and Bullocks load highly on the first factor. Cotton. it appears to define factor 4 in the principal component solution. Growing cotton and maze is labor intensive and bullocks are helpfuL. Notice that DistRd does not load on any factor in the maximum likelihood solution. the second (or third in the case of the principal component solution) factor might be called a "family farm-grass crop" factor. Milet and Goats load highly on the second factor (maximum likelihood solution) or the third factor (principal component solution). The first factor might be called a "family far-row crop" factor. The rotated loadings are considerably different on the fourth factor. the results could change. 66% for maximum likelihood) reinforces the notion that a linear factor model is not paricularly well defined for this problem. Bullocks and Cattle load highly on this factor. The patterns for unrotated loadings on the first two factors are similar but not identicaL. . Perhaps this factor could be called a "livestock" factor. Maze.263 The two solution methods for m = 4 factors produce somewhat different results. This factor is clearly a "distance to the road" factor in the principal component solution. The variables Family. Again. The third factor in the maximum likelihood solution (second factor in the principal component solution) may have different interpretations depending on the solution method but in both cases. The patterns of loadings for the two solution methods on the third and fourth factors are quite different. Sorghum. -1/2 11 ~i2.VI = xfl) and Pi = . t-l/2lo ..23XF) + .l.32XP) .1xO)(X1J (l )= x(1) 2 2 Since !i t2~/2 = (1 OJ.49 b) Ui = . 'ui= el .20XP) + .Chapter 10 lO.Z * * a) Pi = .95)2 and p. iO.55.:S)2J which has eigenvalues ~2 = (.(0 a 2() ( . P2 = .-It_ t22 ~l tii _.iox~2) U2 = .2 = o. Thus (1) The normlized eigenvector. Thus Ui = x~l) .36X~1) Vi = ..30X~1) V2 = ..36Xfl) .1t 1/2x(1) 11 -= (0 a IJ1(.95.30X~1) .. VI = xf2). are ~1 · (:1 and ': · (~l. 0 Corr(UZ' V 2) = (-.0S5)(.S a) -1 -1 Q-1D 0-10 (.863Zi ( 2) + .70ti)(I.106ZZ Var(U2) = (_0..4)= .6)+(.14633 .05S)(.OSSzll) ( 2) Vz = -.OSS)2_2(.0 Var(V2) = 1.4) = 1.28919) .5)+( -.706)(-.2_.863) (1.OO09) equation is the same as that of The characteristic ii/2 12 2~ 21 ii/2 (see Example 10.17361. .).4S189 t11t12t22t21 =~ 11~~22~1 = .671ZP) +l.-.45189-).14633 .1) and consequently the eigenvalues are the same.03 = p~ . b) U2 = -.5457) P..677) (.u5S) (..677)2+(1.863) (.28919 .iO.5461 ).17361 = ).0005 .677)(. = ( À-.677i(I.3) + (. reject Ho but do not reject H~l).~6zll)-1.!.5 Value of Null hypothesi s Ho: t12 = E'12 = a test sta ti stic -136.79Zl1i V2 = -.70x~2) e = 45.03 cos 61 + VI = .74 H~l): pi *0.0IZl2) VI = 1.45Z~2) .5 R. .49 cos A* 10.9 a) Pl= . q=2.46 Sin a1 Sin a2 e2 + .65 A A Therefore.20xi2)+.30zl1)+.10.39) is not parti~ularly strong. p=2.7 a) 0(p(1 * =.03Zl1). A Ui = 1.9953) 1 3. 8444 i ( .07 Û1 = i.48 = 23.39 .78 P2 = .57.b) n = 140.(.02zl2)+1.= 4 radians A* d) PI = .8 c) 1 r2(1+p) 266 (X(1)+X(I) 1 2 ) i 2 (X (2) + X( 2) ) A* Pi = .10zl2)-.84 .9953) Upper 51 -Degrees of point of f Freedom distribution 4 9.(. Reading ability (summarized by Ui) does correlate with arithmetic ability (summrize~ by Vi) but the correlation (represented by PI = .p lp Pi 1 Ui = f2(l + p) VI = 10.5R.72 A 'i VI = . n-l.¥p+q+1l = 136. U2 = . pî=O =-136. p. P2 = . is significant.348.1 60z:2) .z(l) = (. Focusing attention on the first pair of canonical variables.1. Vi is dominated by Royal Dutch Shell.003Z~i) V i = -.s are the banks.11 Using the correlation matrix R and standardized variables.26 lz~2) V2 = -.*' 0. . R ZCI) = (. although relatively small.345z~2) Additional correlations: vi.Z(2) = (. p. The Z(l).982 .17 b) Û1 = i.348) between Û i and ~ suggests there is not much co-movement between the rates of return for the banks on one hand and the oil companies on the other..318 .10 a) 267 A* A* Pi = . = 0 is not rejected at the 5% leveL.142z:1) -. but not much.498). h' es 1standardized) Vi : i zl2) +Z~2) = a "pun.532) RVi.JO.174) Here H 0 : 1:12 c¡12) = 0 is rejected at the 5 % level and H cil) : Pi. Moreover.342 . Û i is dominated by Citibank. The first canonical correlation.977Z~2) U1 .266 . =.602Z12) -. The first canonical varables highly correlated with any of differentiate stocks in different industries with some.539z:I)+ i. 10.S (oil companies) and ~ is not the Zl)'s (banks). =.S are the oil companies.33. p. The second canonical correlation is not significant.209z~l) + . R" Z(2) = (. . Rvi. Û i is not highly correlated with any of the Z2).728z:2) + 1.142z~1) Vi = 1.1973 d( omic.185).002Z~l)-.079z~1) Û2 = i.913 . overlap. the canonical correlations and canonical variables follow. the Z2).093 . shment index" Punishment appears to be correlated with nonprimary homicides but not primary homi ci des.130 Ûi = -. The canonical correlation (.410z~1) +.nonprimary Zi( i) -. 29 . Variables Ui A Vi .. 77zI i ) +. family entertainment outside the home is positively associated with family income. P4=O 0 .oszI 2) +. annual fami ly income .eject Ho. "' P2 = .909.13 . Ho: PI *0. Ui Vi A X(2) . P3=O.19 P2 = a a t the 5: level.S1 of head of household d) - U1 is a measure of family entertainment outside the home.42 househol d 2. Va 1 ue of test statistic Null hypothesi s 1. 90Z~ 2) + .61 3.1 i.12- Pl = . annua 1 frequency . P2=.68 . Essentially." = P4=0 3.99 . 27Z~ 1 ) A VI = .256. a) 10.69..98 educa ti ona 1 1 eve 1 .68 1.094 Degrees Conclusion of freedom at a level 309.98 20 Reject H 78.89 . 636. ~4 -- .A* a) ID. . b) 268 P2 :I . of restaurant dining 2. P2 *0. annua 1 frequency of a ttendi n9 movi es age of head of .81 6 0 00 not r. ?3 = .35 . Reject Ho: A* P12 = 0 a t the 5: level but do Ui = not reject * H~I) = pi 4: 0. .63 12 Reject H 16. VI may be considered a measure of family MstatuS" which is domin- ated by family income. Ho: L12=P12=0 2 Ho: Pi *0.P1 = .19Z~ 2) c) Sample Correla tions Vari ables xU) Variables - Be tween Original Variables and Canonical A . 30J i~~J' r:~ -::: -::: -:.03) G~J' G::: -::: Z(2) 2 . s harder to interpret.98 -.46 . Vi . z~2) and z~2).269 Z(1) i Z(1 ) 2 A . zll) and z¡i)) and "bad" aspects (Z3 (l! Z4 (1) ). It appears to measure the quality of the flour as represented by z12).55 z(l ) 3 Z(1) 4 Z(1) 5 Z(2) 1 .18 Z(2) 3 Z(2) 4 A b) U1 appears to measure qual i ty of wheat as a "contrast" between "good" aspects (Zl1). .: . 2851 0.794353012 HR 0. However.5395.5655 0. 4225 0.6190103052 V1 V2 Raw Canonical Coetticients tor the Market measures ot protitability Q 1.5299 Market measures can be well explained by its canonical variate 'C. In fact.0000 0. Correlations Between the Accounting measures ot protitabili ty and Their Canonical Variables U1 U2 BRA 0.. 6946 Canonical Variables Canonical R-Squared 0.5041 2 O.500804367 REV -0.0968 BR 0.7749354333 -4.538822808 RRA 2. The second pair does not correlate well with their respective components.2711 HR 0 .0263 0..6471230054 RR -1.5665 0.5626 RRA 0 .383659039 1.14 a) pi = 0.8702 0. b) Standardized Variance ot the Accounting measures ot protitability Explained by Their elm The epposi te Canonical Variables Cumulative Proportion Proportion 1 0.7228316516 -0.2851 0.494697741 1. .3114 Standardized Variance ot the Market measures ot protitability Explained by Their eim The epposi te Canonical Variables Canonical Variables Cumulative Canonical Cumulative Proportion Proportion R-Squared Proportion Proportion 1 0.0378 0.6041 0. And the sample canonical variates are U1 U2 Raw Canonical Coetticients tor the Accounting measures ot protitability BRA -0.1298 1.7520.8110 0.2910 0. accounting measures on equity have weak correlation with Ûi.270 10. accounting meaures cannot be well explained.4921 0. pi = 0.032933813 2. 0906 o .4921 2 0.2910 CWlulati ve Proportion Proportion 0.2133051339 -0. from the correlation between measures and canonical variates.9655018549 RRE 0. Ví is highly correlated with both of its components.431692979 2.38346956 RR -1 .8702 0.OOag .7184 0.3930601511 -2.S60S .8298904995 U1 is most highly correlated with RRA and HRA and also HRS and RRS. 0036016621 -0.006663216 12 -0.007828962 Standardized Variance ot the dynamic measure Explained by Their Olm Canonical Variables Proportion Cumulative Canonical Proportion R-Squared Proportion 0. 0046 Cumulative Propor'tion Proportion o .7367 0. 8002 o .9886 0.8736 0.0013448038 0.6741 -0. dynamic meaures can be well explained by its canonical variate Vi.4866 10.008471036 14 0. .1508 REV 0. 8003 Static meaures can be well explained by its canonical variate ill' Also.0399 0.0681.8840 0.0046 o .09S9 RR 0.3814 Correlations Betveen the Market measures ot pro~itability and Their Canonical Variables V1 V2 Q 0.271 RRE 0.9601 0. 0002 o .1160 Standardized Variance ot Their Olm Cumative 2 Proportion the static measures Explained by The Opposite ~anonical Variables Canonical Variables 1 The Opposite Canoni~al Variables Cumulative Proportion Proportion 0.7367 1 .8334 0.8334 o . 0006 o .0077029513 Rav Canonical Coetticients tor the static measures 13 0.8840 2 0.15 pi = 0.0000 0 . And the sample canonical variates are U1 U2 V1 V2 Rav Canonical Codticients tor the dynamc measure X1 0. p. = 0.0018933921 -0.7373 1 0.0000 Canonical R-Squared 0.9129.9601 1.7761 0. 8002 o .000696736 0. 272 10. The large sample tests -en .05) = 5.16 From the computer output below.50 ~ X~(. the first two canonical correlations are ßi = 0.015752 Canonical Correlation Analysis Raw Canonical Coefficients for the ~lucose and Insulin GLUCOSE 0.p*D) ~ X.125508.009317525 I NSULRES 0 .12~675138 .1 .p*D) ~ XlP-lXq-il05) or 1 -(46 ..0"08667216 Raw Canonical Coefficients for the Weight and Fasting WEIGHT 8.125158 o .2"(3 + 2 .517345)2)(1-(.020z~2) .125508)2) J = 0:ô67 ~ X~(.125508)2) J .. Ûi .7047zl1) + i.1609z~2) The two insulin responses dominate Ûi while Vi consists primarily of the relative weight variable.0131006541 0.0655750801 -0.0815z~i) Vi = i. we prefer to interpret the canonical variables obtained from S in terms of coeffcients of standardized variables.99 suggest that only the first pair of canonical variables are important.375167814 FASTING -0.59 and -en ..13.007324 0.~(p + q .4357zPJ .014438254 -0.05) or 1 -(46-1-2"(3+2-1) )In((1-(.517345 2 O. Canonical Correlation Analysis Adjusted Appr~x Canonical Correlation 1 0.517145 o .0247524811 INSULIN -0.1) ) In((1 .517345 and P'2 = 0. 023399723 -0. 125508 Canonical Standard Correlation Error 0.1) ) In((1 .p*~)(1 .q(.019159052 0 .~(p + q . Even if the variables means were given.1 .1 .05) = 12. "009843 Squared Canonical Correlation 0.(.267646 0.1) ) In¡(1 . 273 Standardized Canonical Coefficients for the Glucose GLUCOSE 0.4357 0.8232 INSULIN -0.7047 -0.4547 1.0815 -0.4006 I NSULRES Standardized Canonical Coefficients WEIGHT FASTING Correlations Between the Glucose SECONDA2 1 .0202 -0.0475 -0.1£09 1.0086 and Insulin PRIMARY 1 Variables PRIMARY2 o . 3397 o . 6838 INSULIN -0 . 0502 -0 . 4565 -0 . 5729 0.7551 and Fasting and Their Canonical GLUCOSE I NSULRES Correlations for the Weight SECONDAl and Insulin Between the Weight and Fasting and Their Canonical SECONDAl SECONDA2 WEI GHT o . 9875 O. 1576 FASTING o . 0465 O. 9989 Variables 10.17 The computer output below suggests maybe two .canonical pairs of variables. the canonical correlations are 0.521594, 0.375256, 0.242181 and 0.136568. Ûi ignores the first smoking question and Û2 ignores the third. Vi depends heavily on the difference of annoyance and tenseness. Even the second pairs do not explain their own variances very well. R~(1)IU2 = .1249 and R~(1)iV2 = 0.0879 Canonical Correlation Canonical Adjusted Canonical Correlation Correlation 2 3 0.521594 0.375256 0.242181 0.52t771 0.374364 0.241172 4 O. 136568 o . 135586 1 Analysis Approx Standard Error o . 007280 o .008592 0.009414 0.009814 Squared Canonical Correlation o .272060 D .140817 D .058652 0.018651 Standardized Canonical Coefficients for the Smoking SMOKING 1 SMOKING2 SMOKING3 SMOKING4 -0.0430 1.0898 1.1161 -1.0092 SMOK2 1.1622 0 .6988 -1.4170 -1.3753 0.2081 0.015£ o . 1732 SMOK3 SMOK4 o .8909 -1 .6506 Q. 8325 SMOKl 1.6899 -0.2630 Standar4ize Canonical CoeffÜ:ients f-or the Psych and Physical State 274 STATE CONCEN 1 0.4733 ST A TE2 ST A TE3 -0.8141 -0.4510 o . 4946 o . 5909 ANNOY -0 .7,806 SLEEP TENSE o .2567 -0 . ~052 o . 3800 ALERT 0.6919 -0.1451 -0 . 1840 0.6981 -0.4190 -1.5191 IRRIT AB -0 . 0704 O. 6255 ~O . 3343 TIRED 0.3127 o .5898 CONTENT o .3364 o . 4869 o . 2276 o . 8334 STA TE4 -0 . 1 t5'04 -0.7193 0.624'6 o . 4376 -0.7253 0.87£0 U .1861 -0.6557 Canonical Structure Correlations Between the Smoking and Their Canonical Variables SMOKINGl SMOKING2 SMOKING3 ~MOKING4 SMOK3 0.4458 0.5278 0.6615 0.2917 0.7305 0.3822 0.1487 0.5461 0.2910 0.2664 0.4668 0.7915 SMOK4 o . 6403 -0.0620 0 . 5586 0 .5236 SMOKl SMOK2 Correlations Between the Psychological and Physical State and Their Canonical Variables STATE 1 STATE2 STATE3 5TA TE4 CONCEN 0.7199 -0.3579 0.0125 ANNOY o .3035 0 . 1365 0 . 3906 -0.3137 -0.4058 SLEEP TENSE 0.5995 -0.3490 0 .3709 o .2586 0.7015 0.3305 0.0053 ..0.18'61 ALERT o .7290 -0. 1539 -0. 1459 -0.3681 IRRIT AB 0.4585 0.3342 0.1211 -0 .0805 0.0749 TIRED CONTENT o . 6905 -0.0267 0 . 2544 o . 5323 0 . 4350 0 . 3207 -0.5601 275 Canonical Redundancy Analysis Raw Variance of the Smoking Explained by Their Own The Opposite Canonical Variables Canonical Variabl€s Cumulati ve Proportion Proportion o . 3068 2 3 o . 3068 O. 1249 o . 2474 4 0.3210 1. 0000 1 0.4316 o . 6790 Canonical R-Squared Proportion 0.2721 o . 0835 0.1408 0.0176 o . 0587 0.0145 0.0187 o . 0060 Cumulati ve Proport ion '0 .0835 0.1'011 0.1155 0.1215 Raw Variance of the Psychological and Physical State Explained by The ir Own The Opposite Canonical Variables Cumulati ve 1 2 3 4 Proportion Proportion o . 3705 O. 0879 0.3705 o .4583 0.0617 0.1032 Canonical Variable s Cumulative Canonical R-Squared Proportion o . 2721 0.1008 0.1408 0.0124 Proportion o . 1008 O. 1132 0.5201 o . 0587 o . 0036 0.11'68 o . 6233 0.0187 0.0019 O. 1187 10.18 The canonical correlation analysis expressed in terms of standardized variables follows. The Z(1).s are the paper characteristic varables, the :t2).S are the pulp fiber characteristic variables. Canonical correlations: p; = .917, p; = .817, p; = .265, p; = .092 First three canonical variate pairs: Ûi =-1.505z:1) -.212z~l) +1.998z~1) +.676z~l) Vi =-.159z:2) +.633z~2) +.325z~2) +.818z~2) Û 2 = -3.496z:l) -1.543z~1) + 1.076z~l) + 3.768z~$$
V2 = .689z:2) + i.oo3z~2) + .OO5Z~2) -1.562z~2)

Û3 = -5.702z:1) +3.525z~1) -4.714z~1) +7.153z~1)
V3 = -.513z:2) +.077Z~2) -i.663z~2) -.779z~2)

276

Ru,.zo) =(.935 .887 .977 .952), Ry,.Z(2) =(.817 .906 -.650 .940)
RU1.Z(2l =(.749 .831 -.596 .862), Ryi.zo) =(.858 .814 .896 .873)

Here H 0 : L12 (P12) = 0 is rejected at the 5 % level and H ¿I) : pt '# 0, P; = 0 is

rejected at the 5% leveL. H¿2): Pi. '# O,p; '# O,p; = P; = 0 is not rejected at the

5% leveL. The first two canonical correlations are significantly different from O.
The last two canonical correlations are not significant.

The first canonical variable Û 1 explains 88% of the total standardized varance of

it's set, the Z(1),s. The first canonical variable Vi explains 70% of the total
standardized variance of it's set, the z,2).S. The first canonical varates are good

summar measures of their respective sets of varables. Moreover, the first
canonical variates, which might be labeled a "paper characteristic index" and "a
pulp fiber strength-quality index", are highly correlated. There is a strong

association between an index of pulp fiber characteristics and an index of the
characteristics of paper made from them.
The second canonical variable Û 2 appears to be a contrast between the first two
variables, breaking length and elastic modulus, and the last two variables, stress
burst strength. However, the only moderately large (in absolute
at failure and
value) correlation between the canonical variate and it's component varables is
the correlation (-.428) between Û 2 and Z~I) , elastic modulus. The remaining

correlations are small. This canonical variable might be a "paper stretch" measure.
The canonical variable "2 appears to be determined by all variables except Z~2) ,

fine fiber fraction. This canonical variable might be a "fiber length/strength"
measure. The second pair of canonical variates is also highly correlated.

10.19 The correlation matrix R and the canonical analysis for the standardized varables

follows. The z,1), s are the running speed events (100m, 400m, long jump), the
z,2).S are the arm strength events (discus, javelin, shot put).

1.0

.7926J
.4682

.4682

1.0

.4179
Rl1 = .5520 1.0 .4706

R22 = .4179

.4706.6386J
1.0
(.6386
1.0 .5520
.4752J

RI2 = R;i = .2100 .2116 .2539
.3998 .1821
.3102
(.3509

.4953

( .7926
1.0

277

Canonical correlations:
p; = .540, p; = .212, p; = .014

Canonical variables:
Ûi = .540z~1) -.120z~1) +.633z~l)

Û2 = i.277z~l) -.768z~1) -.773z~1)

Vi = -.057z~2) + .043z~2) + 1.024z~2)

V2 = -.422z~2) -1.0685z~2) + .859z~2)

Û 3 = .399z~1) + .940z~1) - .866z~1)

V3 = 1.590z~2) - .384z~2) -1.038z~2)

RUI'Z(1) = (.662 .160 .732), Rv"Z(2J = (.772 .498 .999)

Here H 0 : L12 (P12) = 0 is rejected at the 5 % level and H cill : p; "* 0, p; = p; = 0 is

rejected at the 5 % leveL. H ci2) : p; "* 0, p; "* 0, p; = 0 is not rejected at the 5 %
leveL. The first and second canonical correlations are significant. The third
canonical correlation is not significant.

We might identify Ûi as a "running speed" measure since the 100m run and the
long jump receive the greatest weight in this canonical variate and also are each
highly correlated with Ûi' We might call Vi a "strength" or "ar strength"

measure since the shot put has a large coeffcient in this canonical variate and the
discuss, javelin and shot put are each highly correlated with Vi'

278

Chapter 11
given in (11-19) is

11.1 (a) The linear discriminant function

A (_ - )'8-1X =AI
a X

Y = Xl - X2 pooled

where
S~moo = ( _: -: i

so the the linear discriminant function is

((: i - (: iH -: -: 1 z=¡-2 ~=-2Xi

(b)

2 2

A =l(A
m
- Yl +A)
Y2 =l(AI
- a Xl +AI)'
a X2 =-8
Assign x~ to '11 if
Yo = (2 7)xo ~ rñ = -8

and assign Xo to '12 otherwise.

Since (-2 O)xo = -4 is greater than rñ = -8, assign x~ to population '11-

279

11.2 (a) '11 = Riding-mower owners; 1T2 = Nonowners

Here are some summary statistics for the data in Example 11.1:

Z¡ - (I:,,::: 1 '

Z2 - 1:::: 1

5, - ( ~:::::: -I::::: 1 '

82 = ( 200.705 -2.589 1
-2;589 4.464

8 pooled - ,

8-1
pooled =
.00637 .24475
( .00378 AJ06371

-7.204 4.273
_(.276.675
-7.204 i

The linear classification 'function for the data in Example 11.1 using (11-19)
is

.006371
r J'
x = L .100 .785 :.
20.267 17.633 .00637

( (109.475 i -( 87.400 i) i ( ,00378

where

1 1

.24475

ri = 2"(Yl + Y2) = 2"(â'xi + â'X2) = 24.719

280

(b) Assign an observation x to '11 if

0.100x¡ +0.785xi ~ 24.72

Otherwise, assign x to '12

Here are the observations and their classifications:

Owners
Observation a'xo
Classification
1
nonowner
23.44
2
owner
24.738
26.436
3
owner
25.478
4
owner
30.2261
5
owner
29.082
6
owner
27.616
7
owner
28.864
8
owner
9
25.600
owner
28.628
10
owner
25.370
11
owner
26.800
12
owner

Nonowners
Observation a/xo
Classification
1
owner
25.886
2
nonowner
24.608
3
nonowner
22.982
4
nonowner
23.334
owner
25.216
5
6
21. 736
nonowner
21.500
7
nonowner
24.044
8
nonowner
9
nonowner
20.614
10
nonowner
21.058
11
nonowner
19.090
20.918
12
nonowner

From this, we can construct the confusion matrix:

Predicted
Membership
'11 '12

Actual
membership

:~ j

11 1
2 10

Total
12
12

(c) The apparent error rate is 1~~i2 = 0.125
(d) The assumptions are that the observations from 7íi and 7í2 are from multi-

variate normal distributions;with equal covariance matrices, Li = L2 = .L.
11.3 l,Ne ned.t-o 'Shuw that the regiuns Ri and R2 that minimize the ECM are defid

281

by the values x for which the following inequalities hold:

Ri : fi(x) ;: (C(lj2)) (P2)
h(x) - c(211) Pi

R2 : fiex) ~ (cC112)) (P2)
h(x) c(211) Pi
Substituting the expressions for P(211) and p(ij2) into (11-5) gives

J R2 J Ri

ECM = c(211)Pi r fi(~)dx + c(li2)p2 r h(x)dx

And since n = Ri U R2,

1 =Jr h(x)dx
r h(x)dx '
Ri J +R2
and thus,

ECM = c(211)Pi (1 - k.i fi(x)dx) + c(112)p2 ~i h(x)rix

Since both of the integrals above are over the same. region, we have

ECM = r (c(112)p2h(x)dx - c(21

JRi

l)pifi

(x)ldx + c(2~1)Pi

The minimum is obtained when Ri is chosen to be the regon where the term in
brackets is less than or equal to O. So choose Ri so that

c(211)pifi( x) ;: c(112)pd2(:i )'Ur

(C(112) c(2j1)) Pi (P2) 11.5 h(x) .8 and assigning x to '12 if fi(x) ~ (C(112))(!!) = .2(~lr :-2:~r ~+~~r.2) = .4 (8) The minimum ECM rule is given by assigning an observation :i to '11 if fi(æ) )0 (C(112)) .: 5 hex) . -' and assign x to '11' 11.1 1.1 l +-1 1 . :i-~'t :+2:2+ :-~2+ ~2 1 i .5. fi(x) = 6.1 l l.1 1+.282 h(æ) )0 h(x) .5 f2(x) c(211) Pi 50. ..2(-2(:1-:2) ~ ~+~l~ :1-:2~ :2 i -1 i ( ) i l.1 J = .c(211) Pi 50..8 (b) Since fi(x) = .3 and f2(x) = .1 ( ) = (:1-:2) t : -2 :'-:2 If :1+~2.(pi) = (100) (~) = .(100) (.~ (~-~1)'t-1(~-~1) + ~ (~-~2)lt~1(:-~2) = 1 1 1 1 .5 .

1 ~ lir 2) .!!2) = ~ (!:i . b) E ( ~.~ ~l(~i + !!2J = 1 ~I (~i .7 (a.2 R_1 -1/3 R_2 -1.2 0.6 ~ 0.0 1.283 11.0 -0.0 0.m = ~l!:i .-1 ( .) Here are the densities: --x 1.6 0.5 x -0.~1) _ 1 ( )..5 1.0 0.6 a) E(~'I~I7ii) -aa = ..0 -0.5 0.~2) ~ 0 .~2'" ~l .0 0.5 x .0 1.II = ~ 1!:2 .2 R_1 1/4 R_2 -1.:!:l . 11. nee r1 is positive definite.'2 ~l .!:2) i r i (~i -!!) ~ 0 s ..m = l ~l (~2 .0 1.2 -0.5 0.5 1.

. 11.) When Pi = . ..8 (al Here are the densities: i.25 and R2 : .Pi These regions are given by Ri : -1 ~ x ~ -1/3 and R2 : -1/3 ~ x ~ 1.4 Ri : fi(x) .: P2 = .hex) hex)~.1 R2 : !i(x) hex) . ci . the classification regions are R1. cii R_2 -1 -1/2 R_1 1/6 R_2 o 1 x (b) When Pi = P2 and c(112) = c(2Il). the clasification regions are R2 : fiex) h (x) ~ . P2 = . the classification regions are !i(x) Ri .h(x) h(x).8.284 (b) 'When Pi = P2 and c(112) = c(211).5..1 R2 : h (x) ~ 1 These regions are given by Ri : -1 ~ x ~ .4 hex) . (c.2.25 ~ x ~ 1. ci -~ C' ci .5.( 1 2 . and c(112) = c(211).:.

ere + ) Thus ~ = 2' "_1~1 .I . 1/6 ~ x ~2.285 These regions are given by Ri : -1/2 =: x ~ 1/6 and R2 = -1.U_2) and 11_2 . = l(2.~l ) so a'B .5 ~ x ~ -1/2.u-_ ~2.J~' a/La a1ta - whI.~ = tt ~2 . . ~ ll_l .ua = !'((~1-~)(~1-~)' + (~2-~)(~2-~).5 11.9 a'B .ua = a/La ! ~I (~1-~2)(~1-~2) I ~ ala .

25 ~ O.nl+n2-p-l Since T2 = 14. T2". Assign x~ to '1i if -A9xi .m = -.52 .49Xi .2j ((I~ + 112) l-::: -::: If L ~: I = 14..~~ F2..28~ 11.10 (a) Hotellng's two-sample T2-statistic is T2 . ni n2 .P - + 1 p. (ni + n2.2 at the Q' = 0. we reject the null hypothesis Ho : J.2)p F.25.(-3 ..X2) . (b) Fisher's linear discriminant function is Yo = â'xo = -.53(1) = -. Under Ho : l. .53 and Yo . .X2)' f (~i + n~) Spooled J -i (Xi .. Yo = -.i = 1J2. assign Xo to '12.53x2 + . .i (c) Here.i = J. = -. For x~ = (0 1). Otherwise assign Xo to '12.53x2 .2o(.52 ~ ~i~i~~-. .28 ~ o.1) = 5. m. Thus.44.1 level of significance.(:Vi .

c 9 10 11 12 13 14 P(BlIA2) P(B2/A1) P(A2 and Bl) 0.023 0.346 .287 11. expected cost = $5P(A1 and B2) +$15P(A2 and B1).067 0.75).81 .159 0.500 .159 0.59 2. and misclassificationcosts c(2Il) = $5 and c(112) =$10.250 P( error) Expected cost .011 peAl and B2) .27.59 1.309 0.27 1.42 1. and equal misclasification costs c(2Il) = c(112) ~ $10: c 9 10 11 12 !13 14 P(BlIA2) P(B2IAl) P(A2 and Bl) .309 0.250 0.067 .011 0.~2 Assuming equal prior probabiltiesPi = P2 = l.78 1. and misoassIÍca- tion costs c(2Il) =$5 and c(lj2) = $10.261 3.49 2.154 0.5) .2'5) +$15P(BIIA2)(.079 .500 0.033 .006 .5) ) the expected cost is cost is $1. .90 and the minimum expected cost is$1.011 .023 P(AI and B2) P(error) 0.023 .011 Expected cost 1.691 0.88 1.188 .61 1.61 minimized for c = 12 and the minimum expected Using (11.D03 . 11.349 .188 0.067 .033 0.48 3.715.159 .188 .006 0.261 .261 0.309 .309 . Using (11.033 0.033 .188 0. expecte cost = $5P~B2jAl)(.067 0.349 0.11 Assuming equal prior probabiliti€s Pi = P2 = l.88 2.079 .261 0.250 .159 .59.003 0.13 Assuming prior probabilties peAl) = 0.023 .159 0.079 0.500 .25 and P(A2) = 0.079 0.154 .500 0. the expected cost is minimized for c = 10.691 .346 0.154 0.154 . 1i.159 .250 0. 14 Using (11-21).309 0. = -0.040 0. classify Xo as '12.3.173 0.88 1.500 0.017 0.3.80 and the minimum expected cost is$0.248 0.691 0.023 0.v'â'â -.006 0.231 0.14 ~ rñi = -0.1.050 0.159 0.288 c 9 10 11 12 13 14 P(Bl/A2) P(B2/A1) P(A2 and Bl) P(A1 and B2) P(error) 0.159 0.56 5.142 0.14 1.067 0.381 0.77 aA 2* -.017 0.077 0.159 0.005 0.067 0.a~ --( 1.18 ~ m.00 i Since â.178 0.65 Using (11. classify Xo as 7i2' Using (11-22).119 0. 11.309 0.88.xo = -0.-and m*i = -0. These two clasification r. = -0.375 Expected cost 0.500 0.127 0.12 -.5) .125 0.98 3. These 'results are consistent with the classification obtained for the case of equal prior probabilties in Example 11.006 0.10 Since â~xo = -0. the expected cost is minimized for c = 9.12. . ~1 and m. 79 A* .023 0.61 1 ai -â -(.93 0.eults should be identical to those of Example 11.

.+-1) = .21(x-ii.k where 1 k='21n (iii/) 1 i ~1 -1.~2"'2 ~ .. For a multivariate 1n fi(~) -In f2(~) l rc(1IZ) 1n Le-pi normal distribution .2!:2'12~ 1 1(+-1 +-1) (. +-1 .'12 ~ + ~1+1 ..)'r'(x-ii...(x) = _12 ln It.289 f1 (xl (C(lIZl P2J 11.('21n U iW i/ ) . 1 ( I t i I) + 2' ~-!:2 +2 (~-~Z) . .-1!:2+2 .2 so 1 n f1 (~) . ) 1 ( ) .2~rl'1 : + ~1 "'1 ~1 1 +.1 .i : "'1 : .'2 1n M _ 1 ( .1!:2)1.~ .22 ln 2rr . -1 ~2) ..15 fZ(~) l eT Pi defines the same region as PzJ.-1 '+ -1 ..'2 t~ -+ . --1 1n f.+-1 . i=1...~ (:-~1)' ~i 1 (:-~.).2 ~ ~1 . iW + I'!!i+1 ~2i2 .+.. n f 2 (:) = .1 _.

16 Q = In .k where When k =12'(1n (I t ii ) 1...1-2~2 1-2 .1 ) i +-.i lnl+il .17 Assuming equal prior probabilties and misclassification costs c~2Ii) = $W and c(1/2) =$73.i(:-~l) 'ti1 (~-~1) (f ¡(X)J 1 l' -1 + '2 In!t21 + 2'(~-~2) t.e2) l (~1 +!:Z) 11. 1'1 ~i . .i .1 = . (.i ~i . (~-~2) 1 . ti = h = t.2' ~Z It-'()'( 1+-1 ' = ~ l ~1 .-' 't-1 ' J ii + ~. . Q= i~ -i1-:1+-1 1 (~i iT t-~1 1. In the table below .~2 Xo + J._X 1'2 ll_Z .1.!:21' 1+-1) l' ~1 +~2 .112:E2 :to 1 -~l (IEil) _ ~( i~-l i -1 ) 2 n 1~21 2 1-1 i 1-1 ..LZ . 1+...-2 x +i -+2 X + X t II . Q-__ i ("(-i "(-i) (i "(-i i -i) 2 Xo LJi ..-1 t.89.2' l:i .290 11.~2T2 ~2 . fi(x) = .

00000 331' 1.78053 0.00009 0. P('1ilx) P ('12 I Q x) 15)' 1.87 1l2 5. (22. (24.00000 0.27 '1i 0. 00000 31)' 1.30483 0.46 24.21947 0. 00000 0 17)' 19)' 21)' 23)' 25)' 27)' 0. (b). 00000 35)' 1. (12.54 -1.99678 291' 1.38 53.54 9.00000 0.69517 0.00000 0. (c) and (d). For (a).74 13.01 37.36731 0. (20.00322 0.0 L c(211) Pi 10 Otherwise. classify x as '12.56 '1i '1i '11 ii2 '12 '1i '1i '11 '1i The quadratic discriminator was used to classify the observations in the above table.89) = 2. (30. (14.00000 Clasification 18. see the following plot.04745 0.36 3.00 -0. 50 40 30 0 0 0 0 0 C\ x' 20 10 0 o 20 10 )L1 30 .95254 0.291 x (10. An observation x is classified as '11 íf Q ~ In r(C(112)) (P2)J = In (73.00000 0. (28. (16. (18.99991 0. (26.63269 0.

m -1.= 4. a"'i Xo .50 Classification 112 7(1 7í2 ..292 11.50 0. rñ = .83 0. (b) Fisher's linear discriminant function is A AI 1 2 3 3 Yo = a Xo = --Xl + -X2 where 17 10 27 3 3 6 Yl = -.18 The vector: is an (unsealed) eigen'l.50 -2.7. -m '1i Observation '12 Classification Observation '11 1 2 2.5 Assign x~ to '1i if -lxi + ~X2 .17 '12 2 3 1 a-I Xo -. Y2 = -. 11.-1B since t-l t-l 1 B: = t c(~1-~2)(~1-~2)IC+.5 ~ 0 Otherwise assign x~ to '12.19 (a) The calculated values agree with those in Example 11..83 '1i 3 -0.4.ector of .(~1-~2) = c2t-l (~1-~2) (~1-~2) i t-1 (~1-~2) = A t-1 (~1-~2) = A : where A = e2 (~1-~2) 't-l (!:1-~2) .

7.Xi) '11 Obs.7.XS .Xa. ÎJ~ (x) ÎJH x ) Clasification 3 3 '1i 1 313 i3 7f2 2 i J! '1i 2 l3 i3 7fi 3 4 3 3 3 '12 3 19 3 4 1 3 21 3 3 7f2 The classification results are identical to those obtained in (b) 11.(x) = (x . where D.20 The result obtained from this matrix identity is identical to the result of Example 11.X'2. . 11.23 (a) Here ar the normal probabiHty plot'S for each of the vaables Xi.293 The results from this table verify the confusion matrix given in Example 11.oied(X .ÎJI(x) i ÎJ~ (x) '12 Classification Obs.xd8~. (c) This is the table of squared distances ÎJt( x) for the observations. X4.

294 -2 .1 o 2 .

(b) Using the original data.ooO 0 0 . and Xs appear to be nonnormaL.08X4 . .2 -1 0 1 2 Standard Normal Quantiles Variables Xi.295 -2 -1 0 2 .III. xa. the linear discriminant function is: y = â' x = 0.ID.i. The transformations In\xi) and In(xs + 1) appear to slightly improve normality..2 -1 0 2 0 ..O..~a_~ 300 ~x 0 ~ 00/ 0 ocP 250 200 0 .25xs where ri = -23.2lx3 . In.2 -1 0 2 0 80 60 II x 40 20 .023xi .034x2 + O.0.(x3 + 1).23 .0.

we allocate Xo to Í1i (NMS group) if âxo .0.034x2 + 0.'1i '12 Actual membership .08X4 .: 0 Otherwise. X3).tXi.2lx3 .23 . X2).296 Thus.24 (a) Here are the scatterplots for the pairs of observations (xi.0. ( c) Confusion matrix: Predicted Membership '1i '12 Actual membership . and ~Xl' X4): .~ j 66 3 7 22 Total t ~~ APER= 6~~~9 = .023xi .~ j 64 5 8 21 Total t ~: Ê(AER) = 6~~~9 = .25xs + 23.0. allocate Xo to '12 (MS group).102 This is the holdout confusion matrix: Predicted Membership .133 11.rñ = 0.

6 x1 The data in the above plot appear to form fairly ellptical shapes.0 C\ + Q.0.4 0. ++++ +it+ + + ce 0 )( -0. ++ + )( + + ++ ++ 2 0+ 0 0 -0. .6 0.4 0.1 + 0 bankrupt nonbankrupt 0 0 0 0 -0.2 óJ 0 0 0 0.1 + ++* +: 0.0 0.4 -0.0 + 0.6 + + 0 o Cò + 0 0 0 0 -0.6 0.4 0.2 0. + 0++ 0.2 -0.2 0.6 -0.2 + + ++ + + 0 0 0 + + + Ll \ + 0.6 -0.6 -0.8 -a + )( 0 0.2 + q.297 0 0.2 -0. so bivaate norma1ìty -does not seem like an unreasonable asumption.3 +lt o 0 + .0 + 5 + C" + + 4 3 ++0+.4 + 000 oOi 8(3 ~ 0 1 0 + + -0.4 -0.2 0.4 0.4 0.

(Nonbankrupt group).Oæ37 J (c).67xi .0442 0.5.298 (b) '11 = bankrupt firms.01751 J 8 pooled = 0. (d). . For (Xi.04594 Fisher's linear discriminant function is y = â'x = -4. '12 = nonbankrupt firms.02847 0.2354 i' 0.rñ = -4.01077 ( 0.00837 0.01751 0.00231 lO'M735 0.196. (e) See the tables of part (g) (f) 0.02092 ( 0.0688 X2 - 0.0551 ( 0.12x2 + .l2x2 where rñ = -.67xi . allocate Xo to '12 APER= :6 = .X2): Xi - 8i - -0.32 Thus. we allocate Xo to '1i (Bankrupt group) if âxo .5.02847 J 82 - 0.0819 i ' ( -0.32 ~ 0 Otherwise.

05. This is the table of quadratic functions for the variable pairs . Fisher's linear discriminant function For the various classification rules and error rates for these variable pairs. see the following tables. both with Pi = 0. The classification rule for any of thee functions is to classify a new observation into 1ii (bankrupt firms) if the quadratic function is ~ 0. and (Xb xs).299 Since 8i and 82 look quite different.5 and Pi = 0.(Xb X2).~Xb X3). and to classify the new observation into .

05 Pi = 0.13 0.11 0.X2) Pi = 0.46xf.17 0. Xa) Pi = 0. APER Variables (Xi.69xi + 7.30. Xa) vaiable pair is the best clasifer.05 0.05 -i.71 Here is a table of the APER and Ê(AER) for the various variable pairs and prior probabilties.95.11x4 Pi = 0.08x3 .5 Pi = 0. X4) Ê(APR) Pi = 0.5 (Xi.05 and P2 = 0. as it has the lowest APER.75xiX4 + 8. X4) Pi = 0.14 6.55x~ + 3.22 0.26 0.17 3.60X2 Pi = 0. X2) (Xi.77xi + 35.05xi .08 2.43x¡ .64xi .11 3.20 0.5 Pi = 0.22 0. + 7.300 '12 (nonbankrupt firms) otherwise.3. Notice in the table below that only the constant term changes when the prior probabilties ~hange. Pi = 0.4t) For equal priors. Variables Prior Quadratic function -61.84xiX2 + 407.20x~ + . the variable pair (xi.05 -0.26 0.s.9ûxa Pi = 0.5 (xi. X2) has the lowet APER.10.23 0.10. it appears that the (Xl.5 (Xl.8. For unequal priors. xa) (Xi.39 o .39 0.05 + - 0.S9xiXa .37 0. .

6 = . Ê(AER) = .00837 0.00431 -0.03300 0.20.80 11.0003l 2.04424 0.00873 D. 82 - Assign a new observation Xo to '1i if its quadratic function .00362 0.412 x'0 xo+ Xo Pi = 0.623 Pi = 0.03428 0.given below is less than 0: Prior Quadratic function -49.493 1.2354 - 0.00662 0.6 = .03177 0.5939 0.0819 - .02580 0.04441 0.657 -2.434 11.02847 0.4337 8.65 14.07543 0.91 14. Ê(AER) = ¡~= .748 1. - 8i X2 iJ. 0.623 11.974 -11.00873 1 :04596 0.232 -20.07.64 .050 -52.00431 0.1'6455 0.69 - 5.0330u 1.07543 -u.00662 0.00031 0.02847 0.03428 0.00362 0.657 526.493 -28.0551 .u023l 0. X4).3675 0.42 -20.04735 0.05 For Pi = 0.4264 -0.02092 0.412 -3.301 (h) When using all four variables (Xb X2l X3.0258D () .00837 0.5 4.336 -2.11 For Pi = D.02618 0.0688 Xi -0.24 - 2.n5 : APER = :6 = .4368 0.03177 0.5 : APER = .050 -52.

2 0. The APER is 2:l4 = .4 0.25 (a) Fisher's linear discriminant function is Yo = a' Xo .6 x1 (b) With data point 16 for the bankrupt firms delete.48xg + 3. 5 4 C' 3 x 2 1 -0. Fisher's linear discrimit . .6 -0.0 0.: 0 Otherwise classify Xo to '12 (nonbankrupt firms).4 -0.302 11. This is the scatterplot of the data in the (xi.rñ .rñ = -4.80xi .13.33 Classify Xo to '1i (bankrupt firms) if a' Xo .2 0. Xg) coordinate system. along with the discriminant line.1.

Als laheUed are observation 16 for bankrupt firms and obrvtion ..303 function is given by Yo = a'a. The APER is 1. 2: 0 Otherwise classify Xo to '12 (nonbankrupt firms).e.31 Classify Xo to'1i (bankrupt firms) if a'xo .m = -4. coordinate system with the discriminant lines for the three linear discriminant functions given abov.o .93xi .089.: 0 Otherwise classify Xo to '12 (nonbankrupt firms)..36 Classify Xo to '1i (bankrupt firms) if a/:.1.m .3 = .4 = .46x3 + 3. X3).i. Fisher's linear discriminant function is given by Yo = a'xo .35xi .O .97x3 + 4.11.m = -5. This is the scatterplot of the observations in the (Xl. With data point 13 for the nonbankrupt firms deleted. The APER is 1.m.

304 13 for nonbankrupt firms.26 (a) The least squares regression results for the X.604 5.) ITI -0.5492 T for HO: 0.0001 Here are the dot diagrams of the fitted values for the bankrupt fims and for the nonbankrupt firms: . Z data are: Parameter Estimates Variable DF INTERCEP 1 X3 1 Parameter Estimate Standard Error Paramet-er=O Prob . 13488497 o .05956685 -0. 11.081412 0.307221 o . It appears that deleting these observations has changed the line signficantly.158 o .

. .. .N onbanrupt o .. 20 1. 06025 o . 60 0 .. .---. .. (b) The least squares regression results using all four variables Xi.--Banupt . X4 are: ...50 fitted values: This table summarizes the classification results using the OBS GROUP 13 16 31 banrupt banrupt nonbankr 34 38 41 nonbanr nonbanr nonbanr FITTED CLASSIFICATION --------------------------------------------o . 30 O.90 1. 57896 0..53122 0. X3. 30089 misclassify misclassify misclassify misclassify misclassify misclassify The confusion matrix is: Predicted Membership '11 '12 Actual membership '11 =1 19 2 '12 J 4 21 Total t .--+---------+---------+----. +---------+---------+--.... X2. .48329 o .13. 00 0 . +~--------+---------+---------+---------+---------+----.. the APER is 2:t4 = ... ..47076 O.. Thus. . .305 .

3508 Here are the dot diagrams of the fitted values for the bankrupt firms a:nd for the nonbankrupt firms: _+_________+_________+_________+_________+_________+__---Banrupt _+_________+_________+_________+_________+_________+__---N onbankrupt -0.268 3. 2ô83 o .55719 0.46653100 X2 X3 X4 1 1.32336357 T fo.944 O.40 This table summarizes the classification results using the fitted values: OBS GROUP FITTD CLASSIFICATION o .21845 The confusion matrix is: .ob . 00 0 . 156317 0.) ITI 1.35 0 .335 1.208915 Xl 1 o .306 Parameter Estimates Standard Error Parameter=O Pr.05 1. 90606395 1 o . 62997 o .r HO: 0. 0026 O.35 0 . 70 1 .214 -0 . 225972 1 -0.305175 0.18615284 0. 72676 misclassify misclassify misclassify misclassify ---------------------------------------------15 16 20 banrupt banrupt banrupt 34 nonbanr 0. 149093 o .2119 o .07030479 0.122 0.7393 Variable DF Parameter Estimate INTERCEP 1 0.

087. It appears that point 16 of the bankrupt firms is an outlier.5 0. Here is a scatterplot of the residuals against the fitted values.27 ~a) Plot'Üfthe4ata in the (Xi.5 +. + .X4) variabte space: 1.0 1.5 . with points 16 of the bankrupt firms and 13 of the nonbankrupt firms labelled.~ :~ j Thus. + °eo + ° ° 13 0 0.0 bankrupt nonbankrupt "'~ ° Cò + 0 c: ~ -0.~ 0 In -e :i "C 'ëi (I 0. ".307 Predicted Membership Total '1i '12 Actual membership 18 3 1 24 F . the APER is 3::1 = .0 . + 0 16 + 0.5 Fitted Values 11.

:2 :.¡'s is different from the of significance: Statistic WiJ. + + + + . :. :.0 + :.0 X2 (sepal width) 'Shape. :.5 4. :.145 Num DF Den DF Pr)J F 8 288 Q. (b) Here are the results of a test of the null hypothesis Ho : Pi = 1L2 = ¡.5 :.0 :.5 Ii - + Jo + Q) 'V 1. :. :. :. The points from all three groups appear to form an ellptical it appears that the points of '11 (Iris setosa) form an ellpæ with a different orientation than those of '12 (Iris versicolor) and 113 (Iri virginica). :.x'S' Lambda Value F o . However. + + :. :.0001 . :.0 3.++++++ + + + + + + + + + +++ ++ o X o Setosa Versiclor Virginic o 000 o 00 0 0 00 0000000000 0.308 2. :. ~ . :. :.'§ -i: 1.~2343B63 199.3 vel'US others at the a = 0.5 3. :. :. :. :.05 level Hi : at least one of the ¡. :. :. This indicates that the observations from '1i may have a different covariance matrix from the observations from '12 and '13.0 + ~ 00 0 0 2. :. :. :. :.++++ :. :. .5 o o 2. 2. :. :. :.

the null hypothesis Ho : J11 = J12 = J13 is æjected at the Q = 0.Xi)' Sii(x .28x4 -59.38x4 .g.6 '13 15A9X2 + 3'6.37.87x~ + 24.28.44.23 So classify Xo to '12 (Iris versicolor).75).58x4 .54x2X4 .37.30X4 .92 .et.68X2 + 6.309 Thus.26x4 .00 .Xi) '11 -3.36.043 cf(xo) = -1. AQ di (xo) = -103. the plots give us reason to doubt the assumption of equal covariance matrices for the three groups.92x2 + 12.. 76x~ + 8.05 level of significance.5 1.57x2x4 .9.. 36.77 AQ d2 (xo) = 0.78 57..16x2x4 .53 '13 -6.60x4 + 23.32x~ + 22.l(x .3lx2 + 1£.O) 1li .71x2 + 2.ß3x4 .22. '12 = Iris versicolor '13 = Iris virginica The quadratic discriminant scores d~(x) given by (11-47) with Pi = P2 = P3 = l are: population ~(x) = _1 In ISil.04 To classify the observation x~ .3. (d) The linear discriminant scores di(x) are: population I di(x) = ~SpooledX .47.94x2 + 7. and classify Xo to the population for which ~(xo) is the la¡.59.67 '12 -9. (c) '11= Iris setosa.12 '12 i9. As discussed earlier.13.l~Spooledæi J dïÍ2.2.09x~ + 19.22.02x2 .73 '58. compute Jftxo) for i = 1.

The results are the same for (c) and (d).5 3.i++++ .0 :.0 0 . :.5 :.. Here is a table of dn(%o) for i.0 0 3. :. ....2.0 :. . :.5 0 0 000 00 0 0000000000 00 0 0 2.i- U .5 X2 ~sepal width) 0 0 0 4. . and Rg delineated. :. :. we classify the new observation x~ = i3.74 29. :. :. with the classification regions Ri. .++++++ + + + + + + + + + + + + :: 0. . . :..: 0 for all i =l 2..3: i 1 2 3 0 -30. .94 -0..~ 2. (e) To use rule (11-56). ....3. using (11-52) Here is the scatterplot of the data in the (X2' X4) variable space. .0 2. 1.310 Since d¡(xo) is the largest for í = 2. .2.i . construct dki(X) = dk(x) ... .. :. . :.5 1. :.. :. 2. k = 1.75) to'1i according to (11-52).di(æ) for all i "It. :.. . we allocate Xo to '12.80 0 -29.5 eØ - + + + ã5 Co '" X 1.. :. . .: 0 for all i = 1. Then classify x to'1k if dki(X) . :. R2. . ..94 0 1 J 2 i Since dki(XO) .74 30. -.80 0.

t + + :P + J: +. + .8 1..90 log 1' .5 0 1. 2.311 CHAPTER 11.TIOH 36 (f) The APER = ii~ = ... V + + t .ces and ivariate normal populations.. + + t +. 26.6 .. ..0 0 0 0 0 0 00 0 0 0 o 0 0 0 * +++..4.SpooledX . labelled with a "*".83 7r3 79.0 0 0 !b 1.')l 0..31.. Ê(AER) = itg = . In '1i. For both variables log Yi..0 log(Y1 ) The points of all three groups appear to follow roughly an eliipse-like pattern...97 log 1' ..04 11.~ .t.5 -~ - Cl . +.'l~~ . 3. DISCRIMINATION A.36..28 (a) This is the plot of the data in the (lOgYi. ++:t++~ +.30 .10 log Yí + 13. 0.. (b)...94 log Yi + 10. there also appears to be an outlier. from the observations from '12 and '13.. these are the linear discriminant -scores dit x) for i = 1. the orientation of the ellpse appears to be different for the observations from '11 (Iris setósa).80 IQg Y2 .lä... However... and log 1': population J df(X) = ä.SpooledZi '11 . (c) Assuming equal covariance matri.82 log Yi + 28. 10gY2) variable space: 0 2.81 7r2 75.Q o Setosa + Versiclor ~ Virginic 0 0 Oll 00 0 0 OCDai 0 0 0 0 00 0 0 o 0 2. .37.033.ND CLASSIFIC. ..4 0..

12.30 85. log Y2 log Yl log Y2.7.93 log Y2 . The preceeding misclassification rates are not nearly as good as those in Ex:- ample 11. 150 .23 i50 ..84 log Yi . It is not as good at discriminating 7í2 from 1i3.18 150 .20 log Yi .. log Y2 + 8.33 150 .312 For variable log Yi only: population ¿¡(x) = ~SpooledX . 49 ...52 log Y2 . we do not expect the error rates in pars (b) and. i50 . 34 .28. (d) Given the bivarate normal-like scatter and the relatively large samples.93 '13 For variable 10gY2 only: population ¿¡(x) = ~SpooiedX .90 log Yi . shape is not an effective discriminator of all three species of iris..82 '12 81.23 34 .(c) to differ. because of the overlap of '11 and '12 in both shape variables.44 '13 16.87 Variables log Yl.17 27 .73 '12 19.11. much.33. . 49 .l~Spooledæi '11 30.54 APER E(AER) 26 .31.læ~Spooleåæi '1i 40. Therefore. Using "shape" is effective in discriminating'1i (iris versicolor) from '12 and '13.33 iš -.

29 (a) The calculated values of Xl.000193 0.191.014 ( 0'2071 To classify x~ = (3.i - 5.43 Thus.000003 ( 0.63 2 16.21 497).11.Xi))2 for alli i= Ie For :. k L~_l(â'.Xk))2 ::E.74 J .646.(X .313 11.12 .009 J ).B- w-i _ 1518. A' ai 0.Xk)J2 1 2.99 3 2. and Spooled agree with the results for these quantities given in Example 11. Any time there are three populations with only two discrim- .11 (b) 1518.2.2 - 0. use (11-67) and compute EJ=i(âj(x .=i (âj(æ .o.'50 The eigenvalues and scaled eigenvectors of W-l Bare ). X3.3 Allocate x~ to '1k if EJ=i(âj(x .Xi))2 i = 1.74 258471.348899 0. Xi.009 ( 5. A i a2 -0. X. classify Xo to '13 This result agrees with thedasifiation given in Example 11.000193) _ ( 12.

125 . 11.33x4 . the minimum TPM rule is given by: Allocate xto '1k if the linear discriminant score dk (x) = the largest of di (:.70xi + 0.44xs .314 inants.l3.44.). '12.20X2 .32x2 .93x4 + 1.0.35.20 '13 2.30 (a) Assuming normality and equal covariance matrices for the three populations '1i.l6x3 + 5. classification results using Fisher's Discriminants wil be identical to those using the sample distance method of Example 11.14xs . d3\~ where di(x) is given in the following table for i = 1.lX~SpooledXi '11 0.78x3 + 8.12. d2 \ æ ).3. and '13.2.64xi + 0.58x2 .39x4 .23.78 '12 1.0.071 The holdout confusion matrix is: Predicted Membership '1i '12 '13 Total me~~~~hiP :: J ~ I ~ 1 :5 ( ~ E(AER)= 2+5~+3 = .2.52x3 + 6.61 (b) Confusion matrix is: Predicted Membership '1i '12 '13 Actual '11 membership '12 7 7í3 0 1 0 10 3 0 0 35 Total 7 11 38 And the APER O+5~+3 = .85xi + 0.11.08xs . population di(x) = ~SpooledX .

14 are also slightly higher than those for the original data. Ê(AER)) for the linear discriminants in Example 11. Thus the assumption of bivariate normal distributions with . log X4.31 (a) The data look fairly normaL. 11.315 (c) One choice of transformations. log X2. y'.ela- tions are smalL. appears to improve the normality of the data but the classification rule from these data has slightly higher error rates than the rule derived from the original data.. Xl. the corr. . The error rates (APER. 00 500 0 0 :: cø 0 Q) 0 0 450 Q) i: "t 0 00 0 400 0 0 0 0 c6 0 0 0 0 0 00 õJ° 0 0 0000 000 °õJ + C\ + 0+ 0 + Gl X +0 0 + + + + 0+ + ~it + + a Alaskan + Canadian 60 80 t 120 + + + +++ + + + + 100 + ++ .equal -covariance matrioes does not seem unreasnable..¡+ + + + 't + 1+ 350 300 + + + + + 140 160 180 X1 (Freshwater) Although the covariances have different signs for the two groups.

as APER= ~t~ = ...5. . ".. .54 Classify an observation Xo to'1i (Alaskan salmon) if â'xo-m .0 It does appear that growth ring diameters separate the two groups reasonably well.. . I. .Q 12.. -------+---------+---------+---------+---------+---------Canadian -8.0 -4.07 and E(AER)= ~t~ = .... .. ...: 0 and clasify Xo to '12 (Canadian salmon) otherwise. . ..052x2 .Alaskan ....07 ( c) Here are the bivariate plots of the data for male and female salmon separately.. . . ". . . . . -------+---------+---------+---------+---------+--------... .316 (b) The linear discriminant function is â'x .0 0.0 4..... . Dot diagrams of the discriminant scores: . .13xi + 0. .0 8.. ..rñ = -0..

1667 ( 100.erwIse.71015 1702.12 Classify an observation Xo to 1ii (Alaskan salmon) if â'xo-m.97101 -ì97.317 eo 100 12.m= -0.+ + + + + + + + + 300 140 160 180 X1 (Freshwater) For the male salmon.~"E~" % 500 0 0 - 45 0 0 CI i: .12xi + 0.31884 ( 181.D56x2 .8.64312 760.3333 i.0 80 160 180 140 i mae ¡~:~:%~~. Si -197.71015 J ( ::::::: l' S2 141.: 0 and clasify :c to '12 (Canadian -salmon) oth.65036 ( 370. these are some summary statistics .¡: 0 Cb 0 ct e o 0 0 0 0 o 0 400 C\ o 0 X 350 o il 0 0 o 0 o + o 000+ o + + + ++ +o + + ++++ + 0+ o 00 o + óJ 000 + o o ++ + ++ o+0 + + o + ++ + :.17210 141:643121 X2 The linear discriminant function for the male 'Salmon only Is â'x . xi 436.. + + + .

2. For the female salmon. .91539 ( 336. It is unlikely that gender is a useful discriminatory varable..w.13xi + O.23231 1097. as splitting the data into female and male salmon did not improve the classification results greatly. -210.05X2 . Using this classification rule.0 = .318 Using this classification rule.64000 1038.06.72ûOO ( 289.0 = .64000 J The linear discriminant function for the female salmon only is â' X .rñ = -O.(4::::::: J' s.(:::::::: J' S2 120.66 Classify an observation xo to'1i (Alaskan salmon) if â'xo-m ~ 0 and classify xo to '12 (Canadian salmon) otherwise.21846 120.06 and E(AER)= 3.23231 i Z2 .33385 -210. APER= 3tal = . these are some summary statistics Z¡ .08 and E(AER)= 3:ä2 = . APER= 3i.

0 and classify Xo to '12 (Obligatory carriers) otherwise. Normal -score plot-s fDr each group confirm this.ri = i9.2 + + 0 + + + 0 0 0 0 0 0 0 0 e + +0 + -0.6 -0. (b) Assuming equal prior probabilties.l7.32xi .319 the data for the two groups: 11.0 X1 Because the points for both groups form fairly ellptical shapes.: . The holdout confusion matrix is . airrier o -0.32 (a) Here is the bivarate plot of + + 0.rñ .4 o Noncrrer + Ob.2 0.0 + + C\ + X + + ++ + + + + ++ +0 0 +0 + ã' + %+ooo' + + ll 0 + ++õ + -0.l2x2 + 3.2 + + + ++ ++ + + 0.4 -0.56 Classify an observation Xo to '1i (Noncarriers) if â'xo . the sample linear discriminant function is â'x . the bivariate normal assumption appears to be a reasonable one.

068 0. 0 and classify ::o to '12 (Obligatory carriers) otherwise.019 4 5 6 7 8 9 10 0.63 lii 3.17 '1i 3.59 111 3.090 -0.279 -0.113 -0.123 -0.45 '11 (d) Assuming that the prior probabilty of obligatory carriers is ~ and that of .012 -0.210 -0.052 -0.rñ :.rñ Classification 6.04 7íi 1.58 '1i 4.17.0'59 -0. .12x2 + 4.68 '1i 3.126 â' x .094 -0.112 2 3 -0.320 Predicted Membership '1i '12 Actual membership '11 j '12 26 4 8 37 Total t ~~ Ê(AER)= 4is8 = .098 -0. The hold.011 -0.98 '11 1.32xi .143 -0.037 -0.o .62 lii 4.27 '11 3.043 -0.ut confusion matrix is .050 -0.064 -0. noncarriers is i.16 (c) The classification results for the 10 new cases using the discriminant function in part (b): Case Xl X2 1 -0.66 Classify an observation Xo to lii (Noncarriers) if â':. the sample linear discriminant function is â':.rñ = 19.

210 -0.205. and Xg = SaleWt.65x5 .0.206.206.47X4 + l8.37 4.059 0.55 Classification 7ri '1i '11 '11 7ri '1i '11 '1i '11 '11 11.27 4.0:03xg .08 2. X6 = Frame.59xs .064 -0.094 -0.78 4.043 -0.0.69 4.73 5.050 -0.l5xa .0.052 -0.03xg -3686 + l27. Xa = SaleHt.22x6 +275.068 0.037 -0.279 -0.0.123 -0. (a) For '11 = Angus.33x7 + 26.126 -0. X4 = FtFrBody.47xa .143 -0.03xg -3881 + l28.24 The classification results for the 10 new cases using the discriminant function in part (b): Case 1 2 3 4 5 6 7 8 9 10 Xi X2 â'x .80xa . and '13 = Simental.84x7 + 28.112 -0.098 -0.321 Predicted Membership '11 '12 Actual membership :: j ~~ I 2°7 Total t ~~ Ê(AER)= l~tO = 0.012 -0.019 7.08x5 .88X3 .48x4 + 19. X7 = BkFat.50X7 + 29.48X4 + 19.18x6 +265.08x3 .113 -0.0.72 5.090 -0.70x3 .ri -0.14 2. here are Fisher's linear discriminants di d2 cÎi - -3737 + l26. '12 = Hereford.33 Let X3 = YrHgt.36x6 +245.011 -0.68 5.

73. .54. 00 0 + + 0 0+ . ?:.:.36 . XS.2 :.eaual costs.Xa X7.XS 11. X7. and '13 = Quaker and assuming multivariate flmai data with a 'Cmmon covariance matdx.: 0 8000 ~ :.28 . and d3 = 3594.46 .31. :.43 . :. This is the plot of the discriminant scores in the two-dimensional discriminant space: 2 0 0 0 0 0 ~ 0 0 . Xs.7. '12 = Kellogg. Xa X4. X4. :.X7 Xs.21 . Xai Xg X4.36 .22 .thes .39 .X7 X4. Hereford.13 .~ i.17.25 . :.322 When x~ = (50. "b + + ~ ~ 0 + + :.13.20 .24 .XS XS. d2 = 3593.34 For .32. and equal pri. :.29 . so assign the new observation to '12.XS X4.32 '11 = General Mils. X6.14 .22 .rct i C\ .32 . 1525J we obtain di = 3596. X7. ~ ~ 0 :.p + -1 0 + 0 + 00 :. . :. + -2 0 :. + + % 0 eO + + ~ + 0 2 Angus Hereford Simental 4 y1-hat (b) Here is the APER and Ê(AER) for different subsets of the variable: Subset I APER Ê(AER) X3. X7. Xa XS.25 .1000.

29x3 + 2.: auar Kellog 2 . and low fat.22xB + .1. However.l5x4 . Mils + .c 0 C\ ..07 .33.: 0 0 0 0 0 + 0 00+ + + + ~ + + o~++ -1 .50xg .Olx65.3.20X7 2.ü.: .23x3 + 3.O.65xg .20xio .323 are Fisher's linear discriminant functions: di d2 d3 ..1.t + + .14 .02x65.0.12xio .07xB + 1. Here is a plot of the cereal data in two-dimension discriminant space: 2 0 o ar 0 1 .79x4 .62xs .64x4 .: -2 0 y1-hat 0 Gen.36xg . and carbohydrates..90XB + 1. fiber. they also have high sugar.53x7 - 1.32x3 + 4. The Quaker cereals appear to have low sugar.: -2 + + -3 -4 + .20xs .: .: + lU i 0 .02X69. but also have low protein and carbohydrates.: .43.69xs .43x7 1.13xio The Kellogg cereals appear to have high protein.

039 SntoVnLength 0. ...6% Proportion Correct 0. ~ . .931 Proportion 0.076 ~ 7...163 -0.. ..-.. il .-.. ..-----...vs Ta. ... .. ... .924 . ..d... . . ." .. .:':___::.. 140160 180 liàjlLength OD) Linear Discriminant Function for Groups Female -36.324 11.-..919 N = 66 N Correct = 61 E(AER) = 1 .--.a .----. It appears as if these variables wil effectively discriminate gender but wil be less successful in discriminating the age of the snakes."--" . Sêatterplotof SntoVnLength. .310 TailLength Constant Male -41.. . . . .501 0.... . ..-' _ _.. .429 0.... .d. ~ . . .046 sumary of Classification with Cross-validation Put into Group Female Male Total N N correct True Group Male Female 34 3 37 34 i 27 29 27 0. ..924 = . . . _ _. ~ ~ ~ ~ ~ .35 (a) Scatter plot of tail length and snout to vent length follows... ~ ~ ....d:-'.. .. .

36 -102.60 0.33 SntoVnLength 0.65 0.14 0.2% Using only snout to vent length to discriminate the ages of the snakes is about as effective as using both tail length and snout to vent length.41 Constant -79.44 -145.325 (e) Linear Discriminant Function for Groups 4 3 2 -112.76 0.76 -193.824 0.182 ~ 18.167 ~ 16.38 0.45 0. a18 E(AER) = 1-.48 sumry of Classification with Cross-validation Put into Group 2 3 4 Total N N correct Proportion N = 66 True Group 2 3 4 14 1 3 21 0 4 0 4 17 14 26 21 0.826 N Correct = 54 Proportion Correct o. Although in both cases.765 0. 94 SntoVnLength 0. .913 Proportion Correct N Correct = 55 0.808 0.833 E(AER)= 1-.808 19 23 19 0.818 = .7% (d) Linear Discriminant Function for Groups 2 3 4 -141. there is a reasonably high proportion of misclassifications.11 0.833= .53 Tai lLength Constant sumary of Classification with Cross-validation True Group Put into Group 2 3 4 Total N N correct proportion N = 66 2 3 13 4 2 4 0 2 21 0 3 17 13 26 21 21 23 21 0.

8.93 1. otherwise assign 1.95 1.000 The regression is significant (p-value = 0.0485441 6.= .000) and retaining the constant term the fitted function is In( p~z) ) = 3. OF = 2.000 0.36 Logistic Regression Table 95% CI Odds Predictor Constant Freshwater Marine Coef SE Coef Z P 3.31500 0.841.98 Log-Likelihood = -19. The confusion matrix follows.p(z) Consequently: Assign z to population 2 (Canadian) if in( p~z) ).394 Test that all slopes are zero: G = 99.92484 0. Predicted 1 2 Total 1 46 4 50 Actual 2 3 47 50 7 APER = .62 3. ~ 0 .p(z) z to population 1 (Alaskan).34 Ratio Lower Upper 1.534 0.126(freshwater growth)-. 06 0.049(marinegrowth) 1.001 -3. .22 0. P-Value = 0.07 ~ 7% This is the same APER produced by the linear 100 classification function in Example 11.52 0.925+.0358536 0.126051 -0.0145240 0.13 0.326 11.

es No .6 R-F R-N R-J R-K o . a+d = 3/5 = 60 Pair Coefficient (a+d)lp R-C .1 Democrat Y~s 1 -+ South a) Codes: Yes Repub 1 icanNo o -+ non-South No e.2 .pter 12 12.4 Y.4 .8 .6 .6 .6 F-N F-J F-K J-K .g.4 .6 C-J C-K N-J .327 Cha.4 .Cart~r: i 0 1 1 o o 2 2 t P.6 C-F o C-N .4 N-K . Reagan .

5 .4 .667 .667 .143 .571 3 .2 0 0 4.4 .6 .429 .5 4:5 1 1 1 4.667 7 3 3 1 1 .5 0 .5 14.429 4.6 .5 4.5 4.667 .429 .6 R-J 0 0 0 .5 5 6 7 5 6 .111 14 12 .25 .5 .5 14.75 4.5 14.2 9 14 9 9 14 14 0 .667 .5 .5 14.429 .4 .5 4.5 4.5 '1 .333 6 3 .5 0 .328 12.333 0 .75 .5 10 10 10 Rank Order Coeffi c i ent Pair 4.4 .5 10 10 10 10 10 10 4.4 .8 .4 .333 .75 .5 13 10 4.6 .571 .333 .25 .333 .1 b) RankOr¿~r Coeffi ci ent 1 2 3 1 2 3 .25 .4 .25 3 11 6 3 11 11 1i 6 3 3 3 6 '6 6 .5 4.5 4.571 R-C R-F R-N R-J R-K C-F C-N C-J C-K F-N F-J F-K N-J N-K J-K 13 10 10 4.2 .2 9 9 9 14 14 12 6 12 6 0 0 0 .5 .333 0 .5 4.75 .5 14.5 14.8 .4 12.25 .2 .'5 4.6 .75 .75 .571 .333 .25 .889 .571 .5 4.2 9 9 9 0 14 14 14 . .111 13 .4 .571 0 .571 .429 Pair R-C R-F R-K C-F C-N C-J C-K F-N F-J F-K N-J H-K J-K .5 .25 .25 10 .6 .5 R-N .25 .429 .333 .571 .5 10 10 .

y.i-x.-xP = (a+b)(1-(a+b)/p)2 + (c+d)(O-(a+b)/pF = (e+d)(a+b) r(y. P r(x.-x)(y.-y.i = (a+b)/p¡ Y = (a+c)/p 1 p ._y)2 = (a+c)(1-(a+cl/p)2 + (b+d)(O-(a+el/p)2 = (a+C)(b+d) l' 1 1 1 1 r(x.-y) = r(x.y+xy) p p p1 = a _ (a+c)(a+b) _ (a+b)(a+c) + p (a+b)(a+c) p P = a(a+b+c+d)-~a+eHa+b) = ad-be Therefore (ad-bc) lp r = ((c+dHa+b~~a+C)(b+dl )'. = ad-be ((a+b He+d) (a+c)(b+d))~ .

4 Finally.4 Let c. -+ -+ 3 (D 0 Z (j 0 ~ ri.5 a) Single linkage 2 1 3 4 (12) 3 1 0 (123) (12) 0 3 11 z o 4 3 4 5 4 3 o Dendogram ~ J :i 1. i ncrea s. 1_1) increases as c. 2. =-. '3 'l (123) 4 4 4 o 4 (: oJ .330 12. 1 + .es increases = c. 2 cz so A 1 so t c2 increases as c. a+d a+d c3 =(a+d)+2( b+c) c = 2 p 1 e3 so then c3 = 1 +2( c. Cz = c-1+3 so Cz increases as c3 increases 3 12.

l 12.7 Treating correlations as similarity coefficients.68 S(45)1 = max (S4!. we have: i Single linkage '" A 3 s l.w~ I'. and so forth.6 "3 4 Dendograms Complete Linkage 10 Average Li nkage .18. i 1 I I i i . 1.3 ~t j .r 5. S'(45)3 = .-' ~ .-:3 . Item 3 is somewhat different from the other items.Single Linkage e (¿ 4- .g.) Average Linkage Compl ete Li nkage Dendoaram Dendoaram 10 S e 4 ~ "3 2- Ll 1- :i :: :3 4 . 4 :i 5'3.2 1.32.5 b) c. . 12.1 ~ . 1. S51) = . i-i -.. 4 .S1 Jrr- .(45)2 = .331 12. I i S45 =. All three methods produee the same hierarchical arrangements..16 .2 S '5. 4 :z !.

.5 a 4 6 CD 8.1 Both methods arrive at nearly the same clust. S5i) = .es r~sults si~. 0 .12 S(45)2 = .g_. c.Sl--S: S(45)1 = min (S41.21..332 i2~4S Complete linkage S45 = ..15.:: tt Average linkage pr~uc. e3 . and so forth. ~-.1 ..5 a 1. :3 s.l¡r to single linkage.68 . 12.'3 . S(45)3 = . 4 .8 1 2 3 4 1 0 2 9 0 3 3 7 0 4 6 5 9 a 5 11 10 CD 8 5 1 -+ 2 (35) 1 0 2 9 0 (35 ) 7 8.eri ng.

.2 '4 S" .) :: 1 1. · :i _i- 5' LL . .S I .3 :L i i. (Note different vertical 2 scales.: . nkage a A1 though the vertical s~a les are differ~ntt all three linkage methods produce the same groupings.i :3 cf s- Average L." ~ n .0 t .::3 4 S' '1 . · 4.9 Dendograms Singl e Linkage COl'pl ete Linkage 1.333 12.

8)2 = o.5 -.11 K = 2 initial clusters (AB) and (CD) Xl xi (AB) 3 1 (CD) 1 1 Final clusters (AD) and (Be) Xi (AD) 4 (BC) 0 xi 2..0 (4). ESS3 = -(5 .5 t 4).0 (23)(24)(2)- (3)-24.10 (a) ESSi = (2 .5)2 = 0.25 45.8.18.25 3.5 Squa red di stance ~ntr.334 12.4)2 = 30 12.25 to C 9 ro up 0 27. (b) At step 2 Increase in ESS Clusters t 13)- (3)(2)- ( 14)- (2) t 12)- ( 1) ( 1) ( 1)- (4).2)2 = 0.25 3.7 Finally all four together have ESS = (2 .4)2 +(5 .oids C1 us ter (AD) (BC) A 8 3.4.25 29.5 (3).5 (c) At step 3 Increas Clusters in ESS t 12)- (34)- ( 123)- (4) 5.25 .25 3. and ESS4 =(8 . ESS2 = (1 .5 ~34).4)2 + (1 .4)2 + (8 .4.1)2 = 0.0 8.25 11.

11.5 (BD) -2 -. this result is the same as the result in Example 12. A graph of the items supports the (A) and (BCD) groupings.335 12.13 K = 2 initial clusters (AB) and (CD) Xi x2 (AB) 2 2 (CD) -1 -2 Final clusters (A) and (BCD) (A) (BCD) Xi x2 5 3 -1 -1 Squared distan~e to group cen troi ds Cl uster A (BCD) B C 0 01 40 41 89 52 41 51 A 51 The final clusters (A) and (BCD) are the same as they are in Example 12.5 Squared di stance. 12.12 K = 2 initial clusters (AC) and (BD) Xl x2 (Ae) 3 .11. . and only. In this case we start with the same initial groups and the first. reassignment is the same. to group centroi ds Final clusters (A) and (BCD) C1 uster I Xi x2 (A) 5 3 (A) (BCD) A B 0 52 C 0 40 41 89 4 5 5 (BCD) -1 -1 As expected. It makes no difference if you star at the top or bottom of the list of items.

8 151.5 51.3 82.178.336 12.2 290.3139.0 0.6 227.6100.7 59.6 288.5 36.8 183.0 C18 265.2 186.3 50.0 99.9 82.3 28.5 50.1 113.6 101.7 O.7 321.5 70.1 152.1 38.4 42.6 103.4 C41 197.0 120.4 209.6 141.9 112.5 82.2 C31 78.8 C17 23. 6 54 .2 163.6 100.3 19.6 180.0 111.0 94.7 115.'6 25.1 0.8 C13 C14 C15 C16 C17 C18 C19 C20 C21 ~22 C23 C24 C13 0 .0 C6 72.9 60.8 51.9 C42 191.4 70.3 38.9 51.8 187.1 122.2 181.7 69.1 51.1 1'02.1 251.1129.1 85.6191.9 207.4 177.1 46.3 190.2 44.0 105.0 C12 42.2 105.0 C4 6 .7 174.6 160.1 60.7 130.8 108.6 90.~ 209.8 55.7 C22 98.7 99.0 102.1 0.5 81.8 63.3 154.9 1'01.2 14.9 C33 143.3 67.7 131.8 92.0 84.14.6 96.6 76.5 92.0270.4267.1 C30 33.5 103.0 C15 213.9 101.5 43.7 76.0 36.3 13.8 120.6 90.474.8 49.7 96.0 C13 163.7 17.7 90.2 112.7 C43 185.9 138.0 C17 173.8 177.9 148.8 127.4 68.4 37.2 148.7 132.4 51.0 43. 8 32 .0 C16127.4 51.0 C16 46.3 69.0 61.9 210.5 44.7130.8 100.9 148.9 C3S 116.9 62.8 59.2 30.6 62.0 C36 114.9 168.3 114.0 C14 46.8 65.9 C15 60.2 173.2 51.3 93.4 0.6 89.4 65.9 54.1 177.4 52.1 194.6 78.8 174.0 ~5.4 270.3 130.3 39.0 103.8 215.3 52.7 0.5 50.0 123.9 123.2 21.1 90.0 113.2 103.9 32.5 1.2 155.2 322.6 288.7 218.5 113.6278.5 59.7 160.844.2 31.b 294.0 68.5 194.9 60.8 122.2 90.8 145.5 121.9 60.5 42.8 83.6 203.7 120.2 54 .2 C37 53.3 14.9 61.4 136.9 283.0 235.1 268.2 79.3 200.5 C25 49.2 31.7 26.5 61.5134.0 C19 212.5 84. 0 C8 15.8 60.5 139.6 106.9 90.7 221.3 0.9 121.9 59.9.8 82. 4 117.7 83.9 27. 0 0 .5 127.8 60.9 61.3 170. 1 65 .1 91.3 44.3 67.2 51.6 102.2 133.6 75.4 112.9 220.5 44.8 205.3 28.4 1.1 75.1 181.1 230.1 21. 0 C14 127.0 56.4 173.6 43.4 C29 107.1 301.8169.454.4 113.2 1-04.6148.4 110.2 99.0 .8 138.0 21.7 C27134.2 110.2 186.3 190.6 21.6 194.0 155.7 125.9 81.6 71.1 101.6 21.3 0.2 185.4 58.1 22.8 270.5 90.4 10.0 120.6 113.8 170.7 78.1124.0 157.2 100.1 64.2 189.6 144.8 290.9 52.8 55.4 71 .6 143.4 181.3 142.0 C3 15.6 C38 54.1 208.0 141.6 250.7 214.1 78.5 75.3 48.8 121.4 0.5 11'0.7 114.1 0.2 68.9 80.1 105.2 63.6 121.1 170.4 92.5 45.0 268.2 C23 58.7 94.0 CL0 54.6 0.8 129. 20.4205.2 60.7 C32 32.5 155.5 111.1 112.3 150.8 60.7 136.8 90.2 14.8 16.1 54.8 51.1128.8 309.2 109. 5 0 .5 C19 68.5 65.1 280.4 121.0 54.0 C18 134.7 111.1 41.2 139.1113.7 85.9 0.1 43.9 69.2229.0 C9 46 .5 45. 9 10 .3 190.6 260.6206.3 C40 40.5 170.8 56.3 111.2 105.2 155.0 217.5 159.1 89. 2 72 .0 59.9 233.0 183.1150.6 C21103.4 0 .3 51.4220.5 47.9 190.8 C34 173.6 81.2115.2 61.3 34.8 122.9 60.7 C24 68.7 136.8 286.3 116.5 86.4 112.3 186.9 22.3 237.7 C26 182.3 196.6 C20116.2 70.5 68.8 71.3 42.6140.8 285.4 45.1 141.3 157.1 158.0 C2116. 9 75 .3 121.9 16.1 112.8 111.3 280.2111.7 198.7171.6 41.5 120.8 36.4 267.7 189.7123.2 220.0 65.7 111.1 68.5 90.7 51.9 148.0 61.2 221. (a) The Euclidean distances between pairs of cereal brands CL C2 C3 C4 C5 C6 C7 C8 C9 CL0 ~11 C12 CL 0.9 C28 16.2 29.8 101.5 63.7 187.6 103.7 210.0 C5 103.3 60.0 0.1 62.7 92. 7 48 .6 94.1 100.7 87.6 44.9 82.2 102.~ CL1 81.0 C7 86.5 49.5 36.1 100.8 C39 48.5 183.0 0.

0 195.1 257.5 71.0 C24 212.278.5 175.7 231.2 0.3 80.2 73.8 C42 210.9 132.0 152.9 148.7 C32 150.2 295.7 122.2 66.6 27.2 128.8 204.7 74.4 95.3 148.5 140.0 C41 241.3 81.6214.7 138.7 58.7 78.2 286.9230.7 C30 191.663.4 324.0 C23 204.8 C27 67.0 28.9 341.8 0.0 C34 30.8 150.8 244.4 51.9 83.3 190.7 97.8 201.0 C22 91.3 148.8 0.7 C33 230.1 88.2.4 93.8 200. 3 C35 221.0 170.1 135.1 132.7 173.2 108.1 229.8 53.2 81.8 96.3 66.1132.0 236.0 C35 91.6 62.5 283.1112.5201.4220.8 110.1 232.9 120.1 60.1 214~1 131.9 297.0 56.7 66~9 91.1 71.5 C40 86.4 225.1278.4153.7 C39188.6 34.6 44.6 227.7231.6 88.1 335.6 158.5234.2 0 .9 188.9 0.4 282.1131.0 110.7 149.5 228.0 123.4158.0 197.5 110.7 87.4 80.1 0.2104.8 148.7 0 .9 150.4 C28174.7 70.3204.5 140.6 96.2 186.2 94.1 195.4141.6 196 .2 143.0 105.4 160.8 216.4 315.6 165.9 112.0 96.4 80.9 130.4 63.7 250.3 375.5 324.2 36.8 49.0 C43 237.8 107.2 116.132.4 57.3 37.4 178.3 170.7 59.2 91.5 53.3297.4 C38 24.7 64.0 C39 20.0200.5 102.2225.8 66.6 31.0 86.0 98.3 180.7 161.1 0.6 30.5 121.0 C40 90.7 0.3 158.0150.1227.8 79.7 197.2 226.0 0 .6 95.0141.6 21.5 47.1 197.8 180.9207.036.8 295.2 75.0 113.0 59.1 103.9 210.0 136.6 1.5 163.8117.6 62.1 93.3 72.2 164.6 52.0 C43218.2 0.5 172.7 93.7 16.2 205.3 78.9 231.5 81.3 72.8 C37182.1177.5 83.8 294.3 200.4 132.2347.3 151.8 C38198.2 264. 0 C38 27 .8156.4 164.2 151.6 33.1 C41 209.1 C41 301.2233.0 C37 43.4 108.2 117.0 93.5 75.1 227.6136.4 24.3 165.8 200.4 91.7 231.5141.2 179.2 0.9214.8 81.3288.9 209.4300.5 79.8 ~4.7 104.5 0.5 130.6 181.7 178.6 41.9 152.2 0.1 97.0 C32 75.6 C25 C26 C27 C28 C29' C30 C31 C32 C33 C34C35 C36 C25 0 .0 221.5 141.1252.4 156.2 30.6 203.1 270.6 140.6 195.1 92.8 102.3144.8 226.8 401.4 135.6 142.9 C37 C38 C39 C40 C41 C42 C43 C37 0 .0 C36131.7 81.2 325.9 117.9 305.6166.7 230.3 157.9 71.8 148.9 96.4 36.7 96.6 71.1 114.1 201.4 186.0 170.1 193.0 C21 234.6 95.1 248.5 0.2 70.5 37.5' 63.3122.1 89.9 137.9 75.2 342.1 131.5 C39 30.5 196.8 0.5 35.9 C29 83.1 162.8 107.3 121.7 108 .5 172.1 200.1 C42 277.2 0.2 227.3220.7 146.1 96.0 143.7 49.4 144.3142.0 121.3 46.2303.0 251. 0 C26 C27 C28 C29 213.7 C43 229.3 98.0 148.5 86.3 226.9 30.6 132.7 140 .2 140.0 46.7 72.3 159.3 204.2 79.7 C40146.7 131.4 0.3 92.8 136.7 194.1 213.4 101.0 152.6 96.3 30.3 250.1 292.0 C33 122.8 167.1 83.2 C36 226.7254.0 60.6 150.7 71.6 70.9 88.8 74.8 24.5 103.9 153.337 C20 223.2 214.8 0.7 300.4 248.2 79.9 165.8 52.0 296.6 27.3 91.6 101.3327.1 102.3 184.0 O.7 107.1 0.8 140.2 151.7 152.5 100.6 71.8 149.5 225.2 58.0 C25207.0 C31 111.1 78.1 35.1101.5 54.7289.7207.9 17.3 141.2 17.3 C26 233.~ C34 215.1 227.4 70.1 233.3 55.9 74.2 10.3 73.1292.0 C30 20.3 '204.8 C31104.8 297.7 124.9 133.9 34.2 117.5 253.3 31.3 94.1 153.0 210.1 147.1 108.8 154.6 117.5 102.8 221.2 136.1 322.2 36.8 10.9 23.4166.1 177.8 139.0 108.2 149.8 151.1 0.5 141.6 293.8 81.8 50.4235.0' C42 237.7 56.9 214.5 131.7 126.

N.. 00 o "'."" õú ~" ::~ 0l 00 . _N 00 .. Oõ MI ot.. Single linkage ~ "" ì3 g g ~ o N CO...338 (b) Complete linkage produces results similar to single linkage. Complete linkage 8.. 8 '" "" ì3 ~ o . 8..

34 4 Single C1 C2 C3 1 1 1 0. Final cluster centers 1 2 3 4 1 110.5 10.9 215.5 3.1 0.3 7. 7 2.9 1.15.8 55.c13 C22 C27 C29 C34 C15 .4 4 clusters K = 3 C1 C2 C3 Distances between centers 1 2 3 4 2 2 2 3 3 3 1 1 1 1 1 1 1 C10 1 C11 1 C12 1 C13 1 C14 1 C15 1 C16 1 C17 1 C19 1 C20 1 C21 1 C22 1 C23 1 C24 1 C25 1 C27 1 C28 1 C29 1 C30 1 C31 1 C32 1 C33 1 C34 1 C3S 1 C36 1 C37 1 C38 1 C39 1 C40 1 C18 18 C26 26 C43 26 .2 0.0 2.8 245.7 15. C20 C23 C24 C25 C28 C30 C31 C32 C33 C35 C37 C38 C39 C40 C21 C26 C36 C41 C42 C43 C11 C13 C18 C22 C27 C29 C34 K = 4 1 CL 1 1 C2 C3 C4 C5 C6 1 1 C4 C5 C6 C7 C8 C9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C7 C8 C9 C12 C15 C17 C19 C20 C23 C24 C25 C28 C30 C33 C35 C37 1 1 1 C4 C5 1 1 C6 C7 C8 C9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C3B 1 C39 CL0 C11 1 2 C14 C16 C22 C29 2 2 2 2 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 2 C31 C32 C40 C21 C2q C36 C41 C42 C43 C13 C18 C27 3 3 3 4 4 4 (.1 0.1 3 86 .7 275.1 1.0 1 2 86.0 K-means K = 2 1 CL 1 2 C2 C3 C4 C5 C6 C7 C8 C9 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 1 1 1 1 1 1 1 C10 C12 C14 C15 C16 C17 C19 C20 C21 C23 C24 C25 C26 C28 C30 C32 C33.8 12. 5 26.8 5.7 171. C35 C36 C37 1 1 1 1 C3B 1 C39 C40 C41 C42 C43 C11 C13 C18 C22 C27 C29 C31 ~34 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 C10 C12 C14 C15 C16 C17 C19 .0 for K = 4 5 6 7 8 0.0 5.0 6.0 4 195.c19 C21 C24 C26 C36 C41 C42 .4 3.4 10.0 162.2 0.0 3 190. In K-means method.4 132.7 4 112. 3 o.0 2.0 .6 123.9 50.0 2 114.c41 41 C42 41 Complete C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C12 C14 C16 C17 C20 C23 C25 C28 C30 C31 C32 C33 C35 C37 C38 C39 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ~40 1 CL1 11 11 11 11 11 11 15 15 15 15 15 15 .339 12.c43 Ci8 1S 15 15 18 0. we use the means of the clusters identified by average linkae as the initial cluster centers.8 225.8 15.

37. countries 11. and countries 4. .16 (a). depending on the distance level. 43. (b) Dendrograms for single linkage and complete linkage follow.340 12. 40 and 46 form a group at a relatively high level of distance. in both procedures. The dendrograms are similar. 27. The clusters are more apparent in the complete linkage dendrogram and. might have as few as 3 or 4 clusters or as many as 6 or 7 clusters. as examples. 25 and 44 form a group at a relatively small distance.

341

(c) The results for K = 4 and K = 6 clusters are displayed below. The results seem
reasonable and are consistent with the results for the linkage procedures.

Depending on use, K = 4 may be an adequate number of clusters.

Data Display
Countr ClustMemK=6

ClustMemK=4

1
2

6

2

2

4

3

4

1
4

4

5

3

1

6
7

6
6

2
2
2

8

1

9

4
1

10
11
12
13
14
15
16
17
18
19
20

21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

44
45
46
47
48
49
50
51
52
53

54

5
3
2
6

3
6
6
2

2

2
3

1
4
2
1
2
2

1

2
1

1

4
1

4
4
4
6
6
6
3
3
2
1

Numer of

Cluster1
Cluster2
VCluster3
Cluster4

Within

Average

Maximum

sum of

from

from

298.660
318.294
490.251
182.870

4.494
3.613
11.895
2.681

9.049
6.800
16.915
7.024

Wi thin

Average

Maximum

sum of

from

from

490.251
128.783

11.895
2.669

5.521

cluster distance distance

observations squares centroid "Centroid
11
20
3

20

4

4

1

4

4

4
3
6

Numer of clusters:

2

2
1

Numer of clusters:

4
2
4
4

4
2
2
2
1
1

4
1

4

4

6
4
5
3

2
4

2
4
2
2
5
1

4

3

1
4

4
4
3
2

6
6

4

6
1

4

3
6
2

1

2
1
2

4

6

Numer of

cluster distance distance

observations squares centroid centroid
4.008
2.884
90.154
10
Cluster1
2.428
1.
ti3
22.813
8
Cluster2
6."651
3.346
116. S18
8
Cluster3
5.977
2.513
78.508
10
Cluster4
16.915

¡.lusterS

Cluster6

vi IdcMl.c.a,\

3

15

342
12.17 (a), (b) Dendrograms for single linkage and complete linkage follow. The
dendrograms are similar; as examples, in both procedures, countries 11 and 46
form a group at a relatively high level of distance, and countries 2, 19,35,4,48
and 27 form a group at a relatively small distance. The clusters are more apparent

in the complete linkage dendrogram and, depending on the distance level, might
have as few as 3 or 4 clusters or as many as 6 or 7 clusters.

cOoø . ... , .. ,'~.. .,., , . ' , " . '..' ,

, . ,'~~~~~~~"~"1'\~~~'~'

, Countries

343
(c) The results for K = 4 and K = 6 clusters are displayed below. The results seem

reasonable and are consistent with the results for the linkage procedures.
Depending on use, K = 4 may be an adequate number of clusters. The results
for the men are similar to the results for the women.
Data Display

Country Cl us tMern=4
1
2
3

2

4
5
6

7

2

ClustMem=6

2

2

4
4

4
1
4

1

3

4

6
2
1

8

2

9

4

2

10
11

2
3

2
5

12
13

2

1

2

14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

2

1
2
3

1
2

2

4
4
4

6
4

1

2

1
1

2
1

3

2
1
4

2

4
4

6

4
4

6
4
2

30
31
32

2
2

1

33

1

3

34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

1
4

4

1

3

4

4

4

2
2
3

49
50
51
52
53
54

2

4
3

1

4

Wi thin

Average

Maximum

cluster distance distance

Numer of sum of
from
from
observations squares centroid centroid
Clusterl
10 169 .042
3.910
5.950
Cluster2
21
73 .281
1. 684
3.041
VCluster3
2
49 .174
4.959
4.959
Cluster4
21
56 .295
1. 481
3.249

Numer of clusters:

4
1
2
5
3

2

4
2

4

2

2
1
2

1
3

2

4

6

Average

Maximum

cluster distance distance

Numer of sum of
from
from
observations squares centroid centroid
Clusterl
12
26.806
1.418
2.413
Cluster2
15
18 _ 764
1.048
1. 844
Cluster3
10 169 _ 042
3.910
5.950
Cluster4
10
10.137
0.935
1.559
vkluster5
2
49.174
4.959
4.959
Cluster6
5
6 _ 451
1.092
1.£06

3

3
2

6

wi thin

1

4

"4
4

4

4

2
2

2
1
1

Numer of clusters:

/.U-t~\'CA

12.18.

344

St~s

.1

(.

'4 ì

io

.1
(.0''7)

i

h

~

.2

. Superior

North

r
.

The multidimensional scaling
configuration is consistent
with the locations of these cities
on a map.

St. Paul t MN

Marshfield

.

,

Wausau

· Appleton

Dubuque, IA

'.

.

Monroe

,.

· Ft. Atki nson

.'

.

Be 1 0; t

· -ch

i1

Mi lwaukee

ago , It

345

12.19.

The stress of final configuration for q=5 is 0.000. The sites in 5 dimensions and the
plot of the sites in two dimensions are
COORDINATES IN 5 DIMNSIONS

v~ABLE

--------

PLO

DIMENSION

--------1

P1980918

A

.51

B
C
D
E

-1. 32

P1SS0960
P1S30987
P1361024

Pl3S100S
P1340945

G

P193ll31

Pl3ll137

P1301062
DIMENSION

F
H

I

.47
.39
.23
.47
.58

-1.12
-.22

2

-.28

3

4

5

-.68
.12
-.05 -.02
.69
.30
.06
-.07
.34
.09
.10
.05
.30 -.32
.12
.14 -.22 -.14 -.28
-,35 .46 .18 -.10
-.31 .~S - .01
-1. 12
,61 -.70 -.06
.01
.24
.62
.19
.05

2I
+
+
I
I
1I
+
+
I
B
I
E
DF
I
oI
+
C
+
I
AG
I
I
I
-1
+
H
+
I
I
I
I
-2
+
+
-2 -1 0 1 2
2

-+------------ --+--------------+--------------+--------------+-

-+--------------+ --------------+ --------------+--------------+-

DIMNSION 1

The results show a definite time pattern (where time of site is frequently determined

by C-14 and tree ring (lumber in great houses) dating).

346
12.20~ A correspondence analysis of the mental health-socioeconomic .data

A correspondence analysis plot of the mental health-socioeconomic-data

Ex

;\12 = 0.026

It
C\

Ò

a Impaired

..It
ò

Ox

It
0

ò
U

...~..~~.~.~.~~t:........................................... L..............:...... ......._.Ç..~...

1.2-0.0014 8- I · MUd

~
9
It
..
9
Ax

It
C\

9

a Well

-a.07

-a.05

-a.03

-a.01

0.01

0.03

c2

u

v

-0.6922 0.1539 0.5588 0.4300
-0.1100 0.3665 -0.7007 0.6022

-0.6266 -0.2313 0.0843 -0.3341
-0.1521 -0.2516 -0.5109 -0.6407

0.0411 -0.8809 -0.0659 0.4670

o . 0265 0 . 5490 0 . 5869 -0. ~756

0.7121 0.2570 0.4388 0.4841

0.4097 0.4668 -0.5519 -0.2297
o . 6448 -0. ~032 0 . 2879 -~. 3062

lambda
0.1613 0.0371 0.0082 0.0000

Cumulative inertia
0.0260 0.0274 0.0275

Cumulative proportion

o . 9475 0.9976 1.0000

The lowest economic class is located between moderate and impaired. The next lowest

das is closetto impairro.

347

12.21. .A correspondence analysis of the income and job satisfaction data
A correspondence analysis plot of the income and jOb satisfaction data

~ $50,000 c ..II ò ..0 ò VS x II$25.000 . $50,000 c 0q q0 ......j..i.;;.ö:öööï..................................r......................................... u II0 9 so l( MS x c:$25.000 c

~

9

II
N

9 vp
-o.Q5

-0.025

-0.005

0.005

0.015

c2

u

V

-0.6272 -0.2392 0.7412

0.2956 0.8073 0.5107

-0.6503 -0.6661 -0.3561
-0.1944 0.5933 -0.7758

o . 7206 -0.5394 0.4356

-0.3400 0.3159 0.2253
0.6510 -0.3233 -0.4696

lambda
0.1069 0.0106 0.0000

Cumati ve inertia
0.0114 0.0116

Cumulative proportion
0.9902 1.0000

Very satisfied is closest to the highest income group, and v€ry dissatisfid is b€low the
lowest income group. Satisfaction appears to in'Cl'ease with income.

......... n .C' I -0.Uï...........6 -0.....537 C! Ironwood 59 D Sugai.1'..Maple x S10 D ¡àasswood x CD d co d SSe "# d S7e C\ d ........4 -0...Ö.......12 = 0..u 0d RedOak x '.... CD 9 S2~eS1 'I BlackOak x .3 0.......est data ¡ ì.. A correspondence analysis of the Wisconsin forest data A correspondence analysis plot of the Wisconsin for...Ö96 AmericànElm x....n .0 0.4 0.348 12...2 0....... .5 0.. WhiteOak x "# SSe 9 S4e BurOak....2 0.1 0... . .....22.6 c2 ..

2172 -0.3369 0.1685 0.0291 0.9950 1.4005 0.3943 0 .1607 0.9155 0.4907 V -0.2464 -0.0582 0.1478 0.3772 -0.0644 0.349 U -0.4071 -0.5090 -0.0394 0.3181 -0.4689 -0.0831 -0.3622 0.5382 -0.7506 0.0674 0.3835 -0.8219 0.4089 -0.3889 0.7700 Cumulati ve proportion 0.3856 -0.4080 0.3606 0.3366 0.3495 -0.4781 0.4079 -0.5895 0.0164 -0.4562 -0.4271 -0.4391 0. 5400 0 .1567 0.3217 0.5154 0.0625 -0.7662 0.5142 -0.4850 -0.0616 0.0518 o .0623 0.2897 -0.2685 0.1108 0.2428 -0.1167 0.4345 -0. 2687 -0.2134 0.0000 Cumulati ve inertia 0.1520 -0.3412 0.3877 -0. 2668 -0.1866 -0.0978 -0.1726 0.3634 -0.9891 0.2745 0.0738 -0.1999 0.326S 0.1596 -0.1355 0.3192 -0.2333 0.2343 -0.0763 -0.1598 -0.1808 -0.0000 .6026 -0.2476 0.6298 0.5994 0 .1821 0.0756 -0.5327 -0.1165 -0.5718 0.0698 0.0772 -0.0042 -0.1272 -0.1381 o .3394 0.2267 0.4771 0.0937 0.3511 -0.0540 -0. 20020 .3904 -0.7086 -0.6644 -0.0820 -0.2507 -0.1968 -0.7616 0. 1262 -0.3310 -0.6122 -0.0307 -0.0377 0.0826 -0.3393 -0.4247 -0.3294 -0.3232 -0.0345 -0.9747 0.4200 -0.0831 0.5817 -0.7050 0.3140 0.1955 0.0151 -0.4160 -0.2646 0 .6573 -0.1052 0.5587 -0.3006 0.0106 0.3150 0.1456 0.0544 -0.4138 -0.6329 0.2108 -0.2635 -0.3484 -0. 4626 0 .1950 -0.3258 0.4260 0. 2022 0 .6970 0.3101 0.1852 -0.0726 -0.4029 -0.3412 lambda 0. 3549 -0.3420 -0.4985 0 .7326 0.1590 -0.4856 -0.0925 -0.0006 -0.5367 0.

0025 -0.0000 1 As in the ~or~esondence analysis.0051 0.0091 0.0059 -0. Dev.0084 -0.3128 0. 0. of Vax.5 -0.350 12.23' We construct biplot of the pottery type-site data.7694 0.0390 -0.0025 -0.6932 0 .0390 -0.6769 0. 0. 5 Eigenvalues of S 0.9373 1.1374 -0.2604 0.0223 0.0187 -0.0061 -0.0952 0 Prop.0376 0.5 O.0187 0.6233 0. Eigenvectors of S S 0.0051 0.0511 -0.1951 -0.5 0.0059 0.5853 0.2385 -0.3464 0.0978 0.0627 0 Cumulati ve Prop. 1396 -0. .0061 0. 5000 -~.6769 0.0064 -0. with row proportions as variables.8325 -0. 0.0628 -0.1940 0.0000 pel pc2 pe3 pe4 St.

8320 0.9969 1.004021 -0.000284 0.000284 -0.000024 0.3628 0.003089 0.007314 0.003485 -0.05 0. A i"eflction about the 45 degi-ee line would make them appear more alike.24.05 0. c:i Moderate E oo 0 c:. A bipJot of the mental health-socioeconomic data -0.000000 pc1 pc2 pe3 pc4 St.351 12. 0. Dev. .000379 0.0614 0.10 -0. 000329 -0.15 0. 0.000853 -0.000480 0.000318 -0.003485 o .0 0.000809 -0.7033 0. 000809 0 .0 0. vVe construct biplot of the mental health-socioeconomic data.000318 0.5676 0.9355 0.5 Eigenvalues of S 0.9355 0.10 0co oo 0 c: c: oo 0 c: D C\ 0 C c: 0C\ Mild C\ ci E 0 () c: mpaired 0 0 Well c: c: A C\ 0 C\ 0 c:.1685 0.000853 0.7379 0. co 0 c:i -0.0794 -0.5 0.ô487 0.0031 0 Cumulati ve Prop.10 -0.0837 -0.2719 -0.000413 -0.4764 0.05 0.5 '~0.0219 0.000413 -0.0000 1 The biplot gives similar locations for health and socioeconomic status. of Vax.5 0.0049 0 Prop.05 0.15 -0. 0. with column proportions as variables.0855 0.10 Compo 1 S Eigenvectors of S 0.2270 0.

0 i i 0.352 12.0 1.- P1931131 P1301062 0 ia _ c0 iic ai E Pl361024 P~7 C! _ 0 PI55096 Õ pP198il18 1340 5 "C c0 u ai en ~C! . A Procrustes analysis of archaeological data A two-dimensional representation of archaeological sites produced by metric multidimensional scaling C! _ .- P130106 P1931131 o ia _ oc iic Q) P1351005 P1361024 o c) P1530987 P155090 E Õ "C C ou ai Il Pl340945 c)I - en P19B018 o P1311137 -7 - ~I I -1.5 2.0 i I i i i -0.0 .5 1.0 0.0 First Dimension A two-dimensional representation of archaeological sites produced by nonmetric multidimensional scaling C! _ .0 -0.25..5 1.5 2.5 0.-f P1311137 ia .0 First Dimension -T 1.5 0. -1.

096 0. 0679 -0 . 9977 -0. 262 -0.9969 4.469 0.071 -0.803.470 -0. 137 0 .387 0.409 1.715 To better align the metric and nonmetric solutions.122 1.703 -0.276 -0.889 -0. we multiply the nonmetrk scaling solution by the orthogonal matrix Q.318 0. 379 -0.963 -0.0784 0. 989 0.512 -0.692 1.756. is reduced to the Procrustes measure of fit P R2 = 0.9893 -0.048 -0.156 -0.9969 0. .829 1.1459 0 . 0679 O. 0.137 -0. 118 -1.000 0. After rotation.387 -0.278 Nonmetric MDS -0.642 0.338 -0.608 0.0000 2. 9977 Q Lambda u -0.545 -0.9893 v -0 .216 0. 1459 0.7819 0.469 0.the sum of squared distanc.e.088 -0.5 degrees.581 -0. .296 -0.234 0.353 Site P1980918 P1931131 P1550960 P1530987 P1361024 P1351005 P1340945 P1311137 P1301062 Metric MDS -0.349 -0. This corresponds to clockwise rotation of the nonmetric solution by 4.0784 -0.

. c.....'..:.......354 12. .. .37 428.c...-'. as 7 or 8 clusters." " -."...':....:. The dendrograms are similar but a moderate number of distinct clusters is more apparent in the Ward's method dengrogram than the average linkage dendrogram....:....... ............:. .. Malilf=aI1i1Yi....' ....:........Fatms 79.'...... . ..OO"f~W~"~~~~~'\~ Fars .-.:- . ':'. .. .._':. _ _ ... Reading the "right" number of clusters from ....:. . Both dendrograms suggest there may be as few as 4 clusters (indicated by the checkmarks in the figures) or perhaps as many either dendrogram would depend on the use and require some subject matter knowledge..: .. __ .' 643....::..:.:.... .. .26 The dendrograms for clustering Mali Family Fars are given below for average linkage and Ward's method.'.'.....:. .'..... ... Euclidean Distance.".."...91 Fars ...." _:' .....kage.Waíf ..... .. .: "p"-. .. Average Linkage. .. -:.. .... ..-..M.... EUdit:eanOisteince..-'" ......' '".'..-. '...:' -..".... .......43 O..

Mali:fiamil.":-::.Eu(¡ndean.'.'._.f~rm$h('5tandat4jlCc-: 8..i..--...--._:-.'"'..'..-.. The distinct clusters we focus are more clearly delineated in the Ward's method dendrogram and if attention on the 4 marked clusters.:-::-_..:-..".355 12.'.' ".:-:::.-:.68 'e ..-.lIsll.'. _.-i'..'--..-.figtjifi)J~lidean Di$t..'. we see the two procedures produce quite different results.51 ..-_.03 uCD i: 5.27 If average linkage and Ward's method clustering is used with the standardized Mali Family Farm observations..".-...'~oall~~ ."... .'..-'.-...':-.'.--.._::--X_. The dendrograms follow.--.'.'--.-.'.-...-.'..68 ."..d.'::-'.--. .." .---:.36' Û Q 2..__....--. the results are somewhat different from those using the original observations and different from one another.-':.-.----_._.'..-'-.'.jnk~lIe..._...~~'\~-:.:._-..:..8 M ¡ .-:---.--...- ~ërilge t.-.-_--:.-'-.'.-.'.. I 29. There could be as few as 4 clusters (indicated by the checkmarks in the figures) or there could be as many as 8 or 9 clusters or more._.'-'.-~1~¥?TFars W~td..Llnk. '-...'.--..--__:. MållFamilyFafm!i 44...

418 .474 24.è S ö~ K 21.uster5 Cluster6 observations 4 11 35 12 8 2 cluster distance distance from from sum of squares centroid centroid 3 2 3 2 3 3 3 2 2 3 3 3 2 2 3 3 1 2431.647 Numer of clusters: 6 Numer of Cluster1 v-luster2 i.619 Wi thin Average Maximum 696.511 8.26.030 22.024 19.878 9.luster3 vCuster4 L./ rd~lGL1\ for t-wo CttOl(.609 4440. there is no change in the other clusters. Note as the number of clusters increases from 5 to 6. 053 16.28 The results for K = 5 and K = 6 clusters follow.053 16. The results seem reasonable and are similar to the results for Ward's method considered in Exercise 12.330 3298.125 13. 1 and 6.076 24. Although not shown.072 15.539 1129.083 1943. K = 4 is a reasonable solution as welL.156 3 3 4 1 1 8 Maximum 3 3 4 4 2 veuster4 \/luster5 6 11 35 12 Average 5 4 4 4 4 Clusterl vèluster2 veluster3 2 3 5 5 Numer of observa tions Wi thin cluster distance distance from from sum of squares centroid centroid 5 3 3 Numer of clusters: 5 4 4 4 2 1 1 1 s- .619 22.094 4440.005 19.878 9.072 15.156 1005.647 21.539 1129.083 1943.330 3298. in the K = 6 solution.356 12.024 19.418 15.498 19.ata Display Farm ClustMem=5 ClustMem=6 1 1 2 2 1 2 3 3 3 3 3 4 ~ 5 5 1 7 4 8 9 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 cE 2 4 3 3 3 3 :: 5 4 4 2 4 3 3 3 3 3 2 3 4 4 3 3 5 5 3 3 3 3 5 2 3 3 5 5 3 3 3 3 3 2 3 3 3 3 3 3 3 3 4 4 5 3 4 4 5 3 3 3 3 4 5 4 3 3 3 3 2 3 2 3 3 3 2 2 3 3 3 2 2 3 5 3 5 ! 18. D.511 8. cluster 1 in the K = 5 solution is paritioned into two clusters.030 33.

703 4.357 follow.071 1. Data Display Farm SdC1usMem~5 SdC1usMe=6 1 2 1 5 1 3 3 3 3 5 6 5 5 1 7 3 8 9 3 1 3 3 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 5 3 Numer of clusters: 5 3 3 3 3 3 5 4 4 4 4 3 4 3 3 2 3 3 2 4 4 4 4 4 3 3 3 6 3 3 3 4 2 3 4 4 3 3 3 3 3 3 3 vfuster1 4 3 3 6 3 3 3 42 43 44 45 46 4 4 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 3 3 3 3 3 '68 '69 70 71 72 3 2 5 4 4 4 4 5 3 3 7 Numer of clusters: 2 3 3 4 35 20 vcuster5 4 3 2 3 Average Maximum 14. These results using standardized observations are somewhat different from the corresponding results using the original data.970 1.954 2.568 3.482 1. 228 65. there is no change in the other clusters.960 1.727 55.050 56. Note as the number of clusters increases from 5 to 6.318 84. 183 1.259 1. 568 2. It makes a difference whether standardized or un-standardized observations are used.727 1.S o( K 1.29 The results for K = 5 and K = 6 clusters are similar to the results for Ward's method considered in Exercise 12.288 2. 951 3.050 56.259 3. 806 2. clusters 3 and 4 in the K = 5 solution lose 1 and 2 farms respectively to form cluster 6 in the K = 6 solution.uc c. The results seem reasonable and 12.ÛDtc.099 63. us ter2 6 cluster distance distance from from sum of Numer 0 f observations squares centroid centroid 5 5 Cluster3 Cluster4 34 18 Cluster6 7 3 ~uster5 51.1 us ter2 3 5 6 3 l/lusterl i.195 3. 993 3.071 7.501 63.482 Within Average Maximum 14.970 1.172 3. 954 .288 3 3 4 3 4 5 3 3 39 40 41 5 C1uster4 4 3 5 Cluster3 3 4 3 within cluster distance distance from from sum of Numer 0 f observations squares centroid centroid L.703 4.604 4 5 3 3 2 3 4 3 2 5 4 4 4 4 5 3 3 4 4 4 3 4 3 3 4 2 4 1 1 1 1 1 1 5 5 3 2 / IdetthcOv\ for f£.27. 211 1.

Cumulative Gains Chart 100 Ul CD Ul c: &.000. The y-axis shows the per-centage of positive responses.000 total customers. the y-values at x = 10% shown in the char are 10% for baseline (no model) and 30% for the gain (lift) provided by the modeL. Consequently.30 The cumulative lift (gains) chart is shown below. This is the percentage of the total possible positive responses (20.000 or 30% of the positive responses if we contact the top to.358 12. With no model. we expect to get 80% of the positive responses. or 2. Ul 90 BO 70 60 -+ Lift Curve CD -. 20 ~ 10 ~ o o 10 20 30 40 SO 60 70 80 90 100 % Customers Contacted .1 x20.Baseline ix 50 CD E~Ul 40 30 o 0. which is a fraction of the 100. The x-axis shows the percentage of customers contacted.000). Continuing this argument for other choices of x (% customers contacted) and cumulating the results produces the lift (gains) chart shown. if we -contact 10% of the customers we would expect 10%. of the positive responses.OOO customers. We see. if we contact the top 40% of the customers determined by the model. for example.000 = . Our response model predicts 6.

The four estimated centers are: .0053. We further restrict the covariance matrices to satisfy ~k = 1JkD.0.8099 0.5343 4. (b) The model chosen above has 4 multivariate normal components.5485 0.1806 5.4986.Pi = 5.5369 íÆ2 .359 12.5910 0. P3 = 3.4988 0.0295) and the estimated scale factors are 17i = 0.0.7044 0.P2 = 8. P3 = 0.2465 . is given in the next figure.3395 5. The matrix scatter plot of the true classification. This is clear from comparing the two scatter plots for (Xb X2).9742 5.18.2871 0. -P4 = 8. 172 = 0.. Theestimatedproportionsarepi = 0.1073.. The K = 3 groups selected by this function have estimated centers .2762 0.6993 . These four components are shown in the matrix scatter plot where the observations have been classified into one of the four populations. P2 = 0.9780 .2633.1408.2467 0.Pi = 3.0319. This minimum BIC model has BIG = -547.2454 4.31 (a) The Mclust function.3732. 174 = 0. which selects the best overall model according to the BIC criterion.5158 0.3355.3228 6.2598. The estimated D = diag(l1.2453 6.1794 5. We also repeat the analysis using the me function to select mixture distribution with K = 3 components.3188 6.1418 11. 173 = 0.7093 0. selects a mixture with four multivariate normal components.1322.7647. .1862 5.3511 0. we see how the oil samples from the Upper sandstone are essentially split into two groups. 2. ¡¿3 = .0909. Comparing the matrix scatter plot of the four group classification with the matrix scatter plot of the true classification.4846 and the estimated covariance matrices turn out to be restricted to be of the form 1JkD where D is a diagonal matrix..1418 11.1730 0.2431 3.6893 4.2445 7.2834 .P4 = 0.3290 0..3526 0.1059. = 7.

0 .0 0 a 8 0 A A i.6 0. +*:J o o oir 0 'D + !X Aa + c 0 0 + a o a J U D 8 a0 e :l co l' 'l: 11+ + e H~ +.i' CD a.." .l~CD a a ii'i' a g + aq. ~ 0 .. .& D ë ages rì ~ ofr's'co'' a 0. + + '" t. "O. o :t +0 +L¡i + ..10 0.. o a e o dc ê 0 . lA ~ A A.t " a A a a " 0 N D 8.. f t (. Qj.+ a 'l + 00 10 .. Ò 0 " n Ò i1c+ ++ l0.§ . +(¡ D0Da a Ai c.... a a l_ A I CD 8! 0 d"ii ~+ 0-i " o 00 +l DO .....s x4 0 0 ~ A~ S ++ i! - ò A + A 0 A. 0 0 a D a .). D a C°itOD ++ ai c~OD D ODD ~~A 68 +-0 D D Do0 0Q x1 + t +t+ aa +:t .3 ...: aD 0 ec .Îa+ + a + 0 lf+ ... D o a 8a~c 0" + o+¡'b. N 0 00 a ai"t'+ a00 '" 0++~ ++ aa DC IJ D 0 tm D-+O+ D+ + Ò °0 0 " a 00 . a x3 + + (.4 0. c ..+'" D C A A ... ~d' 0. 80 ¡ 00 0 a ""A o~ 00 c D D~a-b~~ * 0 Oil +. D ri ¥" 0 0 c a a + + a + +-i\+ + D diD (.2 0. c.l D ++ AA"lDD B + B ++ + d' a a D a " I) Dl "..a o + D iIa'l CaD... .pa+ooa o CT...~ 8 a ~""q¡ a a 0 .360 0.0 + ++0. .:e t+ * - a + + B c ~c. + æ D DaD 0 a cO + oii llo'¥ +" ¡ + " D i A 00 ft" /:i (. + .4 o _+ + a " CD 00"* ++ ++ + c aA A 0 +0+ G1) D 0 A -i 'I . G~ao :.15 0.l.0 02 0.0 To +0 gtP ~ . ++ (t + 0 0 o D .8 . ~ e DO D DO D "' il + . D QCoO D-i ..0 " 4i 0lPo AAA 0 ii~ o A Aa 'l e o + *t * 0 if c bO OJ' fl+ + 00 . ~ . ò o ll ~:i++ o + ~cOD~+ CD c' . x2 o _ ~+. DC o -i+ + 0 D a A AA aaaao +! t a ~ 0 rm a D 0 o ò + .+¡ t a a C DOl! &D Â "". -T T 10 12 N .0 12 Figure 1: Classification into four groups using Mclust . x5 . 0.. .

a CD 0 c °coog 0 o 1& ODD 0 " å 0 . c c 0 0 c -i cOc~°Qi°o c of~aoOl" 00 c 0 B . . ..l. . .. ¡Pll 0 Øl0là 0 0'0 cP 0 .. 0 8. "! " 0 D.0 0 0 0 0 0 00 DC 0 ~D. . 0. o r 0 o 'I 0 ~o .a DC ..0 ~ o 0lf 0 .1 00 0 0 O..0 0 C 0 0 0 o.0 00 0 . i..2 0. ~ N 9.361 i. !. C "'0 O. lIO II ~ x4 Bg E c 0 B 8 C ø g q¡ iao 0 E 0 0 OdjOoo 0 C . 0 0 0 D ÚD§DOcjJ o...o 00 08 o~ . ~. flo 6 .¡ 'F . o Il 1' n " u " 00 DD D 0 0 i: fk coooe DO D o D . 0 0 0 0 .0 :"0 0 ..:t C 00 DJoOCOO 0 o .8 § 0 ~ . c ni . i"o. o 0 . 0 . 0.0 o"C 'l ° B "''i8'ci 0 ..0 li 0 DO DO D .' 0 .. 0 Dc 0 0 0 i 0 ~OO 0 0 . e ~ o 0 ~ ~gåo 0 c 0 .2 classification into sandstone strata II Cl Upper 0 Wilhelm ~ T i 10 12 N .0 0 D..¡¡eo c °OCOo~i¡ ! .DC CJ DO i. m '" .t ior.i - oS o. ceo 00 c . 0 c n B o 0 iiJ . ~ ~ 0 0 000 8'0 0 · ¡¡o .0 ~ o.4 0. ~ å 00°0 .¡¡Ec 0.~OCG x1 AD (OJO. 6 !ì ~c 0 ..l SubMuli II B CC 0 0 0 e "- .6 0. Do 000 . I/ O. l 0°il°1l08 Il =0 .. n . 0 0 0 x5 . 0 . 0 ~ i0 °00°1a& 110 II li x2 'i'" 0 0 . 'lQ) 0 0 c ~ 0 8°~o .. 0 0"0 . 0 _'I 0 8 " ° .DC 0 .25 0. .. t. I 0 -t00 .. B .0 0. 0 0 go o N d An . . ufO . ..4: C . o 0 c '" o 'l ~. 0 0 Oll iP 0 )0 'ì 0 .00.. DO 00 dt .15 020 0.8 1.0 1.3 6 7 4 i I Do . 8 ~ ° .. 0 "" 0 00 o DO 000 0 0 oO"lu B 0 !l 00 x3 00 0 i: D 0 B c c 0 0 '.0 0 0 10 Figure 2: True 0.0c" llo 0 m 0 0 . c II å 00 0 4 . HCe ~O.&0 0000 o ci. "'. :U~ .. . o 0 a II '" 0 0. .0 . .10 0.. 0 lD 0 . oB f 0 o o. 00 o 0i .0 0 00 om ¡.. ii0 ~ . 00 l.0 · rJ ~ å 0 å c rlo S . CC .f Jl'òrt'ó 0... .

0. .1535.6295.0949.1052. 2.24. the following samples are misclassified: 7 19 22 25 26 27 28 29 30 31 32 33 34 35 39 44 45 46 49 and the misclassification error rate is 33. If we use this method to classify the oil samples. The estimated proportions are Pi = 0.93%. rì2 = 0.0. P2 = 0. rì3 = 0.1315.3296.0314. with resulting BIG = -534. P3 = 0.0955) with estimated scale parameters rii = 0.2969.3702.5651.0052.362 the estimated diagonal matrix D = diag(1O.