Applied+Multivariate+Statistical+Analysis+6E【课后习题答案】 | Eigenvalues And Eigenvectors | Correlation And Dependence

Preface

This solution manual was prepared as an aid for instrctors who wil benefit by

having solutions available. In addition to providing detailed answers to most of the
problems in the book, this manual can help the instrctor determne which of the
problems are most appropriate for the class.
The vast majority of the problems have been solved with the help of available
the problems have been solved with
software (SAS, S~Plus, Minitab). A few of
computer
hand calculators. The reader should keep in mind that round-off errors can occurparcularly in those problems involving long chains of arthmetic calculations.

We would like to take this opportnity to acknowledge the contrbution of many
students, whose homework formd the basis for many of the solutions. In paricular, we
would like to thank Jorge Achcar, Sebastiao Amorim, W. K. Cheang, S. S. Cho, S. G.
Chow, Charles Fleming, Stu Janis, Richard Jones, Tim Kramer, Dennis Murphy, Rich
Raubertas, David Steinberg, T. J. Tien, Steve Verril, Paul Whitney and Mike Wincek.
Dianne Hall compiled most of the material needed to make this current solutions manual
consistent with the sixth edition of the book.
The solutions are numbered in the same manner as the exercises in the book.
Thus, for example, 9.6 refers to the 6th exercise of chapter 9.
We hope this manual is a useful aid for adopters of our Applied Multivariate
Statistical Analysis, 6th edition, text. The authors have taken a litte more active role in
the preparation of the current solutions manual. However, it is inevitable that an error or
two has slipped through so please bring remaining errors to our attention. Also,
comments and suggestions are always welcome.
Richard A. Johnson
Dean W. Wichern

Chapter 1
1.1

Xl =" 4.29

X2 = 15.29

51i = 4.20

522 = 3.56

S12 = 3.70

1.2 a)
Scatter Plot and Marginal Dot Plots

.
.

.

.

.

.

.

.

.
.
.

.

17.5

.

15.0

I'

.

.

.

.

.

12~5

.

.

.

)C

10.0

.

7.5

.
.

.
.
.

.
.

5.0
0

4

2

6

10

8

12

xl

b) SlZ is negative

c)
Xi =5.20 x2 = 12.48 sii = 3.09 S22 = 5.27

SI2 = -15.94 'i2 = -.98
Large Xl occurs with small Xz and vice versa.
d)
x = 12.48

(5.20 )

Sn --

-15.94)

-15.94
( 3.09

5.27

R =(
1 -.98)
-.98 1

.

2

1.3

SnJ6 : -~::J

x =

-

-. 40~

UJ

R =

. 3~OJ

.(1
(synet:;
.577c )

L (synetric) 2 .

1.4 a) There isa positive correlation between Xl and Xi. Since sample size is

small, hard to be definitive about nature of marginal distributions.
However, marginal distribution of Xi appears to be skewed to the right. .
The marginal distribution of Xi seems reasonably symmetrc.
.....'....._.,..'....,...,..'.":

SCëtter.PJot andMarginaldøøt:~llôt!;

. .

. . .

.

25

20

.

I'
)C

15

.

.
50

.

.

.

.

..

.

.

10

.

. .

.

.
.
.

.

.

.

.

.
100

;..

150

xl

200

250

300

b)
Xi = 155.60 x2 = 14.70 sii = 82.03 S22 = 4.85

SI2 = 273.26 'i2 = .69
Large profits (X2) tend to be associated with large sales (Xi); small profits

with small sales.

3
1.5 a) There is negative correlation between X2 and X3 and negative correlation

between Xl and X3. The marginal distribution of Xi appears to be skewed to

the right. The marginal distribution of X2 seems reasonably symmetric.
The marginal distribution of X3 also appears to be skewed to the right.

Sêåttêr'Plotäl'(i'Marginal.DotPiØ_:i..~sxli.

.

.

. .

.

.

. .

1600

)C

.

.
.

1200
M

.

.

.
.
.
. .

.

.
.

.800

.

400

.

.

.

.

0

25

20

15

10

. . . .

x2

. '-'

. .Scatiêr;Plötànd:Marginal.alÎ.'llÎi:lîtfjtì.I...

. .

. . .

1600

)C

.

.

. .
.

.

1200
M

.

..

.
.

800

.

400

.

.
0

50

100

150

xl

200

250

.
300

.
.
.
.

.

.

. . .

4

1.5 b)
273.26

Sn = 273.26
(- 32018.36
82.03

x = 14.70

710.91
(155.60J
1

-.85
1.6

- 948.45

461.90

-.85)
-.42

.69
R = ( ~69

4.85

-32018.36)
-948.45

-.42

1

a) Hi stograms

Xs

Xi
NUMBER OF

HIDDLE OF

HIDIILE OF
INTERVAL
co

OBSERVATioNS
*****
5

INTERVAL

5.
6.
7.
8.
9.

6.

********

S

******

X2
NUHBER OF
OBSERVATIONS
1
*

HIDDLE OF
INTERVAL

30.
40.
50.
60.
70.
80.
90.

J

2
3
10

12
a

100.
110.

2
1

s

9.
10.
11.
12.
13.
14.
15.
16.
17.

un*

6
4
4

**********
************
********

0

2.
4.
6.

1*

OBSERVA T I'ONS

1.

.... .

:3 .

4.

s.

NUMBER OF.
OBSERVA T IONS

1 J ***$*********
15 ***************

a ********
5
1 ui**
*

J ****
***
4
7. *******

5 n***

2
2 **
u

1*
2
1 **
*

20.
22.
24.
26.

X4

NUltEiER OF
OI4SERVATIOllS

7 *******
B ********

10.
12.
14.
16.
18.

NUHBER OF

S u***

INTERVAL

*

1

I' I DOLE .OF

19 *******************
9 *********
.3 U*

HIDDLE OF

*

1

Xl

:s *****

4.
5.
6.
7.

*

0

a.

J.

*

1

n

*

X3

2.

****

1

0

INTE"RVAL

HIDIILE OF
INTERVAL

4

0

19.
20.
21.

**
***

u**
Uu.*

.S

LS.

n*

*****
*****
******
****

S

a.

***********

11
5
6

U*

J

7.

u*****

7

10.

oJ .

NUMB£R. OF

08SERVATIONS
2
**

o
o

X7
HIDDLE OF
INTERVAL

2.

J.
4.
s.

NUMBER OF
OBSERVATIONS

7 *******
9 *********
1*

25 *************************

5

1.6

2.440

7.5

b)

293.360

73 .857

-

x

-2 . 714

4.548
=

2. 191

-.369
3.816
1.486

S =

-.452

- . 571

-2.1 79

.,.67

-1 .354

30.058
2.755

:609

.658

6.602
2.260

1 . 154

1 . 062

-.7-91

.172

11. 093

3.052

1 .019

30.241

.580

n

1 0 . 048

9.405
3.095

.138

.467

(syrtric)

The pair x3' x4 exhibits a small to moderate positive correlation and so does the

pair x3' xs' Most of the entries are small.

1.7
ill

b)

3

x2

.

4

4

..

.

2

.
Xl

2

2 4
Scatter.p1'Ot
(vari ab 1 e space)

~ ~tem space.)

1

-6

1.8 Using (1-12) d(P,Q) = 1(-1-1 )2+(_1_0)2; = /5 = 2.236

Using (1-20) d(P.Q)' /~H-1 )'+2(l)(-1-1 )(-1-0) '2t(-~0);' =j~~ = 1.38S
Using (1-20) the locus of points a c~nstant squared distance 1 from Q = (1,0)

is given by the expression t(xi-n2+ ~ (x1-1 )x2 + 2t x~ = 1. To sketch the
locus of points defined by this equation, we first obtain the coordinates of

some points sati sfyi ng the equation:
(-1,1.5), (0,-1.5), (0,3), (1,-2.6), (1,2.6), (2,-3), (2,1.5), (3,-1.5)
The resulting ellipse is:

X1

1.9

a) sl1 = 20.48

s 12 = 9.09

s 22 = 6. 19
X2

.

5

.

.
.

0

.
-"5

.

.

.
5

xi
10

1.-Yi)4 + Z(-l )(x1-Yl )(x2-YZ) + (x2-Y2):¿' = 14(Y1-xi):¿ + 2(-i)(yi-x.pr. b) In order for this expression to be a distance it has to be non-negative for 2.1) we have xl-2xZ = -Z.11 d(P. a12 = ~. for (xl .?0 so d(P.Q) = 14(X. 4(x. and aZ2 = 4.essi'on only if xl = Y1 and then the first is zero only if x. 1 1 15 2.2 = YZ.?o. 2.-yi)2. The s€cond term is zero in this last ex.2(xi-y.10 a) This equation is of the fonn (1-19) with aii = 1. . xl + 4xZ + x1x2 = (xl + r'2) + T x2 .(x1-Yfx2+Y2):1 + 3(Xi-Yi):1.P) Next.)(yZ-x2) + (xz-Yz):¿' = d(Q. .x2) = (0.)(x2-y2) + (x2-YZ): = =. Therefore this is a distance for correlated variables if it is non-negative easily if we write for all values of xl' xz' But this follows 2. we conclude that this is not a validdistan~e function. :¿ all values xl' xz' Since.Q) ~O.7 1.

.. -1 c) The generalization to p-dimensions is given by d(Q.lxpl)' 1.I.lx21..P) = max(lx..1 1 .P) =max (1-31..141) = 4 b) The locus of points whosesquar~d distance from (n. 1 -1 7 x.4) then d(Q.13 Place the faci'ity at C-3.8 1. is X2 .O) is .12 a) If P = (-3..

+ +______+_____+-------------+------~.* I: I: 160.38 (synetric) 337. ...+ )(4 .78 -20. .' 230. . 07 179. ... 20:5.10 61 .32 3 as . . . )(2 130. .. . 1:5:5.. . 93 90. 2:5:5. 94 221 '. . .14 a) 360.53 = 1146. 320.16 116.+ .60 82. 240.9 1. 280.31 236.+ .64 x = 12.35 865. . . .+ . s group.. No obvious "unusual" observations. . b) Mul tipl e-scl eros.62 13. 65 812.72 -218.80 .48 286. 180.. Strong positive correlation.91 Sn 61 . 1 3 -27 . 42 .+ 200.

139 .727 .56 1 95.62 5.61 95.548 1 R = (symmetric) .32 183 .28 2.133 = 1 ( synetrit: ) 1 Non multiple-sclerosis group. 67 3.200 1 1 R -.10 .28 1.438 .15 2.375 .134 1 .21 i = 1 .78 273.114 1 . 04 .123 .84 1.127 .173 1 .22 . 37 .2u 1 03 .49 2.454 .13 sn = 1 01 .892 .244 1 .08 11 0.132 . (syietric) 1 .239 .896 . 99 147.35 2. H)6 .57 1.167 -.

. t I : .. I 1 \1 . 3. .54 1.P.88 t. . 3. . . 1 I .27 . 1 t . .llfl . + t 1 . . 3. 1I. .25 ".- 2. . 1 1 . . \1 .. l 1 . . I ..75 G. . . + . . 1 1 III 2. 1 . . . ... . .. I 1 -- . . .z. . 1 I . +. . 2 .1 3 . . cl . .. . .1) 3.. . . .~~ ..15 a) Scatterplot of x2 and x3.0 ... . I -. . . . .. . .". . . i J . 1 .58 1.. 1 .. . . . . 1 . J - . J 1I 2 1 . + . . . it . I .. . . . . 1. . + . .2 . 1 1 I -- ~ . .25 .7'5 3. 1. ...S11 . . . . . .. l .. t. . . .. . I 1 . ~ .--..80 . . . . . . 1 . "'. . . . 2.. 1. . ~ I I +. . . + . 1 . . . t 1 1 . 1 t I i . . .14 ~.21 2. . ..'_ 1 X:i . + . . . .25 1. 1 I 1 .o . . ..75 ...2 I t 1 .25 Z "~A ACTIVITY X% b) 3. . . 1 . . 75f) .. . . . . .. ~ e . . . . .l - . . 1 - E E . . .11 1. . . . . '. . .81 x = 2. + ...

01 .09 1. .704 -.12 4..02 .11 .61 1.362 .02 -. Both activity and appetite have moderate positive correlations with symptoms.156 .85 .. (synetric) 1 .035 1 -.535 .61 .386 .~6 .15 -.39 .187 1 R = .346 1 (syretric) . A1 so.57 .551 1 .11 .85 .34 . 077 ' .21 = .496 . appetite and activity have a moderate positive correl a tion.92 .537 .455 .12 .071 1 The largest correlation is between appetite and amount of food eaten. 01 0 .15 Sn .27 .11 .58 .

0170261 0.674 .74909 0. 11.021 .027 . and the 1500m and 3000m runs.680 .265 x = 2.72889 1.ti2555 0.000 .0170261 0.00000 x- 0.152 .806 1.152 6.720 .909 .854 .801 .0124815 0.61192 0.152 .674 .544 10. 7927 1.847 2.007 .0192822 0.0077483 0.000 .4420 between Dominant humeru and Ulna.871 .197 3.13 1.905 .178 .0202555 0.0081886 0.728 .55222 0.85181 1. .073 .61192 1.799 .728 .17 There are large positive correlations among all variables.799 1.69146 0.4420 0.732 .021 .0077483 0.065 .0202555 0.89365 bewteen Dominant hemero and Hemeru.941 1.0667051 0. 0.0087559 0.00000 0.941.720.0111057 0. 7348 .0177938 0.474 10.820 .867 .0161635 0.875 2.400 .19 9.62555 1.791 .027 .85181 0.8183 1.230 .67789 0.02145tiO Sn - 0.0081886 0.854 .809 .eation is 0.0123332 0.12 .89365 0. for example. 0.474 10.621 .669 .74218 0.0161219 0.00000 0.0161635 0.55222 0.871 .61882 0. Paricularly large correlations occur between running events that are "similar".0099633 0.669 .0085522 0.0192822 0.199 . The lowest correlation is .782 .66826 0.544 1.680 .0177938 0.060 .193 51.00000 0.508 .0123332 0.80980 1.0109612 0.74369 0.67789 0.973 1.44020 0.74909 0.08 153.193 28.16 There are signficant positive correlations among al variable.61882 0.8438 0.500 1. and the highest corr.368 1.199 .082 .72889 0.02 4.000 .652 28.6938 0.782 .065 .0087559 0.0076395 0.000 .338 .905 1.00000 0.820 .00000 -0. R = 0.89365 1.909 1.0101752 1.677 R = .62 So= 4.0085522 0.69146 0.212 .875 .060 .0641052 0.801 .806 .732 .000 3.0076395 0.0161219 0.973 .230 4.178 .677 .212 .74218 0.99 .7044 0.867 .0771429 0.66826 0.197 1. the 1 OOm and 200m dashes.791 .000 .368 1.082 . .80980 0.0641052 0.508 265.0214560 0.254 10.000 .400 .500 .809 .36 .254 23.0099633 0.74369 0.338 .

167 1.118 .095 .075 .065 .17.675 .094 5.71 .66 .097 .672 .147 .092 .100 .938 1.731 .082 .866 .074 .854 .096 .81 .114 .804 1. The correlation matrix for running events measured in meters per second is very similar to the correlation matrix for the running event times given in Exercise 1.816 .972 .972 1.731 .093 7.105 .096 .776 .816 .086 .124 .000 .075 .875 .118 5.102 x = 6.60 Sn = .095 .14 1.108 .660 .54 4.906 1.096 .804 .093 .000 .806 .906 .852 .000 .108 .741 .144 .729 .675 .081 .114 .672 R = .875 .177 .18 There are positive correlations among all variables.147 .086 .091 .854 1.081 .741 .62 .806 .660 .824 .000 . 8.144 .729 .102 .824 .99 .081 8.906 .097 .852 .694 .115 .797 .694 .065 .138 .081 .000 .906 1.000 .105 .797 .938 .000 .866 .082 .094 .100 .092 .776 .096 . Notice the correlations decrease as the distances between pairs of running events increase (see the first column of the correlation matrx R).

.o'' 0. '" o .I: i = c: . .........I: ..QI . c: . 00 '.. ......- UI ... ..II 0' i:: .19 (a) o _R A 0 IUS RADIUS LHUI. .. .~.I: ~ . -0 c: C CD c.. c .' " " ... .. z- C .. o' : '..'" - -... 0.. = C ~ .I: II ~ " o' " .. in ... . ... '" .. . .. .... 00 00. .0 - QI CI ". c: . . .. z: .. ..... :: ...: C .. ".~ -.' . . en " o' .. .: o c: en ... . :...... - . C ... -I: .0 - - .... - . c: i- .. : ..- .. ...ERUS tlUME~US ILULNA ULNA c- . QI t- " ..¡c:: ... .... : o.... ...' " CI -. I o' '0 .: . .' . .. 00 " . -z ..... o en . ..C co .C co co .. Q I c: i- C . .. c: .. : ... : . t- ~ CI - ..15 1. z: . i :: .: .. .¡c:. "..

: . .0' . ..~... . ... .'\: . . . . ' tl.. \. :.. . . -l .8. . t .. . \.. ... .. .... f . . ...." . ...:. " . .i -. . .~.~ .~~ . .. -i-. .. .. .. . . . . .t:.... .. .o.. . .... . . ~l t.. . ...l . :. \.~. ~.~. .~. l.. ... .16 1.~ ~. ... .. . .. .. I..lý .1... . .. ~: .-.. ll. ..to l .' \ . ~ \. . . . " .... . . . ~ . .-.c . . · ~c. " ". ... . ....1.. . . ..- :. . . . . . . t- . 'l . ...1fiii: '...!.. ".~: . l... .... ... · ... ~. . . ..~ "~ .. A.... ...' !t ~ 1" ~. .... . -Ii....' . \. . . . -it .. . .. -\1: . i l 0 \. ..1.. . . i:_. . . .t':". .:~ t.. .. . _.I.19 (b) ~. ... "-:f' I! . ... ~ ..... P. . . . ~. . .( . .' ~. :.. 'L ~. .. .~. . .. :i~ . . . ..¡ .

. (b) . besides the strung out pattern to the right... '. . which is apparently located in the bankrupt group. . x1 ..". . .. . . ..ö . '" ~'~t . ~ x3 .. .. ~'T · '\\ . .' A L_-l_ X .. . in the lower left hand part could be outliers. Some observations. . .. . . . . there is one outlier in the nonbankruptgroup. (ll) The dotted line in the plot would be an orientation for the classificà. . (a1 The plot looks like a cigar shape. .. \ X3 X1 . . . ..tion. . . L _ _ (. . \ . \ l'ø. but bent. \. ." r .. . ... .20 Xl (a) . ~ .. .l l . From the highlighted plot in (b) (actually non-bankrupt group not highlighted).. . . .."" i..17 1...

. The observation in the upper right is the outlier. .. . .l.. . . ... . . . . . . . there is an orientation to classify into two groups. .... .-..... . X1 . . .. .~. X1 .. .... . . . . G~~ . . . .. . ...21 o (a) o (b) Outlier Outlier ~~ô 0'" ..e Outlier Q (a) There are two outliers in the upper right and lower right corners of the plot..... tfe' Ó. . As indiCated in the plot. .. . . .. . X3 ...18 1.. . .. . ... ...... . . (b) Only the points in the gasoline group are highlighted.. . .

.... .. .. . . Xz . fi . . . . ~./. . .et ot)~ . . ..s.. t I ~~\e X1 . . . .. . ./ .~ . . . / \.. .. . ./ .. .. Xz . . .. . .. x. . .. .... )l" .e .. . . .... .. . .. . . x./ . . .. .. · ). .22 possible outliers are indicated. .. ../ ~e~~\.. .. . ./ / .I . . G Outlier ~ø ~~ ~ø." .. .. Outliers ~ . il . .19 1. ~~e ../ . . X1 Xz/ · . . .. . fi .. . .

u :: "' e a. II C' Cd . 20 .. . ci .~a: i-n: +J l-0 ..Q c: ci ~ s.a:u 4. -u:: cc: s.I -a: ra :: VI s. ci .c u N oi c: -= s.VI Q.- -u Cd ci Q. c: .I¡ Cd a c: s. z: Cd CJ I) . ci VI en :: a..u VI C s..c :: en i- Cd ci VI . ~ i-aci .. s.. c( .Q -0 VI CI . c: fa s. .:: oi :: V) s. U .G M N. ci V) Co = V) i.. s.c:s. VI CI VI ci u . ci to :: iCo . 0 IØ U s.. . ~ C" ~ :: iU ~ci ~ ~s. .. ~ci :: iu VI II ai ci ~VI:: . ~VIa.... iu ci ~VI i-U:: iVI -~ . . --i... s. V) oi = V) s.. ~ciVI en :: .c.. '" +J VI ra ci .

s 1 .24 20 10 13 4 C1 uster 2 3 9 14 19 18 C1 uster 3 22 .21 Cl uster 1 1.

ched from 7. other groupings are~qually . the same manner as those in Example 1. for instance. Note. however. plausible. '5 Cluster 2 toC1 uster 3 and so forth.12.22 Clust~r 4 16 8 11 Cl uster 5 21 5 Cluster 6 17 12 2 C1 uster 7 We have cluster~d these faces in. . utilities 9 and 18 l1ight be swit.

.." .. .. '/ ... 13 '-a.: l.."-. f ~ 20 "..l "....i .": .~...- (not .- . . -.¡... '.l.. ..~... / .1 ¡. The shown) can be gr~uped in 3 or 4 additional r... -..emai ni ng stars cl usters." I: ..'.... 10 4 -. . .... .25 We illustrate one cluster of "stars. -..'-1 ...23 1.0: .....

38 116.317 o . 1263 0. 38 -0.. . .l .73 1. . 8 --¡.. .523 0.24 450.434 -0. cci . CD CD .37 1.525 0.92 206.801 -0..02 0.423 0..05 5813.94 128. . .482 1.05 46. .49 51. .i'.368 -0. .. 72 6592.27 1. . ..41 82. .624 1.198 0.38 145. .0 8.368 0.2895 0.23 3.I .05 -0.82 43.699 0.368 0. . .. 3816 1742. .409 0. .9474 70. . .224 0. . .79 98. 525 0 .317 0.691 0.482 -0. N '" . .-'.940 0.282 1.479 0.14 -0. . '" d '" d . .277 -0.521 0.64 450. .~66 1555. . BkFat .78 15.5 . . .00 0.74 2.0 7.09 98.. .92 1. .55 SaleHt 3. o CD .1967 1. . 624 0 .23 0.94 128. .4342 50. .5224 995.96 2. '. . . . .00 3.44 1..56 2. 47 2.487 0. .5££ 0.. ..82 43. .: Frame :i .116 -0.344 0 .488 -0. . .000 O. ~ .. . .. ~ . 75 51. . .409 0.. . 27 -1. 523 0 .. :.QOO ~0. .000 -0. 26 206 .277 0. .. . . .000 -0.23 3.208 £4. 35 90 1100 t30 5. .566 1. 368 0.14 272. .02 383026.-. ...09 -226. Breed .000 0.208 0. .46 272.479 0.. .0 .102 0.47 5813. . 860 0 .47 'SaleWt 46.423 1 . . .860 0. . .605 -0.3 0. ~ .102 -0.23 -0. . . d O. .521 0. a.3158 0. .390 0.28 1. . . . . .37 1. -:- .8816 6. .472 -0. ... .2 0. . . t 0. .00 480.97 145 .555 0..488 0.32 25308 .~15 0.615 0. .0 6.32 25308. . .224 1. . I. . Breed .4 0.24 -0.000 0.24 1.94 -0.49 -0.35 16628.113 0.72 6592..02 Breed YrHgt FtFrBody PrctFFB Frame BkFat 2.. . ~ g 2 4 6 8 50 52 54 1i 58 60 .691 1. 940 -0. .1. ..73 -429 .. .344 -0. .605 0. ~ . . -.168 -0. .. .38 9.282 0.26 Bull data R (a) XBAR Breed SalePr YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt 4.198 0.81 2.44 -0.44 81..699 0.434 0.: . .81 8481. . ..116 0.79 116. 17 -1.168 0.260 0. . . .28 -226.74 2. 44 81.~'. . 56 2.47 -0. CD CD on SaleHt l: -:. . ..55 4. FtFrB I . . .260 1.46 2.487 0.. .05 3..41 82.000 0. .78 1. 75 10. l. . 2 4 6 8 §! .01 480 .113 0.801 0..17 4. ...472 0.02 15.000 Sn SalePr -429. . . \. .390 0.85 -0.

27 (a) Correlation r = . (c) The correlation coefficient is a dimensionless measure of association. 0 0 1 . . 500 .1000 . . . 2 3 4 5 6 7 8 9 Visitors (b) Great Smoky is unusual park. 2000 . This single point has reasonably large effect on correlation reducing the positive correlation by more than half when added to the national park data set.25 1. Gæct '5lio\£~ "' . . .391. . . Correlation with this park removed is r = .173 Scatterplot of Size Y5 viSitors 2500 . The correlation in (b) would not change if size were measured in square miles instead of acres. . . 1500 iI üi ..

i . 'i. i :3 .1 i. x./ I . i i : l : =i i I 1 I i 02-2: .y = Iß = 5. 870 = arc cos ( . :.~. I .--- I . .. i : I i .i" 1 .621 LxLy . . : .. .-. . "T ~-:~~-'. I .t I b) i) il) Lx = RX cas(e) = . -ii.. ILl / i I. . ~ . . r I .-. ./_.1 .. ~ =.1 ' .. i I J : . ¡ i ~.A : 'JO " : i i . : '7 ~~ /... . . .. ~. '... ------ II i i . i . i I . -~'"' K (-~ ' "'7' . . ~ . . : ..I~ -._~-:..- e . i !! I i : I . ! " I I . . i . . _ . /" 1-.! . .:': . .. I .1 . -A : t. I !.. .' :..26 Chapter 2 2..1'- .--_. .051 . :I Ii I i -tI ! ii i 7.. 1 ~ I ~ .. .051 ) proJection of L on x . :i:-'.s lt~i is1is x = i 3š x = 7~35'35 (1 1 31' c) ': : i .1 3 d~j = . . .~ .__ " ./ == i .i :=-: -j.. : . . ---./ : '-' . . i .1 a) . i' i.. ..... :i 'i i I . I ! . I ! ! i/'i : g ¡ . . I . i . . 1 i) . : i . ø ':.'-i-' . i . i ! : : 1 I I I . . l-' : .. : I ... . /l oJ : : . -"-. . .' . .:/ _~. ::t i i I . I.:~. I .. . .9l'i 1 = = 19... I i I I I ... . . : ! I I .

J R.(: 1a -ìa lõ.th row (b. ...b2i ~'" IbkiJ 1 and A' lias. = Jl Next ßI I ajR.b1i .2 a) b) SA = 20 -6 r-6 1a ( -~ -q SA = . (~ (A I) = A' = A (C'f"l~ :l .. ('1 ~J'.3 = C'B U ':11 ) 11 (~ ~) - = (~ (AB) i ':) 11 (i ~j )th entry k a.. = i aitb1j 1 =Ja"b1. jth .1~ iÕ 4 -iÕJ i2 B'A' (12. 1=1 has .+ 1 a. 1 +Ja'2b2' 1 J+. (t''-' (C' J' 'l.~ -9 c) d) = AIBI -1 (-1 : e) No. -7) -: ) 1).. = d) il).)th k c..27 15) 2.. a) AI b) C' c i so 3.. A 2.i 31 -1 = J) 10 c) 8 ' 10 = 4 (1: AB = n (i has -Tõ 7)' (AB) . (AB) i has entry.=1 Consequently. i .b.

.x2)2 + 5(x. (f1A)B ... ¡s 2. Nonnalized eigenvectors: ':1 = (2/15~ -1/15)= (.4x1 X2 + 6X2 x' Ax .447..B-1S' I so AS has inverse (AS)-1 · I bl (S-lA-l)AS _ B-1 B-1 A.ajk)1 so SIAl has ~i~j)th entry k bliaji+b2ibj2+.lx2) -l (O~O) we conclude that A is positive definite. Thus I i = I = (AA .4 a) I = II and AA-l = I = A-1A. It was suff1 ci ent to check for a 1 eft inverse but we may also verify AB(B-1A-l) =.I Consequently. (xi . -2 -:)(::1~ (2Xi.l: 9xi . and 1= (A-1A)' = A1(A-l)l. 2/15) = (. .894~ -.5 IT QQI = -12 13 2. Å2 = 5 . a) 5i nce b) Since the quadratic form A = AI.~ ) I = (A-l)' A.894) .+bk1~jk = t~l ajtb1i = cji 51 nce i and j were arbi trary choices ~ (AB) i = B i A I .+xi) ~ 0 for tX.7 a) Eigenvalues: Ål = 10. 2. = QIQ .447) ~2 = (1/15. (A-l)1 is the inverse of Al or (AI r' = (A-l)'.6 12l r _121 r IT IT 13 = 1 69 5 12 i3 IT 5 i3 a 169 ~1 A' is symetric.A(~Bi~)A-i = AA-l = I .aj2"".i.. 2.x2J ( 9 .28 column (aji. 1 :J .

1 = (2/ß. 1//5J _ir 1/15) (1//5.2. 1/15J _ 3( 1/1S)(1//s' -2/151 ' A · (: 2.1 = (1/¡.1 Normal. .18 d) Eigenval ues: ll = . l2 = . 9-1/~ 2/~ 2) . -1//5 + 5 (1/1S1 (1/IS.z = (i/ß~ -2/I5J cJ A-l =(t 11 = 1 (2/15) (2/15. -1I/5J 2.~ = (2/15 ~ l/~ J =~ = (1/15.-1 (-2 -2) -2 =i1131 11 3 6 b) Eigenvalues: l1 = 1/2~ l2 = -1/3 Nonna1iz.ed eigenvectors: . 1 fIlS r2/1S.04 . l/I5J .9 -2 1/15 -2/~ a) A-1 = 1(-2)-2(2) 1 . zed e.8 Ei genva1 ues: l1 = 2 ~ l2 = -3 Norma 1. -2/15 J 2) = 2 (2//5) (2/15.z~ eigenvectors: . (012 c) -1 A = 2//5 0041 1 9(6)-( -2)( -2) (: 9 .z =: (2/15~. -2/ß1 -1 2 1/15 3L-21 5 . genvectors: .~ 2/15J .29 b) A' V-2 -2 ) .

+ o.. ~..OOl .. .000 -4 ..24 to find IAI = aii I Aii I + 0 + .0011 = 333. aii. ~pp by the induction hypothesis~ IAI = al'(a2Za33.11 With p=l~ laii\ = aii a a a22 = a11a2Z .0(0) = aiia22 Proceeding by induction~we assume the result holds for any (p-i)x(p-l) diagonal matrix Aii' Then writing aii = A (pxp) a a a .333 -4 .0011 )-(4.4(4.OOl)~ ~4. app' .002 -1 A = 4(4.OOl)~ -4. ( 4.D02001 _ 1 r 4.001 -: 00011 = -1.and 2. . Aii a we expand IAI according to Definition 2A.OÒZOCl -: 00011 1 ( 4. a2Za33 .002001 -44. 001 .. S~nce IAnl =.... app) = al1a22a33 . 001 ( 4..000.002 Thus A-1 ~ (_3)B-1 with p=2.002)~(4.10 B-1.30 2.

31

2.12 By (2-20), A = PApl with ppi = pip = 1. From Result 2A.l1(e)
IAI = ¡pi IAI Ipil = ¡AI. Since A is a diagonal matrix wlth

p p

diagonal elements Ài,À2~...,À , we can apply Exercise 2.11 to

get I A I = I A I = n À , .
'1
1=

1

2.14 Let À be ,an eigenvalue of A, Thus a = tA-U I. If Q ,is
orthogona 1, QQ i = I and I Q II Q i I = 1 by Exerci se 2.13. . Us; ng
Result 2A.11(e) we can then write
a = I Q I I A-U I I Q i I = I QAQ i -ÀI I

and it follows that À is also an eigenvalue of QAQ' if Q is
orthogona 1 .

2.16

show; ng A i A ; s symetric.

(A i A) i = A i (A i ) I = A i A

Yl

Y = Y 2 = Ax.

p _.. .. ..

Then a s Y12+y22+ ,.. + y2 = yay = x'A1Ax

yp

and AlA is non-negative definite by definition.
2.18

Write c2 = xlAx with A = r 4 -n1. Theeigenvalue..nonnalized

- - tl2 3

eigenvector pairs for A are:
Ài = 2 ~

Å2 = 5,

'=1 = (.577 ~ ,816)
':2 = (.81 6, -, 577)

'For c2 = 1, the hal f 1 engths of the major and minor axes of the

elllpse of constant distance are

~1 12 ~ ~

~ = -i = ,707 and ~ =.. = .447
respectively, These axes 1 ie in the directions of the vectors ~1

and =2 r~spectively,

32

For c2 = 4~ th,e hal f lengths of the major and mlnor axes are

c
2
'
ñ:, .f

c _ 2 _
- = - = 1.414 and -- - -- - .894 .
ñ:2 ' IS

As c2 increases the lengths of, the major and mi~or axes ; ncrease.
2.20 Using matrx A in Exercise 2.3, we determne

Ài = , ,382, :1 = (,8507, - .5257) i
À2 = 3.6'8~ :2 = (.5257., .8507)1
We know

,325)
A '/2 = Ifl :1:1 + 1r2 :2:2

,325

__(' .376

1. 701

- .1453 J

A-1/2 = -i e el + -- e el _ ( ,7608
If, -1 -1 Ir -2 _2 ~ -,1453
We check

Al/ A-1/2 =(:
~) . A-l/2 Al/2

.6155

33

2,21 (a)
A' A = r 1 _2 2 J r ~ -~ J = r 9 1 J

l1 22 l2 2 l19

0= IA'A-A I I = (9-A)2- 1 = (lu- A)(8-A) , so Ai = 10 and A2 = 8.
Next,

U;J ¡::J
¡ i ~ J ¡:~ J

-

10 ¡:~ J

-

8 ¡:~J

gives

gives

ei - . 1/.;

- (W2J

¡ 1/.; J

e2 = -1/.;

(b)

AA'= ¡~-n U -; n = ¡n ~J
o = /AA' - AI 12-A
I - .1 0 80- À40

4 0 8-A

= (2 - A)(8 - A)2 - 42(8 - A) = (8 - A)(A -lO)A so Ai = 10, A2 = 8, and
A3 = O.

(~ ~ ~ J ¡ ~ J - 10 (~J
.gves

¡~

gives

so ei= ~(~J

4e3 - 8ei
8e2 - lOe2
0
8
0 ~J

4e3

4ei

Also, e3 = 1-2/V5,O, 1/V5 J'

-

8 (~J

¡ :: J

-

Gei
U

so e,= (!J

34

\C)

u -~ J - Vi ( l, J ( J" J, 1 + VB (! J (to, - J, I
2,22 (a)
AA' = r 4 8 8 J

l 3 6 -9

r : ~ J = r 144 -12 J

l8 -9 L -12 126

o = IAA' - À I I = (144 - À)(126 - À) - (12)2 = (150 - À)(120 - À) , so
Ài = 150 and À2 =' 120. Next,

r 144 -12) r ei J = .150 r ei J

L -12 126 L e2 le2

. r 2/.; )

gives ei = L -1/.; .

and À2 = 120 gives e2 = f1/v512/.;)'.

(b)
AI A = r: ~ J

l8 -9

r438 8J

- r ~~ i~~ i~ J

l 6-9

25 - À 505

l 5 10 145

0= IA'A - ÀI 1= 50 100 - À 10 = (150 - A)(A - 120)A

5 10 145 - À
so Ai = 150, A2 = 120, and Ag = 0, Next,

¡ 25 50 5 J
50 100 10
5 10 145

gives

r ei J' r ei J
l :: = 150 l::

-120ei + 60e2 0 1 ( J

-25ei + 5eg
VùU
O or ei
= 'W0521

lD 145
( ~5 i~~
i~ J

eg e2

( :~ J = 120 (:~ J

35

gives -l~~~ ~ -2:~: ~ or., = ~ ( j J
Also, ea = (2/J5, -l/J5, 0)'.
(c)

3 68 -9
(4
8J
= Ý150 ( _~ J (J. vk j, J + Ý120 ( ~ J (to ~ - to J

2.24

a)

;-1 = ~

'9

( 1

c)

For ~-l
+:

À1 = 4,

a
1

a

b)

n

À1 = 1/4,

À2 = 9 ~

À3 = 1,

':1 = (1 ,O,~) i

À2 = 1 /.9, ':2 = (0 ~ 1 ,0) ,

À3 = 1,

el
-3

= (OlO~l)1

=l=('~O,OJ'
=2 = (0,1,0)'
=3 = (0,0,1)'

36

2.25

Vl/2 "(:

a)

a

3 4/15 1/6 1

(:

0

0Jt 1 -1/5 4flJ (5

2

a -1/5,

a

3 4/15 1/6

= (~:
a)

-.2 .26~
- 2

1

il

.1'67

' 1 67 i

" ~:i'67

V 1/2 .e v 1/2 =

b)

2.26

2

OJ ( 1 -1/5 4fl5J
o 'if.= ,-1/5 1 1/6=

a

1

° OJ i5

-1

1/6 0

2

° = -2/5

2

1 a

a

3 4/5

1/2

4/3) (5 a
1/3 a

2

3 a

0

:J

-2
4

n =f

1

1/2 i /2
P13 = °13/°11 °22, = 4/13 ¡q = 4l15 = ,2£7

b) Write Xl = 1 'Xl + O'X2 + O-X3 = ~~~. with ~~ = (1 ~O~O)

1 1 i , i 1 1

2 x2 + 2 x3 = ~2 ~ W1 th ~2 = (0 i 2' 2" J

Then Var(Xi) =al1 = 25. By (2-43),

~

1X 1X ,+ 1 2 1 .19

Var(2" 2 +2" 3) =':2 + ~2 =4 a22 + 4 a23 + '4 °33 = 1 + 2+ 4

15
= T = 3.75

By (2-45) ~ (see al so hi nt to Exerc,ise 2.28),

1 1 i 1 1

Cov(X, ~ 2Xi + 2 Xi) = ~l r ~2 = "'0'12 +"2 °13 = -1 + 2 = 1

~o

37

1Xl +1'2 X2) =

Corr(X1 ~ '2

2.27

, 1

COy(X" "2X, + '2X2) 1
.103

~r(Xi) har(~ Xl + ~ X2) =Sl3 :=

a)

iii

- 2iiZ ~

aii

b)

-lll

+ 3iZ ~

aii + 9a22 - 6a12

c)

iii + \12 + \13'

d)

ii, +~2\12 -. \13,

+ 4a22 - 4012

aii + a22 + a3i + 2a12 + 2a13 +2a23

aii' +~a22 + a33 + 402 - 2a,.3 - 4023

e) 3i1 - 4iiZ' 9a11 + 16022 since a12 = a .

BX(2))=AE12B'=(1 -1) ¡~ ~J ¡ _~ n=(O 21 . -: J (-~ ~ J .(l) = ¡ :i (b) A¡.(~: -~ J 0) COV(X(l).38 2. X(2)) = ¡ ~ ~ J (j) COV(AX(1).(l) = ¡ 1 -'1 1 ¡ ~ J = 1 (c) COV(X(l) ) = Eii = ¡ ~ ~ J (d) COV(AX(l) ) = AEiiA' = ¡i -1 i ¡ ~ n ¡ -iJ = 4 (e) E(X(2)J = ¡.2) = ¡ n tf) B¡..(2) (~ -iJ ¡ n = ¡ n (g) COV(X(2) ) = E22 = ¡ -.31 (a) E¡X(l)J = ¡. -: J (h) COV(BX(2)) = BE22B' = ¡ ~ -~ J (-.

¡ i ~ J (e) E(Xl2)j = IL(.32 ~a) EIX(l)j = ILll) = ¡ ~ J (b) AIL(l) = ¡ ~ -~ J ¡ ~ J = ¡ -~ J (c) Co(X(l) ) = En = l-i -~ J td) COV(AX(l)) = AEnA' = ¡ ~ -¡ J ¡ -i -~ J L ~~ ~ J .) = ( -~ J (f) BIL(2) = ¡ ~ .X(2)) = ¡ l ::J ~ J (j) COV(AX(l) i BX(2)) = AE12B' ¡ 12 9 J 9 24 . = U i -~ J (j ~ -~ J U -n 0) CoV(X(1). -~ J ( -~ i = ¡ -.39 2. J (g) COV(X(2) ) = ~22 = 1 4 ( -1 6 10 -~1 i (h) COV(BX(2) ) = BE22B' .

U j J H =l n (¡ j J .l ~ ~ J 2.î ) L ¿ ~ J D . ¡234) = (î -~ ~) (-¡ -~!J (-~ n - 4 63 (e) E(X(2)J = ti(2) = ¡ ~ ) (f) Bti(2) = ¡ ~ -î J ¡ ~ J = I .~ .~ ( 4 i 6-i~J (d) COV(AX(l) ) = Ai:iiA' . ) (g) Co( X(2) ) = E" = ¡ ¿ n (h) CoV(BX.¡ ~ J (c) Cov(X(l¡ ) = Eii = .40 .2) ) = BE"B' = U .~ J .I 1~ ~ J .33 (a) E(X(l)j = Li(l) = ( _~ J (b) Ati(l) = L î -~ ~ J ( _~ J .

~ J = ¡ -4.~ 4.41 (i) COv(X(1).X(2))= -1 0 1 -1 ( _1 0 J ü) COV(AX(l). BX~2)) = A:E12B1 = ¡ 2 -1 0 J (=!O J 1 1 3 i1 -1 0 ¡ ~ .~ J .

-. ). max x'x=l .894. .- X i Ax = max ~ 'A! ~13 = À1 ~fQ where À1 is the largest eigenvalue of A.32 = 0 gives ). = x'Ax wher A = (: ~).34 bib = 4 + 1 + 16 + a = 21.35 bid .36 4x~ + 4x~ + 6xix.)2 .1 = 18.).- biBb - = -4 + 3 = -1 = (-4.38 Using computer.7 ~ Ài = 10 and -1 x I x Fl Exercise 2. we have from Exercise el . For A given in 2.42 2. 2.447).3 = 9.17 2. ). Hence the maximum is 18 and the minimum is 9.).6.1 = 7. Hence the maximum is 7 and the minimum is 1. (. 2.-did .2 = 9. 3) = L: -:J (-:14 23) ( -~ J · 125 (-~ J 2/6 ) il ) d I B-1 d = (1~1) 2/6 11/6 = 2/6 1 ( 5/6 --' so 1 = (bld)Z s 125 (11/6)" = 229. ). (4 .2 = 1.= 15 and bid = -2-3-8+0 = -13 (ÉI~)Z = 169 ~ 21 (15) = 315 2.37 From (2~51). Therefore max xlAx = 10 andth1s maximum is attained for : = ~1.

41 (8) E(AX) = AE(X) = APX = m o OJ (b) Cav(AX) = ACov(X)A' = ALXA' = (~ 18 0 o 36 (c) All pairs of linear combinations have zero covarances.43 2.42 (8) E(AX) = AE(X)= Apx =(i o OJ (b) Cov(AX) = ACov(X)A' = ALxA' = ( ~ 12 0 o 24 (c) All pairs of linear combinations have zero covariances. 2. .

n 522 ~ l~ or szi = = 18/3 = "6. 2. = ~. -3. = L!l or s" = 6/3 = 2.866 .86'6 1 J .xl! = (-'.1 :2 Let e be the angl e between .1 a) ~ = (:) b) ~.! = (4 tOt -4) i ':2 = ~z .2 a) g = (..i. .8661 -3 '6 . L = 12 ..5) 3.J b) :1 = II . Also. 0)' c) L =/6. r1Z = ~"s (e) = _ .8:6'6.:. Also.2 = :~ -:2 or :5'2 = -9/3 = -3.. 22 ~2 Therefore n s" = L2 or $" = 32/3. n ši. n 5'2 = ~i':2 or slZ = -4/3. Conse- quently S = and n -4/3 2/3R =-.5 1 (32/3 -413) "( 1" -. L =11 ':1 ~2 Let e be the angle between ':1 and ~2' then eOs (e) = -9/16 x 18 = .xz! = (3. -. riZ = cos (e) = -. -11' ~2 = l2 . Therefore n 31. Consequently So =( Z -3) and R= ( 1 . 0) I c) et L = m. n S = i2 or S22 = 2/3.S.x2! = (-1 t '. and :2' then èos ~e) ~ -4//32 x 2 = -.44 Chapter 3 3.5 :.

!2 = to. 3). -4 ( 32 1 -:J and l sIc: l2 -2 6 b) ii ! + (ll .3. ( 3 -9/2 J -1 3 ~ -3 -1 0 2 (-31 -3 -:J e -9) = -9 18 and Isl = 2.J 1 :J ~ ". Thus d'i = (-3.')' ~2J S . .Ši = . .1 x' = r -~ ~ -~ J. 0. 1. 4)'.l6 0 =e -:)E ~.2) ..3 xl ! II = (1. Since.6 a) X'.ii ! = (-2.) .(~ 1 .5 _2J 3 1 5 5 . . 1 J' i l' · (. 1..xl l) 1 2S=(X-xl CX-xl . so II .. 3 (: i :J _.o1.. 4.l1& X :) .92 = 23' the matrx of deviations is not offull ra. (34 -2 4 .7/4 -9/2 9 .N 3 -1 2 N 3. 1: = . 2 S = (X .-- + 3 3 4 3. 1. Thus li = 4 = a) l'.1 x')' ( X-I ?) = so S =. .. -1) and ~/3 = (-3. = (3. -3).45 3. ..

:~ = (. -4/9J S".707. - 3.707) À2 = 1/9.7Ð7) are: . Isl = 0 (Verify).. .enclosed by ~he deviation vectors 1 s zero.1 ~ (-:~: 519 Eigenvalue-normalized eigenv~ctor pairs for 5-1 are: À1 = 1. c) Total sample varia-nce = 9 + 1 + 7 = 17 . The major axis of ell ipse 1 ies in the direction of ~2. -. .7n7) Half lengths of axes .46 15J -3 b) 2 S = (X -. The 3 deviation vectors lie in a 2-dimensional subspace.1 = (.The 3-dimensional volume . 4/9) 4/9 (5/9 Eigenvalue-normal ized eigenvectors for 5/9 5-1 Ài = 1.7 All e11 ipses are 'centered at i) For S = (: : J ' -x . .. -. ~2 = (. the minor axis 1 ies in the direction of :1.- of ellipse (x-x)'S-l(X-X) S 1 are l/Ir = 1 and l/~ = 3 respectively. . 1 X')' X-I-xl) = ( ~ ~ "i ( øw 15 So S = -3/2 1 -1/2 ( 15/2 9 -3/2. . 2 -1 -1 l4 1'5/2) -1/2 7 ..707.707. !~ = (.7~7.. s = -4 5 ( 5 -4) -1 . .707) i À2 = 1/9. if) For s= .

. fn fact.x)'S-l .x) ~ 1 are.. The major axes ellipse li.. G 1 0 Isl = 1 ~J. (x .. ~i = (1.es in the direction of ':2. . the salid ellipse. !~ = (0. . 0 b) For S. Here.. a solid sphere... S-l = (1/3 OJ Eigenvalue-normalized eigenvector pairs for 5-1 are: ).8 a) Total sample variance in both cases is 3.is. -1/2 For S =(-1~2 -1/2 1 . Major and minor " axes of ellipse can be taken to lie in the directions of the coordinate axes.2 = 1/3. lJ axes Half lengths of (x. 3. the minor axis lieS' in the directi~n of =1. Notice for aii three cases 1s1 = 9. of the again..l/2 -1/2J -1/2 . o 3 0 l/3 iii) For S = (3 0).. (x . 1 Isl = 0 .47 Half l~ngths ~faxes of ell ipse (x. l/lr = 1 and 1/1.1 = 1 13..x) s 1 of ellipse are equal and given by l/ir = l/lr = 13. = 3. 0) ).x)' 5-1_. Note that ~2 here is =1 in "part (i) above and =1 here is =2 in part (i) above.

5) + 39(15.9 (8) Vve calculate æ = (16.5 . 1.48 3.53 -= 5~~ J .2. 0 2. -1 J' giv.5 2. 3. (b) S = 1~ 2. 1. .0 2.5 soI-S-(2.5) + 39(15.5)3 I .5)2 0 + 0 = i) ( 2.10 (a) VVe calculate æ = (5.5 .5 ( c) Check.es Xca = O.5 " ( J I I 10(2.5)(18.5) -55(5.34 l and -4 -1 -5 2 2 4 Xc= -2 -2 -4 4 0 0 4 1 1 and we notice coh( Xc)+ coh( Xc) = cOli( Xc) so a = fl.3 J' and -2 -1 -3 1 Xc= -1 2 0 2 0 -2 1 3 -1 and we notice coh( Xc)+ ~012( Xc) = cOli( Xc) 0 1 so a = iI.¡o~ J 13 + 5.18.5)3 _ 0 -+(2.52. ~b) S =. -1 J' gives Xca = O.5 18.5(2.55 J Using the save coeffient vector a as in Part a) Sa = O.18.5 2.5) = 0 13 5.5) _ 9(18.~ 5~~ so S = -(13)2(2.5) As above in a) Sa = ( 3 ~ .

a2 = O.11 Con~equently S = 14213 i14808 15538 o) 09:70) . 1 4213 J 3. . 01/2 = (121 ~6881 R = 124 . The columns of the data matrix are linearly independent. by the first equation in the fil"t set.49 (c:) Setting Xa = 0.6515 ( 09:70 and 0-1/2 = (" 0:82 00:0 J The relationships R = 0-1/2 S 0-1/2 and S = 0'12 R 01/2 can now be verifi ed by direct matrix multiplication. 3ai + a2 = 0 7ai + 3ag = 0 so 5ai + 3a2 + 4ag = 0 ai 5ai g -"jag 3(3ai) + 4ag = 0 so we must have ai = as = 0 but then.

C -2 X= _ 1 and :' ~3 = 3 so sampl e mean = -1 sampl~ variance = 28 Finally sample covariance = (21-16)(-7+1)+(19-16)(1+1)+(8-16)(3+1) = -2.14.8. 2) Using (3-36) and S · ( ~: -12 J .50 3.e f ~l · (2 3) (~J' 21 - Similarly Ë' ~2 = 19 and Ë' ~3 = 8 so sample mean = 2l+19+8 = 16 3 sampl~ vari ance = (21_16)1+(19-16)2+(8-16)2 = 49 2 I Also :' ~1 · (-1 2) (~J = -7. b) ~-I= (5 . a) From fi rst pri nciples we hav.

-X= 12 sample mean of c.' (-1 21 C: -121 (".1. b' X. c' X :b'Sc=(23) " ..5 (13 1 sampl e mean of -b. 3.SJ -1.-.. (2 3) (:1 = 16 sample mean of :' ~ = (-1 2) (:1.51 sample mean of b' X =~' ~. =-28 . S = -2.15 -2. and.5 -1.5 E · (.1 · .-1 sample variance of b' X · ~' S~ · (2 3) e: -121(: 1 = 49 sample variance of C' X = :' S:.5 1.5 3 . X = -1 . n part (a).28 '" sample covariance of .samp1 e variance of b' X = l2 sampl e vari ance of c' X = 43 sample covariance of b' X and c' X = -3 1. l6 -2-2J1 (-11 2 Resul ts same as those ..

V!.52 3.. Then y=(1 1 1 l)x=1.16 S 1 nee tv =E(~ -~V)(~ -~V)' I ..!:V!.154 . s~ =(1 -1 0 O)S(1 -1 0 0) =.: E(~ ) . s~ =(1 1 1 I)S(1 1 1 1) =3. .~ .913 (b) Let y = Xl -X2 be the excess of petroleum consumption over natural gas consumption.873 .~V .!:VJ:V -: !:V~V + ~V!:V = E(~') . ..~V _ +~V!:V :: E(~ ) .18 (a) Let y = Xi+X2+X3+X4 be the total energy consumption.~V~ +~VJ:V . I I) = E(~ . ' we have E(VV') = * + !. Then y=(i -1 0 0)x=. 'E(V' )" .V.E(~)!:V . 3.258 .

S9 = .S3 Chapter 4 4. Ellpse centered at (0.2 ta) We are given p = 2 .B81' and has half-length ý' c = \I.ô34v'1.8 x V2i J ¡i=(.SSg.3) + -(X2 .-Xi~X2 .72.5) = 1.3) 4.J E=¡ -.72 2 .( -(Xi .72 ( 4: i V2) :7 I(:i) = V: exp -.3) + -(X2 .i)(X2 .2) + -(X2 .460)'.3)2) 1 ( )2 2V2( 2 2 . haif-length .81. The major ax lies in the direction e = I.1 (a) We are given p = 2 i 2 -.72 and E-1 = ~: i (i 1 2 2V2 2 ) (27l) .366\11.39 = 1. The minor axis lies in the direction e =i-Aß-O .Xl .1 + .2)' with the major ax liav- in.2 ) ~c) c2 = x~(.2) ) I( ) i (exp1-2"(2 2V2.¡ c = \12. .72 -(b) .8 x J2 50 I E I = . .72. .9 .1) + -(Xl .9 . V2"J 2 -T :i = (27l)'¡3/2 3Xi -23Xl(X2 + 3 X2 .2)4()2 (b) 2 2 2V2 4 2 3 3 3 -Xi .1)(x2 . I' = (n E =( 2 1 V2 and 2 L-l = v' ~) so I E I = 3/2 -4 .Xl .94.39.

3 We apply Result 4. by Result 4. 1 +1a. _ ~ ). form A * A i . 013 = 023 = 0 d) Yes. 023 = 0 c) Yes. (X1+XZ)/Z and X3 are jointly normal and their covariance is210 = 0. (0 . by Result 4.I.3 with A = _~ 1 to see that the covari anc.o -3 -2 -1 o 1 2 3 x1 4..e i~ 10 and not o.5 that relates zero covariance to statisti~a1 in- dependence a) No.. 3 2 ¿3 e) No. 012 1 0 b) Yes.54 Constant density contour that contains 50% of the probability oc? I. .3. ~ C' x ~0 .

X3 and X2 are independent since they have a trivariate normal distribution where al2 = û and a32 = o.5(xa .a3J wi 11 meet the -a' = (1. conditioning further on X2 does not change the answer from Part a). (d) Xl. .3( -1) = 7 4.~) b) X2/xi .i. te) Xl and Xl + 2X2 .2) . 4.a3J of the fonn ~ i requirement.5) .3a13 = 4 + 2(0) .4 a) 3Xi .(b) Xilx2. Since)(2 is independent of Xi.X3 is N(l + .2a3 = O. (b) Xl and X3 are dependent since they have nonzero covariancea13 = . ~c) X2 and X3 are independent sin-ce they have a bivariate normal distribution with covariance 0"23 = O.5 = (3-2a3.ss 4. As an example.X2-aiXi-a3X3) = : .a. Thus any ~ i = tai .5) . 1) c) x3lxi . 3.3X3 are dependent since they have nonzero covariance au + 20"12 .6 (a) Xl and X2 are independent since they have a bivariate normal distribution with covariance 0"12 = O.x2 is N(¥x1+X2+3) .!) 4. .2) .3.1).2X2 + X3 is N03. a) Xi/x2 is N(l'(XZ-2).x3 is N(-2xi-5..7 (a) XilX3 is N(l + "&(X3 .9) b) Require Cov (X2.

) E = 0 j=1 4 4 4 4 4 4 4 4 That is.s( VI E Vl + V2 E V2 4..8. we find that V2 is N(a..~(v. we know that Vi and V2 have covariance 4 (1 1 -1 1 1 1 -1 1 1 1) 1 (~b'c.for . l~ r (:: J ) 1 .and covariance matrix ( ¿:J=1 b.4 we have Ej=1 CjtLj = a and ( E1=1 c.) = (21l)pf lE I exp ( ..) + -( -) + -( . .8.=1 cJ ) . v. Consequently. l~ J ) so the joint density of the 2p variables is I( v¡. setting bi = b3 = bs = 1/5 and b2 = b4 = -1/5 we fid that V2 has mean ì:. v. with Cl = C2 = C3 = C4 = Cs = 1/5 and /-j . Similarly..S6 4. we know that Viand V 2 are jointly multivariate normal with covariance 4 (1 1 -1 1 1 -1 -1 -1 ) ( L bjcj ) L = -( -) + -( . J (l.-1/4 and tLj = /.5 we find that V 1 has mean EJ=1 Cj tLj = tL and covariance matrix ( E..gain by Result 4. (l.)L= -( -)5+-5. j = 1. J 1 "5 5 -)5+-( 5"5 '5 5-) ~=-E 25 .)+-( . with Cl = C3 = 1/4. lL). ) E = iE. Again by Result 4.L = lL. (1 i -1 i -l ) ) = (27l)pl lE I exp .17 By Result 4. iL). Similarly.8.16 (a) By Result 4...Jl -( . ( ~: J is distributed N. (b) A. VI is N(O.=i bj/-j = l/. setting b1 = b2 = 1/4 and b3 = b4 = -1/4.. ) L = fE..p (0.8. C2 = C4 . .(-)5.tL for j = 1.

t) c) From (4-23).-)' n _n j=l (x.G13 + Z'23 l' (1 .012 .io t) and finally I2 (~-~) .((~J-m)mimn .11 we know that the maximum 1 He1 i hood estimat. ~ .-J.m(i j) .24 +tf34 -'ZOlS +:tZ5 +-r35 +?'l ô .2) entry = 0ô6 + :t55 + tf44 .1) entry =011 + ~22 + tf33 . = t tc~J Gi aj.io t).N~(~.°S6 + zc45 °131 .6) i 1 L .f. 195 has a Wishart distribution with 19 d.18 By Result 4.-x)(x.f13'ô (2.(~JfP 1)1 b) From (4-23).Nô(~.(GJ-m)(~H~J) '.N~(~. 4.za26 .20 8(195)B' is a 2x2 matrix distributed as W19('1 BtBt) with 19 d.2) entry = -r14 of :t.-x = t tmH~J)(m-mHm-(~J)((:1-(m' -J .~¥o -i).es of II and and t are x = (4.f. where 1 1 1 1 1 1 1 1'1 1 1 i a) BtB i has (1.57 4.°46 . b) stB' °31 =l °11 G33J . Then ~-~ .

) = n( X .Jl )'~-l( X .J.1 n-l~ ) (b) Xl . (c) Using Part a)i ( X . (d) Approximately distributed as chi-square with p degrees of freedom.S8 4.is distributed N4\OI L ) so ( Xl . )'L-1( Xl . ) is distributed as chi-square with p degrees of freedom. ) is distributed as chi-square with p degrees of freedom.J.J. Since i L can be replaced by S.J.J. )'( n-1L )-l( X . the sample size is 1 arge .J.21 (a) X is distributed N4(J.

23 (a) The Q-Q plot shown below is not particularly straight. cannot reject hypothesis of normality at the 10% leveL.13 to get Iñ(~-!:) is approximately Hp(~. Q-Q Plot for Dow Jones Data 30 .95 ~ . . -20 -2 -1 0 1 q(i) (b) TQ = . .95 and n = 10. Diffcult to determine if data are normally distributed from the plot. . -10 . Since TQ = .t) and that ~ is approximately Np(~'~ t). 20 . -C 10 . but the sample size n = 10 is small.2).) By (4-28) we ~onclude that ýn(X-~) S ~-~ is approximately b) X2 p. 0 . c i -1(. 2 .59 4.22' a) We see that n = 75 is a sufficiently lar"ge sample (compared with p)and apply R. 4.9351 (see Table 4. .esult 4. )C .

100 50 -2 -1 o 2 1 q(i) .968.940 and for profits. lS .. 150 . although Q-Q plot for profits appears to be "straighter" than plot for sales. Plots not particularly straight. TQ = . Q-Q Plot for Sales 300 250 a.24 (a) Q-Q plots for sales and profits are given below. Since the values for both of these correlations are greater than . 200 ~ . i 0 is . Difficult to assess normality from plots with such a small sample size (n = 10). For sales.60 4.'I. TQ = . we cannot reject normality in either case. . 10 -2 ~1 o 1 2 q(i) (b) The critical point for n = i 0 when a = .9351.935 i. Q"4 P1Ót for l)rofits . .

5 4 1i is 3 g '" l! u 2 '! o 1 o o 2 1 3ChiSqQuantii.2831 4.6416 2.8365 4.8147 7 8 .0411 3.6430 3.1095 2.1891 4.4073 1.1083 5.0195 3.61 4.8 237054 Chi-square Ouantiles .2 -1053.2125 1.3170 7.6418 2.5 Ordered SqDist . Information leading to the construction of this plot is also displayed.6 710.6J (-35576 7476.2894 1. The plot is reasonably straight and it would be difficult to reject multivariate normality given the small sample size of n = i O.3518 .6 -35576 J x = 14.25 The chi-square plot for the world's largest companies data is shown below.7 S = 303.3520 4. 4 5 6 303.3142 1.7978 1.8 -l053.9091 26.9 (155.

2 (d) Given the results in pars (b) and (c) and the small number of observations (n = 10).1898 . .2569 =(2.8165. .7102 30. .39.0176.7539 1. 4.48 5. 1. (b) Since xi(. 2. CÍ1i~squåre pløt for . .0203. . 3.9009.5) = 1.7102 J s-I 1. .7329. 2. 5 observations (50%) are within the (c) The chi-square plot is shown below. .62 x=( 12.2153 50% contour.6222 -17.3753.20J s=( 10.8544' J Thus dJ = 1. it is diffcult to reject bivarate normality.7353.. .26 (a) ' -17. .2569 4.8753.3105.

*".5 i.970 -i .973 (See Table 4. * * * * * :\ 20.' 40.3 3 2* 2* 60.27 q-~ plot is shown below. *3* **2* :..5 \'a(i) %. 63 * * * 100.S -0.05) t we would rejet the hypothesis of normality at the 5% leveL. S i I -1. \ -2. * 2 2 2 so..2 for n = 40 and . .5 Since r-q = .a = .4..5 0.

6592 10.0360 0. For ~ = 1.e. CHI-SQUARE PLOT FOR (X1 X2) 8 w a: 8 ~ ~ 4 c 2 0 0 2 4 6 8 10 ~saUARE 4. (c) The likelihood function 1~Â" --) is fairly flat in the region of Â. -.= 1 so these values are not ruled out by the data.6228 5.10. For ~ = 1.8162 1 .64 4. not at the 5% level).4135 1. We cannot reject the hypothesis of normality at the 10% level (and.1225 o .5 but ~ = 1 (i.29 (a).4607 o . = 1.4135 o .8856 o . no transformation). 7647 3.981 ~.2489 3.6370 o .7741 0. 2. TQ = . 6592 O.3566 o .1489 1 . 47£ 1 1.363531 3~:~~:~~~). TQ = . 1380 O.4584 1.0089 6. not at the 5% level). 80 14 (b).0360 2.9351 the critical point for testing normality with n = 10 and a = . 0857 o .6392 0. We cannot reject the hypothesis of normality at the 10% level (and. n-n niot~ follow . The number of observations whose generalized distances are less than X2\O. (c).4607 1 . 7032 0. Generalized distances are as follows.8985 2 . So the proportion is 26/42=0.3159 2. 1225 1 . consequently.4731 1. no transformation) not ruled out by data.39 is 26. 1849 1. 7874 O.30 (a) ~ = 0.1901 o . consequently.ti) = 1.1 O.6494 o .e.971 ~. 6283 0.1388 0.9351 the critical point for testing normality with n = 10 and a = .3771 0. 8988 4. (b) ~ = 1 (i. These results are consistent with those in parts (a) and (b). 1388 7 .7783 8.4438 0. x = (~~4~:~~:~)' s = (11.1472 0.6190.

9826 for n=69).'(05)°.32 Transformation *: significant at 5 % level (the critical point = 0.84135- Xi 0.2S 4 X6 Xs 0.USïO for n=98).92779(Xs + 0.9652 for n=30).97209 0.978-69 Xs 0.96341 X4 0.32 Xl X2 X3 X4 rQ 0.95986* 0.33 Marginal Normality: rQ Xl X2 0.98098- Transformation (Xl + 0. 4.98079 *: significant at 5 % level (the ci"itical point = 'Ü.31 The non-multiple-scle"rosis group: X2 X3 X4 Xs 0. .s 1 rQ (Xs + 0.96133Xi3. Transformations of X3 and X4 do not improve the approximaii-on to normality V~l"y much because there are too many zeros.94446- Xl X2 X3 0. The multiple-sclerosis group: rQ - - - (X5 + 0.005)-0.4 0.9970 0.98464 - 0.4 0.9640 for n=29).79523- X4 0.21 (X3 + 0.59 x. Those for (Xh X2).Sl (the critical point = O.ûå5)0.005)°.¡0.91137 0. Bivariate Normality: the X2 plots are (X31 X4) appear reasonably straight.005)°.65 4.26 Transformation *: significant at 5 % level (the critical point = 0.94482X-o.49 *: significant at 5 % level - XO. (Xh X3).94526- 0.99057 - 0.005)°.S 0. given in the next page. 4.95585(X3 + 0.91574X¡3.95039- X3 0.

:f CJ 2 2 0 0 0 5 10 e-SOUARE 15 0 2 " e-SORE 6 a .X2) 8 8 6 w " ~ .X4) CHI-SQUARE PLOT FOR (X2.¿ ~ ~ 8 i: c " is is 2 2 0 0 2 0 " 8 6 CHI-SQUARE PLOT FOR (X2.X3) CHI-SQUARE PLOT FOR (X1.X4) 8 8 6 w a: w ~ c ~.X3) CHI-SQUARE PLOT FOR tX1 .X4) 8 8 8 w 6 a: c ~ i: ~ 8 e-SOARE e-SOARE w 10 e-SORE e-SOUARE i: c~ 8 6 " 2 0 10 " " :f (.¿ " :f 6 " is CJ 2 2 0 0 0 2 " 8 8 " 2 0 12 10 8 10 12 CHI-SQUARE PLOT FOR (X3.66 CHI-SQUARE PLOT FOR (X1.

98124 X6 0. it is obvious that observation #25 is a multivariate outlier.34 Mar. it is virtually unchanged (.99011 Xs 0.95162- X2 X3 X4 0. 4.924* rQ I .897* * significant at the 5% level.98421 0.974 for n = 41 From the chi-square plot (see below). Moreover.991 . critical point = .35 Marginal normality: & (MachDir) X. Chi-square Plot 3S :l 25 :! 15 Chi-square Plot without observation 25 10 6 10 12 2 4 6 B 10 12 . the chi-square plot is considerably more "straight line like" and it is difficult to reject a hypothesis of multivariate normality.ginal Normality: Xl rQ.992) for machine direction and cross direction (.9591 for n==25).979 for density.97209 0. rQ increases to .99404 *: significant at 5 % level (the critical point == 0. If this observation is removed.926).(CrossDir) Xl (Density) .67- 4. Bivariate Normalitv: Omitted.i . 0.

37 Marginal normality: 100m 200m 400m 800m rQ I . Notice the values of rQ decrease with increasing distance. times measured in meters/second for the various distances are more nearly marginally normal than times measured in seconds or minutes (see Exercise 4.859* * significant at the 5% level.978 for n = S4 As measured by rQ.947* . There are several multivariate outliers. As the distance increases.968* 1500m 3000m Marathon . the distribution of times becomes increasingly skewed to the right.989 .983 .909* .985 . .866* .978 for n = 54 Notice how the values of rQ decrease with increasing distance. critical point = .976* .36 Marginal normality: 100m 200m 400m 800m rQ I . The chi-square plot is not consistent with multivariate normality. There are several multivariate outliers.36). 4.984 . critical point = .68 4. The chi-square plot is not consistent with multivariate normality.952* 1500m 3000m Marathon . In this case.969* .921* * significant at the 5% level. as the distance increases the distribution of times becomes increasingly skewed to the left.929* .

.... Marginal and multivariate normality of bull data Normaliy of Bull Data A chi-square plot of the ordered distances o C\ r:l/ .9916 normal 00 . I/ r = 0. II 0010 =- :i ai 0 . . iu.C' C\ _I/ 01 "8 0 .. ~~ oX 0 ai 0i- C\ .. 0I/ ~ :: u.69 4.9847 nonnal r = 0. .. ~ a.0 .. -2 .9934 normal 00 _ is: .2 -1 0 1 2 Quantiles of Standard Nonnal 00 ..' ~'" .1 0 2 1 -2 Quantiles of Standard Norml 0 II not nonnal r = 0. .g ¡¡ 00 en I/ . C\ I/ 0 I/ 00 .9631 0 -1 Quantlles of 1 2 Standard Nonnal I/ c: r = 0. d lt co -2 0 co II 0 1 Quantiles of Standard Nonnal -1 2 . .C' -2 -1 0 1 Quantiles of Standard Nonnal 2 -2 -1 0 1 Ouantiles of Standard Norml 2 .. c: ai I/ u. 'C CI ~0 -ë .01 r = 0.. 2 4 6 8 10 12 14 16 18 qchisq .¡ .38.9376 not nonnal . o lt ....9956 normal lt r = 0. . Gl _I/ :i CI co Æ .....

5665 72 11.9254 -0.0080 Ordered dsq qchisq 51 52 53 54 55 56 57 58 59 6 . 0936 71 10.3793 2.2896 83.2524 6. 3669 17 3.Q plot correlation coeffcient test for normality is 0. 1276 7 . 6693 6 . 7748 8 .2891 5 .9605 5. 8587 5.6937 67 9 . 5279 2 .5743 2.0560 2.9273 11.4744 13. 8806 35 4.321$ 12.6524 9. 7983 3.6491 6. 9980 100 .2.4411 3 . 8439 6.5224 995.6254 10.3870 1.6485 3.4049 20 3. 7554 7.1011 3 .4599 4.6333 6.6060 6 .9917 68 10.57'60 6.7832 2.7062 63 8 .8108 6680. 7751 1.0534 -1.5044 -1.4197 66 9 .6829 70 10. 3982 2.70 XBAR S FtFrBody 100.2975 60 61 8 .6287 8.3973 9.9831 82.9839.5044 -0 . 2244 4.7225 75 17.4130 147.6681 3 .4130 4.9078 4. 1896 5 . 3048 3. 0873 8 . 1763 7 .0677 76 21.3830 30 4.7940 9.2828 4 .4142 83 .8449 5 .2851 29 4.8108 129. 7603 4.1986 5 .6917 -0. 2679 13 3.430 1 7.2895 1 2 3 4 5 6 7 8 2 .5906 2.5587 9 10 11 12 3 .1875 28 3 .6725 2.3721 18 3. 3088 YrHgt 5-0.6149 15.7470 1. 1305 Ordered dsq qchisq 1 .3189 3.3215 6. 1876 5. 8667 4.9971 3. 3989 9.1213 4.1430 Ordered dsq qchisq 26 3. 2949 5 .or the Q . because some marginals are not normaL.1286 1 .2896 16850.5801 32 4 .7395 3.9401 82.6795 33 4.5099 5 . 1129 5 .9254 2. 7007 3 . 2763 6 .8160 74 12. 8957 3.0506 SaleHt SaleWt 2. 7797 34 4.3518 5 .7551 2. 0855 5. 1657 65 9.05. with a = 0. 3088 3. 6097 3 .4142 -0.6027 22 23 24 25 3. 7236 3.5808 2.4472 5 .5512 1 . 0495 2.05 and n = 76.9286 64 8.1305 8594.6197 5.7754 6.9281 7. 5041 21 3 .5938 6.4017 5 .4812 31 4.5453 3 . 3004 5.9401 6680 .7081 o .6751 6. 9600 209. We reject the hypothesis of multivariate normality at a = 0.1085 7 . 1405 7 .8912 2.6748 6 .881-6 0.8168 7 . 9863 7 . 3396 0.3982 10.1263 73 11. 1263 1555. 7313 5 .0189 2.8649 From Table 4.0783 5. 1445 4.0506 2.4024 5 .1911 2.5751 17.2522 4 .0534 209. .6430 8 .3006 12.2021 1 . 9118 2.9474 70. the critical point f.5896 7 .9836 6 .6958 10.4577 7 .6618 PrctFFB BkFat 2 .0413 4 . 3264 6.2766 14 3.0180 147.8618 4 .1967 54.0902 27 3 . 9600 -0.1430 -0.3439 2 .9831 129.8037 11.5816 8. 7604 2.9826 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 4.3191 69 10. 4963 62 8 .3115 15 3.9929 3. 7762 1.4141 19 3 . 2036 3 .3470 16 3 .

Plot is straight with the exception of observation #60. rQ = . normality is rejected at the 5% level for leadership.e..991 * significant at the 5% level. ".998 and we cannot reject normality at the 5% leveL. leader 15 .. Chi-square plot for indep. benev.990 for n = 130 (b) The chi-square plot is shown below.71 4. supp. du)A2 5 .984* .997 . critical point = .. . 1 = 0... . conform.993 .. o o 2 4 6 8 10 12 14 16 18 q(u-. 10 .997 rQ I . . .5). ~& . If leadership is transformed by taking the square root (i. Certainly if this observation is deleted would be hard to argue against multivariate normality.39 (a) Marginal normality: independence support benevolence conformity leadership .5)/130) (c) Using the rQ statistic.

The Q-Q plot for the transformd observations is given below.9389. square root) makes the size observations more nearly normaL.u.960 after transformation.01/.. logarithm) makes the visitor observations more nearly normaL. 10 -1 1 (c) The power transformation ~ = 0 (i.5 (i. The 5% critical point with n = 1"5 for the hypothesis of normality is ..9389.975 after transformation.. .. rQ = . . Great Smoky park is an outlier...' . cO "Visitors 5 6 8 7 9 (b) The power transformation -l = 0.837 before transformation and rQ == .e.t .0': 500 o -.72 4. rQ = .904 before transformation and rQ = . The Q-Q plot for the transformed observations is . The 5% critical point with n = 15 for the hypothesis of normality is . o G.40 (a) Scatterplot is shown below.e.given next.

. . -. o Ð 1 ... ... _.......'.:.-. the plot is reasonably straight and it would be hard to reject bivarate normality.. ". .........-........-... transformed nat Chi-square plot for . 2 3 4 5 Chi~square quantiles 6 7 ... ....73 (d) A chi-square plot for the transformed observations is shown below. "..... ....'.... Given the small sample size (n = i 5).

TQ = .5 1. The Q-Q plot for the transformed observations is given below.41 (a) Scatterplot is shown below.989 after transformation.0 1.74 4. (b) The power transformation ~ = 0 (i. logarithm) makes the duration observations more nearly normaL.5 (I 5' o~ 2. Dutå60n Q-QPlatfotNatural Log 3.9591.0 2.e. There do not appear to be any outliers with the possible exception of observation #21.958 before transformation and TQ = . The 5% critical point with n = 25 for the hypothesis of normality is .0 -2 -1 o q(i) 1 2 .

-. The plot is straight and it would be difficult to reject bivariate normality.-....'.7S (c) The power transformation t = -0. 2 .":':'"::-_'..... rQ = .. . The Q-Q plot for the transformed observations is given next. 3 4 5 Chi-squa.:. Ci.:. . .snow..:::. .~ quanti 6 7 8 . -.::'_'....'...._:. ". .'.""" .2 -1 o 2 1 q(i) (d) A chi-square plot for the transformed observations is shown below.. .._..0 0'0 10 8 6 o .'g'~ Chi-square plot for transformed.....QeQ'..:.plot for Reê:iprocal of Square Root of ._'"':::'. _ "". ManlMachinl! Time . .5 (i.'d.. . o ...9591...:':::.' _ '.c_'.:.::-. .. reciprocal of square root) makes the man/machine time observations more nearly normaL.rernovat data .. ' .. ." 'J.'.. .939 before transformation and rQ = .e..... The 5% critical point with n = 25 for the hypothesis of normality is ..991 after transformation.."'.:'.' "'. .. ._.'__d"":. .

2(.'55.05 1 eve1 n (n-1)!.23) = 6.7-6 Chapter 5 -i:13).' 3FZ.-i)(xJ..(n-1) = 3(~4) .3 a) TZ .05) .05.62..0325 (44)2 .) ~ " (i60) ..g1 (:j-~O)(:j-~O)'! 5.0325 .60) is inside the ellipse.. do not reject H1l at the a .J. 2~~) F2.05) = 3(19) =57. we do not reject HO at the a" .17 a -.2.2(.(I Jïi (~j-~)(~j-~) 'I 'r = 244 =.2 (see (5-5)) c) HO :~. FZ .-i)'! ..05 level.64 (.05(3. n ..64 b) T2 is 3ri. s" (-i-:/3 f2 = 1 SO/ll = 13.1803 5..3 = 13:64 !j=l r (x.00 Since T2 _ 13.1 since \11 = (. . -J .05~ .19..5 HO:~' = (.11) a =.05) =.- b) li .I j~i (~r~~H~j-~o)' i/ Wil ks i 1 ambda '" A2/n = A1/Z = '.05 so F2.1 . 3. 5.'5'5.60).23 Since TZ '" 1. TZ = 1.40( .40( . ~ (7... The r.17 (.esult is ~onsistent with the 9Si confidence ellipse - for ~ pi~tured in Figure 5.

01.014) .39 ~93.56. -1.019248 . 32.9 a) Large sample 95% T simultaeous confdence intervals: Weight: (69.638) Weight: . the 95% confidence ellpse is smaller.39 12.(. 61.) el X-\1 .909j 5. more informative.. 103.77) Head width: (29.273 -181.52.603 .48) Gii1h: (86.P4 such that (95.8 f227.025/6) = 2.: .) c) Bonferroni 95% simultaneous confidence intervals (m = 6): 160 (.343) .22) b) 95% confidence region determined by all Pi. minor axis :t .48) Girt: (83. 33.~.2064 .59(. the axes of the 95% confidence ellpsoid are: major axis . than the 95% Bonferroni rectangle.up93. (75.60 ((.SA - 42(~.39).29) Body leng: (16.01 i71f2.025 / 6) = 2.006927J(9S.0146jL-i.04.59/61 = .6361 1.006927--.Sti4 J -( .939 (See confidence ellpsoid in par d.. 176.728 (Alternative multiplier is z(. 19.59) Head Neck: (49.52 -Pi) Beginng at the center x' = (95.41) lengt: (152. 100..55.636 -1 9091 .92. 19.8181 t18L818 212.00.74) d) Because ofthe high positive correlation between weight (Xi) and girt~X4).'3L.£.52.0g) Neck: (51. 121. 173.5 9(' 939) . .31 = TZ r2.343 :t ..P4 L. ) (.636 J tZ = n(~'(~-~O))Z = a' .909 = (2.121 J . 115.J45. ':J . 60.5'5 J ) -1.9a'"J .27.J12.002799 .17.u4 .CI-S -0 5.51) Body lengt: (155.0144 .52 .52.61.77 -1(.37) Head width: (29.93.J3695.Ji 2.0117 .49.003 2 = 1.76) Head length: (16.56.

.... ...--. . .. .us:: 13.49:: tl6 -.88) + 9. . . J . .26 -2(13..9 .025/7) = 2. . .. large sample simultaneous Bonferroni .-~ -------..81 . ..025/7) = 2. . . (m = 7 to allow for new statement and statements about individual means): t60 (.690) n 61 x6 -xs :tt60 .(0036) S66 .-.98) +_ 2. .13 -17. .. ....---- LO )( C' 0 C' I : ...2sS6 + sss = (31. . -' . . . . . .. .u6 . LO CD I .. .. .. "l . . I. ~ 0 CD 60 70 80 100 90 110 120 130 x1 e) Bonferroni 95% simultaneous confidence interval for difference between mean head width and mean head lengt (... . .78 5. . . . . . i I : i : :.. . .tls ) follows. .95 or 12. ..0 LO 0 .---.Continued) Large sample 95% confidence regions. .0 "' -.. .. . .. 0 ....783 (Alternative multiplier is z(.'. . . .78~~2i.

37.69.) d) Bonferroni 95% simultaneous confdence intervals (m = 7): Lngt: (137.91) b) 95% T.24) ~Lngt-4: (-22.00.55) i1Lngt3-4: (-3.95) Lngt3: (127.93) Lngt4: (160.~~.10 a) 95% T simultanous confidence intervals: Lngt: (13D.14) Lngth3: (144.. 179.895) :tv157. 174.70.37. 149.96 .33) .u4-S where .55.65. the axes of ellpsoid are: maior axis .4).447 mior axis :t .25.14. the 95% confidence Beginnng at the center x' = (16. ( .58) Lngt5: (155.Oll024 . 191.53.42) ~Lngth4-5: (-20. 187. 33. .8 72.009386 .95.96/7=10.. 53.69) c) 95% confidenceregon determined by all tl2-3.24. 15.~72.J72.025135 4 .43..895 447) (See confidence ellpsoid in par e.33.18. and tl4-S is the mean increase in length from year 4 to 5. 30. 16-tl2_3.tl4-S such that . 198.43) i1Lngth4-5: (-7. 28.simultaneous intervals for change in lengt (ALngt): ~Lngth2-3: (-21.4-tl4_s ( i.u2-3 is the mean increase in length from year 2 to 3.009386J(16.u2-3) .97) .21) Lngt4: (167.J33.42 .79 5.6Lngth2-3: (-1..96.40) Lngth5: (166. 155. 185. 50.

.. .. ......... . ... J.. ....'80 -5. . I 0C\ I I I I I I I I I I I . . .... . . ..0 :: 0 0.... .. .10 (Continued) e) The Bonferroni 95% confidence rectangle is much smaller and more informative than the 95% confidence ellpse... ... . : I i : I I I --~--------.. . . I . . . . . 95% confidence regions. .. ... 0C\ .----------------~ ...... . vI .. ....... ..... .. ... . ... I . . . .. . ..~. .. ..2-3 40 ..... -20 o 20 J.. . .. I : I .. . ....... .... . o "l simultaneous T"2 o C" Bonferroni ... ... . .

= (.-.97) (. 17.11 a) E' =. 36.n_p(.094 £1 = (.87. .0169 Eigenvalues and eigenvectors of S: .10) = T (3.42 = 15. i. . 1 OJ' is a plausible value for i.49.0508 ~ .759 ~ -- A .8493 .87) .1856.81 5..88.45 Confidence Region 45 40 35 ~ L.30.0700) S = (176.83. 16. 287. I -10 J x1 ( C r ) b) 90% T intervals for the full data set: Cr: (-6.25) Sr: (-4.0276 .26) = 7. V) 'O N )( !' 15 I 20 25 30 35 40 45 -10 - .2412J.t = 688.10) =: 7 F2.(5.7(.2412 527.0042 .0276 J -.49) §i 16 Fp. S-1 =( 287.r- .

. -l.818 we rejec the hypothesis of this varable at a = 0. oi 30 020 o. 7I eo 50 u.0406 -. -1. 73.53. 8. S-1·(2.10) '" 164 (3.0 . -1.8598. F (.0 1..9) Sc: (. -UL .68) removed.82 5.7675.. .10)= 7(62t F" 6(.5 0.07 90% r intervals: Cr: (. . 1.47. .01 normty for d) With data point (40.7518 .5 1. .5 nomscor normty for ths varable at a = 0.0...627 we rejec the hypothesis of 80 . 40 30 20 10 0 .0303 1.0 1.5 nomsrSr Since r= 0. 0149 -T p1n-p 'I 1. .0 0.3786 S =b r .11 (Continued) c) Q-Q pJotsfor the margial distributions of both varables .0. 8.0 os 1.5 0.0303 J 69..4'6) ~.15.8688). ii = (.0406 J .. 17..01 Since r = 0.27) ... 10 .

the 95 % confidence intervals for Pi. P3 = 0.488.(0.8718 00 0. Using Pj:: vx3. 0.0394). (0.7749 0.0 0.6449 0. ßi = 0.084.8632 -50 0.0.1667 0.5 'ß .(0. Bonferroni interval = tn_i(a/2)/tn_i(a/2m).9375 ( 4.17 ßi = 0.05)VPj(1 .0833 i -( 0.198) respectively. 0. COV(Xij. 0.ßi :l Vx3. .0 . 0.029. P2.Pi) = Pi. ~ .500 0. E(Xij) = (l)Pi + (0)(1 .118.412).PiPIi =-PiPk.217).112).7489 0. the 95 %.6911 0. Ps are (0.258. ( 4 i .310.p¡)2(1 . (0. 11 are "(0. 5.12 Initial estimates are 2 1.219. (b).lô6).2ßiPi) In.8745 m 4 10 0.ßi) .044. 0. ('0.Pi) = Pi(1 . Using Pi .8691 100 0. E = 2.\0. 5.pi)2pi +(0 .585.confidence intervals for Pi.P2 is (-0.pj)ln.370)..105. P2. P4. 0.6042 0.Pi)fn. 0.5 i The first revised estimates are 'ß = 6.0 0.83 5. the 95 % confdence interval for Pi .(0. 0.401).Pi) (b).16 (6). 2.2500 1.6678 0. .E(Xij)E(Xkj) = 0 .7799 0. 11. respectively.7644 D.6836 0. (0.8546 25 0.15 (0).6983 "0.8125 i 5.5 0.221.7847 5.ßi) + ßi(1 . There is no significant difference in two proportions.6. Xkj) = E(XijXik) .13 The X2distribution with 3 degrees of freeom.682).2.05)V(pi(1 .14 Length of one-at-a time t-interval / Length of n 2 15 0. Using Pj:l vx'5(O. 5.a.-D5)VPj(1 . Var(Xij) = (1 .(0.0000 .098.

(c). . That is.. The critical point for the statistic (0: = 0. 400 .c .-I.. . 50 40 .. i .. .. . .103 . .9'0 J . (b). ...~.037 0. 30 . . .0 ... 500 I" ir 60 I' I .17 361621 ....-...031 n = 30. 50' . . .. N x I ..0. 01 . . . . : . The group of students represented by scores are significantly different from average college students. 0. .- 30 J 50 40 400 35 ~ -1 0 ..19 a) The summary statistics are: 361"621 . . 1..'1... 700 70 / 60 I' .104 0.. . And directions of corresponding ax. ....2 2 1 2 1 NORMAL SCORE NORMA SCORE NORMA SCORE 0 -1 .:...! . 60 ..183.. . 1 0 I. 0 0 0 70 700 . . 0.o . .. .. ..a.05) is 8.a. . 0. . .. .e : . Hotellng's T2 = 223...t . . 60 ... .473. ..010 ..03 348"6330..50. :. 50J ~354 . .2.994 .oi o. . .. ....Lf xM i J ii o.- 25 20 30 -2 - . i.. .730.I- . . . The lengths of three axes are 23. ..999 ) ( -0."1 -1 0 - .0 . ..006 ) ( -0.-. Data look fairly normaL...33... . I......18 \lo)..84 5. -- 15 -2 2 1 -. .. . . 0..C ~ ._ .31.. . We reject Ho : fl = (500.: :. . x 500 400 I. . .. - - .038 ) ( 0.o 30 40 50 60 70 15 20 2S 30 35 X2 X3 15 20 2S 30 35 X3 5.. are -0. 700 600 -. -x __ (18£0.30)'.995 .. .13 and s = (124055... .-.

: .1 I 1 . : .17 3'61~21. g~nv ect~rs Å1 = 3407292 e~ = (. i .000 - . . i : .So . ! l : 11 . n. . ¡ . .. : . --_. : i ~E . . I ! i .Fp n_p(a) = 3~ 2~i) F2 2St . ! : I i ! . . ' .105740. i . .03 348633tl.ft Xl : .1 0574~) .13-~2) .aoø. . .fJ"w : .. '1 . . . 90 83~4 . 3öOO ' I '._.994394. ~ . since 1 p(.2306 half lengths of the axes of this ellipse are 1. I ! . ' Then. 2.03J' ~1(1860. : .2300 Ir = 886.4 and The l...13 . : -- : f j . I '. .tl5) = . 'v.. i .. ~ . :¿1. .. . . Th~refore the ell ipse has the form -------_. .85 wher~ S has e i g~nva 1 ues and e..øo. --_. : i : : ~ I : i : '.... ..------_. :: -- - l I ._. 2306 .13-~2 (124055. I ! i . . i . IOQ" i 2. I : . . : .-. - .~~. ! i J .:~) . .'50-\11' 8354. 361621 . ._- -~ Ì" /. . ". . i i : ß~5'4. i "J JI : ! . - is given by the set of \1 a 95% confidence region for ~ - (1860.. .. i . . : ! i ... ¡ '- I 10000 : . .. i : . /' i : . -----------_. .~ = 138 ~ 1. l.__..1 ..2306..994394) _1 Å2 = 82748 !2 = (. .5tl-~lJ . .. . : '.

***** .. Ai so .0 1.. . -..990 respectively so that we fail to reject even at the ii signifi- cance level..2--.. l .-- ** 8000 . on ._--_....e 1 at. .._.0 I. 10000)' does not fall within the 9Siconfidence ellipse. :i -2...-._.0 3.._-- * * * 6000.. ._-"- . ** ** . the bivariate normal distribution is a plausible probability model for these data. . . coefficients for the test of normal ity are .' ...-------***" * ...~ 2..._-* *** ---._--- . Finally. -1. * 4000. the data analyz~d are not consistent with these values. we would reJect the hypothesis HO:~ = ~O at the 5% level. Thus.. So.._----- * * * 10000..'0 t:rr.. c) The Q-Q plots for both stiffness and bending strength (see below) show that the marginal normal ity is not seri ously viol ated. ! . _.-_. the scatter diagram (see below) does not indicate departure from bivariate normality.0 ------_. * ._. Q-Q Plot-Bend i n9 Strength X2 12000._- .989 and the correlation ._--._--.989 0.86 b) Since ~O = (2000....

1. -2._____... ... _.. ._----------._~ .___ . ***2 *2 ** ** **** 1600 .87 Q-Q Plot-Stiffness Xi 2800. ._.. -..!.. * * *** * ::ooo..... * * * .-..0 . .0 _ ____. 2.~._-_ _=J.. ._._--_ -_..9. * * * 2400 .._ ..9.... * * 800. ..__.---" . _.._ _ _ _.0 ..._---" 1200. I 0.-_._ ."__0"" .__....-._-.990 .. ---~------I--:.Correlation .0 .

. _._---- 1200.. * * . -. ...._.. i 4000 .1---0..._.... * * * * * **** 1600..._...__._-_..- ** * * *...-------._-. ..---. .. --.__....~. 12000..*... ....._-.... 2400 ...-_.. * ** ... -.. . .. .. .. .- . ** * * * * * . I 4000 .'88 Sea tter 01 agram -_._-~. ..----_._-... --------- 800.... ...---.. .-.-- I 80ÖO.... _.. .-~r._..__. * * * 2000.. 6000... ..__.... '.-_.. . . -.. 10000........_.-:- ... X2 .

i../ . 'ii ...823 274. ..~ ii i ii. ii. . taneous confidence intervals wil touch the simultaneous confidence region Q-Q PLOT FOR X1 Q-Q PLOT FOR X2 310 -. 25S 285.o i . 300 J 290 . iiu. ¥ i" i' .: . .. --- . 96li S1mullJeouB Cooldence Region for Wean Veclor iiI ii. Yes. 210 -. . i. ...- 0 NORMAL SCORE 2 .89 5. "7 .. they are plausible since the hypothesized vector eo (denoted as . . 200 . 250 -2 -1 0 NOMA SC-QRE 2 ISO 20 X1 . Could try tra.. 110 II .822 UPPER 197 . . 310 ..2 280 270 .. 300 .nsforming XI. " i. . )( 190 . 423 274. . i. xN ... . i . 260 .. in the plot) is inside the "95% confidence region. . (11). 1'.. 29 xN . .774 LOWER Bonferroni C.. '.: Simultaneous C. .20 (6)..782 284.. . 280 270 . (c). Simulfrom outside. .. ..422 197 .. .... 189 . i Ui i' .: 189. 260 250 -1 . 180 r . Q~Q plots suggests non-normality of (Xii X2).299 Simultaneous confidence intervals are larger than Bonferroni's confidence intervals..

540 1.786 . 887 .946 .635 .00417 ) TO 1.046 Bonferroni the T2 intevals use the constant 4. 970 .753 .465.800 The intervals use BONFERRONI TO - 2.778 . 629 1.70440 0.26360 0.69384 STDEV 0.766 .28347 0.909 .602 .21 HOTELLING T SQUARE .9 .3616 T2 INTERVAL xl x2 x3 x4 x5 x6 N 2S 25 25 25 25 25 MEAN 0.880 1.723 1. 499 .742 .73484 0.10295 .10685 0.81832 1.90 5.642 .~218 P-VALUE 0.757 .10756 0.79268 1. 583 1. t ( .88 and .608 .11402 0.914 2.84380 0. 95"6 1.

. 2 4 6 8 10 X2 14 . 4 . .5.. . 10 111 . '" NOM4 SCORE l- 12 10 8 -_. ILL .. 16 14 12 10 Xi 14 X 18 30 18 CJ 18 18 14 12 10 I. 4 0 .... . 8 10 X XI x . :: 0 ~ . .... .. 10 . 5 2 .. :. .. . . . ..1 2 -2 NOMA SCORE :: ~ .... 15 l- -I -2 0 .'.. ..... -2 iI 8 14 . 6 .... -- x. a-a PLOT FOR X1 a-a PL-DT FOR X2 a-a PLOT FOR X3 30 18 2S 15 20 X C/ W ..l:: a l:: a:: .. '" ... . .. . 14 12 10 ... . 0 -1 2 NORMA SCORE l- 15 ~ ....' . . .. After eliminating outliers.. . . . 30 10 S 1S a-a PLOT FOR X1 a-a PLOT FOR X2 a-a PLOT FOR X3 12 10 8 6 .'.. ø.l 8 . 10 5 . -2 NOMA SCRE . '. 6 0 2 .. 14 12 10 8 ..- ~ .. the approximation to normality is improved. .. x 14 12 10 .. . o. .- . o . _. 2 4 II . . . .."" . . .. 8 8 6 4 4 5 20 15 25 . 4 2 . 2 0 -1 ... . ...1 14 '" ... X2 -2 ~ xM o. . .. . .22 91 la). .... . .. .X is 20 25 a: W . . X1 4 ~ . 18 16 14 12 10 8 4 NORMA SCRE II 10 I..2 14 . 6 II 4 4 4 8 8 10 X1 14 -I 0 2 NOMA SCRE 111 12 10 .. 5 .' 10 a: .'-- 8 6 0 .. 5 .

i.76 Simultaneous confidence intervals are larger than ßonferroni's confidence intervals.44 9. (b) Full data set: Bonferroni C.34 Simul taneous C.24 8.23 8.24 10.63 5.33 5.: 9.25 4.82 12.19 12.55 8.67 12.16 5.09 12.edi~ LOWER UPPER Bonferroni c.: Simultaneous C.: 9. i. I.65 12.87 9.41 13.72 8.87 . I.: Lower Upper 9.92 l.78 10.96 11.21 15.79 15. Outliers remov.

. C) :i . . -2 '. . NScBL d¿) . . . . 140 ..992 o - .. . . . .. . . = . . .. .. - .5)/30) i 2 NScNH i.. .97'6 . .= . 120 - . .c 't CD x t' :2 130 - .c Ul lU 50 - Z . . . .= . 5 _. .1 I I . . ... . I i 0 1 .93 5.. - .fe. . .23 a) The data appear to be multivanate normal as shown by the "straightness" of the Q-Q plòts and chi-square plot below. .. . . ¡ 10 . . . . .995 10 - . or 0) :i Ul lU 130 ID . .. . 90 - . ..( -. .. .4l. 45 - . . . . I i "U 5 . . . 55 110 - -i s: . . -T . . -2 -1 0 1 2 . . . . . C) Ul lU m 100 - . . 140 - . . . . ..994 i. . 120 2 NScMB -2 -1 0 2 NScBH ~= . . . I -1 0 ! r. . . -1 i -2 -1 .

23 (Continued) b) Bonferroni 95% simultaneous confidence intervals (m = p = 4): t29 (. 133. 135. 102.496 26 4. 51.17.43.26 MaxBrt: BasHgt: BasLngt: NasHgth: (128.87) (131.94 5.73.08. 52. 134.75. 102.42.91) (48. 136.47) (95.02) (49.89) 95% T simultaneous confidence intervals: 4(29) F (. .32.87.78) (96.663 MaxBrt: BasHgth: BasLngth: NasHgt: (128.05) = 3.31) The Bonferroni intervals are slightly shorter than the T intervals.66) (130.05/8) = 2.

. ......070.... ......7 COA 13563....: ii:: "C ":.... 0 9654.....4 1182...9 Holdover 2676.... .... .. .. 1 -946 .0 5026....0 0 a(" .... .... Wisconsin.. ... 2 4 6 8 10 12 14 16 Observation Number The XBAR chart for x4 = COA hours ai :: ii .... 1 -622........0 0 ... .. ....2 2222 .....8 ô06... ..: ii:: "C ":...... .......... . 2 4 6 8 10 Observation Number Both holdover and COA hours are stable and in control....... 'e ...24 Individual X charts for the Madison....9S 5......... 12 14 16 ....... .0 474............0 CD 0 0 0 (" 0 .....5 LeL UCL 5377....0 17473.....5 0 ... . .......0 MeetOT 1738.................. .8 -2....... .... .. 1 use use L-CL = 0 LCL = 0 use LCL=O The XBAR chart for x3 = holdover hours 0 ai :: ii ........ ... 0 0 0 Q) .9 1207... ..4 1478..__.... ...... ---------------------------------------- ...........................5 0 0 0 ..... ....6 1303...... ................ ... 2 6300 ....--..2 800.....a..... ....... Police Department data LegalOT ExtraOT xbar s 3557. ...................y. 'e ........

..2677)2 + 1.... . The quality control 95% ellipse for holdover hours and COA hours 00 0 r...... ...80 X 1O-6(x3 ..n. . .........__..... ..13564)2 +1._........ 00 0 T- ......_.99. .... ....2677)(X4 . co .13564) =5...37x 10-6(X3 . . '''... .. .991 ci .......96 5.. J: c( 0 u0 0 t'0 .. 0 00 ......... quality control 95% ellpse is All points ar.... 0 0 0 C\ . .18 x 10-6(X4 . ...... i: in t! 'It' C\ o .....It 0 in 0'I0 :i 0 . 00 0 . .. T- -1000 0 1000 3000 5000 Holdover Hours The 95% Tsq chart for holdover hours and COA hours a: r- UCL = 5...25 Quality ellpse and T2 chart for the holdover and COA overtime hours... The 1. ......_.+ . .e in control.....

2677)(X4 .... The 95% control ellpse for future holdover hours and COA hours 0 ........ .51..... and X3 = holdover overtime hours..... ....................... The 99% Tsq chart based on x1...+ ............ .13564)2 +1........13564) = 8......37x 10-6(x3 ... X2 . 0co0 0j !! :z ......... 0 ......v c( 0 () 0 ... 00 ... x2 and x3 o ... . All points are in control... .....26 T2 chart using the data on Xl = legal appearances overtime hours....... CD C' ~ co v N o 5.. ................27 The 95% prediction ellpse for X3 = holdover hours and X4 = COA hours is 1....18 x 1O-6(x4 ...80x 1O-6(x3 .....2677)2 + 1........... 0N0 o o o o .....97 5..extraordinary event overtime hours....... -1000 0 1000 3000 Holdover Hours 5000 .......

065 . 0000 0.0924 .0197 .0268 .0000 X2 X3 X4 O.0343 0.0078 .0031 -.0210 0.0146 .28 (a) x= -.0049 .3428 .0066 .39 and 40 exceed the upper control The individual variables that contribute significantly to the out of control data points are indicated in the table below.QO 0.0198 0.36. 0000 o .0221 .0032 .98 5.01 0.0197 .0083 . Point Variable P-Value Grea ter Than UCL 20 33 Xl X2 X3 X4 X5 X6 X4 X6 O. 000'0 0.0105 0. 0088 \). 0000 \) .0146 -.0049 .032 s= The fl char follows.00.0000 0.0078 .0211 . (b) Multivariate observations 20.0155 -.0114 0.0268 -.0008 .0066 .0088 O.1446 . OO.0-013 .0228 .0221 .0474 .0.0008 . limit.0616 .062 .0228 .0001 0.1086 .0000 36 Xl 39 X2 X3 X4 X2 X4 X5 X6 40 XL 0.0366 -.0054 o .698 .506 .0616 .0211 -.0083 -.0155 .0474 . 33.207" -.0626 .0031 .

.226.508:t.635(.392/.593) Nuclear: .258:t .(.766:t.9251.392/.635 Petroleum: . the r intervals wil be wider than the Bonferroni intervals.J) = . .081(.766:t 2.49 = 3.873:t .05) = 7.161:t. for the Bonferroni intervals and everything else for a .J) =.345 -7 (.258:t 3.508:t .(1.010. the multiplier.873:t 3.111) Natural Gas: .99 2 472' 2 29(6) . we do not reject H 0 : ¡.182 -.J) = .J) = .J) = . Since T = 12.076 -7 (.(.J) = .155 -7 (.171-.207/.112.838) Coal: .620) Nuclear: .J) = .072. Multiplier is ~%.404 -. 5.J) = .362.207/.863 -.081 Petroleum: .081(.250) Total: 1.611) Petroleum .472 c: -.873:t 2.738 -7 (1. = 0 at the 5% leveL.438:t2.978/. 5.081(.766:t3.0042) = 2.(.330 -.J) = .736) Petroleum .Natural Gas: .J) = 1.081.283.9251.404) (b) Large sample 95% simultaneous r intervals for the indicated means follow.05/2(6)):: z(.635(1.05) = .146 -.282 -7 (.J9.258:t.170) Natural Gas: .089 -.635(.29 T = 12.081(1.4141.256.161:t3.178.635(.438:t.(. .635(. 2.161 :t. .414/.438:t3. .237) Total: 1.873 :t.2 .429) Since the multiplier.508:t3.J) = 1.Natural Gas: .135.161:t 2.J) = . Multiplier is t49 (. .(.081(. 3. 1.635(.635.421.(.753/.25(2. .438:t.258:t2. .F6. .30 (a) Large sample 95% Bonferroni intervals for the indicated means follow.081(.087.24 (.(. 1.978/. 2.790) Coal: .085.508:t 2. 2.766:t .51) = 18.753/. for the 95% simultaneous r intervals is larger than given interval is the same.

.101 ~ (.208el . . the axes of the 95% confidence ellpsoid are maior axis: :! IT v Â..100 5.05) ei = :t. indicates the confidence ellpse is elongated in the ei direction.524 23. (See Exercise 4.171J Y l .9.905 624. logarthm) makes the duration observations more nearly normaL.31 (a) The power transformation ~ = 0 (i. The power transformation t = -0.= p.240:t2.00160 with corresponding eigenvectors ei = (.021e2 mInor axis: The ratio of 25(23) . r: 2(24) :tvÂ..240 s = r .5 (i.0125)¡.171:t. e2 = (.05/2(2)) = 2.41. . the lengths of 25(23) .99925l Beginning at center y. Â. = . = 2.0058 .0018 . = .03866 .241.J.416/.341) . say Yi = In Xi' Y2 = 1/'¡ where Xl is duration and X2 is man/machine time.e. .240:t.15153.1513 = 2.) For the transformed observations.99925 . F223(.042 = 9.J.OS)e2 =:t.0018 =.905J l23. S-i .391. reciprocal of square root) makes the man/machine time observations more nearly normaL.139.1513 -.391.2(24) F2 23 (.03866). 3. .171:t2.0125)'¡ =.e. so the 95% confidence intervals for the two component the transformed observations) are: means (of Yi :tt24(. (b) t24 (.527 The eigenvalues for S are Â. the major and minor axes.391.101) Y2 :tt24 (.r 7.0058J l.930 ~ (1.

13.Chapter 6 ii. : UPPER -21.27). Yes..3 The 95% Bonferroni intervals are LOWER Bonferroni 'C.333) Ellipse cent~rl!d at r = (-9. After the elimination of the outlier. Bonferroni C.52 3.36. .1 WI Ei9~nvalues andei9~nvectnrs of Sd are: "1 = 449.943) "2 = 168.I. we reject Ho : '6 = O.778. the t.est answers the question: Is ô = 0 ins1tfe the 95i confi- dence e 11 ipse 1 6.50 22.97 Simul taneous 'C. .70 -~ .: -22 .58 units. I.56 -23. 6. -.6338.85 29. 57 -2.: -20 . i.082. 30 -5 .57 units. the difference between pairs became significant. are consistent with this result.egion.!.: Simultaneous C.92 -2. Major and minor axes lie in :1 and !2 d.31) 20. Half length of minor axis is 12.70 UPPER 1.943. 45 -5.25 Simultaneous confidence intervals are larger than Bonferroni's confidence intervals. respetively.r~ctions.73 32. !1 = (.333. UlWER Bonferroni ~.70 Since the hypothesize vector '6 = 0 (denoted as * in the plot) is outside the joint confidenæ r.0l25) = 2.2 Using a critical value tn_i(cr/2p) = tio(O. I. Half length of major axis is 20.08 -3. ~~ = (.

3 0 M 20 U 1 2 10 M U 2 0 2 . ld .1 . 1.: T Simultaneous C.7 _0.95% Simultaneous Conidence Region (or Della Vector 102 ...2 -e..10 -20 -10 -..04 -0.18 -0.& -0. Bonferroni C.' -0.o.' ... I.S .2 -0.69 . -0.4 -1." -1. o..07 0..0 Lower Uoner -1.3 0 MU11-MU21 o 6... -0. 0.4'59... ó =.09 -0..... 0..S -0. Since the critical point with cr Ho : .0 0.05 is 9...: . 1. we reject 0. HoteHing's T2 _ 10..0.4 -0.a .4 (a).' (o~O) .4 -0.3 -I."'1 -it Figure 1: 95% Confidence Ellpse and Diffence 'Simultaneous T2 Interv for the Mea .. .0 -o.8 -0. (b).64 95% Cofidence Slips Ab the Me Vecor ..' 0.215. -0...3 Problem 6.5 -1.10 0.. '.02 -1.1i -1..R -1. ..

25 1.... Marginal normality cannot be rejected for either variable. 0. ///..... -'- 4 . i5 / .%2 plot is not straight (with at least one apparent lJivariate outlier) and... 0./.. .5 ......0 .......S /' . ~/ ..25 _0... ôo _o.50 /////" /-. 0-11 ....25 CI is .& _1..s // -.1_ Chi -squa-e Aot d th OderedDistcnce d ..00 . . . .5 .. -I.- ////_1. /. 3 ../.-..--~/ ~/- ê ..i 7 . r"''/ // . ... it is diffcult to a-a Plot o./ .. The.1~.5 lr' Qlt...//-' ~.103 (c) The Q-Q plots for In(DiffBOD) and In(DiffSS) are shown below...~ o 0.... //~' ../ _0. .. 1_ Q-Q Plots 1... / // //"/ _./ ///// ..// .-///- o ./ ./"./ // _2. (n =11) is small.. although the sample size "argue for bivariate normality.60 .76 ~ 0..

- T2 = n(Ci)' (ese' )-1 (ei) = 90. q = 3 ((~~~li)l) Fq_l. CSt' = (55.57.4.4 . .0 1-2 .3 111 .1-3: -4.1 .4 ~ 6.9 :! 3. = (~1'~2'~3) · -32.2).3 The means are all different from one another.3) :! -/6.05) = (3~~2 (3.5 a) H: Cii = 0 wher.n_a+i(.1-3: ti.6.6 66.9 -32.1-2: (46. n = 40.67 J5~õ5 = -11.5 .67 reject H :Cii = 0 b) 951 simultaneous confidence intervals: 111 .6) i:x = (-11.104 -1 .67 o-- Since T2 = 90.0-- 6.e C = ('0 1 -~ ).25) = 6. ~.3 :! 3.2 :! 3.

. JJ-1( J (ni +"2-2)P 1 FP'"l +n 2-P1 ( .~6 "1 +n2-p.7 TZ = (74..4 . 99%simul taneous confi-dence intervals: 1121 .45 1 +n2-p 1).) 6.'6 . = 1'6 1 53661.4 (1.~ TZ = (2-3.1 .6 b) .1.-88 r (:~J ("1 +n2-2)p _ (5)2 _ -(" p 1 ( .5 ( 1 1 (10963. 4-2) ((1 + 1) -1.l1:n: (2-3) :! I4 Æ~+l)l.4 201.l132: 2 :I 7.'6 a) Tr"eatment 2: Sampl e mean vector -3~2 J (:l sampl e covariance matrix (-3.2 6.5 1122 .IOS 6. c).4 ( 1.Fp'1n +n 2-- Since TZ = 3..6 = 3.= -1 :16.! 21505.3 _ 201 . Since r2 = 16.88 ~ 45 do not reject HO=l2 -!!3 = ~ at the ci = .01) .5 7.t!2 = ~ at the ct = .o level.2 Trea tment 3: Sampl e mean vector samplè covariance matrix (:) ..26 reject HO:~l .6) (45 + 55) 21505.4. -4/3 J 4/3 r~13 Spoled = -1.4 (18) .01 1 evel .05)= 6 .

..-).8 a) For first variable: trea t:nt observation (: 5 8 1 2 5 3 + mean.. &êrS _ poo ~ . = : 7) = (: 4 4 4 4 4 4 r.-13 rlB 3-1=2..f.esidual + 4) (2 2 2 2 4 + 1 -1 a + -2 -2 -2 4 -1 -1 -1 -1 SS · 1 92 mean . SSobs = 246 . -1ldxi-it= (.001 7 J 6.. effect SStr = 36 1 (0-1-12 20. 84 18 35) Tota 1 (corr~ct~) 35 Hl2 (54 '5+3+4-3-9 n . 106 . 48 J -13J .0026 (.-1 -2 J SSres :: 18 For second variable: 5 55) (333 3 3 6 3 = 5 5 5 + -1 -1 -1 311355 (79 6 9 9) (5 5 SSobs = 402 5 5 -3 -3 -3 -3 SStr =84 SSmean = 300 3)(-1 1 ~2 1 'J + -1 2-1 1 -1 -1 1 SS res· 18 Cross product contri butions: 275 48 240 -13 b) MANOVA tabl e: Source of Vari ation SSP Treatment B = (36 Residual W - 48 d.

. Al ternat1vely. .'2 1n .-a)(d..i09 Since x.9 for!! matrix C _ n. + ngxg-x(ni + .9 . .J .. . using Bartl ett' s procedure. Since F4.g ei 1)'((xi-x)u1 + .3 with p = Z and 9 = 3 (1 .J n"J . ...e . + "g)i-x(ni +.. .. +" )) = 0 .02 .2 ) ln A* = . .-J . ....10 d. + (xg-x)u ) = x((x1-x)ni + .U362 Using Table 6.J" -J" .x) .01 l~vel. .28 we again conclude treatment differences exist at e = . = x(("l + . n.....01) = 13..n-.16(.J" .a = C(x. = C~ 1 1: x.-x)')C' = t:SC' .n .107 * ~ 155 c) li = TäT = 4283 = .. + (ig -x)ngJ = i(nix1 + .01 level.~ = 17 . Sd =.77 we:~onc1ude that treatmnt differences exist at e = . 9 ...) (5) ( ) .1 .1 r(x... a = 1 I: d.01) = 4.12 .-x)(x.. ( (p+g.-J .1 r(d. \IK) 9-1) .0362 = 28.1 ... 6.IÃ \ (En 1 .-a)' = C(..+ ng)).) = C X and so 6..

13 . respect to * at 1 Qi+n2-~ 12..+nZ b = 2 and B = (n1 -1)S.+( "2-1152) =(( (".11 l(~1'!:2. _ A_ using (4-16) and (4-17).) z~ ". residual effect .2)52) 6.t) = L(~l .)+nzlexp 2 lt ~. ( : -: : ~l = (~. a) and b) For firs. -3 -3 -3 -3 1 -2 4 -3 -2 0 1 .2)SZ) = n +n S poo 1 l!d (for the maximization with respect to . _~.~~J + (1 .. factor 1 Observation = mean + effect + factor 2 . The likelihood is maximized with respect to ~.12 i = n +n ((n1 -1)S. +"21p (tr t-' ((".!:1)1 ".108 6.t variab1 e: .10 with n.l' t-'(~-~.l + "2(~2-~21' t-l(~2 .(~.:Z 4 -3'1 (0 1 -1 . + (nZ .t)l(~2'.. :l (nZ . ~ ~ ~J + (-~ -~ -~ .1 2 -1) j 51. OJ SStot = 220 SSmean = 12 SSfac 1 = 104 SSfaC Z = 90 SSres=14 For second variable: 8233 = 3333 + ~ 6 -5T -3 -6 (3 3 33 33 3 (8 Z OJ 3J SSt~t = 44() 1 -ZJ (-3 0 3 OJ 1 1 1 1 + 3-2 1 -2 + 1 ° -2 1 -6 (3-2 3 -2 1 -2 2 ° -1 -1 (-65 -6 5 -6 5 5) SSmean = 1\) SSfac 1 = 248 SSfac Z:i 54 SSres · 3Q . and ~z at ~l = ~1 and ~2 = ~ respectively and with . + ".J 1 1 1 1 . see Result 4. -2 4 -3J + 2 -1 0-1 -3 -4 3 -4.

87 ~ X:( .5 1n ( 356 ) = 19. We al so reject HO:~l = ~2 = ~3 = ~4 = ~ at the ~ = . gb .1"=3 -1 =2 Factor 1 148 1481 248J l04 b-l=4-1-3 Factor 2 51 51) 54 ( 90' (g-l )( b-l) = 1) Residual (14 .f.(Q-1 = -(61_ 3-2)ln.3 0 Total (Corrected) r 208 1911 . r~s 2 \ ))J1nA* SSP fac +( SSP res .1 = 11 L191 332J d) We reject HO:!l =!2 = !3 = ~ at a = ..109 Sum of ~ross products: SCP tot = SCP m~an + SCP fac 1 + SCP fac 2 + SCP r~s 227 = 36 + 148 + 51 .8) -8 .. ss i.05) = 9.49 .05 l~vel since ((g-l )(b-l) _ (p+l2..05 level since . IÙ ë: -5. 9 .8 c) MANOVA table: Source of Variation SSP d.13204' and concl ude there are factor 1 effects.

05 level. K ~ res _ ((g-l )(b-l) _ (p+l .77 ~ X~ (.nti* -(6 _ 3-3)R.2 (b-l))R. . 6.03 we ~.n ( 356 ~ = 17.nA*.110 .59' 6887 .n( .24 36 .808) = 2.05) = 21.. G .. 0: -6 R. -13.5R.14 b) MANOVA Tabl e: Source of d.(p+l .n~ res I \)" 2" i==SSPf .0 (32 4:) (36 Interaction . SSP Variation 1841 2 Factor 1 184 (496 208J 24) '3 Factor 2 . = !34 = ~ (no i!lteraction effects) a~ t~ a = . .. -13 5tn tn+ res SSP res .! reject HO:!l1 = !12 = .6 ~S41 12 Residual Total c) Since -84 (312 400J (Corrected) 23 " 1 24 124J 688 (876 .05) = 12. =lSSP1. and concl ude there are factor 2 effe~ts.p SSP r."f.(g-l Hb-l n/2)R. +Iss.88 0( xi: (.. I SSP I -(gb(n-l) .

05 1 eve 1 .s :: -12R.n(.1.1fi 0( XU. OS 1 eve 1 . ~ ISSPresl ) -(gb(n-1)-(p+l-(b-l ))/2)Wi* :I -lZl lssp + SSP r fac 2 r.7949) =2.SLnfac lssp 1 ~e~sp res r . ( lssp 1 ) = -.47) =16.24..19 ~ XH.49 we rejet:t -2 -3 - HO:"t_1 = "C = "t = 0 (no factor 1 effects) at the a.111 Si nc~ -(gb(n-l)-(p+i-Cg-1))/2)R-nA*=-11. = .SR-n(.05) = 9. Since . ."59 we do not reJect HO:~l = ~2 ? ~3 = ~4 = ~ (no factor 2 effects) at.05) = 12.the a = .

11 + ll2) .I ( n-1 q-1 ) () .) 9.0 1 05"625 . .54 Since T2 = 254.- r2 = n ( Cx p ( CSC i ) -1 (Cx) = 254. n · '5.3 + 1.51n(. '. with :' = (1 1 -1 -1).05) = 7.5 :: 174.5~n (.(1.n-q+l a I rc =n = 421.1l. .(247. b · 2.81 .7 . Hi: C!: 'I Q where.27(. we reject HO at a = . -1 a a-- 6.5230) :0 9~40. Again we reject "0 at a.1 x = 1749. g.96 :) X: (.5 1509.n_q+1(a) = (3~~it11F3. I~ :t (Il~q+ 1) fHq-1 . a) For "0:!1 . For HO: ~1 = ~2 = ~. Suniary stati stics: 1 906.05 level.static.. S = 94759 87249 94268 1 01 761 761 6ô 81193 91809 90333 1043Z9 . 14*'= . .5 -.1 1725. 59ó) .15 Example "6. A* D . means h.16 H : Cll = 0..3819):2 · 13.54 we reject "a at C1 = .OS level.05 level.05) = 9. versus .4) is. 95i simultaneous confidence interval for -dynamic. c=U 1 -1 o 1 -1~J .'5230 and :~4.~.112 6. ~(~:~ii)l) Fq_l.2 .3819 Since '* -(gb(n-l)-(p+l-(g-l))/2)tn A =-14.7 . These results are consistent with the exact F tests.

6 -7~-280.17 (a) Q) Format ø Words ø Different Same Party Effects Contrast Party main: (¡. reject H 0 : C. -125.J9. Format effect-"words" slower than "Arabic".4) . 75.2 + ¡. T2 31(3) .5 -7(-411.1) 32 Format main effect: -307:t. the format contrast and the interaction contrast.113 Arabic G) 6.3.(#¡ + #3) Format main: (¡.40.40 42.5 -7 (-32.598.4. -186.4:t.3 + #4) .3.(¡¡ + #2) Interaction: (#2 + #3) .40~9.939.8l8.u = 0 (no treatment-effects) at the 5% leveL.1 J S. (c) The M model of numerical cognition is a reasonable population model for the scores. 29 (b) 95% simultaneous T intervals for the contrasts: Party main effect: -206.40 20.93) = 9.9) 32 Interaction effect: 22. (d) The multivarate normal model is a reasonable model for the scoresconesponding to the party contrast.4:t.: -(2.0) 32 No interaction effect.J9.J9. .9. Party effect-"different" resonses slower than "same" responses.(#1 + #4) Contrast matrx: c= -1 -1 (-1 ~1 . ince = 135.

. C\ . .. ... .~ (0 -g . . .. . 2 1 -2 Quantiles of Standard Normal ... . . . . . ... . . C" Q) E 0 . -. M . . ... -i: ~... .. ::~ (0 . ... Q) . .... . ... .. . . -. 0 ..-10 . .. ..s .. .114 6. . .. ... . ... -'C .. . -2 Quantiles of Standard Normal . 2 6 4 qchisq . . 0 -1 .. . .s . :2 . ëi i: .. . ... .. 10 -ê ~ "C .. . . . . ... . ..~ ~~ .. . c Q) ....-C) Q) .... . ~ .. 'C . co . - . (0 1 2 ~ ~ . ....CJ S (' .s. .... co .... .. .. . . . .... . -. :5 10 . . ... . .....18 Female turtle Male turtle A chi-square plot of the ordered distances (0 .. . 0 . . . . . -1 t) 1 Quantile ófStandard Norm::l 2 . -1 1 0 2 Quantile of Stanar Noral 10 co M -.. . ....... C' ~ (0 . -2 .. .. . -1 10 CJ co . . . . ~ . . . 4 2 6 8 10 ... .. 0 C\ 0 0 . 8 qchisq 0 -1 2 1 Quantiles of Standard Normal . Q) ~ Q) "E . .. .. . 0 :2 ui ë.. .. .... -2 . ... e. .. C' CD A chi-square plot of the ordered distances ... 10 -§ C" ëü . . ~ .. .. . :: co (0 -2 . 0 .. 0 -1 -2 1 Quantiles of Standard Normal .. .s ta (' 10 ia (' .. . . . ..

9006593 4.2926638 0.7031858 0.1290622 0. 355E-1 0 linear combination most responsible for rej ection of HO has coeffici.0113036 0.07ß8599 0.0541167 0.2217252 0.546415 95% simultaneous CI for the difference in female and male means Bonferroni CI LOWER UPPER 0.0127148 0. 72677 -8.0140655 0.3275751 . 6229089 3.0187388 0.0140655 0.0689451 0.0127148 0.2735714 0.2365537 0.115 mean vector for f~males: mean vector for males: X1BAR SPOOLED X2BAR 4.0165386 0.0577676 0.710687 67.833461 27.0165386 0.8164658 4. 7254436 4.4775738 3.1466248 0.3451377 LOWER UPPeR o .0158563 TSQ CVTSQ F CVF PVALUE 85.ent vector: COEFFVEC . 9402858 4.43.052001 8.118029 2.

5640 .( -2 3.6959 J' (('1 + pool l)S ed )-1 = n1.92 Since T2 =(-:1-)-:2 'n 1+ 1 n2) Spooled :) (n1+n£-2)p . -4.!!Z' = 2 .6857 6543 ( 4.4084 -.0134 12.7731 13.Ol) .590 _ (12.5441 4.8 J . Thereis a diff~rence in the (mean) cost vectors betw.219J 18.) '" _s-l (.01 1 evel.116 6.SS(.01) = 13.8960 '( 15.5750 2.3664 2.55 F3. .8745 -.el trucks.ed -1 . _ (57)(3) (ni+nZ-p-l) Fp.een gaso1 ine trucks and dies.a. 9.19 a) ~1 = 8.t. L--i'0939 -. wax x = -1.ni+n2-p-.ae .5. 2. we reject HO at the a = .0203) . Hi8 223.8112 7.8512 7. 1525 .855026.9066 51 = 17. (( )-1 (. Spooled = 20.113 i :z = (~~:~:J i.3623 .3'621 J.7599 46.HO: lh .-) c: pool.~l -) -:2 = St'. n2 .7458 5.48 .9'633 Sz = 25.

-) I ( 1 1 )-1 (.117 c) 99% simul taneous conf.~22: -2.:2 ñ1 5.~2 = ~ at the a = .01 level.578:! 4. ~1 .~21: 2. However.~2 .)) r: 1 l' )-1 (.790 ~12 .-) 2 ( ) .~2 .15 ~)(3 . by Result 6. .( )) 1 (~l -.~23: -8.. . Since 51 and 52 are quite differant. using "large sample" theory (n1 = 36~ n2 = 23) wa have.34 we reject HO: ~l .341 ~13 . + "2 S2 ~l .!:l -!:2 .xp 5inca (.e to pool..650:! 4.4. Thisis consistent with the result in part (a).913 dl Assumpti on ti = t~.~ence interval sara: ~ll .~2 = 43. it may not be reasonab1.~2 'Lñ1 51 +ñ2 52 ~1 .01 = 11.113 t 3.(~1 .

490238 2. .. . 985685 8. 1: Mean Diff. .20 (I) 31. 2. .. ... The most critical linear combination leading to the rejection of Ho has coeffcient vecor (-3. 260 260 L a 240 I 1 2201 m 200i 160 . . .76436 -1.. .: LAELCI Mean Diff. 3392202 RESULT Coefficient Vector: COEF -3.. . 300 280 '" i ngm (b) The output below shows that the analysis does not differ when we delete the observtion 31 or when we consider it equals 184. . . 2: LICIMD LSCIMD -11..161905 -5 .118 6.07955 (Tail difference) (Wing difference) .. Both tests reject the null hypothesis of equal mean difference. . 31 Delete~ T2 C 25... . . . ..07955)' and the the linear combination most responsible for the rejection of Ho is the Tail difference.. Comparing Mean Vectors from Tvo Populations rObS. .490238. .9914645 Reject HO..005014 5.. . . There is mean difference '951. 260 . (c) Results below. simultaneous confidence intervals. . .I. .

.s. . .."' (d) Female birds ai~ g~flerally larger.9914645 Reject HO. 31 = 184. 95% Cofidence Ellips Ab th Me Veda -:61 . There is mean difference 957'simultaneous confidence intervals: LICIMD LABELCI Mean Diff. indicating no significance difference.003431 8..574268 2. ~.1812088 REULT COEF Coefficient Vector: -3.~\ .. 1220203 . 5 .119 ~omparing Mean Vectors from Two Populations 1. 2: LSCIMD -11. since the confidence intervl bounds for difference in Tails (Male .27998 -6. . . :: -eo "f iL\" \ t.e.662531 5.78669 -1. 1: Mean Diif.Female) are negative and the confidence intervl for difference in Wings includes zero.~ .i lObs. T2 C 25...-.

~2 t ~ T2 = 15.01 (-.830 :) (3~~(4) F4.OS level. S-l a a:ed(X-1 _ -2 _ pool .3S(.~2 = ~ and H1: ~1 .241 .2) and (4.21 (a) The (4. .47 so we reject HO at the ~. ( c) x ) -3.con- siderably..4) entri~s in 51 and 52 differ .OS) = 11.120 6. "1 = n2 so the large sample approximation amounts to pooling.74 = . ( b) H 0 : ~1 . .16 . Howev~r.

00 where 11.3296 XF'.1 = 2.53556 critical value is vp F (.05) 37.pooled.4 + 1 4. whieh apply to the standardi zed variables. Notice the critical value here is only slightly larger than the critical value in (b).5.00 is a critical point corresponding to cr = 0. Firstly this was a self selected sample of volunteers (rrienàs) and is not even a random sample of graduate students.737.600.121 (d) Looking at the coeffic1.37.3152 = 3. we see that X2: long term interest rate has the largest coefficient and therefore might be useful in -classifying a bond as 'lhigh" or "mediumlt quality.16348346 -1.2336 J -1.234.4650835. at the 5% level.1697.647)=11. 1. 38. Further. the .344(4) F (. 6. The coeffcient of the linear combination of most responsible for rejection is (-95.032834. the same conclusion reached in (b). 1.376(2.830 ::11.2 = O.0025.344-4+1 34.8687428 -17.762)'.145. 11.344 so. reject H 0: I! -!J2 = 0.6876 .513 v . Have p = 4 and v = = 37. Therefore.1.i . -0.3383659 (c) \Ve cannot extend the obtained result to the population of persons in their midtwenties. for female mean -male mean: ¡ -0.1548 49.487 ).344 Since T2 = 15.344 . T2 = 15.05)=149.0~5. -5.513.6.ents â1'Sii.v-p+1 37. 4+16 (e) From (b). (b) The 95% simultaneous C.830.3972 J _ jS8' XM _ 5.J.87"60572.3404 The Hotellng's T2 = 96. we reject Ho : J.3136 J ( 0. 0. graduate students are -probably more sedentary than the typical persons of their age. -0.22 (a) The sample means for female and male are: ¡ 0.p + 1 p. .

729 Total B+W = .122 6.294 .07563 NAllOVA Table: Source Trea tment d. (28.820 J B = (11. 9 = 3 (~epal width and petal Width) responses only! . (.104 Since (rni-p-2\ (1 .0471i.37 .4 J S = 3 (. ( 1U70 ~3 = 2.147 695J 90. S =.026 J (2.352 2 4.344 75. SSP -21 .18576 ~l = (3.326 J .1 0368 .64 -17 · .esidua1 .f.F4 .05 level.p = 2.974 . . 232.00474 J x = -2 · 0418 J .125 J R.3 ~ 2.30~ 1 .143£4 -.950 14.09860 . .428 J. S2 = (.ect Ho: !l =!2 =!3 at th~ ci.OS) "Ie rej.292( .64~ A =* ~ -l = 2235. P ~ IA 153.03920 ..IÃ) ~ !.23 n1 = n2 = n3 = 50.081 . W = (16. 149 .

1:t 3.0..8301.03:t 3.£33: 3.LU .9:t 3. Sinceg= 3. Since p-value = P(F:.: 0.r3 = Q at the 5% level implyig there is a time period effect.4(2.901 Any differences over time periods are probably due to changes in maximum breath of skull (Maxrth) and basialveolar lengt of skull (BasLgt). we would just reject the null hypothis Ho :.1O:t3.044.14:t 3. £11 -£21 : -1:t 2.94 1785.84 .£34: .T. 2.94 2153(2-+2-) -..66 .123 6.-1:t 3.+ 2-) -.030 0.36 £24 .24 Wilks' lambda: A* = . .78 87 .78 £23 .0.1:t 3.47 .9:t 2. then these changes might be in maximum breath and basialveolar lengthofskull frm time periods 1 to 3.44 £21 -£31 : . All the simultaneous intervals include O.0.3(2.33:t 2.0.049) = .44 £12 -£22 : 0.36 size over time is marginal.57 'l13 .+ 2-) -.94 1924.57 £22 .94 /840.05/24) = 2.£32 : .2:t 3. If-changes exist.Et BasLgth m = pg(g-I)/2 =12.lO:t2.(90-4-2'(1~) = 2. Fstatistics andp-values for ANOVA's: F p-value MaxBrth: BasHght: BasLgt: 3.629 3.o3:t 2.44 £11 .94 BasBrth BasH.8301 value with 8 and I~8 degrees of freedom.'l31 : -3. Evidence for changes in skull The usual MA~OV A assumptions appear to be satisfied for thse data.30 :t 2.57 £12 -£32 : .78 NasH T" .2 (2.+ 2-) -.1"1 =!2 =.2.£34: .0.0.049 is anF 4 A .36 V 87 30 30 'l14 .1.30:t 2.025 NasHght: O.£23 : 0. 30 30 87 30 30 87 30 30 'l13 .£33: 3.1:t 3. 95% Bonferroni simultaneous intervals: t87(.

58 . b) Parall el ism hypothesis HO: C~l = ~2 with C given by (ó~l).7 we reject Ha at the a.1198 and F = 18. 1 P.6-61). Plots of the husband and wife profiles look similar but seem di sparate for the 1 evel of acompanionat~ lnve' tha t you feel for your partneru.25 Without transftlrming the data.413J 2.014 (CS c1r1 = poo 1 es .685 . c 2 = 8.124 6..FO. 6.33 13 J .98 (. A * =IWI =. .029 J -. 17 i (. C(~l .7 ( see (6-62)).~ level.98..93 There is a clear need for transforming the data to make the hypothesis tenable.) cZ = 8.095 fo r a = .M.167 .) c~ = 8. IB+WI Afer transformation.~2) = -. 144 L.341 11 = 9. A * :: . ~ .05) = 1.58 .26 To test for paralle11 sm.733 .05 level.·. The excess electri~al usage of the test group was much low~r than that of the control group for the 11 A. 6. The s imi 1 ar 9 A.M~ usage for the two groups contradi cts the parallelism hypothesis.0. Since T1 = 19. and 3 P. 05.i 159 and F = 18.947 2.rl.616J 1 . consider H01: C~l = C~2 with C giv~n by (.. . we reject HO at the 11 = .~2) = .674 . .870 CSpool~dCI = (.52. .036 (-. hours.028 .. C (~1 .27 a) .M.

006. 73 -4.764 3.14 -1.565 o . 63 2. torrens mean . torrens mean . 343 43 .039 2.714 L.371 Difference -2. 743 39.767 0.0.101 Linear Combination of most responsible for rejection of Ho: L.55 -0 . S imul taneous C.086 o . I. 0.854.457 42.829 -3.437 o .99 1. torrens 96..59.914 0.80 -6.934 6.t 6. 692 1.639 2.41 -1.96 3. Sample Mean for L. The X2 plots look fairy straight.671 3.J12 = 0 at 5% significance leveL.L.992 6. torrens and L . carteri 99 . 274 0. There is a significant difference in the two species.carteri mean: UPPER LOWER -8 .31 The third and fifth components are most responsible for rejecting Ho. \Ve reject Ho :¡. -~.211 0.008 2.649 1..268.629 9.078 16. carteri mean = 0 is (0.84 -7. .47 1. 637 1. 615 0. .343 Pooled Sample Covariance Matrix: 36.990 14.13 ~ 16.371 -0 .98 -1.573 2.28 T2 = 106.886 -0 . -2.143 -4.151.371 14. for L.314 14. 213 0.657 30.657 9.407 o .914 35.426 2.943 -0.i .971)' 951.229 13.383.514 25 .carteri : L.595 6.16 -0 .000 9. 76 0.571 9.L.125 "' .675 9. 2.187.053 0. -0.

: Simultaneous C.. LOWER -0.01628931 0. BONFERRONI ) - TO -.01056 0. 00017 .0247 -.979 and we fail to reject Ho :/£1 . 0637 -0 .00366259 0.0455 .CHI-SQUARE CHI-SQUARE PLOT FOR Lcarteri fOR L.01'30 .01513 o .0195 .02751 0.3616 9. 0~154159 0.0221 . i.00304801 0.Bonfer~oni STDEV intervals use the T in"tevals use -.0625 0..02548 o . The critical point is 9.0445 . (b). S XBAR o .0436 .0072 -0. i.~285 -.00602526 IIotellng's T2 = 5. '0 10 ~ ~ 0 51 .0294 Bonferroni C.: UPPER o .04817 0.0228 o .torr~ns PLOT '5 15 ä! '" :.00304801 0. 0596 0. w a: '" :.0493 .0130 -.0286 0. (e).1235 o .0128 -0.00325 -0..30 HOTELLING T SQUARE - P-VALUE 0 .00482862 O.0505 -0.29 (a).0679 .0123 0.0079 -0.0'030 .516.0385 o .0057 -0.0535 -.1385 6.00154159 0.Jl2 = 0 at 5% significance leveL.0218 T2 INTERVAL xl x2 x3 x4 xS xli The N 24 24 24 24 24 24 MEAN 0.89 and 126 .0701 -.0443 -.1020 -.00012 -0."0278 2.0275 .03074 0.04689 TO .0434 .946.(H)417 4. 0566 0.0430 t the constan-t ( .0283 .1030 0.0876 .:i 0 5 5 '5 '0 5 25 20 15 10 5 o 20 25 o-SOARE O-SOUARE 6.0333 -.05784 Summary Statistics: 0.00482862 0.

0011 6 0.01244417 1.375"67504 18. 38824348 11.6878 30. 995 XL X2 XL 0.012 .205 I3 76.4125 ( Loco.3"65 XL 104.38824348 11.835 352.101666"67 Manova Test Criteria and F Approximations for the Hypothesis of no Overall FACTOR2 Effect H = Type III SS&CP Matrix for FACTOR2 E = Error SSkCP Matrix S=2 M=O N=l Statistic Wilks' Lambda Pillai's Trace Hoteiiing-La~iey Trace Roy's Greatest Root Value 0.. 1843 11.655 284.t~) XL 196. (Vat'~e.6575 162..7008333333 -10.105 121.~675 760.0019 5 0.0675 X3 7.89348380 Wilks' Lambda Pillai ' s Trace Hoteiiing-La~iey Trace Roy's Greatest Root F Num DF 3 3 3 3 11. 22 X3 1'07.1825 1089.365 X3 76.6275 414.6675 3153 .78583333 254.6275 X2 365.10166£67 363.695 1~7 .+~ot') X3 7. 7~583333 254 .0205 o .127 6. 95166"667 8 0.1291666667 -108.015 414.48 49.70910921 10.10651620 0..0055 9 .655 X3 42.4125 72.187"61127 10.31 (8) Two-factor MANOVA of peanuts data E = Error SSkCP Matrix XL X2 X2 49.115 365.1843 Den DF Pr ~ F 4 4 4 0.5 Statistic N=l Value 0. 1843 8.6575 H = Type III SS&CP Matrix for FACTORl X2 -10.OQ05 XL X2 X3 H = Type III SSkCP Matrix for FACTOR2.1843 8 .3127 XL X2 F Num DF Den DF fir ~ F 6 6 6 3 10 0.7924 H = Type III SS&CP Matrix for FACTOR1*FACTOR2 XL 12 X3 205.995 94.520833333 Manova Test Criteria and Exact F Statistics for the Hypothesis of no Overall FACTORl Effect H = Type III SSkCP Matrix for FACTORl E = Error SSkCP Matrix S=l M=0.22 ~5 .6191 21.1825 42.48 121.0205 o .12916666"67 -108.0205 4 o .

00 c 1 6 199.30 -9.40 -4.20 0.95 -5.55 b 2 5 185.25 3.30 58./~ .0113 seem large in absolute value.30 52.15 190.10 -1.00 d d e e f f 2 2 1 1 2 2 6 6 8 8 8 8 200.75 170.40 200.0508 10 0..50 160.65 130.Red-lfot Yied .0339 3.40 -7.10 1.50 160.15 -2.- --/ .5582 3.15 a 1 5 194.80 -2.20 49.15 57.55 1. 95 2.45 3.30 -3..25 3. but (b) The residuals for X2 at location 2 for variety 5 Q-Q plots of residuals indicate that univariate normality -cannot be rejected for all three variables.$0 47.55 -1..80 0.15 57. 54429038 Roy l s Createst Root 6.55 1Q3.80 -0.25 1$4.55 c 1 6 199.95 -2.50 66.30 58.30 52.80 -0.25 1"64.05 4.07429984 1.10 Figu i: Q-Q P1ø ./ ~y/-/ /- . 1 PRE2 RES2 PRE3 RES3 CODE FACTOR1 FACTOR2 PRED1 RES a 1 5 194.82409388 Wilks J Lambda F Num OF 3./ .7721 11.75 -0.20 -0.15 b 2 5 185.25 -3.55 161.25 -3.40 7.45 -3.60 47._ """/~ .55 163. /' -- -.0587 6 0.10 200.0655 5 0.95 5.05 -4. --'I -i.128 Manova Test ~riteria and F Approximations for the Hypothesis of no Overall FACTOR1*FACTOR2 Effect H = Type III SS&CP Matrix for FACTOR1*FACTOR2 E = Error SS&CP Matrix S=2 M=O N=1 Statistic Pillai l s Trace Value 0.15 200.15 2.55 161.80 0.75 0.20 49. //"~// .30 3.80 2.29086073 Hotelling-La~iey Trace 7 .3735 6 6 6 3 Den DF Pr) F 8 0.5066.75 170.65 130.40 4.40 190./ *//.30 9.

167433 195..0418 O.0575000 205.' . *Evidence of variety effect and.'" .../' '// .. 794111 2. .7008333 98. //. ..3835000 Error 6 1 04 .Residual for Sound Mature Kernels /' . 2: Q-Q Plot . and X2 = sound mature kernel.A .." Figure 3: Q_Q Plot .:..9175000 80.65 5.~.. 7008333 196 ...0750 OF 1 2 2 Type III 55 Mean Square o .129 .1_ //.4/../ /... a location Dependent Variable : yield..04 5.63 0. .. for Xl = yield variety interaction.136324 4. ~.... Sum of OF Squares Mean Square Model 5 401 .".... 1225000 Source Corrected Total Source location variety location*variety F R-Square Coeff Var Root MSE yield Mean O./ //' // /' .'" -iI a-.8474 0._ (c) Univariate two factor ANOV As follow./+ /~/ ./ y'/' / ../ / / /' .. 038~ .3675000 11 506 . . 11 50000 0./' ./ //..'/' ...// '/' ..//- . Figure ..."'/. 2050000 17...//.0446 Value Pr ~ F 0... /.. 1016667 102. .Reidual for Seed Size .90 0...../ ..~~. _10...5"508333 F Value Pr ~ F 4. . .//' /~/"....-.".' ...

1016667 85.695000 162. 1476 Value Pr :. 8058333 11 537 .130 Dependent Variable: sdmatker Sum of OF ~quares Mean Square F Value Pr :.92 0. 0292 Value Pr :.1443 . F Model 5 ~031 .0177 Error 6 352.0157 0.507500 390. 8350000 15 .0146 0.5741667 88.76 9.5148333 Error 6 94 . 823533 7. 105000 58. 30833 Source location variety location*variety OF Type II I SS Mean Square 1 72 .60 o .59 8.0508333 42.015000 780.684167 11 2383 .0300.9516667 F 0. 660559 158.067500 1089.72 o .777500 406 .5250 OF Source location variety location*variety 1 2 2 Type II I SS Mean Square 162. The GLM Procedure Dependent Variable: seedsize Sum of OF Squares Mean Square Model 5 442. F 5.347500 F Value Pr :. 882500 Source Corrected Total A-Square Coeff Var Aoot MSE sdmatker Mean o .975655 55 .4091667 Source Corrected Total F A-Square Coeff Var Aoot MSE seedsize Mean o . 832398 7 . 852298 4 . F 4.067500 544.65 O. 5208333 72. 355500 6. F 2.9758333 2 2 284.188166 3.5208333 142. 0759 0.99 2.28 6.

.95 df= 8 MSE= 38. Q5 Confiden~e= 0.862 1.325 Bonferroni (Dun) T tests for variable: X3 Alpha= 0.300 9.560 -4 .05 level are indicated by '***'.685 -23. 300 -7.ety. 385 -17 . Comparisons si~ificant at the 0. 500 17 .01576 Minimum Significant Difference= 10.8 -19.66333 ~ritica1 Value of T= 3.135 -18. 960 -3 .135 8.S -8.050 -22. 763 .625 19.875 -5 .685 Bonferroni \Dun) T tests for variable: X2 Alpha= 0.95 df= 8 MSE= 141.59833 Critical Value of T= 3.confidence Means Limi t .26 '***' .137 -Comparis.05 level are indicated by Simultaneous Simutaneous Difference Upper Confidence Limit Between Confidence Means Limi t -8 .275 11.01576 Minimum Significant Differen~e= 25.6 Cri tical Value of T= 3.05 Confidence= 0.375 Comparisons significant at the 0.95 dr= 8 MSE= 22. 763 .~ -11 .835 5.8 . 560 4. '500 4.7"62 1'0.131 (d) Bonferroni ~simultaneous comparisoRs of va-ri. 200 47.ons significant at the 0. Only varieties 5 and 8 Bonferroni (Dun) T tests for variable: XL Alpha= O.700 3. and they differ only on X3.960 18.875 17 .875 30 . 875 20 .5 o .512 .11') R Qi:" *** *** .575 Lower F ACTOR2 Compari son 6-8 6-5 8-6 8-5 5-6 5-8 23.037 0. F ACTOR2 Difference Upper Confidence Between Confidence Limit Comparison 8-6 8-5 6-8 6-5 5-8 5-6 Simultaneous Simul taneous Lower Limi t Means -20.7'62 . Simultaneous Lower Comparison Confidence Limit FACTOR2 8 8 6 6 5 i: Simultaneous Difference Upper Between .05 Confidence= 0. 250 -3.01576 Minimum Significant Difference= 13.050 -47. 175 22 .HL 900 _1 "7i: -0 . differ.575 -42.835 3.512 9.575 -30.575 -9 . 412 -21. 325 42.6 -0.175 8.700 -4.900 -9.'037 ."625 21.250 -8.200 -17.0"5 level are indicated by '***'.385 7.

5 Do not reject Ho: No nutrent effects (b) Minitab output for the two-way ANOV A's: 560cM Analysis of Variance for 56QCM Source Spec Nutrient Error Total DF 2 1 2 5 SS MS 8. 5. 09\l 3. Pilai's trace statistic.50 0.476 4.99 P 0. the value of (For Nutrent.099 275. p-value = P( F -.) For larger sample sizes.25 'Species effects Do not reject Ho: No MADV A for Nutrient: Wilk'S' lambda A~ = .05) = 19.082. This difference is not as of apparent at 560 nanometers.550 F 28.82 0.2 (. Analysis of Variance for 720CM Source Spec Nutrient Error Total DF 2 1 2 5 SS 2£2.827 MS 131.034 0. 06 0 . 1. For MANOV A.3£1 47.32 (a) MADV A f-or Species: Wilks' lambda A~ = .011.202 720cM.31599 F = 1. A look at the data suggests the spectral reflectance of Japanse larch (JL) at 720 nanometers is somewhat larger than the reflectance species (SS and LP) regardless of the other two nutrent leveL.239 4.722 60.260 2.l (.173 F4. 260 23.489 4.082) = .011) = .203 and p-value = .132 6. The exception is for 720CM where there appears to be Species effects.562 F2.05) = 199. suggests there may be Species effects. Wilks' lambda and Pilai's trace stati'Stic would .119 4.738 8.6776 with F = 5.425 The ANDV A results are mostly consistent with the MANDV A results.00823 F= 5. However. p-value = P( F-.07. Wilks' lambda and Pillai's trace statistic give the sam F value. 1.489 9.give essentially the same result for all factors. Wilks' lambda statistic does not indicate Species effects. .458 F P 10 .

10 Residual Residual (c) Interaction shows up .20 o 10 . p-value =P( F~ 45. (Joa Analysis of Variance for 720nm Source Species Time Species Error Total * Time DF SS 2 2026.59 637.000 F4.90 MS 482.629.08707 F= 15.133 6. p-value = P( F ~ 36.81 4 193. 81 27 35 76. 70.000 42. 60 g 50 tì 60 ~ 60 :i ~ 40 Gl 6.62 198.000 0. Analysis of Variance for Source Species Time Species*Time Error Total DF 2 2 SS 965.39 65.528)=.74 '0. See the Mintab ANDV A output below.46 0.52 (.90 48.571) = .'52 (.574 20 30 .52 -0.04917 F= 45.05) = 2. Observations are not independent. 30 u.s -6 -2 0 -4 4 .58 7(J.571. p-value=P(F~ 15.06877 F = 36.97 224.52 (.528.55 Reject H(J: No species effects MANDV A for Time: Wilks' lambda A'2 = .25 4 7 9S .54 F P 15.000 O. residuals approximately normally distrbuted (see histograms bèiow).55 Reject Ho: No time effects MANOV A for Species*Time: Wilks' lambda A~2 = ..95 2. Observations are likely to be positively correlated over time. in general. 5S 27 35 1769. Histogram of the Residuals Histogram of the Residuals (nipo Is 560nm) (. (a) MAGV A for Species: Wilk' lambda A~ = .000 F4.for the 560nm wavelengt but not for the 720nm ~ wavelengt.12 Reject Ho: No interaction effects (b) A few outliers but.66 3112. \)7 P O.05) = 2.po Is 720rv) 90 90 90 eo 70 ..t54 95:3.18 1275.33.84 F 169.05) = 2. 8~ 2 5573. 30 20 20 10 10 o .629) = . 000 0.40 I! u.43 2766.85 MS 1013.000 Fa.

~OOO 2762. 0000 6. with a = 0.~tp .3788 2271 .9555 71.839 2610.5049 -1.920 2443. (Continued) 134 t d) The data might be analyzed using the growth cure methodology discussed in Section 6.4040 0.7003 80.0936 2443.9818 64.33-.4824 88.009 2343.7500 81. 0000 0 .065 o .544 2277. 120 2531.243 2464.514 2327.01. õ960 88.7867 64.0134 100.2393 1 .013489.912 2660.912 2327.514 2301.9555 71.2875 MLEof beta 71.0l) = 13.5918 74.7328 -0.5890 71.9035 -0.544 2335.749 2369.2393 -5.8907 63.1111 88.5918 75. .7500 95.549 2610.0000 0.3623 74. we reject the null hypothesis of a Iinear fit at a = u.4040 0 .1111 86 .1816 -0.065 1845.11~6 89.1189 86.961 2098.201 Since. fu ths case.8108 80. 5S06 86. 0998 0 .Ol.2511 91.3788 ~.7728 97.0000 ~. The data might also be analyzed asuming species are "nested" within date.920 2464.2745 81.q + g)) 10g(A) = 45.(-1) 93.308 2343.1313 -5.7728 63.34 Fitting a linear gr.28.3623 72.5487 91.1812 71 .3636 78.69.4156 Spooled W = (N-g) *Spooled 95.1106 73.6562 72.7003 80.8108 79.3636 80.72 .: X~4-i-l)2(O..owth curve to calcium measurements on the dominant ulna XBAR Grand mean 72. . 0000 0 .313 25S6.3694 72.0764 72.308 2335.961 2369.438 2821.482486.5487 80.1848 65.5328 Estimated covariance matrix 7.4707 70.8273 73.4733 71.2933 70.818 Lambàa = 1~ I / IWll = 0.1425 88. an interesting question is: Is "Spetral reflctane the same for all species for each date? 6.3496 80.549 2196.3800 .IN .1939 73.749 2756.6. 313 0.5312 (B'Sp~ (-l)B) .5890 98.2745 72. 625 1~45.1425 86..120 2196.1745 97.2948 72.2095 80.438 2271.4441 89.l.0348 80.282 2660.2095 73.5506 81.714 2098.~OO -0.0000 -0 .2$67 Sl S2 92.452 WL 2803.

9965 77. In 1 S pool~d 1= 19. In 1 S¡ 1= 19.0372 2764.2066 0.886 2400.135 6.01) = 6:635.02242 44 54 44+54 6(2+1)(2-1) and C = (1.099 2342.7962 93.(.81.7839 71.(-1) 70.674 3.848 23ß9.175 2101.0760 64.9404 -5.0676 90.93 The chi-square degrees of freedom v =.Hp .1 J(2(4)+3(2)-lJ = .2789 -5 .322 2723. 9323 71.3ß2 2369.90948.0081 78.9033 0. . (.243 2369. In I S2 1= 18.1546 70.099 2277. n¡ = 45.063 2358.3020 -2. XBAR MLE of beta (B 'Sp~ (-l)B) .7962 80.q + 1)) 10g(A) = 7.271 2093.05) = 7. reject Ho : ~¡ = ~2 = ~ at the 5% leveL.36 Here p = 2. with a = V.957 Estimated covariance matrix W2 2857.93.7725 70.9965 77.9783 0 .243 2342.40324)) = 18. Since 2 C = 18.27712 so u =(~+~.410 23ß9. 6.7653 Since.3215 -0.6039 92.0081 78.01.893 ~ XL2_i(0.167 2764. n2 = 55.9783 9.! 2(3)(1) = 3 and z. 8673 -1.522 2394. .9319 2723.80:65 3 .0676 77.0366 2836.1003 0. 894 2333.02242)(98(19. we reject the null hypothesis of a quadratic fit at a = 0.2066 0.894 2314.~39 2101.0799 71.027 2333.0366 75.: Z.522 2889..9033 1.'"25 Lambda = I w I / I W21 = 0. treating all 31 subjects as a single group.522 2316.362 2314.on the dominant ulna.7725 80.070 2093.522 2387 .83.0028 -0.410 2358.90948) -54(18.6548 S W = (n-l) *8 94.OI.6616 78.0799 -2.1894 -0.(n .27712) -44(19.0028 -0.35 Fitting a quadratic growth curve to calcium measurements.674 2387.886 2809.40324.5441 90.~70 2394. 175 2400.05) = 7.027 78.1003 0.

. Since 2 C = 48.!5(6)(2) = 30 and .xM)' = (119.48091. -76 97 S' r = 76. so 28 F 28 M 47.62718) .97).o (. = 1:2 = 1: at the 5% leveL. In I S2 1= -7.186.48091) . In 1 S2 1= 6.24.J14. we have (iF . n. 1= -17.99.05) = 12.23(6.148 47.1 Jr2(25)+3(5)-I) L6 10 37 6+10+37 L 6(5+1)(3-1) =. In I Spooled 1= 8.Jberyllum. = 24.97:f2.99. the large sample procedure is essentially the same as the procedure based on the pooled covariance matrix.24900) -37(-7.55. .11) PF2 -PM2: 29.. 6. Xl = vanadium.705 14.41.09274.39 (a) Following Example 6.1 Jr 2(9)+3(3)-1) =.108533 .23(9. reject H 0 : 1:.108533) d.+~.94 The chi-square degrees of freedom v = .24. X3 =. reject H 0 : 1:1 =I:2 = 1:3 = 1: at the 5% leveL. (J.033186 .05) = 12. Since 2 C = 23. (b) With equal sample sizes. 1= 9.05/2(2)):: z(0125) = 2. 29.0(.09274)) = 48.587 PF' .52) Female Anacondas are considerably longer and heavier than males. we have p = 5.77. -8 +-8 .PM = 0 at the 5% leveL.77..!3(4)(1) = 6 and .40 )0 xi (. (e) Here p=2.24900.38 Working with the transformed data.%. InIS31=-7.81620.37 Here p = 3.5.!+.8 +J.62718 so u =(~+~.(.705J. In 1 S.11438 so u=r.94)0 x.07065)(46(8. 154(. In 1 S.24429 and C = (1-.11438) -6(-17.67870.%. 6. X4 = 1 ¡f saturated hydrocarbons J .59.55:f 2. = 7. n3 = 38. Xs = aromatic hydrocarbons. inee (28 1 F 1 28 J-'M-r.05) = 5.97 )0 xi (.81620) -10(-7. X2= ..05) = 43.40 The chi-square degrees of freedom v = .05) = 43. n.sJ =(186.07065 23 23 23+23 L 6(3+1)(2-1) and C = (1-. n2 = 11.59..67870)) = 23.24429)(53(-7.PM': 119. 38.423508 -.lnISpoled 1=-7.Jiron.148 ~ (88. 150.an i. n2 = 24.24.587 ~(21.136 6. we reject H 0 : PF ..

68-5.201. we do not calculate simultaneous confidence intervals for the magnitudes of the mean differences in times across the two levels of each of these main effects.222 2 = -1.49 -7 (-2.39 :!2.222 Error deg. For this reason.004 the two responses. each at two levels.137 6. xN = 5.217J Error sum of squares and crossproducts matrix = 1. We see that both assessment time and implementation time is affected by problem severity.000 . Since there are only two levels of experience. Wroblem) Complexity and (Engineer) Experience.20.70.01852 .0 130. .03694 . Relevant summary statistics and calculations are given below.J2.4 9.80 -1 0.3.217 2.80. xN = 10. Because of the interaction effect.201 -.68. we can calculate ordinary t intervals for the mean difference in assessment time and the mean difference in implementation time for gurus (G) and novices (N). Effect Severity Complexity Experience Severity*complexity Wilks' F lambda .= -4.1 . Implementation time. of freedom: 11 Assessment time: xG = 3.62) 11 8 The decrease in mean assessment time for gurus relative to novices is estimated to .01 in all cases. There is no interaction term associated with Individual ANOV A's for each of experience however.9 . Assessment time and Implementation time.41 Three factors: (Problem) Severity.54 -7 (-4.71:!.000 .96 :! 2.22) 11 8 Implementation time: xG = 6.667 (2.96 95% confidence interval for mean difference in experience: ~2. 1. MANOV A results for significant (at the 5% level) effects. the main effects severity and complexity are not additive and do not have a clear interpretation.00 265.06398 . -1. show only the same three main effects and two factor interaction as significant with p-values for the appropriate F statistics less than .16:! . problem complexity and engineer experience as well as the interaction between severity and complexity.667 2 6.39 95% confidence interval for mean difference in experience: 3. Two responses: Assessment time.33521 P-value 73.

. Similarly the decrease in mean implementation time for gurus relative to novices is estimated to be between 3.70 hours.138 be between 1.20 hours.62 and 4.22 and 2.

533 ..726 . 7...ooõl 1.~0 -o... Is = 5. z1 z12 :z ž2 = 5.lOO -'5.. 391 .174 .467 fitted equation: y = -.757.695 -.667 8..200 23.4'67 25 - l2 .. 166 .~ .. y = .716.400 13.6£7) 7.y* = v-1/2y_v-1/2.2£ .23. 000 15 5. 5. 1 Y=LS=_ _ 15 351 199 142 = 12 .716. .1 -1~1 (8::)= 11 (-::) = (~:::J 1 l (120 .000 and y = 12..200 .088 . . Residual sum of squares: :1: = 101.l'y =-- 180 85 123 .292 -1 .000 5.. 81 7 1 . '¡sz z = .2£7 9 3 .400 = 9 13.3J3 8 ..391 -1 .æ ami (n-r-l )02 = Ê*'.l39 Chapter 7 7.Ê* is distributed as X1 n-r-1. 3£3 1.11 .6'67 .2 Standardized variables zl z2 Y . £5.726 -1 ..tlOO.43 + 1.7'57 and IS = 7.33z1 . yy The fi tted equation for the origi na 1 variabl~s is = 1 33 - y _ 1 2 (Zl .283 .7 9zZ Al so.267 13 9 .667 + 1...79 2.667 3.2t7 3.2.o\'1 hint and note that s* = y* .130 fi tted equa ti on: .467 3 .. £: = Y-Y .- = 9 .3 .. .19z2 7. Y = 1 . (Z2 .7Bz1 .2£7 zl 7. . pri or to standardi zi ng the variables.120 -10 ß = (Z'Z). .2.6'ó7.. 1 08 1 .451 .-w Foll..117 . zl = 11 .667.2 1 .5\ .

) cj y-l is diagonal with jth diagonal element l/z~ ~o J n ~W .e.. ~ . . .. if ze-.Api = PAP' = l Z ti ) 8y the hint. .e! = PAP' .--d1ag .. .or . ..140 7.. ution follows from Hi nt.Àr.. + th ...4 ii ) v=I b) V -1 .)/( r z.e~ = PA-P' 1. r Ai -1.= 1 1 ... so J n n ""W . so M . 1 n n so ß. j=l J j=l J â = (zIV-lz)-l :iv-l~ = (L y....ia) pro je. o À1 AA. lZ8 = Z'y.= ¿ ).1) . is the 'Projection t c) .0.ct.j=l J J j=l J is diagonal with jth diagonal -element 1/'1.)/( ¿z~).¡zJ.))/n 7....w = (zlz).Y. J=1 ß = (z'y-1z)-lzIV-ly = (..).= r 1. ..r (YJ.5 So.1_ 1 .A = :J ..z'y = t L z....6 a) irs t nO.0 = - is the Z' (y- . 1.0) F. that In we show . with PP' = P'P = I ..:' e.~ (Z'Z)(Z'Z)-(Z'Z) = PAp1(PA-P')PAP' = PM.o n of y.1+1 . -1 ri + is a generalized inverse of il since .e at A. we check that th~ defining relation holds p -.0 Si nce Z'l = ! ).. 1-1-1 1= ri+1 (Z'Z). = A +1 a . 7. ze- ..

_1_1 The (S11 are r1+l mutually perpendicular unit l~ngth vectors that span the space of all linear combinations of the columns of Z. Hence hij = bj(Z'Z)-lbi ~ 0 where hi is the i th row of z.=1 -1. Let a be an n x i unit vector with j th element 1. (h) Since i . .(q~y) = ( L q. hji ~ 1..two independent X2 random variables divided by their degrees of freedom. Ühe latter follows because the full random vect-or ê is distributed independently of SZ).. = -1-1 i =1 1 I.:r-q A Recall from Result 7.=.H is an idempotent matrix.~. Thé projection of iis then (see R. e.4 that ß =(~ii) = (Z'Z)-lZ'Y is distributed as N +1(ß.(2) - r . Write .q~)y = Z(Z'Z). for _1 1 _1 ri +1 -1 . 7.12) ri+l ri+l ri+l '1-1-1.t) H2 = Z(Z'Z)-l Z'Z(Z'Z)-i Z' = Z(Z'Z)-i Z' = H.. Then 0 ~ a'(l .À. The result follows from the definition of a F random variable as the ratio of . From the Hint. = ¿ q. (~(2'-~(2))'(Cl~~(2'-~(2)) iscl2 and this is distribut~d independently of S2.H)a = (1. . . nition 2A. r 1 +1 .esul t 2A. Then Consider q. ¿'i~:hij = tr(Z(Z'Z)-iZ') = tr((Z'Z)-iZ'Z) =tr(Ir+1) =r+l..hii)' That is.q. )ZI ri +1 Z(Z'Z)-Z' = Z(I.2 and Def.~ 1À.7 and Z = (Zl h J .e. = ~(2) ~ (_~U1J .a2(Z'Z)-1) indepen4ently of nâz = (n-r-l)sZ which is distributed as a2 X~-r-l.. 141 _ -1/2 c) = l. Ze. (Z'Z)-l is posiiÏe definite. On the other hand.Z'y 1= .- I (q!y)q. . it is positive semidefinite.. 7.8 (.=1 q.-1 i=l -1 -1d) See Hint.

.2 0 ". .5 .5 3. .1 SJ ( 53 .9 Z' = (' .5 .( . -13.A A A Y'Y = y'y + tit r 55 J-15 l- 0 ..0 .1 1.5 22 ...2 .1. SJ 24 =. -1 4 2 1 2 3 4. ¿Zi + nzj) 1 (Zj . 1=1 i=i hjj .5 L .-z)2 £Ji=1 we obtain 1 ( ßn ) i_I .(1 Zj)(Z'ZJ-I ( . -2 -1 1 1 :l a (ZIZ)-l '=('0/5 1.9 3.142 (c) Usill (Z'Z)-In£J1=1 ="':1(z' ¡ ¿~I zl -l:~1 . ~(2) = (Z'Z)-lZ'~(2) = ri ~5 J t = (~(l) :1 ~~2)J .2z.SJ + r 1.9 -1.5 0 1.8 3..5 " .0 -1..~9 1 ~5 J Hence 4.1 .5 1.1 1.i"i:z'ZiJ n' J.0 .z)2 7. 1 -13.9 . y = Z~ = 3.0 3. e =y-y= 5 -3 3 -1.ní:~ (z' _ z)2 L:z.10 J ~(1) = (Z'Zi-1Z'l(1) = L~91.8 -3.5 1..0 a 2.0 = .i . + í:i'i(Zj .j ) .2 3.0 2.2 -3.5 1.9 -1...0 .z)2 .

5) l'"3. Y02 .35.7.10 a) Using Result 7.SS) s (1 + .5 Y02 .9 )11 + (1. 5 J (3 .5J (Y01.75).143 7. os) (0: .¿) a 95% prediction ellipse for the actual V's is given by (YOl -2.55.825 . -2.5' 9.1 (\9) or .18 .18 -.9 (1.8.. c) Using (7-l.~)I. . 5. 3. the 95% confidence interval for the mean reponse is given by (1.0) :t 3. -.:! 3.35) .5) (.5 7. the 95% prediction interval for the actual Y is given by (1..75) 7.0 J..~H~KI j9)'or (~ .225) ~2)P~ (19) = 69.75 (7. 25. b) Usi ng Resu1 t 7.: .

) .ZS) i (Y-ZB) = (Y-Z~+Zp-ZB) i (y. = tr(Z'(f3-B )A(S-ß)' Zi) ~ C i Ac ) 0 .11 The proof follows the proof of Result 7.ZB ) i (T. -8 z . j=l -J -J -J -J and i:~=1 dj(B) = tr(A-1rY'-ZB)'(Y-IS)) Next. n (Y. -B z .. )( Y . Unless B = i.Z' B) = I (V. 1 2 ( c ) tr(A-lt~-B)'Z'Z\P-B) = tr(~p-B)'Z'Z(s-8)AJ = tr(Z i Z (S-B )A(~-B) i) .- ~/here ~ is any non-zero row of ~(~-B). Tl)us ~ is the best 'Choice f-or any positive d'efi ni te A.10 with rl replaced by A. (¥. . Usi n9 Resul t 2A..144 7. Z(S-B) will have a non-zero row.zP+ZS-ZB) = ê'€ + '~~-B) i Z i Z(~-B)J so i:~=1 dj(B) = tr(A-l£'tJ + tr(A-l(j-B)'Z'I~i-B)) The fi rst tenn does not depend on the choice of B. .

13.33 = VS691 .34 .y a a i t-1 (c) PY(x) = -zyayy zz -zy _ IS _ . we partition t as t = iL ~ -i ~J 1 1 ii 1 and detenni ne cava r.¡ If (a) By Result 7..'3 . ance of ( 1 given z2 to be :.12 (a) (1)) best linear pr~di~tor = -4 + 2Z1 .78 (c) Partition ~ = t l~11 so .73) L 5.a i + az = 4 yy _ zy zz . 1) = l: ~).Z2 +-1 mean square error = cr .( : J (1 ) .745 (d) Following equation (7-5b).707 1 s -z(2)zl çl s z~ 2)z(2)-z(2 )zl s zizl .13 . 57 R = zl (Z2Z3) =/3452.Z3J = r 3.145 7. ß_ =zz s-l-s _zy (b) Let !(2) = (Z2. (1. Therefore =IiT= 2 Py Z i · Z 2 = 7.. ( : : J .

£2 /3649.42 Thus 380 .) = r3649. .31 Removifl9 lIy.34 r S I s' s i i S f---------- = z(l )Z(l): -Z3Z.(. · z2 s z z Zi yz'.11 L _z3z(.14 = .51 126. The negative correla- tion between attitude to\'iard risk 'and achieved rate of return indica tes an apparent advantage for conservative managers.82 ' 02. we no\'1 have a positive c.d riskll and "achieved . z2 11 - r~-YZ2I' -zlr~-z2 = zl Z2 s 'S z2z2 = . 82 r z.z2 _ --.821 z(l )z(l) -z3z(1) z3z3 -z3z(.37 i l23. ryz. _ YZ2s 2.s' s-l s 380.146 S691. (b) from (7-S1) s s syz. z2.) i '3z3 and s .ear'S of eXl'eriencell from .consideration.) i S = '600.04 380.YZ2 r r2.z3 7.ZZ =ß') 2 = /s · Is i S syz yy-z2 z.04 1''02.orr-elation between "attitude towar.zl.42 (a) The large positive correlation between a manager's experience and achieved rate of return on portfolio indicat~s an apparent advantage for managers with experience.05 i _________l-___ 217. S2 yy z2z2 s zl zl ryz.25 23.

870 + 2634'1 + 45.24 3 3 Zl 'Zz .2) = . Now for example. are no apparent (b) An analysis of the residuals indicate there model inadequacies.OS) = 4. (c) The 95% predi~tion interval is ($51 .228.o 1.17(. . It appears as if Zl is not needed in the model provided £1 is include~ in the model.15 (a) MINlTAB computer output gives: y = 11 .025 Zl 2 Z2 2 12. residual sum of squares = 2tl499S012 with 17 degrees of freedom.0067)-1 (45.23~) (d) Using (7-". 7.2z. .z.147 returnll.45 we cannot reject HO:ß2 =~.16 Predictors P=r+1 C.. After adjusti ng for years of experience.. $60.. tion of ßO is /1.s an apparent advantage . Since fi.996152 = 4906. there . F = (45.2)( .Q). 7.025 12058533 .to mana~ers who take ri sks. the ~stimated standard devia- . Thus s = 3473. Similar calculations give the estimated standard deviations of ß1 and ß2.

005768 p 0.999 0.:--.0 10 Fi Value .00 2. .. 26 Regression Residual Error Total 7 9 104. .0 "10 10.00577 Assets Predic-tor Constant Sal...0 125 15.:'Nónf~tProbabtltypìól: of the ResdualsÝetthe fi Value Residiials 99 .0'6~1 Sales + 0.045 0.'. the residual plots below are consistent with the usual regression assumptions.17 with a p value of . (d) The t-value for testing H 0 : ß2 = 0 is t = 1.86282 T 0.'.0 ~2.5 ! 0.0 1 .013 0.148 sales and assets follows.9.4() P 0.63 14.oef 0. 20.45 235.-".~1 + 0..'641 = 43. Ii I " 2 f 1 -2. .5 20. All leverages are less than 3p/n=3(3)110=.'.71 MS F 65.. We cannot reject H 0 at any reasonable significance leveL.Reua' (c) With sales = 100 and assets = 500.17 (a) Minitab output for the regression of profits on Profits = 0.282. D' 3 .5 "5.0'6806 0.55." ii 2.02785 0. a 95% prediction interval for profits is: (-1. . 7.004946 0.:-..:.es Assets S = SE Coef (.058 (b) Given the small sample size. That is.0 ii.5 o .0% Analysis of Variance Source DF SS 2 131. The leverages do not indicate any unusual observations.44 1. -"'.".95).0 90 :.. Resîdual Plots for Prots .7% 3.17 7. The model should be refit after dropping assets as a predictor varable.92 4. consider the simple linear regression model relating profits to'sales.0 17. 5.:¡':-.282 R-Sq(adj) R-Sq = 55.Reual Residuals Versus the Order of the Data llistni.oftt~ Residuals 4: 5.

p (predictor) AIC 2 (sales) 29.149 7. assets) 29. Z2. A Q-Q plot of the residuals is a bit wavy but the sample size is not large.' .756-. It may be possible to find a better model using first and second order predictor variable terms..058 and R2 = . assets) 3. Note that p is the number of model parameters including the intercpt.24 2 (assets) 33.322z2 +.63 3 (sales.114z4 with s = 1.46 7.60.0 (b) The AIC values are shown below.0 3 (sales.Zs is In(y) = 2.18 (a) The calculations for the Cp plot are given below. Perhaps a transformation other than the logarthmic transformation would produce a better modeL. .4 2 (assets) 7.19 (a) The "best" regression equation involving In(y) and Z¡.. (b) A plot of the residuals versus the predicted values indicates no apparent problems. 2 (sales) 2.

.6615. 7.eraT ap?ear- ance: ..3901Z2 ..iso 7.7371 + .z5 in the ing eigenvectors give the coefficients principle component.20 Ei genva 1 ues 'Of the carrel atÍ\Jn matrix of the predi ctor vari able'S 2:1. z2.. 1n(y) = 1..8545. .1 . In this case 1n(y) = 1. starting with a full quadratic model in t~e predictors z1 and z2' suggested the fitted regressi'On equation: " Yl = -7.21' This data set doesn1t appear to yield a regr. for example. written in terms of standardized predictor variables.8940..e proportion of the variation in the r~sponses.4465.7371 .070.015.701 and R% = .22..05 and R% = .60647.. (a) (i) One reader. The correspoml- of '1' z2.. the first principal component.li with s = .edictor variable regress ions...0045zS A regression of Ln(y) on the first principle component gives " .1.0038z2 z with s = 3.5281 z2 . ..3604x4 and s = .3808 + .235. .. .ession relationship whkh explains a larg.z5 are 1.2755Z4 . A regression of 1n(y) on the fourth principle ~ompon~nt produ~~s the best of the one pri ncipl e component pr. * * * * * Xl = . is .1435. Al so a Q-Q plot of the residuals has the fOllO\'ling gen...618 and R1 = . (Can you do bett.6357Z 3 .er than this?) of the residuals versus the fitted values SU99~sts (ii) A plot the response may not have constant variance.

' -2 -1 o 1 2 Quantiles of Standard Normal Therefore the normality assumption may a 1so b~ suspect...16... C/ ei :: 'C ïñ -.151 Normal probabilty plot .... .......32. . .. . .... a 95~ prediction interval of zl = 10 (not needed) and z2 = 80 is 10. ~. ...... co ... (iii) Using the results in (a)(i). ... I . .ession can be obtained after the responses have been transformed or re-expressed in a di fferent metri c.. . .~ .84 :! 2.... C\ Q) 0: 0 -...¡. Perhaps a better regr... ...... .37)...- . .0217 or (5.~.. (0 .

1 0. 20 1.------~~-------------"'-"~_.8% S = 0.0687763 P T SE Coef 0.357 DomUlna + 0.~ 50 .4068 Ulna P -1.1027 0. -I .00473 Residual910ts for DomRadius.165 Humerus + 0.1381 0.0 0..31204 2 22 24 F P MS 0.7% Analysis of variance Source DF Regression Residual Error Total (ii) SS 0. 98 0. 58 0.5519 S = 0.3566 0. 1l 90 .076 0... .6 0. .16249 0.97 0..05940 0.1035 0.1 ~Or 2 4 6 8 10 12 14 16 18 20 22 24 .1% R-Sq = 71.The regression equation is DomRadius 0.Dom Hi:metus and 'Dm Ulna PtedictlS Normal probabilty Plot " 'O the Reduàls Reiduals Verus the Fitt Value .74 3.152 7..407 Ulna Coef 0.103 + 0.1652 DomUlna T SE Coef R-sq(adj) = 66. A residual analysis for the best model indicates there is no reason to doubt the standard regression assumptions although observations 19 and 23 have large standardized residuals.7 OJ! 0.08 0.22 (a) The full regression model relating the dominant radius bone to the four predictor variables is shown below along with the "best" model after eliminating non- significant predictors..128 0. :D.0.2756 Predictor Constant DomHumerus Humerus 0.162 DomHumerus + 0.2 '0..1 0. 10 1 1l J.2174 -0.1566 2.40 0.000 0.1064 0.1985 0.10406 0.1637 0.O¡O 0. 80 1.0663502 .088 0.10399 21..02"6 0.6% R-Sq = 66.012 0.276 DomHumerus .164 + 0.0 Resual Fi Value Histogl"in of the Residuàls ResidualsVelsus tleOrder ofthè Ðlt 8 f& :! i . 87 0.002 1.552 DomUlna predictor Constant DomHumerus DomUlna Coef 0. ~ '0.~ -.1147 0..16 ll.1 ~ò16 "8.53 R-sq(adj) = 63.246 0._~~. 4 ! II . Q) The regression equation is DomRadius = 0.08 .9 1.20797 0.346 2.

5% Regression .059 0.322 DomUlna + 0.31 1.2% R-sq(adj) = 55.5953 MS 0.00 1.OOC 0.1120 0. This fitted model along with the fitted model for the dominant radius bone using four predictors shown in part (a) (i) and the error sum of squares and cross products matrix constitute the multivanate multiple regression modeL.3209 SE Coef 0.198 DomUlna + 0.1520 0.2202 T 2.003'674 . 0'68 2 .064903 .09'6597 26.09676 -0. It appears as if a multivariate regression model with only one or two predictors wil represent the data well.114 _ 0.462 Ulna Coef Predictor Constant o .159 DomUlna Ulna SE Coef 0.1674 0.5% R-Sq(adj) = '67.6% Error sum of squares and cross products matrix: The regression equation is Radius = 0.92 3.014 0. The regression equation is Radius = 0.193195 Residual Error 22 0.1833 P 0.312038 Total Coef 0.000 Source DF 5S 2 0.088047 P 0.223 + 0.46 Predictor p Constant 0.564 DomUlna + 0.217 0.07'60309 R-Sq = 59. 11423 DomHumrus Humerus DomUlna Ulna -0.020 S = 0. a multivarate regression model with predictors dominant ulna and ulna may be reasonable.1680 0.005781 F 15.127175 .127175 24 0.39 Analysis of Variance Analysis of Variance Source DF 5S 2 0.050120J .178 + 0.050120 . 5645 Constant Ulna 0.0110 DomHumerus + 0.8% S = 0.18 2.1976 0.252 0.2235 DomUlna o .1 S = 0.1755 T 2.321 Ulna Predictor Coef 0.4625 SE Coef T 1.0559501 R-Sq 77.003 2.080835 24 0.064903J .00 2.3220 0. The results for these predictors follow.11'65 0.062608 (.080835 (. 1 0.27 0.2108 0.52 0. Using Result 7.910 0.595 Ulna The regression equation is DomRadius 0.08971 0.274029 Regression Total Error sum of squares and cross products matnx: MS F I 0.0'606160 R~5q = 70.058 o .29 O.17846 0.99 P VIF 0.11 1.08931 0.2% R-Sq(adj) = 72.153 (b) The full regression model relating the radius bone to the four predictor varables is 'Shown below.207 0.11.01103 0.152 Humerus + 0.092431 0.184863 Res idual Error 22 0.68 1.

.332721 o ..0795 F Value Prob) F 18... 177'529 46 ..) FtFrBody J Frame ~ BkFat) SaleHt) (a) Residual plot oo II ~ o ll .0043 0. . 0029 o .-..633612 22....78342 75 29110024..4161 SalePr Dependent Variable: X2 Analysis of Variance Source C(p) 7 .. 0049 o . 5655 Parameter -Standard Estimate Error -5605.366 2. ..-~.364490 1749. Gl a: .. .. 0c: oo .9663 70 12647164.~. 0001 0.....:~...839 180673.495 2....~...154 7...420733 89..._.29880197 -2.5655 0.....0041 0... (a) Regression analysis using the response Yi = SalePr.0009 o .5782 X5 56 0. 66g7 o ..3735 6.01'50 o .. ......905 -3.21819165 133..06)2(1 + z~(Z/Z)-lZO)' SalePr:~reed ..0057 The 95% prediction interv~l for SalePr for %0 is z~ß:f t70(0. Sumary of Backward Elimination Procedure for Dependent Variable X2 Variable Number Partial Model Step 1 2 3 Removed In R**2 R**2 K9 7 0.482 -3. .....-. 7073 C Total R-square 425...823664 1929.6697 o . u...3g86440 -77.... .. c: ~ '" o ii:: i: 'Uj ..832 3292571.. .'W..671 Error Prob)F 0. 0 o .4341 Sum of Mean DF Squares Square Model F o .090 4.0001 5 16462859. .... ~ . 75490590 389.. 00 ll o o '9·._.0127 6... 00 3000 /' ...23.224 0.854 o .._.:..025) /(425.~.17300145 701...5826 X3 0.4033 o . ..- .. e.1538 2..66673277 T for HO: Parameter=O Prob ) ITI -2..05739 Parameter Estimates Root MSE Variable DF INTERCEP 1 XL 1 1 X4 X6 X7 1 1 1 18 o .o '" 1ã i::: 1i Gl a: ~b) Normal probability plot 00 ll ~ ... -2 -1 o 2 Quanties of Standard Normal .. 1000 2000 Predicted /:.....

.0122 0. : C\ c? "" ...597-'2 0.ed S PrctPF8 j Frame i SaIeHt) (b) Normal probabilty plot (a) Residual plot .. .~~. C\ II c: II 0 iü :: :2 ..01927655 Variable DF INTERCEP 1 XL 1 X5 X6 X8 1 T for HO: 0. '" ~ .. ......19018 R~ot MSE R-square Parameter Estimates 0..2 7... 00846029 o ..19.¡¡ '" a: y/ "" . .. Sumary ofBa~kward Elimination Procedure f~rDependent Variable LOGX2 Variable Number Partial Medel Removed In R**2 R**2 Step X3 76 0. c: ..6121 '0 ..0001 o . 9445 2 .03'617 Error C Total 6.6368 X7 0... In(alePrFfl3r...8 8.060 o .esponse Yi = In(SalePr).... 00742 27.6311 i9 5 0.....183611 o . ....736 0. 9 7.6189 X4 4 0.....02968 1.. ......0033 0... ... . .841 -3 ..0 7.599 3....0057 0.4537 1 .: ~ ... 0594 6 .. . The few outliers among these latter residuals are not so pronounced.. 9 '.. -2 -1 o 2 Quantiles of Stadard Nonnal . 2265 Dependent Variable: LOGX2 Analysis of Variance 'Sum of Mean Source DF 'Squares Square F Value Prob~F Mode 1 4 71 75 4...854 0.6 7.0013 0. 000 1 1 5...027613 0..049418 -'0..".3070 o ..03992448 0..91286786 5. .... c: C\ II iü :: c: "0 ..025) J~O..Q2)2(1 + z~(ziz)-izo).. 00827438 -5 ...1...:". c: .235773 -0...0001 2 ..0081 0.....155 (b) Regression analysis using the r.... ............ 2902 o . ...4368 1.6655 0. ~ . ..0 Predicted a: J C\ 9 .. ..6108 Parameter Estimate 'Standard Error Parameter=O Prob ~ ITI 0...'.. 1348 6.:.058996 4...\ ....337 o .. 56794 0." j. .. 0001 1 0...6108 1 2 3 4 C(p) F Prob~F 7...4890 o ...4 7..0031 The 95% prediction interval for In(SalePr) for Zo is z~ß:f t7o\O.. .6121 6...

.. ....74'533 117 .... . .36221288 X3 o .... 000 1 o ... e... e..846281 3.. ..94798 Root MSE Parameter Estimates o ........ SaleHt Dependent Variable: X8 Analysis of Variance Sum of Mean OF Squares Square F Value Model 2 235.. 08088562 X4 1 o .....:...773 0.' .60204 301. 0224 o ...:.. .821 ITI 'Û . 005... 60..0001 Variable OF INTECEP 1 1 7.. ëii Gl Ul .55.....025) \/0...34737 o . SaleHt:r~rHgt) FtFrBody) (b) Normal probabilty plot (a) Residual plot ....96:f t73(0.86). .0148).' ..wr./ ëii Gl 58 .". . = (52...: ....../ . v-..24.8987(1...2 -1 o 2 Quantiles of Standard Normal . C) 54 Predicted 56 .... Ul ii:i 'l c a: ..... ll .156 7... 802235 o .5. ii:i 'l 0 a: ... :.t' ... C) 52 .06._.00151072 T for HO: Prob ) Parameter=O 2. .. 7823 Standard Error Parameter Estimate Prob)F 0. . 89866 Source C Total R-square 0.pa .. . ..0003 The 95% prediction interval for SaleHt for z~ = (1.. /~:....'../.....87267 131... N N .334 9... (a) Regression analysis using the response Yi = SaleHt and the predictors Zi = YrHgt and Z2 = FtFr Body..970) is 53..918 3..50..165 Error 73 75 65....

.278 0..157 (b) Regression analysis using the response 1í = SaleWt and the predictors Zi = YrHgt and Z2 = FtFrBody..970) is 1535. ci ... ci Predicted r /" / ¡.17430765 0..... ....50. .37826 Parameter Estimates Root MSE Variable DF INTERCEP X3 X4 1 Parameter Estimate R-square 1 Prob..319 0.'.. ..3.rI.) ITI 0. .. . ..5. ...33265244 2.0859 o ..... ...0001 The 95% prediction interval for SaleWt for z~ = (1..... ._...- .5)..... ......_..291 4. SaleW~rHgt) FtFrBody) (a) Residual plot (b) Normal probabilit plot oo C' o o C' o o N II ñi =' '0 '¡¡ Gl IX o o - ..-_. ... ..316794 387 ..¡¡ Gl IX o 8 .741 Prob ... . o .63614 195228. . ... 93499836 9.' .. 1500 1600 1700 1800 .)F 16..'.. -..1755..' ._........025)V1l963. . SaleWt Dependent Variable: X9 Analysis of Varian~e Sum of Mean Source DF Squares Square Model 2 390456...99544 11963.... oo - -...... . .. .. o ... II ñi =' '0 ... ~ ' . o .9:: t73(0. 3090 Standard Error 675..... .60268 75 1263799....' ooC\ .~.0148) = (1316.or o oo ... ..745610 1 F Value T for HO: Parameter=O 1.....31807 73 873342...6316 Error C Total 109. . .0001 o .771'6 o ..6(1.719286 0...2 -1 o 2 Quantiles of Standard Normal . ....._..

i.75813396 0. even though this term could be dropped in the prediction equation for Sale\Vt.000l(Yó2 . o- CO 6 B 10 51 S2 53 S4 5S 55 57 Y01 . The '95% prediction ellpse for both SaleHt and SaleVvt for z~ = (1.0098(Yói .96)2 + O.0001 72 0.970) is 1..72(O.5.158 Multivariate regression analysis using the responses Yi = SaleHt and Y2 = SaleWt and the predictors Zi = YrHgt and Z2 = FtFrBody.53.9)2 .0001 0.1535.59574625 57.4850 a .53.05) = 6. The 95% predicion ellpse for both SaleHt andSaleWt Chi-square plot of residuals o - ~ Ò 'l. 4850 11 .4469 Num DF 2 2 2 2 Den DF Num DF 2 2 2 2 Den DF Pr ~ F 72 72 72 72 0.31902811 0..4469 1.96)(ìó2 .59574625 57 . Multivariate Test: HO: YrHgt = 0 Multivariate Statisti~s and Exact F Statistics S=1 M=O N=35 Statistic Value F Wilks i Lambda o .4850 11. 4850 11 .4469 Pillai's Trace Hotelling-Lawley Trace 0. 4469 Roy i s Greatest Root Multivariate Test: HO: Multivariate Statistics S=l 1 .0001 Pr ~ F 72 0.0001 The theory requires using Xa (YrHgt) to predict both SaleHt and'SaleWt. 38524567 57..3498(1'i .0001 72 a .cQOO1 72 0.OI4872F2.31902811 Wilks i Lambda Pillai's Trace Hotelling-Lawley Trace Roy's Greatest Root F 11..0001 0.61475433 57 .4282.O.0001 FtFrBody = a and Exact F Statistics N=35 M=O Statistic Value 0.24186604 0.:/' oo ~ co l: II ~l! '" N o ~ ai o 2 4 qchisq oo io - o .50.1535.9) 2(73) .

........5:l ti4(O.962 o . " oo ~ 500 1000 2000 Predicted 3000 ....025)y'128596..88303 C Total 16 7705940.. CD ir 0 0 y r' .0001 Error 14 1800356.. ..9(1. AU variables left in the model are significant at the D. ~ ..1.... ... ....0... ...........159 7.609 o . 0203 1 206... 60408 Root MSE o .... TOT~eN ....' -' .ñ ..05 leveL.04977501 Variable DF INTERCEP 1 Zl Z2 T for HO: 0.. . .617 6....' .....) Dependent Variable: Y1 Analysis of Variance TOT Sum of Mean Source Model DF Square s Square F Value Prob~F 2 5905583. g .. 720053 507..0001 The 95% prediction interval for YÍ = TOT for z~ = (1.... ...2353 R-square 358 ..... 3625 128596.1200) is 958..... CD ir ..25. ..... AMT) (a) Residual plot 0co0 (b) Normal probabilty plot 00 co . 00 0N0 In ãi :: Q "C . .... .' .....' N In ãi :: "C üi 0 .. 00 y ...... .' . The backward elimination proce- dure gives Yi = ßoi + ßl1Zi + ß2iZ2.. 7664 Parameter Estimates Parameter Estimate Standard Parameter=O Frob ~ ITI 1 56..1763. (a) R-egression analysis using the first response Yi... . 7878 o .70336862 193. 328962 0...274 2.79082471 0... (It is possible to drop the intercpt but We retain it.~.' ..1)..0941) = (154. ~ -2 -1 o Quantiles of Standard Normal 2 .......9364 22.8728 2952791.073084 Eror o ..

... .. ...2692 14 1620657...... en ...1 o Quantiles of Standard Normal 2 .23703 Root MSE Parameter Estimates Variable DF INTERCEP 1 Zl 1 Z2 1 o .....347910 196.025) Jii5761... 7870 Parameter Standard Estimate Error -241..0001 Error C Total R-square 340.. AMi=t(eN J AMT) (b) Normal probabilty plot (a) Residual plot oo CD oo CD (/ ¡¡ :: ooC\ .1.' '9 '9 500 1000 2000 Predicted ..344 115761......8824 25. 0001 The 95% prediction interval for 1'2 = AMI for z~ = (1..' .866 o .. G) oo ïii c: i.298 6.. 324255 o .~....... All variables left in the model are significant at the 0.Gl c: C) C) oo oo ... 2387 183.....2(1...05 leveL..4)...234.' ...11640164 T for HO: Parameter=O Prob .... !t0P'" o oo i: o .23886 16 7610377. 04722563 0. Dependent Variable: Y2 AMI Analysis of Variance of Mean Sum Source DF Squares Square F Value Prob..2 ..........' .... ...871 0.1517.0053 o ..86521452 606 .5384 2994860......A.....) ITI -1.o¡ ... .......0941) = (-9...... ¡¡ i::i oo C\ .. . ...... 309666 o ..231 3.160 (b) Regression analysis using the second response Y2' The backward elimination procedure gives 11 = ß02 + ßi2Zi + ß22Z2....1 :l t14(0..)F Model 2 5989720..... ..1200) is 754..

07581808 F 1 .214 x 10.305 x 10-5(Yå1 . .1s. DIAP=O.( . .6890 1 .5)2 + 4..1657 11 0. QRS=O Multivariate Statistics and F Approximations 8=2 M=O N=4 Statistic Wilks' Lambda Value 0.161 (c) Multivariate regresion analysis using Yi and 1"2. . 60385990 Hotelling-Lavley Trace Roy's Greatest Root 1.. the three variables Zs. . . 1755 22 .( .. 0 .44050214 Pillai 's Trace o .\O.0941-iF2. Multivariate Test: HO: PR=O.782 x 10-5(l'2 .1200) is 4.5)(1'Ó2 -754..D5 .754. The 95% prediction ellpse for both TOT and AMI for z~ = (1.7541 6 3 Den DF 6 Û 3 . 0 2 3 4 qchisq 5 1l 7 o 500 1000 1500 2000 Y01 .958.1) = i.5859 Num DF 1.8.958.968.0391 Based on Wilks' Lambda.9447 Pr ~ F 20 .16942861 1..1.. The 95% prediction ellpse for both TOT and AMI Chi-square plot of residuals U) gII - II i: .101 . Z4 and Zs are not significant.1)2 s( ? 2(14) ( ) . 1983 18 0.8. lI 'C 'C N 1: !! CO ~ CI o'E N 0 00 0- 00 10 .

6-.3504 R2 Fitted model Yi = -70.798 16.239) (b) (i) Using all four predictor variables.049 .3616 80.3530 .232 -24.047 .077.9640z1 + 26.419 .511 .53z4 27.120 B= i I= .12z4 Y2 = -20.089 .1 + . #60 and #61.220 2.3.01 17z3 + 44.763 -17. (ii) The simultaneous prediction interval for Y 3 wil be wider than the individual interval in (a) (iii).098 . (ii) The same outliers and leverage points indicated in (a) (ii) are present. 4. (ii) 95% prediction interval for Y 3 is: (1. the estimated coefficient matrix and estimated error covariance matrix are -74.398 .008 . Each predictor variable is significant at the 5% leveL.486 .398 . Apar from the outliers.0288z2 +.26 (a) (i) The table below summarizes the results of the "best" individual regressions.162 7.0120z3+ 15.7% .0+.122 A multivariate regression model using only the three predictors Zi.77z4 (ii) Observations with large standardized residuals (outliers) include #51.193 . #52 and #56. Otherwise the residual analysis suggests the usual regression assumptions are reasonable.914 .0593z2 + .076 28.009 .0282z3 + Y4 = -17.550 -1.0555z3 + 82.025 .185 .727 -. .1.029 .210 .244 .59z4 Y3 =-43.914 .6% s 1.089 .015 -45.511 .8+.193 .5192 76.210 .755 45.04z4 Y2 =-21. Observations with high leverage include #57.4% . #58.118 .011 85. 73.0224z2 +.6595 75.5% 75. Z3 and Z4 wil adequately represent the data.7% .92+. the residuals plots look good.

) There are no the levels of signifcant interaction terms in either modeL.9162 4.9% 2.827 Complexity 74.270Severity + 3.27 The table below summarizes the results of the "best" individual regressions.827 (-1. (The levels of Severity are coded: Low= 1. Ea~h predictor variable is significant at the 5% leveL. the levels of Complexity are coded: Simple= 1.477 Severity + 5.9707 1.163 7.919J 5. 3.834 î: = ( .834 + 1.5643 A residual analysis suggests there is no reason to doubt the standard regression assumptions.1364 RZ For the multivariate regression with the two predictor variables Severity and Complexity. High=2.1% s . Guru=2. Exper are coded: Novice=l.270 3.003 -4.9162J 1.9853 71.003Complexity Implementation = -4. the estimated coeffcient matrix and estimated error covariance matrix are B = 1. . Fitted model Assessment = -1. Complex=2.477.919 + 3. Experienced=3.

.

.

.

.

. = 1.86..903.)1NEiZ).7!J7L¿ Var(Yi) = À. proporti()n of tatal population variance explained by ~l is 6/(15+1) = . 1 1 ..' 4.1 Eigenvalues of * are À1 = 6.. 8. (c) Py L = .the 'principal components are y 1 = Xl' Y 2 = X2 and Y 3 = X3 .genvect~rs assaciate with the ei~en- values 4. 8. 1/12 .7n7Z. E. À2 = 1.-l/I2J.6325J (a) Y.6325 1 £ = (1 .~i = (l/ff. fer À~'=a2'tl-pI2).a2. + . Therefore. 4. ~ .. ~~ = 0/".p)2 = 0 Thus ~0'2-À)H.707L.of 1.4 figenvalues of * are selutions.447X2 .2(a).894XZ Vareyi) = À1 '= 6. Y2 = . With these assignments . For À1 = (12. 8..816 (b) No.-À11 = (a2-Àp-2t~2_ÀH. 4 are not unique.2 .429 2. = .).903. ~ fj/Z. One choi~~ is =i = iO 1 O)çnd :~ =(0 0 1).. The principal campenents are Yl = .o. -1/12. 'For À2 =O'2(l+i'Ii. Py Z = . = .6325 Proportien of total population '2 = . The two variables contribute unequally to the principal components in 8.164 Chapter 8 8.3 Ei~envalues of tare 2.894X1 + . PYl Zz = .a2_À)2_2cr4p2J = 0 S'O À = 0'2 'Or À ='"2~lt~hl..707Z2 varianceexpl ained by Yl is 1.6325/(1+1.~47X1 .1 because 'Of their unequal varian~es. The two (standardized) variables contribute ~qually to the principal components in 8.

? and results are consistent with (8-16) for p = 3.1..12 X3 1 1 1 1 1 1 Variance Explained y 2 = "2 Xl + 12 X2 + 2" X3 0'2 (1-. À2 = À3 = 1 .3./ž X2 + 2" X3 1 (l-pl2) 8. 1 1 (b) By direct multiplication . a2(l+pm 1 (1+1'12) Y 1 = a Xl ..y~ - .3(1:'À)p1 = 0 or (l +29-À)(1-p-À)2 = O.p .p3 .p.p .. further ~ :i= (l-p)~i' .ø ( c 1) = (1 + (P-1)9 H c 1 ) thus varidying the fir~t eig~nvalu~-eigenv~ct~~ pair.i y. .p12) y 3 = '2 Xl .5 (a) Eigenva.l ues of 2 satisfy IE-ul = (l-À)3 + 2..65 Propert ion of Total Pri nci'fa 1 Component Vari ance 1 1 02 1/3 . = 2. Hence À1 = 1 + 2.

70).707z2 Y2 = .041x2 Sample variance of Yi = -l = 7488.60. Half length of major axis is 102. with the larger variance.999xl + . It is usually best to use the correlation matrix or equivalently. 8. (d) The sales numbers are much larger than the profits numbers and consequently.6 (a) Yi = .166 8. as measured by correlation coefficients.8431 (c) rýi. to put the all the variables on similar numerical scales. This is confirmed by the correlation coefficient ry'¡.7 (a) Yi = .041xl +.i¡ = .3139 (b) Proportion of total sample variance explained by Yi is -l /(-l + t) = . Obtaining the principal components from the sample correlation matrix (the covariance matrix of the standardized variables) typically produces components where the importance of the varables.687 The first component is almost completely deterined by Xi = sales since its variance is approximately 285 times that of X2 = profits.918. rÝ1'l2 = .999x2 Sample of Y2 =t =13.6861 Sample variance of Y2 =t =.4 in direction ofyi' Half length of perpndicular minor axis is 4.8 variance Y2 =-.000.' 2 (d) r)~ x = 1.4 in direction of Y2' 19 1 . sales. wil dominate the first principal component obtained from the sample covariance matrix.707z2 Sample varance of Yi = -l = 1. .8 (b) Proportion of total sample variance explained by Yi is -l/(-l + t) = . is more nearly equal.707z1 -.xi = 1.000. 14.918 The standardized "sales" and "profits" contribute equally to the first sample principal component.707z1 +.9982 (c) Center of constant density ellpse is (155. ry" x = .

.354 .726 -.353 .%.5.280 4 .5 Correlations: i'\k 1 2 1 .2 k = 1.326 . This test assumes a large random sample and a multivariate normal parent population.604 . .Zl == êik..831 -.694 3 . .167 8.719 The correlations seem to reinforce the interpretations given in Example 8.564 .1.67 so r=2.435 .2.353 T= 103.(.437 2 .8 (a) rý¡.374 5 .J i == 1.01)=21..732 -.299 r = .: .485 would reject Ho at the 1% leveL. (b) Using (8-34) and (8-35) we have k rk 1 2 3 4 5 .

. - .e !Y !æ (2n) 2 (cr2... .lti8 8.~a.- :~. ~O' . n and (4-17) \.) = 1 11 (2ir)'2 (n-l)2 s~. n 11 11.II L(\1.rO) = .ed to each variable independently 9. .O"~I) = ii 1 -2aLttr(~n-l)$)) .¡e get (b) When t = 0'2 I .r - ( 21) 2 \ n-n1 ) 2 I S 12 The same resul t appl . =1 11 n PT n 5. ves n max e-i n n n L(l1.. using (4-l6) 1 max L(\l .tO) A = 'max Ui = . i ~o11 ~ 1 11 . p Under HO ~ max L(ii.) 2 .. il:fo L(~.) 11.. 2 £! E!!l e max LÜi~t) -' ~.+0 1=1 and the 1 i kel i hood ratio stat. stic becomes .9 (a) By (S-lt) _ .

497 -0.0013677 0.1 = (p+2)~p-l )/2 d. WellsFargo.322 -0.!) (1 tr (5) )np/2 and the result follows.00022398 0.9~ Continue) 50 np e ( )np/2 -np/2 max l. CitBank.651 0.801 PC2 PC3 0. Under HO there' are P lJ.00006055 0.00072251 0.10 (a) Covariances: JPMorgan.00027566 0.00015903 0.223 0. RoyDutShell.00043872 0.250 -.001189 .326 0.639 0.589 -0.00018144 0.094 0.000142ti 0.055 0.00043327 0.1"69 8. 's and.00017999 0.529 PC1 0.0.000 .248 0.271 0.0 . '01Ì v~riance so the dimension of the parameter space.s YO = p.149 O. The unrestricted case has dimension p + p(p+l)lZ so the X2 has p(p+l )/2 . Exxon Principal Component Analysis: JPMorgan. Wells Eigenana1ysis of the Covariance Matrix 103 cases used Eigenvalue Proportion Cumulative Variable JPMOrgan CitiBank WellsFargo RoyDu.00076568 Fargo.00012325 o .155 0.(ll .625 -0.0002538 0.0'2 (lr)np/2(n_l )np/2(tr(s))np/2 e -np/2 = n p 1 np/2 (21T)np/Z (.00. CitiBank.307 0.309 -0. RoyDutShell.00007341 0.64ti -0. f..663 -0.570 0.098 0.04ti 1.07012 0.00006410 0.118 -0.0. ExxonMobii JPMorgan JPMorgan CitiBank WellsFargo RoyDutShell ExxonMobil 0.216 0.tShel1 ExxonMi 1 0.954 PC5 -0. + 1.345 0.038 -0.00050828 0.780 0.414 0.899 PC4 0. .00008897 CitiBank WellsFargo RoyDutShe1l ExxonMobi1 0.529 0.a2 I) = -). 8.642 0. .

900 (Symetric) (b) ~ = 108.397 . 102 9.049884 -0.306 .-027909 o .393975 0.514314 -0.35 ê3 ê4 ês ê2 .315978 0.128991 0. ~ are íl: (.828018 0. .858905 0.270 -1.00070 . (c) Using (8-33).626 .513 10.118931 -0.1.431404 -0.767815 -0.068822 -0. .040076 -0.507549 0.00106.00012.00054.479670 0.259861 -0.062264 0. 8. ~ = .0036) (d) Stock returns are probably best summarized in two dimensions with 80% of the total variation accounted for by a "market" component and an "industry" component.15 .00025 14 = .030 55.29 14 = 4.554515 -0. Bonferroni 90% simultaneous confidence intervals for Âi Â.673 s= 4.27 A ei t =43.00014 is = .: (. -0.202000 .037630 0.953 12.170 (b) From par (a).00019.t = 31.2.11 (a) 3.60 is = 2.440 89.00195) Â.00137 t = .759246 0.769147 -0.249442 -0.067 9 :570 31.078 .937 -.081081 -0.308887 -0. .t = .28.00100) ~: (.

038xl +.60+2.91 =-.220 -.129xs 5'2 =-.y¡.669 .508xS (c) Correlations between variables and components: Xl X2 X3 X4 Xs r.480x3 +.27+43. Variable Xs now has much more influence in the second principal component.15+31/29+4.249x2 -.222 -.3) but did change the nature of the second component. The second component appears to be influenced most by roughly equal contributions from percent with professional degree (X2).590 The proportion of total sample variance explained by the first two principal Components is (108.759x3 -.119x2 -.x. We might call this component an employment contrast.316x4 -.8S9x4 +.35)=. -. We might call this an achievement component.398 -. The first component appears to be a weighted difference between percent total employment and percent employed by government.062xl -.171 . percent employment (X3) and median home value (xs). The change in scale for Xs did not appear to have much affect on the first sample principal component (see Example 8.238 r.212 .80.27+43.Y2.527 -.X¡ -.947 .15)/(108.669 -. .

c'Ompont is essentially IIso1ar r-adiation".110 .378 -.4'64 . Tliefirst .0 1..024xS +. 1 33 1 .254 .593 S = . 1 94 . Y1 = -.319 . ~6 = . 3.2£. 5:7 = ..0 -. 15'6 .074 .12 (300.914 .167 ..673 2.OO5x4 +.0 -.154 1.Oinxi +.395 6... ~3 = ll.:.586 -2.OO2x7 accounts f-or 87% of the total sampl-e variance.235 .364 3 .316 2. 500 8..479 (Symmetrk) 1.142 1.171 .779 30. ~2 = 28.28.0 Using $: ~1 = 304.101 1.978 .0 ( Symetri c ) .822 ..1 . ~4 = 2.172 -2.993x2 +.28. 51G) .448 1.166 ..779 .182 1 .014x3 -.53.177 11 .502 .21 The first sampl-e princi-pal component .4~.811 . ~Nete t~ large sample varianc~ f"()r x2 in S).411 ..0 R = 1.04'5 3Q.27-0 .052 . ~~ = 1.624 1 .52.1 34 .183 .089 .0 1.23'5 .11£ .. 768 2 .557 . .112xii +.522 .297 .

54.20..e sample principle components are A Yl = .378z4 -. A ~1 = 2 .~05z2 -.527z2 +..ó44z1 +.e of the pol1uti()n level ~ The second component is largely cemposed of "solar radiati.1~9z6 +.3 = 1.407z4 +.+.73. .on". It might represent the effects of solar radiåtion since solar radiation is involved in the production of NO and D3 fro!l the other pollutants. It might represent a wi~ transport .34. The data can be eff€ctive1y summarized in three or few~r dimensions. effect. .. It might be some general measur..541z7 These components ~cceunt fer 70% of the total sample vari ance.tind" and certain pollu- tants (e. À£ = .435z4 -.-. -.007z3 -.-.113z3 -.~37Z1 -.324z6.319z7 .551z3 .308z7.278z.. A "better" interpretation of the components \'iould depend on more ...extensive subject matter knowledge. remaining variables. and the pollutants "NO" and Iln3". The choice of S' or R makes a difference.5S7zti . Yi = -.g. The first camponent contrasts "\'/ind" with the.199z5 +. ~2 '=: 1.. .173 Usingji: .225z2 -. A 1.498zS -. Y3 = . À5 = :65. . À 7 = .197z5 +. "NO" and "He").39.. The third 'c-omponent is -eampos-d largely of ii.16 The first thre. '"4 = .

0.129607 0.347989910 0.074885659 o .5514 1.477385 0. 0342 o .3863 o .174 8.347989910 0.Q52 -0.86431 1.58g699088 '0. 15815. 11'(131391 0.78640 1 .861455923 Correlati~n Matrix XL X2 X3 14 X5 X-ô XL 1 . 157" 1.47738 0.4958 0.7'Û46 1 .0102 0. 65'679 o .087004959 l1C933412 0.276915309 X5 1 .931345370 0.040543 0.12733 0.21740"5649 o .110933412 o ..0790 . 53"66 0.0000 o .65031 0. 5514 0.008817'694 X3 XL X2 X3 X4 X5 X6 0. 000 )(4 X5 X6 o .118469052 1 .024851988 0. 4554 o .021814433 -0.3616 o .021814433 o .179408 0.1875 o .3863 o .0707 -. '064672 o .110131391 X4 X5 X6 0.536' 0.93134537C 0. 1184"69052 o .1-08386 0.15815'0852 -0.'ÛOOO o .0000 o .ger than that of x4.4554 o .0790 X2 X3 0.024851988 o .14478 o .38803 0. 2432"6 o . 217405"649 .276915309 0.1570 0.89479 0.29881 o .3616 o .862172372 -0. 5350 o .0102 1 .087004959 0.704'6 0.-07"645 0 ..110409072 0.3464 .2"6228 0.388886434 X6 0.9594"6 Eigenvalue Difference 2.008817694 0.5350 o .4958 o ..388886434 o .589699088 O. 3464 0.13 (a) Covariane Matrix XL X2 XL X2 X3 4.77764 0. Eigenvalues of the Correlation Matrix PRIN1 PRIN2 PRIN3 PRIN4 PRINS PRIN6 Proportion 'Cumula t i va 0.78786 1 .612821160 0. 0342 -0. 0000 .0000 0.0707 (b) We wil work with R since the sample variance of xl is approximately 40 times lai. X4 " . 1875 1.00000 .6'54750889 0.571428861 o .074885"659 o .

.44446 -0. Also. -.5.60720 o . .14412 8.3) = ..3 The first sample principal component explains a proporticn AI J 200. 687297 o .53'67"6 0.14 S is given in Example 5~Z_ ~l = 200.998x2 +. Å3 = 1.429880 0. ~2 = 4. We nee the fit four components to summarize the data.029x3 .02175 -0.061367 -.665604 -. 073090 o .055877 -.339"55 o . 873960 o .628157 .5 + 1..02766 -0 .4458 .498607 X4 o . 29923 o .029 .75289 o .178715 o .5..124585 . Hence Yl = -.076408 0.051.302ti8 X4 o .55393 -0.5/(200.339330 o .16171 0.600851 PRING O. 146492 o .421060 0. .175 Eigenvectors PRIN2 PRINl PRIN3 PRIN4 PRIN5 -.103175 o .211635 o . 90675 o .291738 0.026600 0.97 of the total sample variance.203339 o .521276 XL X2 X3 X6 .051x1 -. 78335 X5 o . =1 = (-.998. 020959 .17931 -0..39440 -0. -09457 0.07646 0.72056 o .5 + 4.380135 XS 0.532689 .200526 -..10986 -0.88222 o .331839 0.429300 o .551149 -.207413 -. 402854 -.116262 o .794127 (c) It is not possible to summarize the radiotherapy data with a single component.358773 o ..04949 X3 o ..43969 -0 . (d) Correlations between principal components and Xl . 053090 -.X6 are PRINl PRIN2 PRIN3 PRIN4 -0.37909 XL X2 X6 -0.

. Theseàata appear to be approximately normal with no suspect observations.176 The first principal cQmponent is essentially Xz = sQdium content. . li '¡w ".. 1 -2.0 1 I 0... Q-Q plot for Yl.i~ 3.. . o.0 . A (NQte the (r.\) q(i ) . * oW 'f' -45.¡ 'I' ** * .elativ... ' ** ** w '.0 1.0 i. w . w .. -60.... "s"dium in S)... * . -30. 2.0 -1.ely) large sample vtlriance for Q_Q plot of the Yl values is shown bel-ow. Yl (1) * -'15.. -75.. .

49. .09 850.01.5lx3 + .13.01/~948.. = .177 1088.23 784. À2 = 4'68.15 831 .28 1128. Y.51. . À3 = 452..76 of the total sample variance. obtained from R. .41 S = 7'63.3779.49x2 + .1"5' ~ A A A À1 = 3779. . .6.45.15 904.72 Consequent1y~ the first sample principal component aCt:ounts for a proportion .53 (Symmetri c) 1395.32 92'6.40 8.45xi + . (Note the sample variances in S are nearly equal). À4 = 24~.53x4 The interpretation of the first component is the same as the interpretation of the first component. in Example 8. A 1 so . "" :1 = (.l1 = .73 1336.25.53) Co nsequent 1 y ~ .

153g 0. Dev.34'07 '0.5346 1.9293 1.4676 0.4203 0.7126 -0.5004 0. 3925 0. of Vax.0143 -0. 6284 -0.8874 0.6668 -0.2530 0.liey~ is eontrasted with aU the others in the first principal eompoont . 1886 -0.81£1 0.0355 0.5318 pel pc2 pe3 pe4 St.1107 0.5172 -0.4938 0.3'081 -0.16.e largemouth bas.4429 Eig~nvectors of R O. 7~32 0 .5074 0. Principal component analysis of Wisconsin fish data ..1731 pe 1 pe2 pe3 pc4 peS pe6 St. 1.1969 0.6716 0. 0.7354 0.5385 0.7249 -0.(') An are positively correlated.1539 0.1810 ~O.2302 -0.0872 -'0. ().3549 1. The second principal component is essentially the 'Walleye and somewhat th.392'0.1863 0.4242 Eigenvectors of R -0.5711 0.5914 -0.7'013 -'0.0719 0.n Northern pike and BluegilL . The thkd principal component is nearly a contrast betV'æ. 0.5284 -0.x4 Eigenvalues -of R 2.7074 0.4111 0.0403 0. (b) Principal component analysis using xl .0100 0.3621 -0.5972 0. The second contrasts the Bluegil and Crappie with the two bass.2911 0 .0917 -0.5555 -'0.x6 Eigenvalues of R 2.2787 -0. of Var.7352 0.1786 0.6983 -0.6231 0.3765 -0.4652 ~. 1.0834 0.6644 0.£385 0.3216 0.3871 -0.7846 0.4021 -0.0707 Cumulative Pr~p.84£9 0. 4295 O.2927 -0.'0471 0.0353 0.178 8.1107 Cumulative Prop.9842 0.7293 -0.6157 0.9921 0.0000 The first principal component is essentially a total of all four. 0000 The \Va.66£5 Prop. Dev.6513 Prop.6722 0.7875 0.look at theLOvariance pattern).8893 1.0114 0. (c) Principal component analysis using xl .2016 0. '0.1640 0.2250 0.8911 ~O.7'071 0.4702 0.

.Q223S.008 o .500 .0115684 . .Q13001'6 .179 8.430 .633 .0103784 .0200857 .0177355 .0089085 .337 . . ..0803572 .0210995 . -.0168369 .. .01'05991 The eigenvalues are o . .218 .514 -x x .432 .204.0167936 .0185352 .0114179 . .673.0085298 .lS4 0.003 0..0080712 .06677"62 .018 0...159 .Q0 . .181 . .0912071 . ..0128470 .0079578 .0.001 and the first two principal components are .024 .0694845 .17 COVARIANCE MATRIX ----------------- xl x2 x3 x4 xS x6 .02 0.

389z5 + .406 -0.040 0. Marathon 100m(s) ().414 -0.295 0.923 .674 0. These rankings appear to be consistent with intuitive notions of athletic excellence.741 PC4 -0.195 0. Poland 9.. marathon) and might be called a distance component.002 Eigenvalue 5.364 .645 0.395z4 + .6287 0. Czech Republic 8.572 0.141 0. Russia 4.959 Variable 100m(s) 200m(s) 400m(s) 800m 1500m 3000m Mara thon PC1 0.799 0.911 .887 .378 0.407 -0. USA 2.378z1 + .000 proportion 0.194 PC5 0.830 0.867 0.806 0.237 -0.383 0. 3000m.368z3 + .308 Z5 Cumulative proportion of total sample varance explained by the first two components is .990 0.335 .830 0.127 -0.919 0.782 0.669 Eigenanalysis of the Correlation Matrix 0.148 0.856 .389 PC3 0.240 0.732 0.368 0.355z7 )72 =-A07z1 -A14z2 -AS9z3 +.094 -0.018 0.245 .309z5 +A23z6 +.973 0.013 0.l.266 0.328 -. Correlations: 100m(s).909 0.323 -.0910 0.791 0.422 0.082 0.854 0. Great Britain 7.820 0.423 0. (d) The "track excellence" rankings for the first 10 and very last countries follow.Yi.080 Pe6 PC7 0.327 0. 200m(s). 1.0545 0. 200m 400m) with the times for the longer distances (800m.459 0.937 .677 0.161z4 +.180 8. China 5.067 0.919.0143 0.389z7 Zi Zz Z3 Z4 r. France 6.819 -0.048 -0.026 -0.321 -0.905 0.309 0. 1500m.090 1.189 -0.167 -0. .540 )71 = .376 0.089 0.161 0. Australia .128 Z6 'l7 .008 0. The second component contrasts the times for the shorter distanes (100m.680 0.809 0.355 PC2 -0. This component might be called a track index or track excellence component. 800m.952 r.2793 0.18 (a) & (b) Principal component analysis of the correlation matrix follows.376z6 + .. Somoa .1246 0.389 0.587 -0.998 0.352 -0.247 0.8076 0.871 0.731 0.240 -0.728 0.383z2 + .. 1500m.977 cumulative 0. Germany 3. 400m(s).Y2'Z¡ -.801 0.720 0.101 -0.941 200m(s) 400m(s) 800m 1500m 3000m Mara thon 200m(s) 400m(s) 800m 1500m 3000m 0. (c) All track events contribute about equally to the first component.395 0.017 -0. Romania 10. 3000m.906 .745 -0. 54.

055 0.427 0.964 PC3 -0.991 0.1054364 0.03338 0.376 -.3 lOx¡ + .0997547 0.320 Cumulative proportion of total sample variance explained by the first two components is . The second component contrasts times in mls for the shorter distances (100m. Marmls 3000m/s 1500ml s 800m/s 40 Oml s 200m/s 100ml s 0.357 x2 + . All track events contribute about equally to the first component.0966724 0.0810999 Marml s 0.095 5'1 =.829 PC1 0.874 .730 PC4 0.410 .21 IXs +.1184578 Marml s Eigenanalysis of the Covariance Eigenval ue Proportion Cumulati ve Variable lOOmIs 20 Oml s 400m/s 800m/s 1500m/s 3000m/s Marml s 0. .944 .376 0.002 1.1018807 0.0956u63 200m/s 400m/s 800m/s Marml s 0.445 -0.519 -0.937 .097 0.519x3 +.73215 0.311 0.057 .YllXi .391 0.460 0.689 -0.08607 0.299x4 + .902 .829 0.00885 0.0905383 lOOmIs o .081 -0 .199 0.187 -0.423X7 5'2 =-.434 0. 3000m/s.124 0.0822198 0.1138699 0.007 0.299 0.434x2 -.926.379x3 + .423 0.0954430 0.357 0.0735228 0.998 PC5 PC6 PC7 0. This component might be called a track index or track excellence component. -0809409 0.10831'64 0.18.136 -0.046 -0. 499 0.882 .886 r.0749249 o .376x¡ -. 3000m. 800m/s.585 -0.379 0.667 -0.19 Principal component analysis of the covariance matrix follows.176 . marathon) and might be called a distance component.053x4 +.X¡ -.098 -0.030 0.367 -.089 -0.951 .00207 0.396 0. Covariances: 100m/s. 400m/s.000 -0. 114'6714 0.396x6 +.18.01498 0.460x6 + .276 .0650640 0.1377889 0.734 0. 200m/s.010 0.Yi.127 0.357 -0.310 0.236 -0.1667141 l500m/s 3000m/s o .136 0.391xs + .053 0.138 -0.1465604 0.1765843 0.0%01139 0.894 -0.12384.237 -0.624 0.445x7 Xl I X2 X3 X4 Xs X6 X7 r.017 0.184 -0.2'65 0.00617 0. The interpretation of the sample component is similar to the interpretation in Exercise 8. The "track excellence" rankings for the countries are very similar to the rankings for the countries obtained in Exercise 8.053 -0.1437148 0.0921422 0.128 0.0943056 0.323 -0.08'64542 0.132 0.435 0.211 0.038 0.Q5 0.981 0.0933103 0. 1500mls.926 PC2 Matrix 0. l500m. 200m 400m) with the times for the longer distances (800m.274 0.nn 8.

the men's track data is consistent with that for .061 -0.529 0.Z¡ .972 PC3 PC4 o .914 . France 5.217 -0.295z6 -. 500m.381 -0.370('6 + . Eigenanalysis of the Correlation Matrix Eigenvalue 6. These rankings appear to be consistent with intuitive notions of athletic excellence.061 -0.541 0.133 0.334 -0.470'2 +.948 .918 0.387 0.946 0.YI. Germany 9.067 0.295 -0.026 0.353 0.089z4 -. Canada .'6384 Variable PC2 PC1 0.276 -.351 0.423 .244 0.339 ('3 + .183 0.OOOm. Great Britain 3.055 0.338 0.339 0. (d) The male "track excellence" rankings for the first 10 and very lasti:ountris follow.028 0.0707 0.366 0.OQ6 0. Kenya 4. Australia 6.345z3 -.335 -0.0097 0.366z7 + .896 r.346 0. 200m 400m) with the times for the longer distances (800m.366 10.0976 0.004 -0. Cook Islands The principal component analysis of the women.000 PC7 PC8 0.123 -.267 -. 1.259 -0.077 0.087 -0.001 1.018 -0.353z4 + .594 0.009 0.039 -0.071 -. 1500m.345 -0.300 0.878 l4 .370 0.652 -0 .2058 0.334z7 -. 114 0.529z1 +.54.359 -0. Italy 7.851 -0 .080 0.012 0.349 -0.069 Yi = .387('8 Zs ('6 ('7 ('8 .993 PC6 -0.366z5 + .984 PC5 0.375 0.227 -0.344 -0.999 0.860 .706 -0. lu. Portugal 10. This component might be called a track index or track excellence component.Y2'Z¡ . 233 -0.838 cumulative 0.154z5 -.04'69 0.838 0..182 8.134 -0.346('2 + . The second component contrasts the times for the shorter distances (100m.341 -0.273 -0.040 0.000m Marathon 0.354z8 Y2 =.147 0.376 Cumulative proportion of total sample variance explained by the first two components is .440 o .328 0.354 100m 200m 400m 800m 1500m 5000m 0.. Brazil 8.7033 proportion 0.918.470 0.948 .332z1 + .958 . USA 2.530 0.309 ('1 Z2 ('3 r. (c) All track events contribute aboutequally to the first component.003 -0..072 -0.2275 0.066 0.089 -0.244 0.348 -0.917 .362 o . marathon) and might be called a distance component.697 0.236 -.154 -0.20 (a) & (b) Principal component analysis of the correlation matrix follows.783 -0.332 0.

0572512 0.3lOx7 +.317 X3 + .046 0.327 -0.390 -0. 8.20. 200 400m.0745719 0.00752 0.844 0.428x6 + . 5000mls.998 Eigenvalue 0. 1500mls.364xs+ .0937357 0.436 0.261 0.445 -.420 0.180 -0.0468840 0. 8oom/s.93 0. 000 PCL 0. The "track excellence" rankings for the countries are very similar to the rankings for the countres obtained in Exercise 8.469x3 -.416 pc3 PC2 -0.450 -0 .033 . 033 0.079 0.177 pC7 0. All track events contribute about equally to the first component.266 Cumulative proportion of total sample varance explained by the first two components is .339 0:584 0.010 0.261x6 +.858 .0469252 0.902 .0587731 0./s 0.844 0.063xS +.384 -.310 0.416xs 5'2 =-.111 -0.X¡ .039 pc8 -0.0432334 0.391 j\ = .0562945 0.215 0.317 -0.119 -0.Y2'X¡ -.608 -0.311x2 +.187 -0.'S23x2 -.244 0.181 .006 0.247 0.387xs Xl X2 X3 X4 Xs X6 X7 Xs r.432 0.0736518 0.0425034 0.923.0905819 cumlative 0.822 .421x7 + .20.0571560 0.0979276 Covariance Matrix Eigenanlysis of the 0. 'SooOm.374 -0.00322 0.317 0.303 -0.849 . .421 0.543 0.469 0.0431256 100m/s 200m/s 400m/s 800m/s lS00m/s 5000m/s 10.013 0.YI.0688217 0. .128 pc4 pc5 PC6 -0.0434632 0.0729140 10.983 0.0617664 0.0599354 0.0909952 0.044 -0.6. 1o.217 .332 0..278x4 + .OOOm/s Marathonm/s 5000.0553945 0.096 -0.lfiJQ~~ 10Om/s 200m/s 40Om/s 800m/s 1500m/s 0. The second component contrasts times in rns for the shorter distances (100.334 -0.183 component analysis of the covariance matrx follows. This component might be called a track index or track excellence component.oom/s.050 .100 -0.947 Eigenvalue 0.0482772 0.0535265 0.311 0.0558678 0.023 0. marathon) and might be called a distance component.OOOm/s Marathonm/S 5000.215 -0.684 0.0523058 0.535 0.523 -0. 10.387 -0.OOOm/s Marathonø/S 0.584 -0.119 0.428 0.49405 0.'696 -0.04622 proportion 0. 400m/s.934 r.01391 0.235 -0. .439 -0.006 0.00575 0.0314951 0.016 -0. The interpretation of the sample component is similar tt) the interpretatìon in Exercise 8.244xl + .364 0.070 -0..432xl -.0648452 0.033x4 +.0541911 0.OOOm/s Marathonm/s 0.318 0.964 .0761i388 0.432 -0.008 -0.278 0.00112 proportion cuulative Variable 10Om/s 200m/s 400m/s BOOm/s 1500m/s 5000m/s 10.0942894 0.0448325 0. 2oom/s.0537207 0./5 10. .948 .074 0.002 1.024 0.01332 0.0567342 0.442 -. 800m) with the times for the longer distances (1500m.352 -0.0428221 0.063 0.341 0.21 Principal Covariances: 1oom/s.0434979 0.368 0.970 0.173 0.0Q.923 0.971 .993 0.0959398 0.

/ 0.1 o .487193 ..509727 -.037984 0. 000003 PRIN1 PRIN2 PRIN3 PRIN4 PRINS PRIN6 PRIN7 ~!!S!!. I 2500 8 Y1 8 1 8115 2000 5 8 1 5 1 8 81 8 8 18 5 885 5111 11 551 51 15 111 1 1 8 155 55 18 1 1 8 8 8 1 8 8 8 8 8 1 5 5 1500 -100 a 100 Y2 200 300 0. Symbol is value of X1.535569 o .311194 o .000018 0.184 8.593037 0.998778 0.011906 _. 008526 0..22 Using S Eigenvalues of the CovarianeeMatrix Cumulative Eigenvalue Oi ffe renee proportion 20579. 009680 0. e72697 .7 5.000130 0.608787 _ .191437 0.003227 _.008577 _ .005887 0.5 0.014293 _.4 0.284215 0.010389 0. 008388 -.808198 0.000069 0.3 0.003112 0.004847 0. 000444 o .Y2.034277 0.487047 o .029196 0.005597 o .425175 0.009330 (Õ.904389 0.005278 O.000341 Plot of Y1.000000 0.1 3.018864 0.082331 _.1 .4 15704.013820 _ .8 0.390573 0.000493 0.004886 _ . 000256 yrhgt ftfrbody prctffb fraiie bkfBt saleht salewt . 0.2 2.286337 0.0 Eigenveetors X3 X4 X5 X6 X7 X8 X9 PRIN3 PR IN4 PRINS PRIN7 PRIN2 PRIN6 PRIN1 0. 855204 0.024592 _.000253 ~ o .000457 0.043786 0. (NOTE: 10 obs hidden.9 2.6 4874.133267 _ . 002665 -.000213 0.748598 -.

.314787 0. 236723 0...452854 0.568273 _.02....78357 0.242818 0.2 -1 0 .. 042442 ftfrbody pr(:tffb freiie bkfat .714719 0. . 582433 0. 02644 0..33713 0..14650 0.006722 1 .038732 0.94580 0. ..253312 _ .02 2 3 .. (NOTE: 27 obs hidden. 600 1500 1 -3 ..799288 -..007728 0.185 8.433957 .~3562 o .060204 0.600515 Plot Syiibol of Vl.3 .215769 -... 176650 -.579367 0.290547 0.. 582337 is value of Xl.129837 -..412326 0.002397 .. (NOTE: FOA S 36 obs hidden..97235 0.186705 0..719343 0.276561 0.581171 0.142995 0.58867 0.208017 0.072234 _. ... .00000 Eigenvectors X3 X4 X5 X6 X7 X8 X9 PRINS PRIN6 PRIN7 0.22 (C"Ontinued) Using R ~igenvalues of the ~orrelation Matrix Ugenvalue Difference Proportion 'Cuaiulative 2..109535 0..774926 0.2 i i -1 a Q2 1 2 3 ..) 38 obs hidden.31996 o .18581 0.315508 0.449931 0..127800 _. 59575 0.020929 0. 800 .99328 o . 042790 0.101315 0.09945 0.618117 _ .247479 0.03930 0..269947 . ***...42143 0.017768 PRINl PRIN2 PRIN3 PRIN4 0..450292 0... .065871 _.Y2.415709 .105912 0.74138 0..02..113356 0.355562 0.*.160238 yrhgt o . -.452345 . ..191018 0.047036 sa leht salewt ...04706 PRINI PRIN2 PRIN3 PRIN4 PRINS PRIN6 PRIN7 o .) 1200 8 8 8 8 88181 1000 8 118851 8151 Vl 1 800 8 88811111 1 8 15 1 1111155 55 1 1 5 5 600 i i 800 900 1000 1200 1100 1300 V2 Plot of VL.177061 0..12070 1. . 88560 0.) 1200 2500 VL SymbOl used is hi( ~ ***** 1000 VL 2000 . (NOTE: Syiibol used is Plot of Yl.434144 0.77969 4.

9439 O. It is domiated by Weight.9461 0.9439 0.314€78 -0.9045 0.082353 0.0000 0.87 152.0000 0.87/4673. The second component essentially contrasts Weight with the remaining body size varables.4091 £2 0.598371 0.008692 -0.194132 -0.1~6 8.9503 0.519785 -0.058127 0. Gir.719785 .9200 0.41'0333 0.6447 0.54 Eigenvectorsof S (in colums) -0.9559 0.0000 Eigenvalues of R 5.052267 0.261937 -0.958 or 95.22€606 0. Body lengt and Gir.8752 0.458838 -0. 403'672 -'0.561034 -0.9635 1.443264 The first component might be identified as a "size" component.599053 -0.070597 -0. although the sample correlation between the second and Neck is small (-. The first component explains 4478. Body length.043918 -0.012754 -0.9177 0.849339 -0.52 0. The first two components explain 99.9177 0. component and Head width.9270 0.580499 0.9271 0.216748 0.0.-0000 0.110784 -0.1 % of the total sample varance.848576 0.12 1.368552 -0.558334 0.0266 Eigenvectors of R (in colums) -0.0565 0.8752 1.9635 0.581252 -0.064458 0.313431 -0.846078 -0.0000 0.060354 0.073950 -0.695916 -0.928388 -0.001815 -0.019105 -Q.9013 0.251473 0.9437 0.892805 -0.368132 0.9270 0.074260 -0.0473 0.186741 0.228969 0.9503 O.291938 0.222694 -0.286817 0.032666 -'0.060162 0.9544 0.435168 .000202 -0.411999 -0.0-04276 0.9271 1.231095 0.389366 -0. These body measurement data can be effectively sumarze in one dienion.9025 0.033880 0.23 a) Using S Eigenvalues of S 4478.092026 0.128024 -0.0000 0.9013 1.4'04313 0.035396 0.887138 -0.012289 -0.532348 -0.9437 0.470832 -0.409938 -0.32 8.355060 -0.8% of the total sample varance.84 = .243840 -0.319513 -0.05). 9200 0.9544 1.440119 0.012490 0.9461 0.9559 0. 9025 0.318718 -0.47 32. b) Using R R 1. Neck.0492 0.1758 0.303143 -0. those varables with the largest sample varances. Head lengt.9045 0.

the first principal component is a "size" component.941 or 94. Head lengt and Head width. All varables contribute equally to the first component.l87 8. . These data can be effectively sumarzed in one dimension.1 % of the total sample variance. The first two components explain 97% of the total sample variance. c) The results are similar for both the covarance matrx S and the correlation matrx R.6447/6 = . The analyses differ a bit with respect to the second and remaining components. The fist component in each analysis is a "size" component and aInost all of the varation in the data. Neck and Girth with Body length.23 (Continue) Again. but these latter components explain very little of the total sample varance. This component explains 5. The second principal component contrasts Weight.

Wisconsin.9 319.7 6 971.6 7.1 35.2 101.8 359.5 24138728.1173 Principal components yl y2 y3 y4 y5 y6 1 1745.8 279.8 1079573.6 -109.9 15 3076.7 419.9 -273.5615 0.3092 -0.S -199.8 650.S -140.cmponents of the Madison.5541 0.2 4 -1360.5 636.4 85714.188 8.6 42771)7 .2 -242.0642 -0.3 43399 .711)3 -0..4932 -0.1 444.9 -44908 .5 -147.6 728.0 -1322.2662 -0.3 618.3675 -0.9 761592.5 123.7 -316.3 286.9 502.4898 -0.3404 -0.3 181437.2 13 326.5157 0.1 -422.0 94302. 3 110517.8 1478.3 -244785.3 1~1312.6 Eigenvalues of S 4045921. 2667 0.4 -379.1543 -0'.6 -9.7 572.1796 -0.4 -44908 .5 -69.5 5 -1255.1 2 -1096.9 139692.0359 -0.4 16 2261.8 652.8 -899.8 -122.6 800 7141 S -72093.3270 0 .8 367884 .3269 0.8 85714.9 2265078.2 -158.7 -72093.8 1251.8 11 -3931.0316 -0.9 224718 .9 12 -1392.6 -324.4308 0.6 490.5 123.8 9 -497.3071 -0.8 330923.6 284.6122 0.2 10 -2397.6 Eigenvectors of S -0.2 110517 .1 -105.9 560.3862 -0.4094 0.8 222491.0810 -0.1 7 1118.4 1399053.2 3 210.6 .0 85.9 -942.5 -972.1 700.3 1079573.1 372.1 1458543.~ -1113809.9 139692.0 209.8 1698324.9 207.1 -499.0008 -0.9 -3715.2 178.9 13563.5 -15. Police Department data XBAR 3557.8 222491.5 -149.4 -1479.0 14 3371.5 8 -1151.6 1752.6415 -0.0567 -0.9 11'61018.S -462615.1 288919.24 An ellipse format chart based on the first two principal.3 -53.7 924.2 88.4 -893.0 -2285.9 -60.8 -293. 4 ~244 785.3575 -0.1 -274. 9 111)1018.2 -1113809.4126 -0.6 365.8 -598.3 -4'6261S .9 -114.4062 -0.3 -593.8 -83.1 523.1 1448.7 222.4311 -0.1 101312.1544 0.8 330923.4 2676.9 -1029.5 -70.2 152.4821 0. 1 420.8 809.~ 4277'67 .1 43399 .4 -1688.7 -104.1 1819.3 3.6 -586.6 2011.5696 0 .8453 -0.

189 2. M ~ en en iq 0 d .99 The 95% 'Control ellipse base on the first two principal. Period 12 looks unusuaL.cmponents of overtime hours ooo ~ ooo "' -400 o 2000 4000 y1 8. Sum of squares of unexplained t:omponent of jth deviation .. 4 6 . 1-0 . .5 X 10-7 yl + 4. 12 14 1'6 . .4 x lL-7 yi = 5.. 2 . . 8 Period .. . .25 A control chart based on the sum of squares dij.. It ~ 0 ~ .

454 0.o 0.087 -0. 3682 0.492 -0.26 (a)-(c) Principal component analysis ofthe correlation matrix R.115 -0.5888 0.521 0. Benev.7559 0.333 Cell Contents: Pearson ~orrelation Principal Component Analysis: Indep. Correlations: Indep.1966 0. Conform. The first component contrasts independence and leadership with benevolence and conformity.298 -0. It is difficult to provide an interpretation of the components without knowing more about the subject matter.525 -0.5 0. All four of the components represent contrasts of some form. Benev.007 -0.561 -0.121 0.187 Supp Benev Conform Leader Supp Benev Conform 0.982 PC4 PC5 -0.173 -0.274 0.190 8.008 0.327 -0.361 0.000 0.187 0.439 0.471 0. it appears as if 4 components should be retained.018 1.386 0. These components explain almost all (98%) of the variabilty.451 0.733 -0.295 -0.253 -0. t. Supp.151 0. Supp.713 0. SG-llot of Indap.667 -0.401 -0.439 l'ortion Cumulative Variable Indep Supp Benev Conform Leader 0.0905 0.469 -0.491 -0.480 Using the scree plot and the proportion of variance explained.0 1 2 3 Component Number 4 5 .648 0.018 0.118 0. Leader Eigenanalysis of the Correlation Matrix 1. Conform. The second component contrasts -support with conformty and leadership and so on.439 -0.788 0. Leader Indep -0.864 PCL PC2 PC3 Eigenvalue 2.460 0.548 -0.351 -0.

.. .:::_... -: . .~ . ..__.. 1 U' . .. " . . .. . ... .. ... ... e.-: ~\ '" . . . .. '.3 ....'::'/-... . -'. . .':. .. " .. .:---.___...-::-":___.. .. .. . . . ." ". .. .... . yll . . . o 2 1 3 . .. .: . . . -2 ... ... . .' ... /till o ... . . i 3 The two dimensional plot of the scores on the first two components suggests that the two socioeonomic levels cannot be distinguished from one another nor can the two genders be distinguished. .. . .' -: . .. . -3 . ". .- . .. -4 .::'--'. . .~ . . ..... . Scatterplot of y2hatvs 11l1lt ... .. " . . . 2 .. . ." . .' ..-. . .. . .. . ._-. .. . .. Observation #111 is a bit removed from the rest and might be called an outlier. fi ... ."". -4 ..:--d"::. .I. .. ..' ." ... ..:: .:--:'-.. .......191 .. . . . ..2 yII1 -I ~3 . .. . .":-: SCàlterplot òfy2h:al vsyiihat . . . . ..

The components are very similar to those obtained from the correlation matrix R.398 0.643 0. Seney.0718 -15.484 Eigenvalue Proportion Cumulative Variable PC1 Indep -0.8682 -8.115 0.4198 -7.612 0.219 -0.380 0. 509 0.090 0.354 0.0426 Supp 17. it makes little difference whether the components are obtained from the sample 'Correlation matrix or the sample covariance matrix.524 0.163 0. Leader Eigenanalysis of the Covariance Matrix 68.352 0.9580 Principal Component Analysis: Indep.392 0.7502 -4.1 LeaCler--Cv Mamx Scre Plot 50 11 i "1 ~ .9422 33.017 1.119 -0.192 covarance matrix S.868 pc3 PC2 -0.579 Leader -0.7 Inde Supp Benev Conform -18. Leader Indep 34.493 Supp Benev Conform 31.572 16.222 0.042 0.494 2.5134 0. All four of the components represent contrasts of some form.101 0.000 pc5 0.734 -0 .309 -0.7165 Leader Benev Conform 29. Supp.8447 9.515 0.422 0.271.3488 -13. of ItidèP# -.=i 30 ¡¡ 20 10 o 1 2 3 CompolientNumbe 4 5 .706 23.9729 5. Conform.7233 -9.386 0. it appears as if 4 components should be retained.079 0. These components explain almost all (98%) 'Of the variabilty.484 0. (a)-(d) Principal component analysis of the Coyariances: Indep. Seney.612 -0. Supp.583 0. The first component contrasts independence and leadership with benevolence and conformity. Conform.478 Using the scree plot and the proportion of variance explained.304 0. In this case.983 PC4 0. The second component contrasts support with conformity and leadership and so on.140 0.752 0.9419 Leader 26.

Large sample 95% confidence interval for Â.... ..752 )=(55. . 1 . . . . . So'. . . .. . . . . e. e.. .... . .. "" #111 ø . .. ... .. . . .. ...! . .. .. ø "" . . .. . .i: (l-1. . . .. .20 . .. .. . .31. .96-21130 ((l+1..... .. . . . _'. fill .. . 2 ... . . ... Observations #111 and #104 are a bit removed from the rest and might be labeled outliers. .. .. .Sctterplot of y2hatatv VB y1hàtcv Lj 15 to . 1# . . . .Uil o iO 20 yilhatcv The two dimensional plot of the scores on the first two components suggests that the two socioeconomic levels cannot be distinguished from one another nor can the two genders be distinguished.. .... ". .... . i.. . .21130 68..193 Scatterplbt of y2hatcov vs ylhatcov 15 .. . . .15 ø -J 10. ... -. _. .96.10 io o y1hav 20 ~ ." . 68.. -iO ... .. .. . .-.. 1# ~ l lö4 -15 -20 . ... ~ .90. ".. . . .83) .l. -. ... ..... i. .. u. ..752 ...

318 -0.960 Cumulative o .194 -0. 975 Cell Contents: Pearson correlation Principal Component Analysis: BL.506 0.020 0.698 The proportion of variance explained and the scree plot below suggest that one principal component effectively summarzes the paper properties data.000 PC4 -0.0076 0.053 -0. 1403 Variable PC2 BL EM SF BS PC1 0.508 0. 988 0 .27 (a)-(d) Principal component analysis of the correlation matrix R.261 -0.485 0. SF.0126 O.960 0. BS BL EM SF EM 0. Correlations: BL.800 0.914 SF 0.565 0. All the variables load about equally on this component so it might be labeled an index of paper strength.510 -0.194 8.035 0.500 0.995 0.984 0. EM.8395 Proportion 0.597 0.819 -0.998 PC3 0.002 1.942 BS 0 . 003 0.237 -0. SF. EM. 875 0 . BS Eigenanalysis of the Correlation Matrix Eigenvalue 3. Component ftJlmbe .

006 0.302871 1. SF. Covariances: BL.. .. EM.997 PC1 PC2 0.50 y2hiit U)O 0.497 -0.001 1.003 0. the cOl'elations of the variables with .987966 0.988 0.733 0. Sætterplot ofylhat vs y2hat . ~3 -4 -050 "0. but there are some differences in magnitudes.856 0.295 0. BS EM SF BS 0.198 0. EM.458 -0. 88£636 4. .25 0. 000 PC4 -0. The loadings of the variables on the first component are all positive.434307 2. . BS Eigenanalysis of the Covariance Matrix Eigenvalue proportion cumulative Variable BL EM SF BS 11.332 0. ..75 1.5 (a)-(d) Principal component analysis of the covariance matrix S..009 0.259 0.988 0.. . .3Hl 0.032 0.325 -0.195 The plot below of the scores on the first two sample principal components does not indicate any obvious outliers. e. 'O e.901 -0.513359 0.104 0.786 -0.147318 i..999 PC3 0. SF.. o.431 0. .480272 BL BL EM SF BS 8.~.987585 0..201 0.: .204 0. However.155 0.140046 0.:. . .364 The proportion of variance explained and the scree plot that follows suggest that one principal component effectively summarzes the paper properties data.972056 Principal Component Analysis: BL. .. ~ . o. . .25 0.00 0.

.. 27. .. . SF and BS respectively.. a.. . .8 y2hatcov / 1. 15. 0.0 0.998. .5 .196 the first component are . ".0 .928..5 . this component might be labeled an index of paper strength..2 1. . . . .' .. . Component NurilJ The plot below of the scores on the first two sample principal components does not indicate any obvious outliers. .plot of ylhatcov vs y2håtc . EM. . ..6 .a. . Again..989 for BL. 17.4 0..990 and . 'Stàtb~r.... .

.. .197 8. 34. ¡ ¡OO 100 0 . .. ... . . " ".. r. .. . 300 .'1 "1 i .. ' tt 1 I) ... . . I . l=. 0 20 li 3~. Removing the outliers has some but relatively little effect on the analysis. Observations 25. Scttei¡løt øf'family YS Distad 160 . il ?i 114'1: 200 100 0 Dist 300 40 50 5cttl'1CJtCJfPistRlI"SiCatte 500 .. . "1 i'.. .. . . #" r." .~~ . 21) :...28 (a) See scatter plots below.. . 69 and 72 are outliers. Five components explain about 90% of given the the total variabilty in the data set and seems a reasonable number scree plot. . . 40 Catt 60 8Ø 100 (b) Principal component analysis of R follows. . .

68.132 0.111 PC4 0.219 -0 .198 . Cotton.249 PC6 -0.1114 0..216 0.751 0.247 0.011 -0.632 -0.311 0.033 -0.357 0.240 -0.3 Coirpone 45 Numbe 6 Prlnclp81 Compon8nt An8lysls: AdjF8m.293 0.048 0.030 0.121 0.059 0.011 0.361 -0.194 -0.181 -0.621 -0.569 0.009 -0.273 0.095 . 079 -0.988 1.137 0.745 0. 127 0.029 0.129 0.941 0.452 -0.554 0.373 0.088 0.616 0.149 -0 . · Eigenanlysis of the Correlation Matrix proportion 4.604 0.458 -0.249 0.089 0.363 0. Dlsd.342 -0.309 0.968 0.6058 0.030 -0.759 -0.146 -0. 000 PCL 0.014 -0.598 0.454 -0.579 -0. 002 -0 . 504 0. AdjMalz AdjSorU.000 variable AdjFam AdjOistRd AdjCotton AdjMaize Adj Sorg AdjMillet AdjBull AdjCattle AdjGoats pe 0.831 0.077 -0.024 0.460 Eigenvalue 0.818 0.499 -0.187 -'0.1443 0.013 C\lative 1. 0581 0.7918 1.067 0.045 0.118 0.269 0. 068 0.372 -0 .715 PC2 PC3 PC4 PC5 -0.645 0. MIII8 BulL..173 0.170 0.190 -0.385 -0.971 0. Møz Sor9.625 0.019 0.065 -0.110 0.021 0.098 -0.687 -0.34.139 0.486 0.134 0.218 0..274 0.048 -0.102 0.2720 0.300 -0 . 028 0.242 PC8 PC9 0.329 -0 .171 0.200 -0.056 0.065 0.067 0. 675 -0.352 0.043 -0.885 0_5044 0.242 0. AdjDlstRd.770 o .2400 0.509 -0.169 0.574 -0.444 -0 . .160 0.263 -0.445 0.027 0.116 -0.164 0.415 -0.077 0.465 0.284 0. 4381 Eigenvalue 4.434 0.123 -0.502 0.396 -0.355 0.606 0.100 -0.0845 0. 053 -0.041 0.204 0.797 -0.833 0.182 O.197 0.941 0.1851 O.021 -0.027 0.134 O.225 -0.040 0.012 Eigenvalue cuulative proportion C\lative variable Family OistRd cotton Maze sorg Millet Bull Cattle Gots 1.440 -0.337 -0.072 -0.392 0_407 PC7 -0.351 PC5 0.100 PC6 -0.3661 0.497 -0. 043 0.069 -0 . 1718 0.126 0.055 0.051 -0.038 0.049 -0.068 -0.6043 0.008 0.041 0.100 -0. 089 -0 .461 0.378 -0.402 PC7 PC8 PC9 -0.082 -0.360 0.1470 0.527 -0 .012 -0.278 0.496 -0.460 0.379 -0. AdjCotton.353 0.72 remove) Eigenalysis of the correlation Matrix proportion culative 0. 2364 1.246 0. 594 -0.361 0.465 0.411 -0.241 0.352 0.1182 proprtion 0.900 1.458 0.240 0.122 0.9205 0.446 0. 131 Princlp81 Component An8lysls: F8mlly.987 Eigenvalue 0.030 0.388 -0.Outle 25.016 0.255 PC2 PC3 0.097 0.215 -0.229 0.

Without additional subject matter knowledge.4137)2 = 4.0782 so e .1111 -T -1. makes the largest contribution to d~18 and . This component might represent subsistence farms.0946 v = 2 (.0946(11.186. The fourth component appears to be a contrast between distance to road and milet versus cattle and goats. (. we set the chi- 2(.07) = 1. The fifth contrast sorghum with milet.199 (c) All the variables (all crops. all livestock. S~2 = . The alternative control char is plotted on the next page and it appears as if multivariate observation 18 is out of control. lie outside the ellpse. this component is diffcult to interpret. this component is difficult to interpret. (. . Observations 3 and 11 Scalterplot of y2hat-y2bar vs ylhat-yl_ . Â.4137) . The ellpse consists of all YI':h such that Yl + ~2 S X.0782 = .5 -1 o 1 2 ylhat-ylbar (b) To construct the alternative control char based upon unexplained components of the observations we note that di = .U782 squared degrees of freedom to 1) = 5 and the VCL becomes ex.05) = 5.4137. Conservatively. Milet and sorghum load positively and distance to road and maize load negatively on the second component. t = . component appears to Again.354. 8. y.0. family) except for distance to road (DistRd) load about equally on the first component.05) = . This component might be called a far size component. The third component is essentially a distance to the road and goats component.29 (a) The 95% ellpse format chart using the first two principal components from the covariance matrix S (for the first 30 cases of the car body"2 assembly data) is "2 shown below.05 or approximately 1.4.99 Â. where -l = . For observation 18.

=ä 5(. the variables t. l.0 ." 1. Car body #18 could be examined at locations 1 and 2 to determine the cause of the unusual deviations in thickness from the nominal levels.05) .200 getting the most weight in Y 4 are the thickness measurements Xl and X2.

81 . The first . hi = 9.63 .711 " . = .lY21' .25 .4 i (. Corr(~..7~29J . ' L = h1 ~1 = Ii .1.45 .21 =.F.5. .25 so 2 = LL' + 'l 9.35 .91 . LL' = .7 .45 L' = (.507. ~93.3. .) = Cov(Zi'Fi)' i = J.3 a) L = r'':1 = 1i . Thus Corr(Zp'F1) = 111 = .81 h1 .Ìi = .5).25 The communalities are those parts of the variances of the variables explained by the single factor.e = f ..'5 (.35 . 2 .) = .63 .63 .7.451 9.variab1e.lil.i6 = . b) corr(Zi'F.31 = .7 .5ti23 =.ii = .40Hi .49 . b) Proportion of total variance explained = ~ = .2. ' .35 .corr(Z3. Zl' has the largest correlation with the factor and therefore will probably' carry the most weight in naming the factor(. (.65 . .63 .9.'¥ = LL = .45 .831 .6251 9.' By (9-5) cov(Zi.49 .1 .35 .9 . Slightly different (.8' .87'61 from result in Exercise 9.2 å) For m-' h1 = 9. . .F1) = 9.Fi) = 9. .III = 49 .201 Chapter 9 9.

LL .5 Since V is diagonal and S .'¥-1L)-1L''1-1)(LL' +'1) = '1-1 (lL' +'1) -'1-1L(I +L''l-lL)-1L'v-1(LL' +'i) '-(use part (a)) = .-1Ll +1 -'1-1LL' +'1-1L(I +L'V-1L)-lL' _ '1-1i(i +LI'1~1L)-lL' = I Note all these multiplication steps are reversibl~. . S . It A (sum of squared entri es of : S ....) s ~~+l + À~+2 + .1 . .LL =. hint. \'ie post multiply by (Ll' +'1) to get I = ('1-1 -'1-1L(I +L...6 a) Follows directly from hint. - be since m = 1 common factor completely determin._.c . Ai Ai tr(P (2)A(2)P (2) (p (2)A(2)P (2)) J = tr(P (2)A(2f (2)P (2)) ..i = tr(A(2)A(2)P(2)Pc2)) = tr(A(2)A (2)) ..1. '11 = ~2 1 + ~2 2 + .entries of . A m+ m+. A " I . .202 Result is consistent with results in Exercise 9.-1 (Ll + '1) .. (sum of squared entri es of S . has zeros on the diagonal.. + l! Therefore..... c) Multinlyin~ the result in (t) by L we get .. Ai ..'¥-l L( I ~'(I + L' '1-1 L) -1) l' .LL' -. S . By the..P(2)A(2)P(3) \'1hich has sum of squared er1tri es A .LL)..V) S (sum of squared . 9. A ¡Ai.. It shoul.e~ e = 2 . A. b) Using the hint..'l . + Àp 9. "'i Ai A A.'(-1 LÙ + L ''1-1 L) -1 L' = .LL i .

(use part (a)) = '¥-lL-V-\(1 _ (I i-L1'1-1L)-1) = iy-'L(I +L''1-1L)-1 Result follows by taking the transpose of both sides of the final equal ity.1 9. 1: = Ll + 'i for m = 1 imp 1 i es = . from ..4) and 9.7 Fran the equation ~ = Ll . we have .21 = .9.wl' a2i = ~21 +"'2 and a12 = 111121 let p = a12/lan la22 .12. there are an infinite number solutions to the factorization. and .11 +. a12 .Ii 112 al2 .' '¥.¡as arbitrary of within a suitable interval.l119..i = 121131 1 = 9..255 · . ¥ = 9.31 + "'3 No\'1 i~ ~ = :.l~1 ~ (111 pZau obtai . m = 1. Thus .9 = 111R.l. = 111131 ) .31 ~ -w have .21. 1 + il1 .717 = :t 1.lll = alZ/121 an~ check.-a11 -~V11 --0 and tPZ = a22 ..9/. Since i21 \..119. 9..)(.. 1 = (:.4 .z~ rii ai ~ ..t31 =:! . a12 = 9.21 1 .55&.l21 =':! ." =:! .8 .1 ~ 0'22 -0'22 = o.11121 = (111 +~1 'so aii = 9. Then. 9. We 2 on 't1 = aii lYll = (111 .21 + *2 ( 1 . ~12 a2~ 121 +"'d 9. finally.203 (Ll' +V)-1L = 1f-\_iy-1LlI +L''i-lLrll''¥-~ .: 9..717. so 9. for any choice Ipl/a22 s 121 :S /aZ2 t set .

"M._..... -:.__.......... Ca 1 vados L.. Factor 2 ..... ._...._ .... _... . .._-_...:-. ':'. .. . ..n ..._-....__. '-.. --factorl --..._-... :~:::~~~ . .._--. ..-.558 1. " ..'-' -'. '0 . .. . ~-':'=-_-:._. .' ... N .558 (. ._.. n._-_.0 -.-.~5 '-:".. .._.' ' ....... -'--'-'-' --. ...._ ... -_..... . -_. ... ---~-~......- It doesn't appear as if rotation of the factor axes is necessary..9 (a) Stoetzel's interpretation seems reasonable.----_.. ...~:_. ...._-.. .-..... ". .._. ~.. --. ._-~... .1 ... . .'-.....:.........._..-~_..... -----:-~7-.-._... . .." .""'" ~ -':': .: . .5 --_. .. ... ---_. c variances and communal ities based on the unrotated factors. -:. .'--7._..:._.--=~:_~::::__ ~ =~~ d:b.." _ '" __._.---.... _: .. -----_.-._- ..._.::.. .. ...' ... -...o~.-:~-_. ...' :.... ...-:.-_.' .. ._.------.... -----. ._...-.--_..'_'''_... ..e~ ' We ~o 717 J ._.._---..._--. _.. Kirsch" .... .. . .-_.. -'_.~-----_.. "...~:-.--._.. O. .3111 ........_. .-. .. ~~:="~.'~=:.--:._._. ~..'---..:....--. -- .. .._..- ..._.. ~.7 1 ....~.. ...... .- -_...0.... ..9 1...._~._--_... ...... .:~:_'. . ....: _. ... .57~..... . ..-...-. . ..575 so' ~3 = 1 . -._.. . "': .4 LL'.--l a .. .~. ..---------"'------....1. Wh.._....._ 0...._"....... ._. _. . . . -......-.....C:-:~-~.._....' is e~ . -...... ...-.... -.. '.. '..=:-._..__'-___.-...... ...:_~. . .. ....... .....-..~~~!.._.....------~__---. _... -_. -_..:'--:.5' .... . ---.. .... .... ....::i :. . ." ._...... . ..7 ........--. . .. . .._.7._. ..........._.__. . . .. .. -~_.__. ......-_.. ..... . :. ...~~.. _._-- o Rum . ._-=::~?n~ ::~:=::£--= -----:~--~_. ' " Marc. .. : . .-_....255 = (:¡14 .... -----~..::.255 J.. . ..... ........ . .:. which is inadmissible as a varianc~. ......--...________... ....5 Li quors'........ .. ... . ... _._. . ._- _.._._-----_. ..--. i:\.' .__.........9 J .. . ..-_.~ .. ... '... ..1 ~~7S-= ~.717 .~ ~::~: .. ... .------_.: k _. -.__.....__... .. ~. ~ --~_.--:~-~ ... _... The first factor seems to contrast sweet witb strong 1 iquors.... 'A a oc ' . . _.~ -...~~c._--~.'.... . -_.... Mirabelle --'" ._.. .._._.---~ .....:_h.... ~-~. __ 0' -_. ..':'-- ......" ...._ -_._._.. __.(b) ---"-"'-_.. . _.-.. - _.. . ...' ......._... --_. .. n ._------... :.-....... -..._~_ ....~ -.... ...._.--...._~- ... . ......::.__._--_. ..._~_____:. ~___~. _.--:..~:"~.::__ _....-.... .. -....C .204 Note all the loadings must be of the same sign because all the have covari ances åre positiv. (a) & (b) The speci f.. .. . ..-. ...._... _._"--.'..0 ~--......_--'r-. 9. ~.. .. -'" ._....__.------_._...... ._...1 ....~ ." _." .= .. ...'-._:.._.... .. ..__... ._----- '. .:~~: ~::::::.. ...-..... are given in the f~llcwing table: .-: ... ..._--: _...._.--'-..

0000 1 .8779 Ti bi a 1 ength .9062 Skull 1 ength (e) The proportion of variance explained by each factor is: Factor 1 : ~ .000 0 .5976 .10) into equation (9-45) gives.0000 Humerus 1 ength .032 0 . Y (unrotated) = .9905 Ulna length .205 Speci fi c Vari ance Vari abl e Communa 1 ity .ell) cri terion.000. 001 -.003 ..017 -. interpretation -of the fa(:tor-s .4D24 Skull breadth ..1221 .000 .000 .7582 .0095 .001 .e struct-ur. A .2418 Femur 1 ength .000 0 .04692 Al though the rotated 1 cadi ngs are to be preferred by the vari-max ("sim. = Factor 2 : ! r 12i 6 0 1 = 1= (c) 4. 000 ..7% .=1 r 9.7"h or 6. 0938 .193 -.018 .pl.4177 6 or 66... 9.0001 6 .000 .000 .01087 y' (rotated) = .11 0 0 Substituting the factor loadings given in the table (Exerci'Se 9. R-Lz l-'i= z 0 ..

7549 5.0052707 8.0191419 8.. 2 .. 2 .. h2 = 0.4906 Thus we get .0191419 6.685 7.8182 (b) Since li~ = Îti for an m=l model. 9.1673 .8563 J ( 10. In(width) 0.0104373. we should use 8n to get 8¡i. h3 = 0.. A 2 .i = 8¡¡ .0765267 0.075201 7 Therefore.0004941.:) From (a)-(c).4474 J ( 1.112497.0006342 (c.0058563 (a) To fid specific variances . In(length) 2.1124971.0720040 8.0219489 = . n .1596480 6..0056053.hi the maximum Note that in this case.8197 J Sn = -8 = -2 S = 10-3 X 7.) The proportion explained by the factor is .6828 5.1 23 (10.4474 0 2.\Î = 10-6 X 2. A 2 .685 6.0765267 Fi 0. 2 hi + h-i + h3 = 0. A 2 .4167255 6. the residual matrix is: 8n .6107 7. ".6553 5.8182 5.J3 = 0.7727585 J ( 11... ( 0. .1673 0 00.it' .1021632 J it' = 10-3 X 7.7551 6.8197 5.1021632 3.0052707 6. we use the equation . In(height) 0.Ji = 0.1596480 The maximum likelihood estimates of the factor loadings for an m=1 model are Estimated factor loadings Variable 1.0752017 ..0001 734. _ hi = 0.i'S. .7551 n 4 7. i = 0.. 0.9440 811 + 822 + 833 0.1494 5.0232507 (.4373 7.206 seems clearer with the unrotated loadings. not S because likelihood estimation method is used.7549 7.12 The covariance matrix for the logarithms of turtle meaurements is: S = 10-3 x 8.6828 7.J2 = 0. the communalities are '" 2 .

100611 0.033857 -0.~ "!!S~-1 "'!..931029 0.021205 0..096522 0. /i ~/i ~ = /i and E f E = I .009639 -0.014563 -0.002626 0 -0..014563 0.008864 0.l.039634 0.00696 -0.035712 0.. RRS).005516 0..002626 -Q.006454 0.13 Equation (9-40) requires m ~ ¥2P+l .021205 0 0. 9.899389 0.207 9.062289 -0. Hêre we have m = 1. Ther-efore.063146 0 -0. market- value meaures provide evidence of profitabilty distinct from that provided by the accounting measures.058312 -0.036263 0 0.02449 0 The m=3 factor model appears appropriate.02678 0..¡g). REV). ALIl .107308 -0. .022111 -0.960366 (b) Residual Matrix o 0. Accounting meaures on assets (HRA.Et~~:: /i~fEA"l = /i"'/i"S = A. The second factor is related to accounting historical measures on equity (HRE.032646 0.004011 -0.093691 -0.015523 0.063146 -0.013953 -0. '..070351 0.14 Since "'~ Ä_l A~ Al ""1.006854 0.079682 0. we cannot separate accounting historical measures of profitabilty from accounting replacement measures.02145 -0.02449 -0.065101 0 0.015523 0. 004011 -0.97322 0.903478 0.15 (a) variable HRA HRE HRS RRA RRE RRS Q REV variance communality 0.010351 0.032645 0 0."'.188966 0.920318 0.005616 0.811034 0..005464 0.02145 -0.068312 -0.RRA) are weakly related to all factors. '" '" 1. .L = /i"tl1f~ V. "'At.052289 -0.00066 -0.013953 -0.009639 0. The third factor is historical related to accounting historical measures on sales (HRS.00066 -0.107308 -0. P = 3 and the sti"ict inequality docs not hold.078402 -0.022111 -0.068971 0.00696 0. However. A 1f 1f '1 = I. RRE).036263 0. L''l.068416 0.078402 -0. (c) The first factor is related to market-value measures -(Q.093691 -0. 9.133955 0.065101 -0.I .036712 0.058416 0.033867 0.866045 0.

3 Q RRE HRE 02 0..6 0. 0.15 HRE .6 0. HRA 0. c u.7 I' a: 0.6 FACTOR 1 Rotatl FaClr Pattem 0.8 R :¡ NO.6 o t.208 PROBLEM 9.4 FACTOR 2 ROlated Factr Panem 0.9 HRS 0.7 '" a: 0 t.8 RRE HRE .8 RS 0. RRE 0.5 R~RA 0.5 RRA 0.4 REV 0.4 FACTOR 1 Roia FaCr Panem 0.3 Q . ' it 0.6 0. T 0.6 0.6 0. HRA a: "".6 gc u.4 .4 HRS 02 Q REV 02 0.9 HRS 0.4 RE 0.8 RRS 0.02 0.

-J . .-J . equation (9-57) suggests the regression and generalized least squares methods for computing factor scores could give somewhat different results. j=l !.J. "1SAlA" = n ti".17 Using the information in Example 9.L ''l. = Â. "1 "r ( -)(' -)1"'-1.ti(I+6)Â. is appreciably different from 0.2220 -.&)''1.l -J-J' . the factor scores have sampl e mean vector Q and zero sampl e covarfances.. ng (9A-l).0137 diagonal matrix. apar from rounding error.= n(I+ti. .12... _ x UI I A J. .. "" = n ti. = ti. a diagonal matrix.0283 .(Cj .L''Y.... J=l From (9-50) . . ""'1 r fJ.1" L' '1V. f I.L".L6..0283J (Lz 'l.L' '1.16 '" '" 1A " 1 fJ.).~-~(I +ti)ti- j=l Al" "'. Since the number in the (1.x x . . '" 1.fJ~ = n ti.1) position. J=l ' n.(x. we have A I A i A i (.LI'1.-x) and : ~J ~ n A 1'" "1 n L U l.2220."-1 '" 'f. Consequently.1n. n "'1'" "'l.x LU.x . 9. is a -. and the observations have been standardized.= which. n "'..\" ( ) 0 "'A A 1"''' Since 1 ~1"'Al fjfj = Â.\" _fJ' = A .&HìSj .209 9. .U-1 Us. Lz). -! = _.

1539 0.X4 Ini tial Factor Method: Maximum Likelihood Factor Pattern (m = 1) FACTOR1 BLUEGILL BCRAPPIE SBASS LBASS 0. 73867 -0.5385 0.70812 o .73867 SBASS LBASS o .0000 Factor Pattern (m = 1) Factor Pattern (m = 2) FACTORl FACTOR2 F ACTORl BCRAPP IE 0.1719 0.1969 0. Bluegil and Crappie load heavily on the first factor. 64983 ~ .18.13066 O.1107 Cumulati ve 0.5385 0. 48544 0.17543 0.41799 (c) Varimax rotation. Factor analysis of Wisconsin fish data (a) Principal component solution using Xl .19047 (b) lvlaximum likelihood solution using Xl .65863 LBASS 0. Principal Components Varimax Rotated Factor Pattern BLUEGILL BCRAPPIE SBASS LBASS FACTOR1 0. 36324 SBASS 0.3663 0.51319 For both solutions. 70439 LBASS 0. 76738 BLUEGILL BLUEGILL BCRAPPIE SBASS LBASS 0.48072 FACTOR2 0.19445 BCRAPPIE o .40581 o .1728 Proportion 0. while large- mouth and smallmouth bass load heavily on the second factor.93147 0. 63002 o .1539 0.X4 1 2 3 4 Ini tial Factor Method: Principal Components Eigenvalue 2.08767 0.67309 0.80526 0.77273 0.02251 BCRAPPIE o .16518 0. 50404 0.36549 o .96841 0. Note that rotation is not possible with 1 factor.4429 Difference 1.48073 0. 64983 o .7354 0 .210 9.4350i 0 .77273 -0.85703 0.28186 0.65312 Factor Pattern (m = 2) F ACTOR1 F ACTOR2 BLUEGILL 0.25907 SBASS 0.98748 -0. .37743 0.62774 Maximum Likelihood Varimax Rotated Factor Pattern F ACTOR1 F ACTOR2 BLUEGILL 0.76738 0.6157 0.7876 0.8893 1.

21097 0. 0707 o .51192 0.3925 0. 13806 BLUEGILL 0.6644 Difference 1.0762 o . 80285 Varimax Rotated Factor Pattern F ACTORl F ACTOR2 F ACTOR3 o .24062 NPIKE o .49190 0.72422 0. and the Walleye and smallmouth bass on the third factor. 60333.20017 0. 85090 -0. 22600 0 . 06520 o . 00000 BLUEGILL 1 .02285 BLUEGILL 0.4242 0. 06257 0 .12720 -0.5711 0.28458 0. 02359 0 .31567 0.05767 0.3925 0.21480 0.04905 0.47779 The first principal component factor influences the Bluegil. 72944 -0.18156 o . 12927 -0. 29435 O.86227 0.20771 O.01286 0. 00000 o .0876 0.29875 LBASS o .23481 o . The Northern Pike alone loads heavily on the second factor.49746 WALLEYE 0.1640 0.07998 LBASS WALLEYE NPIKE 5 o .58051 SBASS 0.2830 0.3199 0.33099 -0. 26350 o . .11256 -0.7352 0.76170 0. 0 .42801 0. 00000 0.26931 0.22770 -0.71176 0.X6 1 2 3 4 Initial Factor Method: Principal Components Eigenvalue 2. 0834 o .01989 BCRAPPIE o . 46530 o .05282 0.9293 1 . 26232 -0 . Crappie and the Bas. The MLE solution is different.1107 Cumulative 0.18979 BCRAPPIE 0.92348 -0.20739 o .46222 0.3549 1. 96466 SBASS o .97853 0.0000 F ACTOR3 -0.8459 Factor Pattern (m = 3) F ACTORl F ACTOR2 o .46485 0.0719 0. 5004 6 o .1640 Proport ion 0. 83342 -0.211 (d) Factor analysis using Xl .54231 SBASS LBASS WALLEYE NPIKE 0.50492 o . 99637 0 . 00000 0.24459 -0.00311 -0.74189 0.9843 0. 13392 -0.44657 -0.14613 Initial Factor Method: Maximum Likelihood Factor Pattern FACTORl FACTOR2 F ACTOR3 o .1786 0. 39334 0 . 46665 Varima Rotated Factor Pattern F ACTOR1 F ACTOR2 F ACTOR3 BLUEGILL BCRAPPIE SBASS LBASS WALLEYE NPIKE o .06957 BCRAPPIE 0.03199 -0.47611 -0.

Abstract Math 1 2 3 4 5 6 7 VP Factor Factor Factor 1 2 3 0.250 0.374 0. nOOO .4481 .912 0.566 0. ti es 1 _ Growth 2 Profi ts 3 Newaccts 4 Creative 5 Mechani c 6 Abstract 7 Math O.968.9615 0. (b) and (c) l1aximum Likelihood (m = 3) lJNROTATED FACTOR LOADINGS (PATTRN) FOR l1AXIMU~' LIKELIHOOD CANONICAL FACTORS Factor '1 Growth 1 Profits 2 3 4 5 6 7 Newaccts Creati ve r~echani c Abstract Math VP 0.0352 .212 9.0000 0.181 O~OOO 1 .255 0.0876 .632 26'Z 3.729 ROTATED FACTOR LOADINGS (PATTERN) Growth Prof..653 .544 0.433 0.9124 1 . 967 O.794 0.295 0.520 1 ..019 0.0385 .921 o .179 0.355 0.541 0.208 0.-0.772 0. ts Newaccts Creat.0000 0.509 0. O~ 919 0.437 0..527 0. 1 84 0.389 0.774 0.426 .437 0. 720 1 .721 0.300 Communa 1..9631 .953 0.9648 O. 0.'0369 '0.S519 . 464 Specifi~ Variances 1 .19 (a).0000 . 4"54 0.054 0.000 0.347 0.316 0. ve Mechani c .570 '0.334 0. Factor Factor 2 3 0.295 3~ 180 1 .

746 ..espectively.386 .0 . we see that the last two factors ar.637 . '(4) Using (9-39) \.56£ 1. .esents the observed carrel ations quite well.213 1.0 (Symetri c) 1.single variables IIcreativell and "abstra'Ct" r.0 (Symmetri c) .386 1.147 .e dominated by the..572 1.0 R = .0 It is clear from an examination of the r.708 . provide intei:-retations for the factors.0 1.0 + 'l = .(LL i +'1) that an m = 3 factor sol ution repr.694 .884 .843 1.674 .826 . m = 3 we have 43 833 1 n (.) x32(. However.641 1.0 ""A .542 .848 . A R .948 ..465 . If we consióer the.413 . 00007593l\ = 62 1 .0 .944 .679 . . = 7.147 .591 .646 .455 .0 Ll .674 .696 .566 1.853 . .925 .923 1. .0 1.572 . p. it is dlfficul t to .926 1.927 .542 . The first factor links the salespeople performance variables wi th ma th a bi 1 i ty. rotated loadings. .912 .o1)= 11. '575 .413 .0 1.3 .700 1. .591 .641 .esidual matrix .000018427).0 .iith n = 50.700 .0 .

214 so \'le reject HO:r = LL' + 'l for m = 3. -.) ~' = (1.465.criteri-on. wi th f = LzR~ : - Principal components (m = 3) Maximum 1 ikel i hood (m = 3) f' = (.20 Xs ~ -. .78 11.679. .98 L(symetric) 2.686.~7 S = 300.673.852.1.23 6. Principal components (m = 3) l' = (..computed Factor scores using weighted least squares can 'only be A_l for the principal component sol utions si nce '1 cannot be com" puted for the maximum likelihood solutions.957. -. Using (9-50). . -. .129. AA " note that the matrices R. ('1 has zeros on the main diagonal for the maximum lik~lihood solutions). A_1 have. He . t~e residual matrix above indicates a good fit for m = 3..= 3 factor models appear to fit by the' x'.395) f' = (-.497) Using the regression method for computing factor scores. . Neither.522.344. .36 3. (e.59 -2.751) .2~3..805) 9.271.13 31. m. 1.of the m = 2. we .52 ) . . Again. LL' + V have small determinants and rounding error could affect the calculation of the test statistic. 1.78 3u.70Z.

96 I 5. the pri ncipal component sol ution generally produces uniformly small~r ~stimates of the specific variances. On the other hand.oadings obtåined from the sample A Z '. ..42 .) Xs (N02) X6 (03) -.17 17. the first factor is dominated by Xl = solar . correlation matrix R. X2 = solar radi~tion and the pair X5 = NOZ and ~6 = OJ .ith m = 2) does a better job. (For t see problem 9. For thë unrotated m = L solution.61 I .37 i -.23). Note: Maximum 1 ikol ihood estimates of the loadings for m = 2 may be di ff1cul t to obtain for some computer packages without good . .215 (a) Princi pa 1 components (m = 2) Factor 1 Factor 2 1 oadi ngs 1 oadi ngs i Xl el.between the paJr Xl = wind.radiation and Xs = °3.74 1.32 -. (c) Haximum likelihood estimation (\. of accounti ng for the covari ances inS than the m = 2 principal component sol uti on. The second factor seems t~ be a contrast . One choice for initial esti- mates of the comnunallties are the communalities from the m = 2 principal components solution.lind) X2 ~solar rad. estimates of the communålities.19 I Cb) Maximum 1 ikol ihood estimates of the loadings are obtained from L = ~z where Lz a~e the l.

530 7 8 9 10 :¡ . for maximum 1 ikel ihood estimat. . Recall solar radiation and ozone have the larg~st sample variances.. the fact~r scores generated by the we._.546 -0.544 -0.070 0. l"e factor scores generated by the regression method wi th ..492 0.'332 5 6 0.~gain the ff. ghted 1 eas~ squares formul as in (9-SQ) wi 11 be identical. ozone.io: .129 4 0. f1 - 0. The second' factor" might ba interpretad as a contrast bebieen wind and the pair of pollutants N02 and 03.316 0.790 -0.01. to som~ extent. Similarly.456 0.23~) are giv€n -l below for the first 10 case~.252 0.370 -0.724 -0.. seeproblem9..es.179 " f2 -0. tl23 p.rst factoi.515 0.2 -0.384 -::0.509 -0. This will affect the estimated loadings obtained by the principal component method. Case 2 3 . L =i D~Lz and S = O'lRO\ the factor scores gener~ted by the equations for tj in (9-58) will be identical.22 (a) Since. maximum likelihood estimates (m =2. is dominated by solar radiation and.' " 9..

368 'f .56 Factor 1 .950 ~O.217 (b) Factor scores using principal component estimates (m = 2) and (9-51) for the fit.52 Xs ( NOZ) .74 . 083 1 .492 10 -0.717 0.856 4 0.259 0.203 -0.) .st 10 cases are given below: Case 1 2 3 .394 5 6 7 8 9 f2 0. 9. ngs 1 oadi ngs loadin~s Facto r 1 -.05 Q£ 2 Factor I -. Principal components (m = 2) Factor 2 Rotated load.24 -.447 0.53 I .77 ~'.23 .410 ~0. 029 1.518 0.937 -0..30 . X2 (solar rad.072 (c) The sets of factor scores are quite different.65 -. 646 ~ 1 .31 C! -..04 (Æ .48 . Factor scores depend heavily on the method used to estimate loadings and specific variances as well as the method us~d to g~nerate them.049 0.795 o .1 68 0..20 Xl Xs (wind) (°3) -. 811 0. f1 '" 1 .

It might be called a "pollutant transportU factor.43 J ~ 1 (wi nd) Rota teet loadi ngs 2 -. Some of the observed carrel ations between the variables are vary small implying that a m = 1 or m = 2 factor model for these 'four variables will not be a completely satiSfactory description of the under~ 'lying structure.03 C& I -.25 -. However. Al so the two sol ution methods give somewhat different results indicating the solution is not ve~ stabl e.27 C: em IX5 (N02)' . the second factor appears to compare one of the pollutants with wind.econd factor-.04 .20 and 9. We may need about as many factors as vari~blas. \4e note that the intèrpretations of the factors might differ depending upon the choice of R or S (see problems 9.32 -.21) for analysis.218 l1aximum likeli'hood (m = 2) Factor 1 Factor 2 1 oadi ngs loadings Factor 1 Factor -. If this is . we see that both solution methods yield similar estimated loadings for the first .19 ~6 (°3) .50 . It mi ght be called a "ozone pollution factor'l.09 -X ' 2 (solar rad.factr.38 ..) . there is nothing to be gained by proposing a fa-ctor model. .17 .the ca~e.65 -.10 Examining the rotated loadings. There are some differences for the s.

756 0.807 -0.026 -.371 -0.180 1.192 .2236 0.541 -0.696 MedianHorne Variable Population Variance % Var 1..837 -0.0 -. . shown below.0 .313 -. Consequently.274 0.9919 0.381 PerCen tProDeg 0.. SçreêPlqlofPopulation.460 -0.0 . there is no sharp elbow.685..512 0.180 .584 0.830 0.0 2 4 3 Factr 5 Numbe Principal Component Factor Analysis (m = 3) Unrotated Factor Loadings and Communalities Factor1 Factor2 Factor3 Communality 0.0 The correlations are relatively small with the possible exception of .398 1. the correlation between Percent Professional Degree and Median Home Value.064 -0.153 0.676 PerCen tGovEmp 0.685 R = .0 -.026 .708 PerCentEmp::16 0.065 . reinforces this conjecture.5 lI :i ii ~ 1.8642 0. However.173 4.373 .295 0. we present a factor analysis with m = 3 factors for both the principal components and maximum likelihood solutions.0 1.411 1.24 -. The scree plot.9'62 -0.119 .5 0.870 0.3675 0.065 1.373 -.. a factor analysis with fewer than 4 or 5 factors may be problematic.729 0.192 1.0 lI ai iü 0. MedianHøme 2.119 .209 -0.219 9.010 1. The scree plot falls off almost linearly.411 -.685 -.313 -.845 .010 .

019 Variable Population -0..216 Factor Score Coefficients Factor1 Factor2 Factor3 0. I 0 -1 . . .522 0. .313 PerCentEmp~16 0.281 4.701 -0.362 PerCen tGovEmp Variable Population MedianHome Variance % Var Factor3 -Coiiunality -0.169 0. .~.082 1. -3 -2 -1 o Firs i Faêtr 2 4 3 Maximum Likelihood Factor Analysis (m = 3) * NOTE * Heywood case Unrotated Factor Loadings and Coiiunalities Factor1 Factor2 -0. -2 . . ~ 1 æ . .028 -0. . . .138 -0.103 0.830 Variance 1.Q_85Q'/ -0.' . .4050 0. 'a ..~ .6043 0..0011 -0. .099 0. 3 2 . .999 0.102 C-d.755 . MedianHome (PC:) 4 ..395 -0.348 1.298 0.845 PerCen tGovEmp % Var 0.7382 0. . .047 -0.208 1.756 -0.020 -0.070 Score Plot of Population.870 MedianHome ~ 0.0419 0.226 1. . . . .801ì -0.160 0.277 _.000 0..544 PerCentProDeg PerCentEmp~16 PerCen tGovEmp MedianHome 0.147 PerCen tProDeg PerCentEmp~16 0.940 -0.118 ~ ~ 0. . .658 -0.052 0. .1310 0.220 Rotated Factor Loadings and Communalities Varimax Rotation Variable Population Factor1 Factor2 Factor3 Coiiunal i ty 0.49'6 3.015 1.577 0.2236 0.278 -0.941 0. . . . .059 -0.000 1. ..807 0... .962 0.000 0. i . . .009 -0.146 0.7772 0. .321 11.068 0.135 -0.059 -0. .989 PerCen tProDeg -0. . . .0803 0.321 1.984 0.109 -0.

7772 Factor1 Factor2 -0. 4 1. -3 -2 -1 o 1 2 3 4 fil'Fllctr A m = 3 factor solution explains from 75% to 85% of the variance depending on the solution method. .090 PerCentProDeg 0. We call this factor an "employment" factor.1740 0.001 -0. . Using the rotated loadings. .046 0.755 0... MedianHoníé (MU£') of . . . . It is difficult to label this factor but since income is probably somewhere in this mix. The second and third factors from the two solutions are similar as well. -ø i: 8 . .5750 0.333 PerCen tGovEmp Variable population ~ MedianHome -.000 -0. ~ . ~ : 0 . 000 -0. . it might be labeled an "affluence" or "white collar" factor. . . . . . . Factor scores for the first two factors from the two solutions methods are similaro .. .002 -0.145 1. iX -1 . ..221 ~ c¡ ~ 0.159 -0. The third factor is clearly a "population" factor.165 0. . . The second factor is a bipolar factor with large loadings (in absolute value) on Percent Employed over 16 and Percent Government Employment.496 1.041 -0.046 -0.047 PerCentEmp::16 -0. . 1 .177 0.137 -0..984 0. 2 .053 1.070 PerCentEmp::16 -0.001 MedianHome Factor3 -1..010 PerCen tGovEmp -0. .. the first factor in both methods has large loadings on Percent Professional Degree and Median Home Value.430 0.298 0.000 Rotated Factor Loadings and Communalities Varimax Rotation Factor1 Factor2 0. .. .. . . . .235 1. 000 1. . .061 0.315 Variance % Var Factor3 Conuunality 0... . 0282 3 . . 017 PerCen tProDeg 1.. . .036 -0.206 Factor Score Coefficients Variable population Plot Score Population. -2 . . . .. 025 0. .155 -0.

90.809 90.033. T1i'Outli.343 H)4.Z80 101. 287./ave 317.808 .ers..625 S -' 94.329 (Symmetric) A m = 1 factor model appears to represent these data qui te well .15.992.524 ..009 -.802 The factor scores produced from the two sol ution methods ar.. 297. 1 . loadings Maximum Likelihood Fa ctor 1 loadings Shocki. 275.9% Proportion .734 87. 291. Stati c test 1 . 320. Static test 2 307. Variance Expl ained Factor scores (m = 1) using the ~gression method for the first 'few cases are: Principal Components Maximum L i kel ihood -.186 81 . were i'Óentifi.ery similar.e 4. spet:imens 9 and 16.d in 'Exæipl.e v.719 . Vibration 293. .804 . The correlation between the two sets of sc~~es is .1% 86.25 105.761 76.242 94.530 1.222 9.204 91 . Pri nci pa 1 Components Factor 1.

The maximum likelihood.4 205. 0 Percentage Vari ance ' . estimates of the factor loadings for ii = Z we're not o.3 31. .9 310. Factor 10adinas 26.7 Li tter 4.9 -6.i 370.4i Maximum Likelihood Litter 1 . c) It is only necessary to r~tate the m = 2 solution.9 182.8i Explained.8 Percentage Variance Explained 76.2 Litter 3 31. 32.30.5 1. 68.8 ' '~ v.4% l! b) " 9. Factor 2 '1 oadi naS I.4i 76. .7 30.5 344.0 32.5 1 98.6 Litter 4 '.2 Li tter 2 " 30.2 Litter 2 30.4 471. Litter 1 21.2 271 .Factor 1 11 oadi nQS 1 'P .223 9.4 529.9 309.5 18.btained due to convergence difficul ti es in the computer program.26 a) Principal Compûn~nts L m = i( Factor 1 loadi nos lm=2( '1 .2 Litter 3 28.0 27.9 -8.0 245.4 -4.

(m = 2) Rotated loadin9s 'l .27 .53.4% 40..06 .44 .36 .78 . Percentage Variance Expl ained' 76...5~ .33 .5 13.85 -.32 Litter 4 .20 .91 .87 -°.71 Li tter 3 .Z 11.7 33.86 .5% 9.8 Litter 3 14. ance .59 .6% i Factor 1 .8 Percentage Var.4 Litter 4 31. 2 10a~ings .4 12.91 Li tter 2 .224 Principal Components (m = 2) Rata ted 1 oadi ngs FactOr 1 'Factor 2 Litter 1 26.14 . J Factor 2 .21 . loadings Litter 1 .4 li tter 2 27. Factor 1 Factor.5~ 32.15 .Principal Components.44 45.87 .12 .4% Explained 9.

78 . .34' Litter 2 . .297 .81 .17 litter 3 .ßl Percentage Variance 68.. _ = .225 Maximum Likelihood (m = 1) . Factor 1 '1 . loadings 1 Li tter 1 .81 Expl ai ned '" "-1 f = L R z z.34 .91 .39 litter 4 .

02 0.00 0.15532 0.86309 2.061 0.129 278.00755 0.50918 1.014 278.373 -5.21616 3.725 0.34456 0.20276 0.749 6.74546 0.55435 10.380 -0.158 3000m Mara thon Variance % Var ~ 16.02141 0.89130 Marathon 0.270 274. 200m(s).984 4.28 The covariance matrix S (see below) is dominated by the marathon since the marathon times are given in minutes. the first factor explains about 98% of the variance and the largest factor loading is associated with the marathon.23388 4.178 o ~~ Marathon (i5.881 1.431 0.38 0.869 % Var -0.53984 0.66476 10. 21965 Principal Component Factor Analysis of S (m = 2) Unrotated Factor Loadings and Communalities Variable Factorl Factor2 Communality 0.725 -1.438-' 0.785 400m(s) 0.999 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factor1 Factor2 Communality 0.767 -2.36 0.06617 0. Covariances: 100m(s).08389 0.06138 0.42682 28.124 -0.19284 0.238 270.073 0.749 -0.308 0. 400m(s). the principal component solution with m = 2 is given below.38 0. with either unrotated or rotated loadings.90373 0.075 800m 0.124 -0.18181 0.052 0. however.226 9.27015 1.999 1500m 3000m 0.02770 0.052 -0.143 -0. Using the rotated loadings. The first factor might be labeled a "running endurance" factor but this factor provides us with little insight into the nature of the running events.051 -0.217 1500m 0. 800m.401 1.07418 0. 3000m.517' Variance 242.453 -0.582 0. the first factor explains about 87% of the varance and again the largest loading is associated with the marathon. 1500m.027 0. Marathon 100m(s) 200m(s) 400m(s) 800m 1500m 3000m 100m(s) 200m(s) 400m(s) 800m o .270 36.267 100m (s) 0. 030 0. It is better to factor analyze the correlation matrix R in this case. It is unlikely that a factor analysis wil be useful. Using the unrotated loadings.453 270.38499 6.70609 .33418 Marathon Marathon 270.006 0.230 0. The second factor.640 200m( s) 6.172 100m(s) 200m(s) 400m(s) 800m 150 Om 3000m 0. explains relatively little of the remaining variance and can be ignored.38 0.006 -0.

854 0. _.801 0.728 0.906 0.828 0.856 5.Maràthon(èortlilàtion lbtnx) II .669 200m(s) 400m (s) 800m 150 Om 3000m Mara thon 200m (s) 400m (s) 800m 1500m 3000m 0.677 0. 200m(s).227 The correlation matrix ~ for the women's track records follows.871 0.090 6.934 0.867 0.941 0.887 0.830 0.960 0.921 0.791 0. Marathon 100m(s) 0.4363 0.919 7 . Correlations: 100m(s).720 0.910 0.973 0.933 Variable Factor1 100m(s) 200m(s) 400m(s) 800m 150 Om 3000m Marathon Variance % Var 0. 400m(s).ii ØI ¡¡ 2 1 o 1 3 4 5 i 6 F.799 The scree plot below suggests at most a m = 2 factor solution. 800m. 3000m.ctrNumbet Principal Component Factor Analysis of R (m =2) Unrotated Factor Loadings and Communalities Communality 0. Scree Plot of iOOm($l.~ ¡c 3 .951 0. 1500m.680 0.820 0.8076 0.782 0.809 0..919 0.909 0.674 0.905 0.940 0.6287 0.732 0.938 0.923 0.806 0.

(..240 -0. . .984 0. . ..919 3..~i. .4363 0.259 0.801 0.2032 0.525 400m(s) 0. .c . .440 Factor Score Coefficients Variable Factor1 Factor2 -0. .479 Variance % Var 6. ~-1 tI .085 6.Marâthôll(PC:.481 3000m 0.856 0.921 0. .886 4 5 .934 0.848 0.244 -0.280 0.480 100m(s) -0. .972 800m 1500m 3000m Marathon Variance % Var o . 90'6 0.940 0. . j 0 :.592? 0.172 0.255 Marathon 0.445 Plot of 10011(5). . -2 . . . .0833 0. . .035 800m 0. . #11 -3 -2 -1 o 1 Factr 2 3 Firs Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Communality 100m(s) 200m(s) 400m(s) o . . .976 0.3530 0. . ..960 0.933 0. .386 1500m 0. . .. '#1/0 . Score 2 !1 .. 6 i 04 o .. 6'62 5 . ..828 100m(s) 200m(s) 400m(s) 800m 1500m 3000m Marathon 3. ..488 200m(s) -0..#3/ . .919 0. . . ..228 Rotated Factor Loadings and Communalities Varimax Rotation Variable Communali ty 0.288 -0. rn=2..

481 1.369 -0.237 200m(s) -0. 1500m. 200m..B56 0. .019 400m(s) -0.229 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factorl Marathon 0. ~ .036 0. *":u. . All the running events load highly on this factor. . 3000m.107 0. .317 -0. The plots of the factor scores indicate that observations #46 (Samoa).1B06 100m(s) 200m(s) 400m(s) 800m 150 Om 3000m % Var 0..886 Variable Factor1 Fat:tor2 100m(s) -0. Using the unrotated loadings.... . #'1 . Maràthon(JiL~.662 3. Note. l 1 = l .024 BOOm 1500m 3000m Marathon 0.395 .. the remaining running events in each case have moderately large loadings on the factor. 400m) with the longer events (800m. m=2) 3 . -1 -2 -2 -1 o 1firs Fad 2 3 4 The results from the two solution methods are very similar.972 0. for both factors. #11 (Cook Islands) and #31 (North Korea) are outliers.2032 0.915 0.077 0.#31 2 .984 0. marathon).690 Variance 3.lS7 0. The two factor solution accounts for 89%-92% (depending on solution method) of the total variance. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance"-since the running events 800m through the marathon load highly on this factor.003 ScorePJol of 100m(s).879 0.449 0.. The second factor appears to contrast the shorter running events (100m. .025 -0.906 0.84B 0. . the first factor might be identified as a "running excellence" factor.595 0..::.I..728 0.976 0.772 0.455 0. .454 Communality 0.. 1° . . This bipolar factor might be called a "running speed-running endurance" factor.0225 0. The second factor might be classified as a "running speed" factor.432 Factor Score Coefficients 6.

73215 0.08'5 o .230 9.926 Rotated Factor Loadings and Communalities Varimax Rotation communality 0.1184S78 0.0943056 Marml s Principal Component Factor Analysis of S (m = 2) unrotated Factor Loadings and communalities communality 0. 200m/s.1765843 0.0960189 0. sse -0.1083164 0.1377889 0. 33S 0.110 0.083 0.1667141 lOOmIs 200m/s 400m/s 800m/s 1500m/s 3000m/s 0.0966724 0.1465-604 . Marmls 3000m/s 1500ml s 800m/s 400ml s 200m/s 100ml s Marml s 0.116 0.1138699 0.81822 0.45423 0.148 variable lOOmIs 200m/s 400m/s 800m/s 1500m/s 3000m/s Marml s Variance 0.128 0.104 0.926 0.542 .287 0.0809409 0.025 0.148 variable 10 Oml s 200m/s 400m/s 800m/s 1500m/s 3000m/s Marrl s Variance 0.0921422 0.363 lOOmIs -0.0735228 0. 3000m/s. it is likely a factor analysis of S wil produce nearly the same results as a factor analysis of the correlation matrix R.110 0. The results for a m = 2 factor analysis of S using the principal component method are shown below.0905383 0.0810999 Marml s 0.0956063 0.168 0. 1500m/s. Since all the running event variables are now on a commensurate measurement scale.0749249 0.168 0.603 400m/s 800ml s 1500ml s 3000m/s Marmls 0.514 0.1054364 0.0954430 0.222 -0.36399 % Var 0.0997547 0.Q .116 0.171 -0.0864542 0. 400m/s.097 0.0650640 0.0822198 0.29 The covariance matrix S for the running events measured in meters/second is given below.829 0.1238405 0.800m/s.128 0.1437148 0.412 Factor Score Coefficients Factor2 variable Factor1 -0. Covariances: 100m/s.0933103 0. 28'0 -0 .1018807 0.1146714 0. A factor analysis of R follows.471 200m/s -0.066 0.306 -0.81822 0.066 0.08607 % Var 0.083 0.

A two factor solution seems waranted.906 0.875 0. 200m/s. Note.824 0. II ! D.0 1 2 3 4 5 component Number 6 7 .852 0.i: 0.797 0. 200m.866 0. 1500m/s. 400m) with the longer events (800m.854 Scree Plot of lOOmIs. the first factor might be identified as a "running same size loadings on excellence" factor. marathon). Correlations: 100m/s.906 0.2 0.816 0. 3000m. 3000m/s. 800m/s.938 0.804 0.7 D.675 0.776 0. All the running events have roughly the factor. This bipolar factor might be called a "running speed-running endurance" factor. for both factors. Marm/s (Correlation Matñx) 0.1 0. 400m/s.694 0.806 0.:lI 0.231 Using the unrotated loadings. The second factor appears to contrast the shorter running events (100m.6 .5 ¡. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon have higher loadings on this factor.660 200m/s 400m/s 800m/s 1500ml s 3000m/s Is Marr 2 OOml s 400m/s 800m/s 1500ml s 3000ml s 0.729 0. the remaining running events in each case have moderate and roughly equal loadings on the factor. The two factor this solution accounts for 93% of the varance.741 0.731 0. 1500m.972 0...8 fl. Marm/s lOOmIs 0.672 0.3 0. The second factor might be classified as a "running speed" factor. The correlation matrix R is shown below along with the scree plot. .

499 400m/s 800m/s 15 OOml s 3000m/s Marr/ s 0. 0 . . . I) . . .4799 0..m=2) 3 . .960 0.481 6. -1 .358 0. ".914 0.947 0.875 3.6477 0. .833 6..232 Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Communal i ty Vari.1125 0.771 0.248 0.400 0. o ~q!" l 1 11 . . .941 0.941 0.947 0.:: 'ii . .i \ 2 .#31 .418 0.able 0.926 0. . 'l .932 0. . ..tíifl'ØOm/s. ai o.Marm/s (PC.445 Factor Score Coefficients Variable Factor1 Factor2 -0..932 0.8323 0. .243 -0.875 lOOmIs 20 Oml s 40 Oml s 800m/s 1500m/s 3000m/s Is Marr Vari.911 0.914 0.142 0. -2 -4 -3 -2 -1 Firs Factr . 0'.484 0. . '8 0 . 0 0 .025 0. .4799 0..926 3.093 Rotated Factor Loadings and Communalities Varima Rotation Variable Factor1 10 Oml s 20 Oml s 400m/s 80 Oml s l500m/s 3000m/s Marml s Variance % Var Communality 0..3675 0.265 -0.489 lOOmIs -0. .960 0.(j.436 0.911 0.871 0. ..252 -0. i 2 . :t.. . .249 0. .ance % Var 5.293 ScoreP1. . .455 0.886 0.839 0. .484 20 Oml s -0.

949 0. . o lI -l . .233 Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and communalities Communa1i ty Variable 0. . 1 ~ l.2395 0. .. ".435 0. .971 0. m=2) . " . A . ..ii 3 .518 0. .896 0.. .737 lOOmIs 20 Oml s 400ml s 80 Oml s 1500m/s 3000m/s MannI s % Var 6.039 0.441 0. .048 40 Oml s 0. ..082 5.0165 0.463 Variance % Var 6.412 rO. .765 1500ml s 3 OOOml s MannI s 3. ..106 -0..5716 0.984 0. .073 10 Oml s -0.983 0...~ .984 0.:i .859 0.. -2 -3 -4 -3 -2 -1 Firs Fac:or o 1 2 .041 80 Oml s 1500m/s 3000m/s MannI s -0.812 Variance Rotated Factor Loadings and communalities Varimax Rotation Communality 0. . .6844 0.850 0.. " . " " .017 scOl'ePlotøf100mls. Marmls (MLE.521 -1.431 Factor Score Coefficients Variable Factor1 Factor2 -0..124 0. *q. . . . . .850 Variable Factor1 0. .:c: .971 0. .2560 0.014 0. 2 " " . .896 0.2560 0.737 0. . ~! '.... .914 0.167 -0.894 0.~ .983 0. . . .836 0.379 0. 100ml s 200m/s 400m/s 800m/s 26 0.894 3.122 2 OOml s -0.836 0.. . .

since the measurement scales in Exercise 9.28 are quite different from the meters/second scale. All the running events load highly on this factor. .28 or in meters per second.234 The results from the two solution methods are very similar and very similar to the principal component factor analysis of the covariance matrix S. the remaining running events in each case have moderately large loadings on the factor. Note. The second factor appears to contrast the shorter running events (100m. women's track records when time is measured in meters per second are very much the same as the results for the m = 2 factor analysis of R presented in Exercise 9. 3000m. #11 (Cook Islands) and #31 (North Korea) are outliers. It does make a difference if the covariance matrix S is factor analyzed. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon load highly on this factor. This bipolar factor might be called a "running speed-running endurance" factor. marathon). Using the unrotated loadings. The plots of the factor scores indicate that observations #46 (Samoa). 400m) with the longer events (800m. 1500m. the first factor might be identified as a "running excellence" factor. it makes little difference whether running event time is measured in The results of the m = 2 factor analysis of seconds (or minutes) as in Exercise 9. 200m. If the correlation matrix R is factor analyzed.28. for both factors. The two factor solution accounts for 89%-93% (depending on solution method) of the total variance. The second factor might be classified as a "running speed" factor.

688936 3.097 0. the first factor explains about 98% of the variance and the largest factor loading is associated with the marathon. The first factor might be labeled a "running endurance" factor but this factor provides us with little insight into the nature of the running events.541038 10 i OOOm Mara thon 2. with either unrotated or rotated loadings. however.124575 0.043 800m 0.069956 0.002751 0.119 '0.841 0.049 -0.158 -0.300903 0.033 1500m 0.223 0.983 1.537 -0.392 ~~J 0.007131 0.430489 .066193 0.034 -0.019 0.537 2.179 2.996 0.083 -0.643 80.832 0.234 0.135356 2.270 200m 2.130 71.573 0.107 0.229701 1. Covariances: 100m.023034 0.008264 0.152 100m 0. 262533 6. -0. 800m.134 -0. 10.262 0.125 0.317734 0.578875 1.399 -0.979 1.034 0. Using the unrotated loadings.406 -1.649 0.lb4 85.019 0.033 0.013 85.722 5000m 10. the first factor explains about 83% of the varance and again the largest loading is associated with the marathon. Using the rotated loadings.401 -0.666818 0.256022 0.529 14.OOOm.044 400m 0.178857 0. 5000m.025720 0. 1500m. It is unlikely that a factor analysis wil be useful.130 84.048973 0.105833 0.111044 0.30 The covariance matrix S (see below) is dominated by the marathon since the marathon times are given in minutes.002 0.015 0. the principal component solution with m = 2 is given below.074257 0. It is better to factor analyze the correlation matrx R in this case.643 80.034348 0.110 0. 200m. The second factor.002 -0.312 -0.031 -0.265613 1.849941 9.000m Mara thon Variance % Var ~ ~.996 Rotated Factor Loadings and Communalities Varimax Rotation Variable 100m 200m 400m 800m 1500m 5000m 10 i OOOm Mara thon Variance % Var Factor1 Factor2 Communality -0.235 9.234 2.853486 1.615 1.022929 0. 340139 0.057938 0. 400m.507 0.819569 14.342538 80.049 0. explains relatively little of the remaining variance and can be ignored.141 0.192564 0.649 0.378905 Principal Component Factor Analysis of S (m = 2) Unrotated Factor Loadings and Communalities Variable Factor1 Factor2 Communality 0.168473 0. Marathon 5000m 1500m 800m 400m 200m 100m 100m 200m 400m 800m 1500m 500 Om 10 i OOOm Marathon 10 i OOOm Marathon 0.

Marathon 200m 400m 800m lS00m SOOOm 10. SOOOm.000m Marathon The scree plot below suggests at most a m = 2 factor solution.7033 0.OOOm.804 0. 1S00m. 200m.123 -0.840 0.838 Fa tor2 0.3417 0.761 0..740 0.768 0..948 0.780 0.797 0.236 -0.807 0.878 0.376 0.988 0.236 The correlation matrix Rfor the men's track records follows.917 0.957 0.713 0. 800m.901 0.878 0.913 0.267 -0.954 100m 0.712 0.896 0.795 0. Correlations: 100m.766 0.861 0.917 6.309.947 0.861 0.721 0.772 0.914 0.896 0.937 0.845 0.843 0.715 0.944 0.748 0. 0.000m 0.080 7.766 0. 400m.423 0.920 0.276 Communality -0.676 200m 400m 800m lS00m SOOOm 10.969 0.972 0.944 0. S Fà. .847 0.6384 0.000m Mara thon Variance % Var 0.915 0.918 1 .rNumbe Principal Component Factor Analysis of R (m =2) Unrotated Factor Loadings and Communalities Variable Fa to 100m 200m 400m 800m lS00m SOOOm 10. 10.

.Scareeløt~f'100ml .780 0. . .176 0.988 0.927 0.237 Rotated Factor Loadings and Communalities Varimax Rotation Communality Variable 0.913 0.866 0..920 0.224 -0.0.2249 0.989 0.n.866 0.533 -0..814 0. Marathoia:lPÇ..420 .972 0.969 0. . ..091 7.840 0.802 0.847 0.. .788 0.989 0.349 0. .004 -0.413 -0.183 400m 800m 150 Om 5000m 10. -..283 200m 0.875 0. #-" .772 0. it tj 10 .277 0.4134 0.1432 0.893 .233 0.949 6.000m Marathon 0.380 0.515 Factor Score Coefficients Variable Factor1 Factor2 0.912 0. .000m Marathon Variance % Var r Communali ty 0.963 0.3417 0.ltn=2) .810 0.186 -0.403 4. . . o ~ -3 o -1 -2 3 1 Firs Fadr Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Fac 100m 200m 400m 800m 1S00m SOOOm 10..335 100m 0.7299 0.053 -0.586 -0. .991 0.944 0..937 100m 200m 400m 800m 1500m 5000m 10.000m Mara thon Variance % Var 7..1168 0.918 3.

1986 % Var 0.Seore9Jpt.209 -0.423 -0.893 Variance 3..051 .056 -0.963 0. Nevertheless.238 Rotated Factor Loadings and Communalities Varimax Rotation Variable Communal i ty 0..490 200m '0.'ll.9446 3. .044 400m 0. 800m and 1 SOOm events are in one group (positive loadings) and the 5000. . this bipolar factor might be called a . .125 100m 0. 1O. the 100m. 200m and 400m events have positive loadings and the 800m.. ..558 0. 1O. 5000m.866 0.994 -0.000m Marathon 0. -3 -2 ~1 o 1 2 3 Firs Factr The results from the two solution methods are very similar.000m and marathon events have negative loadings..912 100m 200m 400m 800m 150 Om 5000m 10..of1.104 -0.OOOm and marathon are in the other group (negative loadings). . .1432 0. Using the unrotated loadings.989 0.054 -0. the first factor might be identified as a "running excellence" factor. The second factor appears to contrast the shorter running events with the longer events although the nature of the contrast is a bit different for the two methods.. fi 'I1l 'i . iF II ...256 -0. '" e..761 0.493 0.. -2 .003 0. . L .011 800m 1500m 5000m 10. . For the principal component method.089 0. For the maximum likelihood method.988 0.866 0.All the running events load highly on this factor.nì::l) 3 . the 100m. 1.000m Marathon 7.. 1 .788 0. IS00m. . 400m.400 Factor Score Coefficients Variable Factorl Factor2 0. . 200m.772 0. .1 .'ØOJDI"" Marathon (ML£..

038 0.0431256 200m/s 400m/s 800m/s 1500m/s 0.. lOOmIs 2 OOml s 400m/s 800m/s 1500m/s 5000m/s 10.038 0.0469252 0. Note.923 0.076'6388 5000ml s 10. Th~ second factor might be classified as a "running speed" factor.079 Communali ty 0.094 0.844 0.0468840 0. the remaining running events in each case have moderately large loadings on the factor.093 0.293 0.0537207 0. 9.0942894 0. The results for a m = 2 factor analysis of S using the principal component method are shown below.49405 0. OOOml s Marathonr/s Variance % Var Fact 0.0432334 0.060 0.0567342 0. 200m/s.296 0.28. OOOml s Mara thonrl s lOOmIs 0.0617664 0.0599354 0. SOOOm/s.0648452 0. 10.239 "running speed-running endurance" factor.0688217 0.061 0.301 0.0937357 0.OOOm/s.0523058 0.219 0.. 1S00m/s.0905819 Principal Component Factor Analysis of S (m = 2) Unrotated Factor Loadings and Communalities Variable lOOmIs 2 OOml s 400m/s 800m/s 1500ml s 5000m/s 10. 800m/s.0428221 0.0909952 0.04622 0.256 0.092 0.0572512 0.066 0.0959398 0. Since all the running event variables are now on a commensurate measurement scale. OOOml s Marathonr/s 0.0448325 0.0979276 0.0558678 0.223 0. 400m/s.171 0. The two factor solution accounts for 89%-92% (depending on solution method) of the total varance..0482772 0. it is likely a factor analysis of S wil produce nearly the same results as a factor analysis of the correlation matrix R.0587731 0.0729140 0. The plots of the factor scores indicate that observations #46 (Samoa) and #11 (Cook Islands) are outliers.0314951 0. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon load highly on this factor. Covariances: 100m/s.0553945 0. OOOml s Mara thonr/ s 5000m/s 10.0434979 0.0425034 0.05715£0 0.0736518 .0535265 0.54027 0.0562945 0. A factor analysis of R follows.0541911 0.31 The covariance matrix S for the running events measured in meters/second is given below. for both factors. The factor analysis of the men's track records is very much the same as that for the women's track records in Exercise 9.0745719 0.0434632 0.195 ' 0.

092 0.275 0.000m/s Marathonr/s Variance % Var Communality 0. 5000m. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 1500m through the marathon have higher loadings on this factor. . 800m) with the longer events (1500m. 400m.379 0. The second factor might be classified as a "running speed" factor.093 0.151 .094 0.048 -0.240 0. the 800m run has about equal (in absolute value) loadings on both factors and the remaining running events in each 'Case have moderate and roughly equal loadings on the factor.561 -0.923 Factor Score Coefficients Variable 100m/s 2 OOm/ s 400m/s 800m/s lS00m/s 5000m/ s 10.273 0.038 0.078 0.489 0.000.066 0.54027 0.116 0. The correlation matrix R is shown next along with the scree plot.562 0. 10.105 0. The two factor solution accounts for 92% of the variance.038 0.159 0.240 Rotated Factor Loadings and Communalities varimax Rotation Factor1 Variable 100m/s 200m/s 400m/s 800m/ s 1S00m/ s 5 OOOm/ s 10.184 0.377 -0.21168 0. OOOm/ s Mara thonr/ s Factor1 Factor2 -0.283 0.334 Using the unrotated loadings.362 o .287 -0.061 0.32860 0.060 0. 200m.022 0. the first factor might be identified as a "running excellence" factor. 12 0. Note. All the running events have roughly the same size loadings on this factor.526 -0. The second factor appears to contrast the shorter running events (100m.254 -0.415 0.197 -0. This bipolar factor might be called a "running speed-running endurance" factor. marathon).080 0. A two factor solution seems warranted.

10.778 0.755 0.706 0. 800mls.784 0.965 0.913 7 8 .000m/s Marathonm/ s Variance % Var 6. SOOOm/s.OOOm/s.754 0.852 0.828 0. .758 0.895 0.929 10 Om/ s 20 Om/ s 40 Om/ s 80 Om/ s 1500m/ s 5000m/ s 10.833 0.6258 0.085 7.916 0.726 0.241 Correlations: 100m/s.661 0.986 0.800 0.939 0.913 0.745 0.935 10.794 0.760 0..968 0.3023 0.6765 0.834 0.872 5000m/s 10 i OOOm/s 100m/s 200m/s 400m/s 800m/ s 1500m/ s 5000m/s 10. 3 ¡¡ 2 1 o 1 2 3 4 5 6 Factr Number Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Communali ty 0. Maråthonl11$ (C()rrêlatiol1f1atr~) 7 5 l 4 'ii- i: 3.744 0.732 0.909 0.691 0.947 ScteePlot of lOOmIs..836 0.914 0. on 20 Om/ s 400m/s 800m/s 1500m/ s 0.899 0.697 0.OOOm/s Marathonm/s 0. 200m/s. 400m/s. OOOm/ s Marathonm/s 0.700 0. 1S00m/s.841 0.

913 0.m=2J ' .-.004 0.968 0.#'1(..405 -0.399 7.913 Factor Score Coefficients Factorl Factor2 Variable -0.834 0.315 -0. ..215 0.1907 0.914 0. .369 0.. ..418 -0.3023 0.882 0.242 Rotated Factor Loadings and Communalities Varima Rotation Variable Factorl Communality 0...270 -0.1116 0.805 0.895 0.896 Variance 4.566 -0.74i 800m/s l500m/s 5000m/s 0.178 0.186 10 Om/ s 20 Om/ s 40 Om/ s 800m/s 1500m/s 5000m/s 10.466 100m/ s 200m/s 400m/ s 0.514 10..841 0.236 0.056 0.. .- -2 ~1 0 Fiis Fact .965 0. "" .939 0. . . .261 ~PC. .371 0. 1 2 .929 0.178 0.522 -0.. .423 0.000m/s Marathonm/s 0.341 0. .000m/s Marathonm/s % Var 3..

094 6. .865 0..957 0. .985 0.055 -0.492 Variance Factor Score Coefficients Factor1 Factor2 0..000m/s Mara thonm/ s 0:773 0. .886 3.047 Score P.957 0. . .388 -0.1540 0.lotof 100m's.792 Variance Rotated Factor Loadings and Communalities Varima Rotation Variable Communality 0.777 0. .457 0. ..986 0.012 0.899 Factor1 Variable 100m/s 200m/s 400m/s 800m/s 1500m/s 5000m/ s 10. ..711 0. I . #t(L -2 .899 100m/ s 200m/s 400m/ s 800m/ s 1500m/s 5000m/ s 10.3380 0..986 0. *11 -3 -3 .7485 0.243 Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Communality 0.0865 0. .777 0..089 0. . .f " D J . . 00 .859 0.986 0. . .000m/s Marathonm/s % Var 7.865 0.055 -0..951 -0.008 Variable lOOmIs 200m/s 400m/ s 800m/ s 1500m/ s 5000m/s 10. .859 0.08ti5 0.989 0.. .. .128 0. .870 0.928 0. .758 0. -2 -1 0 Firs factor 1 i . .9325 0. .758 0.797 0.268 -0..394 3.570 0. .942 % Var 7. . 2 .985 0.806 0.111 -0..219 -0. . l 1 .. .. . m=2) 3 . OOOm/ s Marathonm/s 0. Marathonm/s (MLE. . . . .046 0.886 0. . -.

The second factor might be classified as a "running speed" factor.244 The results from the two solution methods are very similar and very similar to the principal component factor analysis of the covariance matrix S. the remaining running events in each case have moderately large loadings on the factor. All the running events load highly on this factor. It does make a difference if the covariance matrix S is factor analyzed. since the measurement scales in Exercise 9. this bipolar factor might be called a "running speed-running endurance" factor. The two factor solution accounts for 89%-91 % (depending on solution method) of the total variance.30 or in meters per second. After rotation the overall excellence factor disappears and the first factor appears to represent "running endurance" since the running events 800m through the marathon load highly on this factor.30. Nevertheless. The 800m and 1500m runs are in the longer group for the principal component method and in the shorter group for the maximum likelihood method. The second factor appears to contrast the shorter running events with the longer events although there is some difference in the groupings depending on the solution method. for both factors. .30 are quite different from the meters/second scale. The plots of the factor scores indicate that observations #46 (Samoa) and #11 (Cook Islands) are outliers. it makes little difference whether running event time is measured in seconds (or minutes) as in Exercise 9. Using the unrotated loadings. Note. the first factor might be identified as a "running excellence" factor. The results of the m = 2 factor analysis of men's track records when time is measured in meters per second are very much the same as the results for the m = 2 factor analysis of R presented in Exercise 9. If the correlation matrix R is factor analyzed.

0000 1.30219 YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt from a covarance matrx and then rotates the scaled loadings.50159 X5 0 . 48777 X4 0 . 28442 X4 0. 1129 0.9378 o . 75340 0 . 36843 0. 8082 a .38394 -0. ()OO 4 3 2 1 Eigenvalue Factor Pattern FACTORl X3 a . 39838 o . 0001 a .0000 1 . 9998 o . 08393 o . 23003 a .33514 o . 36809 -0.03120 a .34428 0. 3948 a .~045 2 . 4292 4869.85951 o . 25282 -0. 39308 YrHgt FtFrBody PrctFFB Frame BkFat o .60974 -0.3163 a ..42819 0.32. 83599 'SaleWt Varimax Rotated Factor Pattern FACTOR1 FACTOR2 X3 0 . 43720 o .0000 o .6126 15704. 40890 a .21635 YrHgt FtF. 9996 5 6 7 3.88812 0.4688 O.13508 0. 94883 X6 0.83816 0.51405 X~ 0 .05273 F ACTOR3 0./ ç: . 0695 a . YrHgt a .25711 -0. 52282 -0. 00000 1 .55648 0. 0741 o .27102 o .42460 X4 0. 32637 X3 0.25026 X7 -0.90600 X5 0.94438 a . 0002 o .1914 o . 62342 a . 87'634 BAS scaws the loadings obtained PrctFFB SaleHt SaleWt The scaling is Î. 29875 -0 .49074 o .245 9. 39033 a .66769 X9 a . 18354 FtFrBody PrctFFB 0.2456 2. 00000 o . 96506 F ACTOR2 F ACTOR3 a .45576 X6 0.31943 0.11083 X8 0.16509 0. 8475 O.36162 -0. 65725 a .0000 1 . 00000 o . 00000 0. 62380 o . 26204 o . 00009 Varimax Rotated Factor Pattern F ACTORl F ACTOR2 FACTOR3 o .85244 a . 33038 o .41219 0. 8082 Cumulati ve 4874.44716 0.0000 1 .0000 a . 94363 Frame SaleHt SaleWt Initial Factor Method: Maximum Likelihood Factor Pattern X3 X4 X5 X6 X7 X8 X9 F ACTORl F ACTOR2 FACTOR3 o .01180 o . 75367 X5 0.15478 BkFat 0.46689 X9 -0. 00086 a .6748 5 . Factor analysis of data on bulls Factor analysis using sample covariance matrix S Initial Factor Method: Principal Components Difference Proportion 20579. 94025 -0.37408 X6 0. !J V"ü . 00598 a .50195 0.25853 0. 26667 19 (). 28992 SaleHt a .06913 X8 0 .38532 -0 . 64446 a . 18382 Frame BkFat 0.42166 X7 -0. 33505 o . 50894 o .48170 X7 0.rBody O.

21799 -0.etation of factiJl' from S.74140.94188 0.74194 SaleWt o .21811 o . 00000 a .0067 1 .4214 Initial Factor Method: Principal Components 6 7 0.37460 o . 0471 Difference 2.15210 0.54798 F ACTOR2 FACTOR3 -0 .85951 o .79502 Frame BkFat 0. 39308 o . 94025 o . 08393 O. 1858 o .88812 0.28442 0.2356 Proportion 0.25513 -0.7797 0.9458 o .7836 0. 39838 o .25711 -0. 24262 0.1910 0. 00000 F ACTOR2 F ACTOR3 1 .38949 -0.5887 0.33710. 00894 o .3200 0.69440 o .82646 YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt Varimax Rotated Factor Pattern X3 X4 X5 X6 X7 X8 X9 F ACTOR1 FACTOR2 FACTOR3 0.01180 -0.13094 0.246 Factor analysis using sample correlation matrix R 1 2 3 4 Eigenvalue 4.42819 o .23541 -0.51405 0.37900 X8 0. 00000 YrHgt 0. 50159 0.36843 o .72177 X6 0.55648 X5 0. 0265 o .26667 0.83365 0.8856 0.27102 X8 0.27085 -0.44792 0 .01382 -0.9723 a .03335 0.78354 o .25026 0.16509 X4 0. 00598 0.0000 . 83700 X5 0.41206 0.35794 -0 .91927 X9 0.94883 0 .1059 0.12071.43720 X9 O. 00000 FtFrBody Pn:tFFB 0. 0209 Cumulati ve 0. 34932 0 . 9933 o . Factor Pattern F ACTOR1 X3 a . 0393 o .18382 X7 -0.88091 X7 -0 .23003 0 .5887 0.93812 0.5957 0.87071 0.15014 -0.48930 -0. 34428 o . 83599 SaleHt SaleWt Varimax Rotated Factor Pattern FACTORl FACTOR2 FACTOR3 X3 0. 52282 o .0602 5 O.75340 0.11715 -0. 21635 X6 0.41219 0 . 39692 YrHgt FtFrBody PrctFFB SaleHt Ini tial Factor Method: Maximum Likelihood Factor Pattern X3 X4 X5 X6 X7 X8 X9 FACTORl o .1465 o .36162 -0 .38772 o . 36484 a .26505 0. 0994 o .05273 0 .94438 0. . 28992 o . 85244 o .03120 Frame BkFat -0.62380 o . 25282 -0. 87634 YrHgt FtFrBody PrctFFB Frame BkFat SaleHt SaleWt The interpretation of factors from R is different of the interpr.91334 X4 a .06532 o .04948 0.

.... .. .... ~ .. . ... '1 ..... .__.. - ........... .... ....... .a... 0 :....... ........... - ........... . o ..:........... . .._. ...... . ... ...........i......1 o 1 2 ............. ".......~-_............. ..."... ....0 ... .......... ............... ..n..o .......................247 Factor scores for the first two factors using S and varimax rotated PC estimates of factor loadings 51 (O50 N- .............. ¡ ". ... r ci-2 ................ ii i I i i .: ..-... o .:. .......~....._.. - ....... ci I I -2 -1 I I i 1 2 Factor scores for the first two factors using R and varimax rotated PC estimates of factor loadings "1 - 51 (O- N . . .... o .....__...._..._... ..

1966 % Var 0.173 -0.248 9. These correlations and the scree plot suggest m = 2 factors is probably too few. 5 8 Q) -0. ~ lU Unrotated Factor Loadings and Communalities Variable indep supp benev conform leader Variance 2. The correlations are relatively modest.864 .909 0.327 -0. Correlations: indep.492 -0. leader supp benev conform leader indep -0.018 -0.298 -0.leader (¡.009 -0.561 -0. 3682 0. supp.3207 0. 0.401 0..s 0.7559 0. Consequently.819 0.274 0. 1 3 2 Factr Number Principal Component Factor Analysis of R (m = 3) .670 0.256 1. An initial factor analysis with m = 2 confirms this conjecture.151 Communality 0.333 Scte Plot ofindep.163 0.979 4. benev.422 cr5~m..187 supp benev conform 0.33 The correlation matrix R and the scree plot follow.. conform. we give am =3 factor solution..0.439 Fàctor2 Fact~r3 l:9 .943 0.471 0.100 -0. .

..545 leader 0.1211 0..971 0.073 -0. . 5591 0. ..155 ~ -0. .086 0.819 0.864 0.240 0..010 conform 0. .. .i12 CO:890J ~1~ (~0.081 O.000 -0. .909 0.194 0. .832 0. m=3l 4 .532 CD indep supp benev conform leader Variance % Var and Communalities 1.indep.147 supp 0. .119 -0. 3114 0.1 0 Fii'l'actr -2 Maximum Likelihood Factor Analysis of R (m = 3) * NOTE * Heywood case Unrotated Factor Loadings Variable Factor3 Communality 1..000 0. ...249 Rotated Factor Loadings and Communalities Varimax Rotation Variable Fact~ Factor2 Factor3 indep supp benev (=0.670 0..U9" -0.310 0.372 -0. ..690 benev 0.589 1.41-a -0. .111 Variance 1.752 -0..127 -0. . ..: . . . . I" .6506 0. . . ...312 1. " -.Ieader (PC.824 0.. '\.362 -0.018 -0.. .277 -0. 5486 0.262 Factor Score Coefficients Variable Factorl Factor2 Factor3 indep -0. .203 . .379 (-OJ077 1. .129 0. ...272 4. . Score Plot of. .000 -0. -.790 1..3207 0.136 -0. . ..008 ..330 conform % Var Communali ty 0.3 ...979 0.943 0... 3587 1. .. . .000 1. .. 0133 4. ..003 leader -0. .. .

243 1.5. 000 -0. 5842 % Var 0. although all the factors ar bipolar.4-.454) -0.~ ~ Gb. 000 -0. this factor is a .0 8 -O.589 1. the arangement of relatively large loadings on each factor is quite different for the two solution methods.211 Using the unrotated loadings and including moderate loadings of magnitudes . The rotated loadings are consistent with one another for the two solution methods and. may be easier to interpret. The first factor is a contrast between Independence and the pair Benevolence and Conformity. The second factor is essentially a "leadership" factor although if moderate loadings are included.992 0.264 1. 000 4.123 -0 .515 ~ -0. 000 -1.129 0.098 leader -0.980) 0.213 Variance 1.219 -0.824 Factor Score Coefficients Variable Factor1 Factor2 Factor3 indep supp benev conform leader -0 .130 0.432) . 081 0.317 1.9681 cg. 000 O. Moreover. 069 -0. 024 -1.532 0. 000 1.1211 0.011 1. 000 O. 016 0.000 -0. the factors are all bipolar and appear to be difficult to interpret. Perhaps this factor could be classifed as a "conforming-not conforming" factor.3199 0. 2170 0.000 0.250 Rotated Factor Loadings and Communalities Varimax Rotation Variable indep supp benev conform ~ Factor2 Factor3 Communality IT 0.034 0.

Plots of factor scores from the two factor model suggest that observations 58. this factor is a contrast between Support and Conformity. SF. To our minds however.480272 . 59. the results for a m = 2 factor solution using both solution methods is also given.434307 0.984 0. Covariances: BL.302871 EM SF EM 1. the latter is difficult to interpret.987966 BS 1. 60 and 61 may be outliers. SF. 9. Teenagers with above average scores on Leadership tend to be above average on this factor.34 A factor analysis of the paper property variables with either S or R suggests a m = 1 factor solution is reasonable.513359 SF 4. 886636 0. EM.988 0.875 0.987585 2. For completeness. No outliers are immediately evident. while those with above average scores on Benevolence tend to be below average on this factor. Perhaps we could label this factor a "lead-follow" factor.972056 Correlations: BL.914 SF 0. BS BL BL 8. The third factor is essentially a "support" factor although. EM.975 BS o .140046 0. again.147318 0. The covariance matrix S and correlation matrix R follow along with a scree plot using R. All variables load highly on a single factor. BS BL EM SF EM 0.942 BS 0. if moderate loadings are used. The factor scores for the first two factors are similar for the two solutions methods.251 contrast between Leadership and Benevolence.

8395 % Var 0. This factor might be called a "paper properties index.664 1.991 0.255 The first factor explains 96% of the variance.960 3.449 0.988 0. All varables load highly and about equally on this factor. given their measurement scales.252 Principal Component Factor Analysis of S (m = 1) Unrotated Factor Loadings and Communalities Variable Factorl Communality BL EM SF BS Variance % Var 2.960 Factor Score Coefficients Variable Factor1 BL EM SF BS 0.248 0.468 11.101 0.8395 0.960 BL EM SF BS Variance 3.259 0.042 0.988 Factor Score Coefficients Variable Factor1 BL EM SF BS 0. load highly on this factor.285 0.734 0.878 0." .684 8.042 The first factor explains 99% of the total varance.258 0.905 0.441 2.984 0. 295 0. 295 11. Note: There is no factor rotation with one factor. Principal Component Factor Analysis of R (m = 1) Unrotated Factor Loadings and Communalities Variable Communality 0.188 0. All varables.

9798 0.761 0.571 0.960 0.568 Factor Score Coefficients Variable Factor1 Factor2 -0.995 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factor1 Factor2 Communality BL EM SF BS Variance % Var 0.000 0. the factor might be called a "paper properties index.991 0.993 0.253 Maximum Likelihood Factor Analysis ofR (m = 1) * NOTE * Heywood case Unrotated Factor Loadings and Communalities Variable Fac BL EM SF BS Variance % Var 1.9798 0.999 0.980 BS Variance % Var 3.8395 0.098 0.522 0.493 0.7784 0.000 0.035 3.868 0.1403 0.650 BL 1. Again.975 0.835 0. The first factor explains about 95% of the varance and all varables load highly and about equally on this factor.996 2.000 0. 013 1 BS .427 3.7784 0.191 0.008 0.996 SF 0.7082 0.945 Factor Score Coefficients Variable Factor1 BL EM SF BS 1.991 -0.995 0.361 0.984 Communality 1.307 0. 3. 821 -1.993 -0.914 0.000 0.128 0.817 0.951 EM 0.9 2 BL 0.968 0.988.999 0.8£8 1 .235 EM 0.945 3.996 -0.000 The results are similar to the results for the principal component method.642 0.232 SF -0.000 0." Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Variable Factor1 Factor2 Communality 0.852 0.271 7 1.

..2572 0.992 . 000 0.992 1. the second factor explains very little of the variance beyond that of the first factor and is not needed.757 0.492 1..984 1.070 3. 000 0.000 1.853 0. 795 -0.576 0.254 . . lr61 .. ..I .9700 0.. e.986 0.000 3..000 1.0. ". ..523 0.870 2. 000 -1..564 Factor2 Communality 0.641 0. . 000 3.922 and Communalities Factor2 Communality 0. .185 0.875 0.759 Factor2 -0.984 Rotated Factor Loadings and Communalities Varimax Rotation Variable Factor1 BL EM SF BS Variance % Var 0. .2800 0.809 0.#tõfJ . .428 Factor Score Coefficients Variable Factor1 BL EM SF BS -0.975 1.988 0. -1 0 Firs FaCtor Using the unrotated loadings. Maximum Likelihood Factor Analysis of R (m = 2) * NOTE * Heywood case Unrotated Factor Loadings Variable Factor BL EM SF BS Variance % Var 0.6900 0. The potential outlers are evident in the plot of factor scores. . . 016 -0.103 0. .. #S9 . Since the unrotated loadings provide a clear interpretation of the first factor there is no need to consider the rotated loadings.. .00 1.9700 0. #"-0 .986 0.7128 Ù . 000 0. .485 0.000 -1. 078 1..

A m = 1 factor solution using R appears to be the best choice. The covarance matrix S and correlation matrix R follow along with a scree plot using R.733 -0.784 0. LFF.00577 221. Since the unrotated loadings provide a clear interpretation of the first factor (paper properties index) there is no need to consider the rotated loadings. ZST AFL LFF FFF ZST AFL LFF FFF ZST -3.05161 -185.785 .25S The results are similar to the results for the principal component method.06227 3.21404 0. Plots of factor scores from the two factor model suggest that observations 60 and 61 and possibly observations 57.63707 0.35 A factor analysis of the pulp fiber characteristic varables with Sand R for m = 1 and m = 2 factors is summarized below. 9. the first factor explains 92% of the total variance and the second factor explains very little of the remaining variance.711 ZST 0.34760 308. FFF. ZST AFL LFF FFF LFF 0.906 FFF -0. S8 and 59 may be outliers.00087 0. FFF. LFF. Using the unrotated loadings. The same potential outlers are evident in the plot of factor scores.39989 -0.35980 Correlations: AFL. Covariances: AFL.793 -0.40633 0.

047 175. . LFF and ZST group.001 AFL LFF FFF ZST variance 455.860 455. LFF (long fiber fraction) and ZST (zero span tensile strength) may all have to do with paper strengt while FF (fine fiber fraction) may have something to do with paper quality.573 279.858 0. AFL (average fiber length).256 Principal Component Factor Analysis of S (m = 1) unrotated Factor Loadings and Communalities variable Communali ty 0.645 0.000 0.000 The first factor explains 86% of the total varance and represents a contrast between FF (with a negative loading) and the AFL. Perhaps we could label this factor a "strength--uality" factor.48 0.860 Factor Score Coefficients Variable Factor1 AFL LFF FFF ZST 0.433 -0.48 % Var 0. all with positive loadings.

090 0.1245 % Var 0.870 0.614 0. Maximum Likelihood Factor Analysis ofR (m = 1) Unrotated Factor Loadings and Communalities Variable Communali ty 0.877 0.132 The first factor explains 78% of the variance and the pattern of loadings is consistent with that of the m = 1 factor analysis of the covariance matrix R using the principal component method.422 0.781 Factor Score Coefficients Variable Factor1 AFL LFF FFF ZST 0.770 0. we might label this bi polar factor a "strength-quality" factor.394 -0.894 0. Again.1245 0.273 The first factor explains 84% of the variance and the pattern of loadings is consistent with that of the m = 1 factor analysis of the covarance matrix S.3577 0.3577 0.839 3.900 0.278 -0.261 0.717 AFL LFF FFF ZST Variance' 3. Again.257 Principal Component Factor Analysis of R (m = 1) Unrotated Factor Loadings and çommunalities Variable Communality 0.839 Factor Score Coefficients Variable Factor1 AFL LFF FFF ZST 0. .279 0.781 3.841 AFL LFF FFF ZST Variance % Var 3. we might label this bi polar factor a "strength-quality" factor.

256 variable ~ AFL .0176 0..7070 0.942 0.087 3. 075 -0.3493 0..t1~'! .927 0.288 LFF FFF ZST 3..696 0. ~: ~ .757 0.359 0. .953 Variable AFL LFF FFF 0. *' sf ..7070 0. Principal Component Factor Analysis of R (m = 2) Unrotated Factor Loadings and communalities Factor2 communality 0.949 0.. ...863 ZST 2.953 0. .942 0. .501 . .949 0.504 Variance % Var 3.. .422 Factor Score Coefficients Variable Factor1 Factor2 0.082 ZST 0. .613 AFL LFF FFF -0.3577 0. . 6893 0.258 Because the different measurement scales make the factor loadings obtained from the covariance matrix difficult to interpret. . l.-..863 0.#'"1 .429 1. 50 0. we continue with a factor analysis of the correlation matrix R with m = 2.927 1.*"'0 -4 -3 ~2 -1 Factor FirS 1 2 .... .839 Variance % Var Rotated Factor Loadings and Communalities Varimax Rotation Communality 0..

944 0.944 0.-:-.. .943 0.879 1.~~' Communal i ty 0...336 AFL LFF FFF ZST 0.205 -0. " .876 0..8 . 197 0.¡"..070 3.2 -1 o Firs Factr Examining the unrotated loadings for both solution methods.+$7 1l~"i. this factor has factor explains little (about 7%-8%) of moderate to very small loadings on all the variables with the possible exception of .033 0..503 3. Also.292 Variable AFL LFF FFF ZST Variance % Var 3.534 0..076 m=2) "'''c.943 0. .876 0.. . . . . ..752 0...423 -1. we see that the second the remaining variane.376 Factor Score Coefficients Variable Factor1 Factor2 -0...049 -0...2351 0.752 AFL LFF FFF .":.. . 01 ZST Variance % Var 2. . ~ .5146 0.809 Communality (-0: 38ij 0...259 Maximum Likelihood Factor Analysis of R (m = 2) Unrotated Factor Loadings and Communalities Factor2 -0.. -3 -2 . 5023 0. .879 Rotated Factor Loadings and Communalities Varimax Rotation Variable F5a~~..2796 0.: .5146 0.0124 0.922 0.101 0..

484 Goats ScreeP.382 0. 58 and 59 may be outliers. We give the results for m = 4 but. Cotton.022 0. Bull.081 0. A one factor model appears be sufficient in this case.679 -0.389 0.lotofFamily" . 4 and 5. Goats Bull Cattle Sorg Millet Maze Family DistRd Cotton DistRd -0.063 0. Potential choices for mare 2.357 0.088 Bull Cattle 0. Maze. Using the rotated loadings. here is a case where a factor model is not paricularly well defined. to our minds. there seems to be no gain in understanding from adding a second factor to the modeL.305 0.424 0.404 0. Cattle.353 0. LFF and ZST.031 0.568 -0. this factor might be called a "fine fiber" of "quality" factor.560 0.520 0.724 Cot ton 0.443 0. If retained..36 The correlation matrix R and the scree plot is shown below. 9..623 0.730 0.399 0.506 0.054 Maze 0.028 0.175 0.727 -0. 61 and. Sorg. 3.Goats i 2 3 4 5 6 Factor Number 7 8 9 .197 0. To summarize. Milet. plots of the factor scores for m = 2 suggest observations 60.084 0.383 0. the second factor looks much like the first factor for both solution methods. perhaps.336 -0.260 variable FF. observations 57. However.136 0. this factor appears to be a contrast between variable FF and the group of variables AF.765 0.217 0.071 Sorg Millet 0. DistRd. That is. After m = 2 there is no shar elbow in the scree plot and the plot falls off almost linearly. Correlations: Family.109 0.

224 Bull 0.261 Principal Component Factor Analysis of R (m = 4) F~ unrotated Factor Loadings and Communalities Variable Factor3 0.951í 8J11 0.374 0.706 0.614 -0. m=4.114 Goats SCOff: PlolõfFamily" mjGoats (PC.023 0.714' -0.~' Factor 2 3 4 .879 0.863 i -0.907 0.842 -0.ft :.148 0.460 Variance 0.092 0. '.3593 0.030 0. .r .633 -0.811 0.396 sorg Millet Bull Cattle 0.115 -0.047 0.494 Maze O.150 ~ ~~~ 0.210 0.8985 0.158 0.309 1. e.903 Family DistRd Cotton tt~~ -0. 0581 4.¡S' 1 Firs 'i .070 -0.974 61.344 Cotton 0. 'l3'! .118 Family DistRd Cotton Maze Sorg Millet Bull Cattle Goa ts Variance % Var F~ ~ 0.851 .171 -0. 28 0..963 0.260 -0.706 0.043 0.851 -0.3593 0.818 0.818 1.125 0. .108 Factor4 communality 0. 0291 0.078 -0.246 -0.026 \I .811 0:466 0.247 -0.t.907 0.178 Goa ts % Var Rotated 1.798 -0.226 Factor2 -0..1443 0.697 Millet 0.024 0.019 0.320 -0.856 0.629 EO.164 -0.042 0. 7 .~":.080 0.156 0.005 0.286 -0..798 -0. 724J ~~ 7.842 0. .165 0. ..197 Family -0.199 Sorg -0.HlO -0. -lll.029 0.211 1.180 0.068 0.145 7.183 (0'. .Oti3 -0. .~6) 2. t:i.112 0.986ì 0. .7 -#..614 0.856 0.063 Cattle -0.076 0.7840 0.110 0.535 0.090 -0.301 o 564 -0.6476 0.026 -0.338 -0.102 and Communalities Factor Loadings Varimax Rotation Variable ~ Factor4 Communality 0.118 0. · .329 0.001 -0. 0. ..114 Factor Score Coefficients Factor4 Variable Factor1 Factor2 Factor3 o . 0 -0.013 0..204 0.022 0.006 -0.175 Maze -0.014 DistRd -0.074 0. -1 o .974 0.9205 0.582 0.

141 Cattle -0. '" -. -.003 Millet -1.837 -0. -1 o 1 Firs Factr .023 -0..109 -rr.075 0. .161 -0.131 -0.151 64~ì Variable F~~~~ FamilY ..062 -0.303 0.017 -0.034 Maze Sorg -0.002 0.681 0.962 0. .649 -0.404 -0.440 0.362 -0.229 0.044 -0.010 -0.9322 0.4ti5 1.189 0.215 -0.009 -0.837 -0.249 Goa ts 2. ( " .1.385 0.466 Goa ts Variance % Var 2.782 -0.162 -0.990 0.465 -0.ti59 0.301 0.009 -0.15tì C=Ô.002 -0.869 0.307 0.9824 0.7 .4il 0.361 -0.143 Factor Score Coefficients Factor3 Factor4 Variable Factor1 Factor2 0.7340 0.148 0.162 -0. 7047 0.369 -0. 426 0.073 5. ":Js .369 -0.206 0.268 E§:m ~:$ 0.370 0.113 0.lf52.001 DistRd -0.869 -0.5841 0.374 0.331 Variance % Var Rotated 1.9322 0.290 0.896 0. "7(" .915 fO.2098 0. .003 0.995 Maze 0.262 Maximum Likelihood Factor Analysis ofR (m = 4) unrotated Factor Loadings and Communalities Factor3 Factor4 Communality 0.166 0.064 DistRd Cotton a1 ~ 0.065 communalities i ty Factor3 Factor4 Communal 0. .120 0.980 0.606 0.081 0.659 0.6610 0.074 0.#t.028 -0.089 0.016 0.246 1.033 cotton 0.091 -0..052 -0..103 0.023 Sorg 0.990 -0...247 -0.009 Goa ts .096 0. 2850 0. .558 fj~' rff Millet Bull Cattle -0.017 DistRd Cotton -0.649 -0.082 5.605 FamilY 0.025 -0.078 -0.782 -0. .013 FamilY -0...071 r=i0~5 Q112 21) 0.:~ .093 -0.109 0.026 Bull 0.962 0.009 0. l.044 -0.746 0.324 -0.025 0.185 0.189 ~ Factor Loadings and Varirnax Rotation Variable .7035 0. '.211 Maze Sorg Millet Bull Cattle 0.

The first factor might be called a "family far-row crop" factor. Sorghum. Maze. The interpretation is not clear in the maximum likelihood solution. If these observations are removed. the second (or third in the case of the principal component solution) factor might be called a "family farm-grass crop" factor. The patterns for unrotated loadings on the first two factors are similar but not identicaL. Plots of factor scores for the first two factors indicated there are several potential outlers.) Variables Family. The third factor in the maximum likelihood solution (second factor in the principal component solution) may have different interpretations depending on the solution method but in both cases. DistRd does not load on any factor in the maximum likelihood solution. . 66% for maximum likelihood) reinforces the notion that a linear factor model is not paricularly well defined for this problem.263 The two solution methods for m = 4 factors produce somewhat different results. Again. although factors 2 & 3 in the principal component solution appear to be reversed in the maximum likelihood solution. The patterns of loadings for the two solution methods on the third and fourth factors are quite different. Growing cotton and maze is labor intensive and bullocks are helpfuL. This factor is clearly a "distance to the road" factor in the principal component solution. Perhaps this factor could be called a "livestock" factor. Milet and sorghum are grasses and may provide feed for goats. The variables Family. The factor loading patterns are more alike for the two solution methods using the rotated loadings. Cotton. Bullocks and Cattle load highly on this factor. (From R we see that DistRd is not correlated with any of the other varables. and Bullocks load highly on the first factor. Notice that DistRd does not load on any factor in the maximum likelihood solution. it appears to define factor 4 in the principal component solution. the results could change. Milet and Goats load highly on the second factor (maximum likelihood solution) or the third factor (principal component solution). The rotated loadings on factor 4 for the two methods are quite different. The rotated loadings are considerably different on the fourth factor. Consequently. The fact that the two solution methods produce somewhat different results and explain quite different proportions of the total variation (82% for principal components.

32XP) .1t 1/2x(1) 11 -= (0 a IJ1(. P2 = .30X~1) .iox~2) U2 = .23XF) + .36Xfl) .2 = o.VI = xfl) and Pi = .1xO)(X1J (l )= x(1) 2 2 Since !i t2~/2 = (1 OJ.l.Z * * a) Pi = . 'ui= el .-It_ t22 ~l tii _.:S)2J which has eigenvalues ~2 = (...49 b) Ui = . VI = xf2).(0 a 2() ( .36X~1) Vi = .30X~1) V2 = .95..Chapter 10 lO. are ~1 · (:1 and ': · (~l.95)2 and p. Thus Ui = x~l) .55. Thus (1) The normlized eigenvector. iO.20XP) + ..-1/2 11 ~i2. t-l/2lo .

677)2+(1.863Zi ( 2) + ..).14633 .1) and consequently the eigenvalues are the same.-.4) = 1.5)+( -.u5S) (.106ZZ Var(U2) = (_0. b) U2 = -.28919) .677) (..677)(.677i(I.17361 = ).17361.14633 .2_.4)= .0 Corr(UZ' V 2) = (-.OSSzll) ( 2) Vz = -.OSS)2_2(.45189-).3) + (.0S5)(.863) (.iO.28919 .S a) -1 -1 Q-1D 0-10 (.OO09) equation is the same as that of The characteristic ii/2 12 2~ 21 ii/2 (see Example 10.0 Var(V2) = 1..4S189 t11t12t22t21 =~ 11~~22~1 = .671ZP) +l.6)+(.5461 ).5457) P.70ti)(I.03 = p~ . .0005 .706)(-.05S)(. = ( À-.863) (1.

p lp Pi 1 Ui = f2(l + p) VI = 10.03 cos 61 + VI = .9953) 1 3.~6zll)-1.45Z~2) .5R.5 R. p=2.0IZl2) VI = 1. pî=O =-136. A Ui = 1.74 H~l): pi *0.07 Û1 = i.70x~2) e = 45.79Zl1i V2 = -. q=2.39) is not parti~ularly strong.49 cos A* 10.30zl1)+.10zl2)-.(.5 Value of Null hypothesi s Ho: t12 = E'12 = a test sta ti stic -136.65 A A Therefore.10.7 a) 0(p(1 * =. .b) n = 140. Reading ability (summarized by Ui) does correlate with arithmetic ability (summrize~ by Vi) but the correlation (represented by PI = .9 a) Pl= .= 4 radians A* d) PI = .03Zl1).¥p+q+1l = 136.(.84 .!.78 P2 = .72 A 'i VI = . 8444 i ( .46 Sin a1 Sin a2 e2 + .02zl2)+1.20xi2)+.48 = 23. n-l.57. U2 = .39 . reject Ho but do not reject H~l).8 c) 1 r2(1+p) 266 (X(1)+X(I) 1 2 ) i 2 (X (2) + X( 2) ) A* Pi = .9953) Upper 51 -Degrees of point of f Freedom distribution 4 9.

1.498).142z~1) Vi = 1.602Z12) -.11 Using the correlation matrix R and standardized variables.532) RVi.1973 d( omic.728z:2) + 1.079z~1) Û2 = i.002Z~l)-.1 60z:2) .130 Ûi = -. overlap.342 . but not much. The canonical correlation (.977Z~2) U1 . p.33. 10.*' 0. h' es 1standardized) Vi : i zl2) +Z~2) = a "pun. The first canonical correlation. the Z2). The Z(l).26 lz~2) V2 = -. p.17 b) Û1 = i. the canonical correlations and canonical variables follow.s are the banks.S are the oil companies. Û i is not highly correlated with any of the Z2). Vi is dominated by Royal Dutch Shell. although relatively small.539z:I)+ i. R" Z(2) = (. . =.982 . =.913 .S (oil companies) and ~ is not the Zl)'s (banks).318 .348. is significant. = 0 is not rejected at the 5% leveL.093 .410z~1) +. shment index" Punishment appears to be correlated with nonprimary homicides but not primary homi ci des.345z~2) Additional correlations: vi. P2 = .Z(2) = (. Focusing attention on the first pair of canonical variables.142z:1) -. Rvi. p.JO.348) between Û i and ~ suggests there is not much co-movement between the rates of return for the banks on one hand and the oil companies on the other. Moreover.nonprimary Zi( i) -. The second canonical correlation is not significant. Û i is dominated by Citibank.10 a) 267 A* A* Pi = . The first canonical varables highly correlated with any of differentiate stocks in different industries with some.174) Here H 0 : 1:12 c¡12) = 0 is rejected at the 5 % level and H cil) : Pi.266 .209z~l) + .z(l) = (.185)..003Z~i) V i = -. R ZCI) = (. .

98 20 Reject H 78. ?3 = .61 3. P3=O. P2 *0.19Z~ 2) c) Sample Correla tions Vari ables xU) Variables - Be tween Original Variables and Canonical A . "' P2 = . Ui Vi A X(2) .89 . VI may be considered a measure of family MstatuS" which is domin- ated by family income. of restaurant dining 2. 636..S1 of head of household d) - U1 is a measure of family entertainment outside the home.81 6 0 00 not r. ~4 -- . 77zI i ) +.13 .68 .256. a) 10. P4=O 0 .63 12 Reject H 16.99 . Reject Ho: A* P12 = 0 a t the 5: level but do Ui = not reject * H~I) = pi 4: 0.42 househol d 2. b) 268 P2 :I . annua 1 frequency .094 Degrees Conclusion of freedom at a level 309.909.29 .. 27Z~ 1 ) A VI = . annua 1 frequency of a ttendi n9 movi es age of head of . Essentially.69.35 . annual fami ly income . Va 1 ue of test statistic Null hypothesi s 1. .P1 = .eject Ho.12- Pl = . Ho: PI *0.A* a) ID.98 educa ti ona 1 1 eve 1 ." = P4=0 3.1 i. Ho: L12=P12=0 2 Ho: Pi *0.oszI 2) +. P2=.68 1. . 90Z~ 2) + . Variables Ui A Vi . family entertainment outside the home is positively associated with family income.19 P2 = a a t the 5: level.

Vi .55 z(l ) 3 Z(1) 4 Z(1) 5 Z(2) 1 .269 Z(1) i Z(1 ) 2 A . zll) and z¡i)) and "bad" aspects (Z3 (l! Z4 (1) ).03) G~J' G::: -::: Z(2) 2 .18 Z(2) 3 Z(2) 4 A b) U1 appears to measure qual i ty of wheat as a "contrast" between "good" aspects (Zl1). It appears to measure the quality of the flour as represented by z12). s harder to interpret.: .30J i~~J' r:~ -::: -::: -:.46 . z~2) and z~2). .98 -.

accounting meaures cannot be well explained. However.383659039 1. And the sample canonical variates are U1 U2 Raw Canonical Coetticients tor the Accounting measures ot protitability BRA -0..538822808 RRA 2. Ví is highly correlated with both of its components.5395. from the correlation between measures and canonical variates. 6946 Canonical Variables Canonical R-Squared 0.0968 BR 0.7228316516 -0.8702 0.4921 0.032933813 2.500804367 REV -0.4921 2 0.2910 CWlulati ve Proportion Proportion 0.38346956 RR -1 . 4225 0.794353012 HR 0.3114 Standardized Variance ot the Market measures ot protitability Explained by Their eim The epposi te Canonical Variables Canonical Variables Cumulative Canonical Cumulative Proportion Proportion R-Squared Proportion Proportion 1 0. Correlations Between the Accounting measures ot protitabili ty and Their Canonical Variables U1 U2 BRA 0.5665 0.431692979 2.7184 0.494697741 1.8298904995 U1 is most highly correlated with RRA and HRA and also HRS and RRS. In fact.7520.1298 1.2851 0.14 a) pi = 0.0000 0. .270 10. b) Standardized Variance ot the Accounting measures ot protitability Explained by Their elm The epposi te Canonical Variables Cumulative Proportion Proportion 1 0.8702 0.OOag .3930601511 -2. 0906 o .6041 0.0263 0.5655 0.6471230054 RR -1.5626 RRA 0 .0378 0.8110 0.2851 0.S60S . The second pair does not correlate well with their respective components. accounting measures on equity have weak correlation with Ûi.6190103052 V1 V2 Raw Canonical Coetticients tor the Market measures ot protitability Q 1..2910 0.5041 2 O.7749354333 -4.2133051339 -0.5299 Market measures can be well explained by its canonical variate 'C.2711 HR 0 . pi = 0.9655018549 RRE 0.

0000 Canonical R-Squared 0. 8002 o .271 RRE 0. 0046 Cumulative Propor'tion Proportion o . 0002 o .09S9 RR 0.1160 Standardized Variance ot Their Olm Cumative 2 Proportion the static measures Explained by The Opposite ~anonical Variables Canonical Variables 1 The Opposite Canoni~al Variables Cumulative Proportion Proportion 0.8840 0. 8003 Static meaures can be well explained by its canonical variate ill' Also.8334 0.0018933921 -0.6741 -0.8334 o .3814 Correlations Betveen the Market measures ot pro~itability and Their Canonical Variables V1 V2 Q 0. 0006 o .006663216 12 -0.008471036 14 0.7373 1 0.0681.8840 2 0.000696736 0.0077029513 Rav Canonical Coetticients tor the static measures 13 0. p.0036016621 -0.15 pi = 0. 8002 o .007828962 Standardized Variance ot the dynamic measure Explained by Their Olm Canonical Variables Proportion Cumulative Canonical Proportion R-Squared Proportion 0.7761 0.9886 0.0013448038 0.1508 REV 0.0046 o .7367 1 .4866 10.8736 0. dynamic meaures can be well explained by its canonical variate Vi. And the sample canonical variates are U1 U2 V1 V2 Rav Canonical Codticients tor the dynamc measure X1 0.9601 1.9129. .0399 0.9601 0.0000 0 .7367 0. = 0.

1) ) In((1 .009317525 I NSULRES 0 .p*D) ~ X. 023399723 -0.~(p + q . The large sample tests -en .16 From the computer output below.1) ) In((1 .2"(3 + 2 .p*~)(1 .0"08667216 Raw Canonical Coefficients for the Weight and Fasting WEIGHT 8.13. Canonical Correlation Analysis Adjusted Appr~x Canonical Correlation 1 0.015752 Canonical Correlation Analysis Raw Canonical Coefficients for the ~lucose and Insulin GLUCOSE 0.0655750801 -0.59 and -en .1 . we prefer to interpret the canonical variables obtained from S in terms of coeffcients of standardized variables.517345)2)(1-(..375167814 FASTING -0.0815z~i) Vi = i.272 10.020z~2) .0247524811 INSULIN -0.p*D) ~ XlP-lXq-il05) or 1 -(46 ..125508)2) J = 0:ô67 ~ X~(.1 .1) ) In¡(1 .019159052 0 .0131006541 0.7047zl1) + i.267646 0.517345 and P'2 = 0.4357zPJ .007324 0.1609z~2) The two insulin responses dominate Ûi while Vi consists primarily of the relative weight variable.125508.014438254 -0.517145 o .125508)2) J .125158 o .50 ~ X~(. the first two canonical correlations are ßi = 0.05) = 12. 125508 Canonical Standard Correlation Error 0. "009843 Squared Canonical Correlation 0.517345 2 O.(.05) or 1 -(46-1-2"(3+2-1) )In((1-(. Ûi .1 .~(p + q .05) = 5.q(. Even if the variables means were given.99 suggest that only the first pair of canonical variables are important.12~675138 ..

273

Standardized Canonical Coefficients for the Glucose
GLUCOSE

0.4357 0.8232

INSULIN

-0.7047 -0.4547
1.0815 -0.4006

I

NSULRES

Standardized Canonical

Coefficients

WEIGHT

FASTING

Correlations

Between the Glucose

SECONDA2

1 .0202

-0.0475

-0.1£09

1.0086

and Insulin
PRIMARY

1

Variables

PRIMARY2

o . 3397

o . 6838

INSULIN

-0 . 0502

-0 . 4565
-0 . 5729

0.7551

and Fasting

and Their Canonical

GLUCOSE

I NSULRES

Correlations

for the Weight

SECONDAl

and Insulin

Between the Weight and Fasting and Their Canonical
SECONDAl
SECONDA2
WEI GHT
o . 9875
O. 1576
FASTING
o . 0465
O. 9989

Variables

10.17 The computer output below suggests maybe two .canonical pairs of variables. the
canonical correlations are 0.521594, 0.375256, 0.242181 and 0.136568. Ûi ignores the
first smoking question and Û2 ignores the third. Vi depends heavily on the difference of
annoyance and tenseness.
Even the second pairs do not explain their own variances very well. R~(1)IU2 = .1249

and R~(1)iV2 = 0.0879
Canonical Correlation

Canonical

Adjusted
Canonical

Correlation

Correlation

2
3

0.521594
0.375256
0.242181

0.52t771
0.374364
0.241172

4

O. 136568

o . 135586

1

Analysis
Approx

Standard
Error
o . 007280

o .008592

0.009414
0.009814

Squared
Canonical

Correlation
o .272060
D .140817
D .058652

0.018651

Standardized Canonical Coefficients for the Smoking

SMOKING 1 SMOKING2 SMOKING3

SMOKING4

-0.0430 1.0898 1.1161

-1.0092

SMOK2

1.1622 0 .6988 -1.4170

-1.3753 0.2081 0.015£

o . 1732

SMOK3

SMOK4

o .8909 -1 .6506 Q. 8325

SMOKl

1.6899
-0.2630

Standar4ize Canonical CoeffÜ:ients f-or the Psych and Physical State

274

STATE

CONCEN

1

0.4733

ST A TE2

ST A TE3

-0.8141
-0.4510

o . 4946
o . 5909

ANNOY

-0 .7,806

SLEEP
TENSE

o .2567

-0 . ~052
o . 3800

ALERT

0.6919
-0.1451

-0 . 1840

0.6981
-0.4190
-1.5191

IRRIT AB

-0 . 0704

O. 6255

~O . 3343

TIRED

0.3127

o .5898

CONTENT

o .3364

o . 4869

o . 2276
o . 8334

STA TE4
-0 . 1 t5'04

-0.7193
0.624'6
o . 4376

-0.7253
0.87£0
U .1861

-0.6557

Canonical Structure
Correlations Between the Smoking and Their Canonical Variables

SMOKINGl SMOKING2 SMOKING3 ~MOKING4
SMOK3

0.4458 0.5278 0.6615 0.2917
0.7305 0.3822 0.1487 0.5461
0.2910 0.2664 0.4668 0.7915

SMOK4

o . 6403 -0.0620 0 . 5586 0 .5236

SMOKl
SMOK2

Correlations Between the Psychological and Physical
State and Their Canonical Variables
STATE

1 STATE2 STATE3

5TA TE4

CONCEN

0.7199 -0.3579 0.0125

ANNOY

o .3035 0 . 1365 0 . 3906

-0.3137
-0.4058

SLEEP
TENSE

0.5995 -0.3490 0 .3709

o .2586

0.7015 0.3305 0.0053

..0.18'61

ALERT

o .7290 -0. 1539 -0. 1459

-0.3681

IRRIT AB

0.4585 0.3342 0.1211

-0 .0805
0.0749

TIRED
CONTENT

o . 6905 -0.0267 0 . 2544
o . 5323 0 . 4350 0 . 3207

-0.5601

275

Canonical Redundancy Analysis
Raw Variance of the Smoking
Explained by
Their Own
The Opposite

Canonical Variables

Canonical Variabl€s

Cumulati ve

Proportion

Proportion
o . 3068

2
3

o . 3068
O. 1249
o . 2474

4

0.3210

1. 0000

1

0.4316
o . 6790

Canonical
R-Squared Proportion
0.2721
o . 0835
0.1408
0.0176
o . 0587
0.0145
0.0187
o . 0060

Cumulati ve

Proport ion
'0 .0835

0.1'011

0.1155
0.1215

Raw Variance of the Psychological and Physical State

Explained by
The ir Own

The Opposite

Canonical Variables
Cumulati ve
1

2
3

4

Proportion

Proportion

o . 3705
O. 0879

0.3705
o .4583

0.0617
0.1032

Canonical Variable s
Cumulative

Canonical
R-Squared Proportion
o . 2721
0.1008
0.1408
0.0124

Proportion
o . 1008
O. 1132

0.5201

o . 0587

o . 0036

0.11'68

o . 6233

0.0187

0.0019

O. 1187

10.18 The canonical correlation analysis expressed in terms of standardized variables

follows. The Z(1).s are the paper characteristic varables, the :t2).S are the pulp
fiber characteristic variables.
Canonical correlations:
p; = .917, p; = .817, p; = .265, p; = .092

First three canonical variate pairs:
Ûi =-1.505z:1) -.212z~l) +1.998z~1) +.676z~l)
Vi =-.159z:2) +.633z~2) +.325z~2) +.818z~2)

Û 2 = -3.496z:l) -1.543z~1) + 1.076z~l) + 3.768z~\)
V2 = .689z:2) + i.oo3z~2) + .OO5Z~2) -1.562z~2)

Û3 = -5.702z:1) +3.525z~1) -4.714z~1) +7.153z~1)
V3 = -.513z:2) +.077Z~2) -i.663z~2) -.779z~2)

276

Additional correlations:
Ru,.zo) =(.935 .887 .977 .952), Ry,.Z(2) =(.817 .906 -.650 .940)
RU1.Z(2l =(.749 .831 -.596 .862), Ryi.zo) =(.858 .814 .896 .873)

Here H 0 : L12 (P12) = 0 is rejected at the 5 % level and H ¿I) : pt '# 0, P; = 0 is

rejected at the 5% leveL. H¿2): Pi. '# O,p; '# O,p; = P; = 0 is not rejected at the

5% leveL. The first two canonical correlations are significantly different from O.
The last two canonical correlations are not significant.

The first canonical variable Û 1 explains 88% of the total standardized varance of

it's set, the Z(1),s. The first canonical variable Vi explains 70% of the total
standardized variance of it's set, the z,2).S. The first canonical varates are good

summar measures of their respective sets of varables. Moreover, the first
canonical variates, which might be labeled a "paper characteristic index" and "a
pulp fiber strength-quality index", are highly correlated. There is a strong

association between an index of pulp fiber characteristics and an index of the
characteristics of paper made from them.
The second canonical variable Û 2 appears to be a contrast between the first two
variables, breaking length and elastic modulus, and the last two variables, stress
burst strength. However, the only moderately large (in absolute
at failure and
value) correlation between the canonical variate and it's component varables is
the correlation (-.428) between Û 2 and Z~I) , elastic modulus. The remaining

correlations are small. This canonical variable might be a "paper stretch" measure.
The canonical variable "2 appears to be determined by all variables except Z~2) ,

fine fiber fraction. This canonical variable might be a "fiber length/strength"
measure. The second pair of canonical variates is also highly correlated.

10.19 The correlation matrix R and the canonical analysis for the standardized varables

follows. The z,1), s are the running speed events (100m, 400m, long jump), the
z,2).S are the arm strength events (discus, javelin, shot put).

1.0

.7926J
.4682

.4682

1.0

.4179
Rl1 = .5520 1.0 .4706

R22 = .4179

.4706.6386J
1.0
(.6386
1.0 .5520
.4752J

RI2 = R;i = .2100 .2116 .2539
.3998 .1821
.3102
(.3509

.4953

( .7926
1.0

277

Canonical correlations:
p; = .540, p; = .212, p; = .014

Canonical variables:
Ûi = .540z~1) -.120z~1) +.633z~l)

Û2 = i.277z~l) -.768z~1) -.773z~1)

Vi = -.057z~2) + .043z~2) + 1.024z~2)

V2 = -.422z~2) -1.0685z~2) + .859z~2)

Û 3 = .399z~1) + .940z~1) - .866z~1)

V3 = 1.590z~2) - .384z~2) -1.038z~2)

Additional correlations:
RUI'Z(1) = (.662 .160 .732), Rv"Z(2J = (.772 .498 .999)

Here H 0 : L12 (P12) = 0 is rejected at the 5 % level and H cill : p; "* 0, p; = p; = 0 is

rejected at the 5 % leveL. H ci2) : p; "* 0, p; "* 0, p; = 0 is not rejected at the 5 %
leveL. The first and second canonical correlations are significant. The third
canonical correlation is not significant.

We might identify Ûi as a "running speed" measure since the 100m run and the
long jump receive the greatest weight in this canonical variate and also are each
highly correlated with Ûi' We might call Vi a "strength" or "ar strength"

measure since the shot put has a large coeffcient in this canonical variate and the
discuss, javelin and shot put are each highly correlated with Vi'

278

Chapter 11
given in (11-19) is

11.1 (a) The linear discriminant function

A (_ - )'8-1X =AI
a X

Y = Xl - X2 pooled

where
S~moo = ( _: -: i

so the the linear discriminant function is

((: i - (: iH -: -: 1 z=¡-2 ~=-2Xi

(b)

2 2

A =l(A
m
- Yl +A)
Y2 =l(AI
- a Xl +AI)'
a X2 =-8
Assign x~ to '11 if
Yo = (2 7)xo ~ rñ = -8

and assign Xo to '12 otherwise.

Since (-2 O)xo = -4 is greater than rñ = -8, assign x~ to population '11-

279

11.2 (a) '11 = Riding-mower owners; 1T2 = Nonowners

Here are some summary statistics for the data in Example 11.1:

Z¡ - (I:,,::: 1 '

Z2 - 1:::: 1

5, - ( ~:::::: -I::::: 1 '

82 = ( 200.705 -2.589 1
-2;589 4.464

8 pooled - ,

8-1
pooled =
.00637 .24475
( .00378 AJ06371

-7.204 4.273
_(.276.675
-7.204 i

The linear classification 'function for the data in Example 11.1 using (11-19)
is

.006371
r J'
x = L .100 .785 :.
20.267 17.633 .00637

( (109.475 i -( 87.400 i) i ( ,00378

where

1 1

.24475

ri = 2"(Yl + Y2) = 2"(â'xi + â'X2) = 24.719

280

(b) Assign an observation x to '11 if

0.100x¡ +0.785xi ~ 24.72

Otherwise, assign x to '12

Here are the observations and their classifications:

Owners
Observation a'xo
Classification
1
nonowner
23.44
2
owner
24.738
26.436
3
owner
25.478
4
owner
30.2261
5
owner
29.082
6
owner
27.616
7
owner
28.864
8
owner
9
25.600
owner
28.628
10
owner
25.370
11
owner
26.800
12
owner

Nonowners
Observation a/xo
Classification
1
owner
25.886
2
nonowner
24.608
3
nonowner
22.982
4
nonowner
23.334
owner
25.216
5
6
21. 736
nonowner
21.500
7
nonowner
24.044
8
nonowner
9
nonowner
20.614
10
nonowner
21.058
11
nonowner
19.090
20.918
12
nonowner

From this, we can construct the confusion matrix:

Predicted
Membership
'11 '12

Actual
membership

:~ j

11 1
2 10

Total
12
12

(c) The apparent error rate is 1~~i2 = 0.125
(d) The assumptions are that the observations from 7íi and 7í2 are from multi-

variate normal distributions;with equal covariance matrices, Li = L2 = .L.
11.3 l,Ne ned.t-o 'Shuw that the regiuns Ri and R2 that minimize the ECM are defid

281

by the values x for which the following inequalities hold:

Ri : fi(x) ;: (C(lj2)) (P2)
h(x) - c(211) Pi

R2 : fiex) ~ (cC112)) (P2)
h(x) c(211) Pi
Substituting the expressions for P(211) and p(ij2) into (11-5) gives

J R2 J Ri

ECM = c(211)Pi r fi(~)dx + c(li2)p2 r h(x)dx

And since n = Ri U R2,

1 =Jr h(x)dx
r h(x)dx '
Ri J +R2
and thus,

ECM = c(211)Pi (1 - k.i fi(x)dx) + c(112)p2 ~i h(x)rix

Since both of the integrals above are over the same. region, we have

ECM = r (c(112)p2h(x)dx - c(21

JRi

l)pifi

(x)ldx + c(2~1)Pi

The minimum is obtained when Ri is chosen to be the regon where the term in
brackets is less than or equal to O. So choose Ri so that

c(211)pifi( x) ;: c(112)pd2(:i )'Ur

2(~lr :-2:~r ~+~~r.282 h(æ) )0 h(x) ..3 and f2(x) = .(C(112) c(2j1)) Pi (P2) 11.8 and assigning x to '12 if fi(x) ~ (C(112))(!!) = .1 J = . fi(x) = 6. .2(-2(:1-:2) ~ ~+~l~ :1-:2~ :2 i -1 i ( ) i l.1 l l.1 l +-1 1 .8 (b) Since fi(x) = .5 h(x) ..(100) (.5 .5 f2(x) c(211) Pi 50.: 5 hex) .5.1 1.4 (8) The minimum ECM rule is given by assigning an observation :i to '11 if fi(æ) )0 (C(112)) .1 1+.1 ( ) = (:1-:2) t : -2 :'-:2 If :1+~2.(pi) = (100) (~) = .2) = .~ (~-~1)'t-1(~-~1) + ~ (~-~2)lt~1(:-~2) = 1 1 1 1 .c(211) Pi 50. -' and assign x to '11' 11. :i-~'t :+2:2+ :-~2+ ~2 1 i .

~ ~l(~i + !!2J = 1 ~I (~i ..5 0.~2'" ~l .'2 ~l .5 x -0.0 0.6 a) E(~'I~I7ii) -aa = .!:2) i r i (~i -!!) ~ 0 s .283 11.0 -0.2 -0.-1 ( .) Here are the densities: --x 1.~2) ~ 0 .II = ~ 1!:2 .5 1.m = ~l!:i .2 R_1 1/4 R_2 -1.0 0.0 1.5 x .2 0.~1) _ 1 ( ).6 0.0 0.0 -0.0 1.5 0..1 ~ lir 2) .!!2) = ~ (!:i . b) E ( ~.7 (a.m = l ~l (~2 .:!:l .6 ~ 0. nee r1 is positive definite.2 R_1 -1/3 R_2 -1..5 1.0 1. 11.

5.8 (al Here are the densities: i.h(x) h(x)..2.( 1 2 . P2 = . .) When Pi = .1 R2 : !i(x) hex) .8.1 R2 : h (x) ~ 1 These regions are given by Ri : -1 ~ x ~ .5. ci . ci -~ C' ci .hex) hex)~.4 Ri : fi(x) ..4 hex) .25 and R2 : .: P2 = . cii R_2 -1 -1/2 R_1 1/6 R_2 o 1 x (b) When Pi = P2 and c(112) = c(2Il). 11.. and c(112) = c(211). the clasification regions are R2 : fiex) h (x) ~ . the classification regions are R1. the classification regions are !i(x) Ri .:.25 ~ x ~ 1.284 (b) 'When Pi = P2 and c(112) = c(211).Pi These regions are given by Ri : -1 ~ x ~ -1/3 and R2 : -1/3 ~ x ~ 1. (c.

= l(2.~ = tt ~2 .~l ) so a'B .J~' a/La a1ta - whI.285 These regions are given by Ri : -1/2 =: x ~ 1/6 and R2 = -1.ua = !'((~1-~)(~1-~)' + (~2-~)(~2-~). ~ ll_l .u-_ ~2.ere + ) Thus ~ = 2' "_1~1 . 1/6 ~ x ~2.9 a'B .U_2) and 11_2 . .ua = a/La ! ~I (~1-~2)(~1-~2) I ~ ala .I .5 11.5 ~ x ~ -1/2.

. Yo = -. Assign x~ to '1i if -A9xi .53x2 + .. Otherwise assign Xo to '12. (ni + n2.25 ~ O.53x2 . Under Ho : l. .2j ((I~ + 112) l-::: -::: If L ~: I = 14. .28 ~ o.10 (a) Hotellng's two-sample T2-statistic is T2 ..28~ 11.nl+n2-p-l Since T2 = 14. T2".i = J.m = -..(:Vi .X2) .~~ F2.52 . .i = 1J2.2o(.52 ~ ~i~i~~-. Thus.53(1) = -.P - + 1 p.2)p F.44. ni n2 .53 and Yo .1) = 5.1 level of significance.2 at the Q' = 0.X2)' f (~i + n~) Spooled J -i (Xi .i (c) Here.(-3 .49Xi . m.25. assign Xo to '12. we reject the null hypothesis Ho : J. = -. For x~ = (0 1). (b) Fisher's linear discriminant function is Yo = â'xo = -. .

500 0.250 0.500 .250 .59.309 .159 0. 1i.11 Assuming equal prior probabiliti€s Pi = P2 = l.033 .033 . Using (11. and equal misclasification costs c(2Il) = c(112) ~ $10: c 9 10 11 12 !13 14 P(BlIA2) P(B2IAl) P(A2 and Bl) .309 0.011 .13 Assuming prior probabilties peAl) = 0.261 0.309 0.023 .079 0.500 . and misoassIÍca- tion costs c(2Il) = $5 and c(lj2) = $10. expecte cost = $5P~B2jAl)(.346 .88 1.59 1.154 0. expected cost = $5P(A1 and B2) + $15P(A2 and B1).154 0.033 0. the expected cost is minimized for c = 10.349 0.003 0.27 1.59 2.42 1.D03 .27.159 .006 0.011 0. c 9 10 11 12 13 14 P(BlIA2) P(B2/A1) P(A2 and Bl) 0.011 Expected cost 1.88 2.159 0.067 0. .079 0.023 .261 3.349 .61 1.49 2.188 0.188 0. 11.691 0.691 .5) ) the expected cost is cost is $1.500 0.079 .067 .188 .78 1.159 0.346 0.159 .154 .287 11.067 0.715.261 .~2 Assuming equal prior probabiltiesPi = P2 = l. and misclassificationcosts c(2Il) = $5 and c(112) = $10.2'5) + $15P(BIIA2)(.261 0.011 peAl and B2) .90 and the minimum expected cost is $1.250 P( error) Expected cost .75).033 0.250 0.309 .159 .023 0.25 and P(A2) = 0.81 .079 .48 3.188 .006 .154 .067 .5) .61 minimized for c = 12 and the minimum expected Using (11.023 P(AI and B2) P(error) 0.

023 0.017 0.017 0.80 and the minimum expected cost is $0.18 ~ m.178 0.3.077 0.309 0.93 0.14 1.159 0.050 0.067 0.006 0. classify Xo as '12.88.65 Using (11.040 0.77 aA 2* -.14 Using (11-21).12 -.127 0.142 0.5) .500 0.98 3. 11.288 c 9 10 11 12 13 14 P(Bl/A2) P(B2/A1) P(A2 and Bl) P(A1 and B2) P(error) 0.005 0. the expected cost is minimized for c = 9.12.691 0.10 Since â~xo = -0.006 0.248 0. ~1 and m.xo = -0.381 0.a~ --( 1.067 0.1.56 5.119 0.88 1.309 0.-and m*i = -0.125 0.375 Expected cost 0.500 0. classify Xo as 7i2' Using (11-22). These 'results are consistent with the classification obtained for the case of equal prior probabilties in Example 11.3.159 0. = -0.v'â'â -.231 0. 79 A* .61 1 ai -â -(.00 i Since â. These two clasification r.173 0. .14 ~ rñi = -0.159 0. = -0.023 0.eults should be identical to those of Example 11.

.1!:2)1. -1 ~2) .1 .)'r'(x-ii.15 fZ(~) l eT Pi defines the same region as PzJ. .~ (:-~1)' ~i 1 (:-~. +-1 .'12 ~ + ~1+1 . i=1..'2 t~ -+ ...).+-1) = . ) 1 ( ) ..-1!:2+2 .21(x-ii.'2 1n M _ 1 ( .2!:2'12~ 1 1(+-1 +-1) (.2~rl'1 : + ~1 "'1 ~1 1 +.(x) = _12 ln It.~2"'2 ~ .+-1 .2 so 1 n f1 (~) .2 ~ ~1 ..289 f1 (xl (C(lIZl P2J 11.('21n U iW i/ ) ..1 _.k where 1 k='21n (iii/) 1 i ~1 -1. 1 ( I t i I) + 2' ~-!:2 +2 (~-~Z) . For a multivariate 1n fi(~) -In f2(~) l rc(1IZ) 1n Le-pi normal distribution . iW + I'!!i+1 ~2i2 .+.. --1 1n f.22 ln 2rr ...-1 '+ -1 .i : "'1 : . n f 2 (:) = .~ .

Q= i~ -i1-:1+-1 1 (~i iT t-~1 1.17 Assuming equal prior probabilties and misclassification costs c~2Ii) = $W and c(1/2) = $73.. (.i .~2T2 ~2 .i lnl+il .!:21' 1+-1) l' ~1 +~2 .1.16 Q = In .-1 t. 1+. ti = h = t. ._X 1'2 ll_Z . 1'1 ~i ..-2 x +i -+2 X + X t II . .LZ ..290 11..e2) l (~1 +!:Z) 11.2' ~Z It-'()'( 1+-1 ' = ~ l ~1 .1-2~2 1-2 .~2 Xo + J.1 = .-' 't-1 ' J ii + ~.112:E2 :to 1 -~l (IEil) _ ~( i~-l i -1 ) 2 n 1~21 2 1-1 i 1-1 .1 ) i +-.k where When k =12'(1n (I t ii ) 1.. (~-~2) 1 . fi(x) = .2' l:i ..89. Q-__ i ("(-i "(-i) (i "(-i i -i) 2 Xo LJi .i(:-~l) 'ti1 (~-~1) (f ¡(X)J 1 l' -1 + '2 In!t21 + 2'(~-~2) t. In the table below .i ~i .

30483 0. (22.00000 0. (20.00 -0. P('1ilx) P ('12 I Q x) 15)' 1. (26.00000 0.87 1l2 5. (c) and (d). (b).00000 0. (18.00000 Clasification 18. (14.21947 0.56 '1i '1i '11 ii2 '12 '1i '1i '11 '1i The quadratic discriminator was used to classify the observations in the above table.69517 0.99678 291' 1.01 37.99991 0.36731 0. (12. 00000 331' 1.291 x (10.89) = 2.46 24.54 9.54 -1.0 L c(211) Pi 10 Otherwise. (28. see the following plot.74 13.36 3. 00000 35)' 1. An observation x is classified as '11 íf Q ~ In r(C(112)) (P2)J = In (73.00322 0.95254 0.63269 0. classify x as '12.38 53.78053 0. 00000 0 17)' 19)' 21)' 23)' 25)' 27)' 0.04745 0. 50 40 30 0 0 0 0 0 C\ x' 20 10 0 o 20 10 )L1 30 .00009 0. (30. For (a). (24. 00000 31)' 1.00000 0.27 '1i 0. (16.

50 Classification 112 7(1 7í2 . a"'i Xo . 11.= 4. -m '1i Observation '12 Classification Observation '11 1 2 2.5 Assign x~ to '1i if -lxi + ~X2 .19 (a) The calculated values agree with those in Example 11.50 0.50 -2.7.5 ~ 0 Otherwise assign x~ to '12.ector of .4.17 '12 2 3 1 a-I Xo -. (b) Fisher's linear discriminant function is A AI 1 2 3 3 Yo = a Xo = --Xl + -X2 where 17 10 27 3 3 6 Yl = -.-1B since t-l t-l 1 B: = t c(~1-~2)(~1-~2)IC+. rñ = .. Y2 = -. m -1.292 11.83 '1i 3 -0.18 The vector: is an (unsealed) eigen'l.83 0..(~1-~2) = c2t-l (~1-~2) (~1-~2) i t-1 (~1-~2) = A t-1 (~1-~2) = A : where A = e2 (~1-~2) 't-l (!:1-~2) .

.ÎJI(x) i ÎJ~ (x) '12 Classification Obs.Xa. where D. ÎJ~ (x) ÎJH x ) Clasification 3 3 '1i 1 313 i3 7f2 2 i J! '1i 2 l3 i3 7fi 3 4 3 3 3 '12 3 19 3 4 1 3 21 3 3 7f2 The classification results are identical to those obtained in (b) 11.7.7. 11.23 (a) Here ar the normal probabiHty plot'S for each of the vaables Xi.XS . X4.20 The result obtained from this matrix identity is identical to the result of Example 11.Xi) '11 Obs.(x) = (x .X'2.oied(X .xd8~.293 The results from this table verify the confusion matrix given in Example 11. (c) This is the table of squared distances ÎJt( x) for the observations.

294 -2 .1 o 2 .

. and Xs appear to be nonnormaL. In.O.(x3 + 1).25xs where ri = -23.2 -1 0 2 0 .23 .~a_~ 300 ~x 0 ~ 00/ 0 ocP 250 200 0 .ooO 0 0 .034x2 + O.2 -1 0 2 0 80 60 II x 40 20 ..08X4 .III.023xi .i.ID. (b) Using the original data.0.295 -2 -1 0 2 . the linear discriminant function is: y = â' x = 0. The transformations In\xi) and In(xs + 1) appear to slightly improve normality.. . xa.0.2 -1 0 1 2 Standard Normal Quantiles Variables Xi.2lx3 .

2lx3 . X2).0.'1i '12 Actual membership .08X4 .tXi.rñ = 0.~ j 66 3 7 22 Total t ~~ APER= 6~~~9 = . we allocate Xo to Í1i (NMS group) if âxo .102 This is the holdout confusion matrix: Predicted Membership .~ j 64 5 8 21 Total t ~: Ê(AER) = 6~~~9 = .: 0 Otherwise. and ~Xl' X4): .023xi . allocate Xo to '12 (MS group). X3).133 11.0.23 .0. ( c) Confusion matrix: Predicted Membership '1i '12 Actual membership .296 Thus.034x2 + 0.24 (a) Here are the scatterplots for the pairs of observations (xi.25xs + 23.

297 0 0.0 + 0.4 -0.0.2 + + ++ + + 0 0 0 + + + Ll \ + 0. + 0++ 0.8 -a + )( 0 0.4 0.4 0.2 + q.0 C\ + Q.6 x1 The data in the above plot appear to form fairly ellptical shapes.4 -0.2 0.0 0. ++++ +it+ + + ce 0 )( -0.6 + + 0 o Cò + 0 0 0 0 -0.4 + 000 oOi 8(3 ~ 0 1 0 + + -0.6 0.1 + 0 bankrupt nonbankrupt 0 0 0 0 -0. so bivaate norma1ìty -does not seem like an unreasonable asumption.3 +lt o 0 + . .6 -0.6 -0.4 0.2 0.2 -0. ++ + )( + + ++ ++ 2 0+ 0 0 -0.6 -0.2 0.2 óJ 0 0 0 0.6 0.4 0.4 0.2 -0.1 + ++* +: 0.0 + 5 + C" + + 4 3 ++0+.

32 Thus.67xi .0551 ( 0. For (Xi.Oæ37 J (c).00231 lO'M735 0.01077 ( 0.298 (b) '11 = bankrupt firms. (Nonbankrupt group).02847 0.02092 ( 0.04594 Fisher's linear discriminant function is y = â'x = -4.5.01751 0.01751 J 8 pooled = 0.X2): Xi - 8i - -0.5.196.0819 i ' ( -0.02847 J 82 - 0.32 ~ 0 Otherwise. (e) See the tables of part (g) (f) 0. '12 = nonbankrupt firms.rñ = -4.l2x2 where rñ = -.12x2 + . .00837 0.0688 X2 - 0. allocate Xo to '12 APER= :6 = . we allocate Xo to '1i (Bankrupt group) if âxo . (d).2354 i' 0.67xi .0442 0.

The classification rule for any of thee functions is to classify a new observation into 1ii (bankrupt firms) if the quadratic function is ~ 0. both with Pi = 0.(Xb X2). Fisher's linear discriminant function For the various classification rules and error rates for these variable pairs.~Xb X3). This is the table of quadratic functions for the variable pairs . and (Xb xs).299 Since 8i and 82 look quite different.05. and to classify the new observation into .5 and Pi = 0. see the following tables.

APER Variables (Xi.71 Here is a table of the APER and Ê(AER) for the various variable pairs and prior probabilties.95.14 6.13 0.08x3 .9ûxa Pi = 0.S9xiXa . xa) (Xi.64xi . For unequal priors.5 (Xl.39 o .17 3.17 0. Xa) Pi = 0. X4) Pi = 0.60X2 Pi = 0.22 0.05 -0.05 Pi = 0.39 0. . it appears that the (Xl.23 0.05 + - 0.5 Pi = 0.26 0.37 0.46xf.05 0.26 0. + 7. Variables Prior Quadratic function -61.69xi + 7.3.10.5 (xi.08 2. Pi = 0.05xi .05 and P2 = 0. the variable pair (xi.11 0.43x¡ .05 -i.X2) Pi = 0.11 3.11x4 Pi = 0. as it has the lowest APER. X4) Ê(APR) Pi = 0.84xiX2 + 407.4t) For equal priors.77xi + 35.10. Xa) vaiable pair is the best clasifer.75xiX4 + 8.55x~ + 3.22 0.s.5 Pi = 0.8.20 0.5 (Xi.20x~ + . X2) (Xi.30. X2) has the lowet APER. Notice in the table below that only the constant term changes when the prior probabilties ~hange.300 '12 (nonbankrupt firms) otherwise.

657 -2.65 14.03177 0.64 .24 - 2.03300 0.0258D () .02847 0.11 For Pi = D.03177 0.07543 0.02847 0.91 14.4264 -0.301 (h) When using all four variables (Xb X2l X3.00431 -0.2354 - 0.20.00873 D.69 - 5. X4).n5 : APER = :6 = .05 For Pi = 0.80 11.04441 0.748 1. 0.00431 0.00362 0.0003l 2.42 -20.6 = .0330u 1.493 -28.02092 0.00031 0.00873 1 :04596 0.07543 -u.3675 0.1'6455 0. 82 - Assign a new observation Xo to '1i if its quadratic function .0551 .623 11.02580 0.6 = .07.623 Pi = 0.04424 0.4368 0.050 -52.434 11.050 -52.412 -3.5 4.u023l 0. Ê(AER) = ¡~= .02618 0.493 1.03428 0.given below is less than 0: Prior Quadratic function -49.4337 8. - 8i X2 iJ.0819 - .04735 0.5 : APER = .03428 0.232 -20.00362 0.00662 0.00662 0.00837 0.5939 0. Ê(AER) = .336 -2.412 x'0 xo+ Xo Pi = 0.00837 0.0688 Xi -0.657 526.974 -11.

along with the discriminant line.1. Xg) coordinate system.25 (a) Fisher's linear discriminant function is Yo = a' Xo .: 0 Otherwise classify Xo to '12 (nonbankrupt firms). . This is the scatterplot of the data in the (xi.4 -0.80xi .rñ = -4.48xg + 3.6 -0.2 0. The APER is 2:l4 = .0 0.6 x1 (b) With data point 16 for the bankrupt firms delete.13.4 0.302 11.rñ . Fisher's linear discrimit .2 0.33 Classify Xo to '1i (bankrupt firms) if a' Xo . 5 4 C' 3 x 2 1 -0.

089.93xi .i. Fisher's linear discriminant function is given by Yo = a'xo ..O .3 = . X3). This is the scatterplot of the observations in the (Xl.11. The APER is 1.46x3 + 3.36 Classify Xo to '1i (bankrupt firms) if a/:.1.m . coordinate system with the discriminant lines for the three linear discriminant functions given abov.4 = .m = -5.e. With data point 13 for the nonbankrupt firms deleted.97x3 + 4. 2: 0 Otherwise classify Xo to '12 (nonbankrupt firms).303 function is given by Yo = a'a.31 Classify Xo to'1i (bankrupt firms) if a'xo . Als laheUed are observation 16 for bankrupt firms and obrvtion .o .: 0 Otherwise classify Xo to '12 (nonbankrupt firms).35xi . The APER is 1.m = -4.m..

304 13 for nonbankrupt firms.5492 T for HO: 0.307221 o .26 (a) The least squares regression results for the X.) ITI -0.081412 0.0001 Here are the dot diagrams of the fitted values for the bankrupt fims and for the nonbankrupt firms: .05956685 -0. 11. Z data are: Parameter Estimates Variable DF INTERCEP 1 X3 1 Parameter Estimate Standard Error Paramet-er=O Prob .604 5. 13488497 o .158 o . It appears that deleting these observations has changed the line signficantly.

60 0 . .53122 0.90 1....---... X3. 20 1.... . Thus..--Banupt .. . 30089 misclassify misclassify misclassify misclassify misclassify misclassify The confusion matrix is: Predicted Membership '11 '12 Actual membership '11 =1 19 2 '12 J 4 21 Total t . the APER is 2:t4 = .50 fitted values: This table summarizes the classification results using the OBS GROUP 13 16 31 banrupt banrupt nonbankr 34 38 41 nonbanr nonbanr nonbanr FITTED CLASSIFICATION --------------------------------------------o ..48329 o ... +~--------+---------+---------+---------+---------+----. . 57896 0.13..N onbanrupt o . . . X2.305 . . 06025 o . . 30 O.. X4 are: .--+---------+---------+----. +---------+---------+--.. 00 0 .47076 O. (b) The least squares regression results using all four variables Xi.

208915 Xl 1 o .214 -0 .268 3.r HO: 0.35 0 .335 1.32336357 T fo. 156317 0.306 Parameter Estimates Standard Error Parameter=O Pr.21845 The confusion matrix is: .55719 0. 3508 Here are the dot diagrams of the fitted values for the bankrupt firms a:nd for the nonbankrupt firms: _+_________+_________+_________+_________+_________+__---Banrupt _+_________+_________+_________+_________+_________+__---N onbankrupt -0.05 1.18615284 0.46653100 X2 X3 X4 1 1.7393 Variable DF Parameter Estimate INTERCEP 1 0.122 0.ob .40 This table summarizes the classification results using the fitted values: OBS GROUP FITTD CLASSIFICATION o .07030479 0. 00 0 .35 0 .2119 o . 149093 o . 72676 misclassify misclassify misclassify misclassify ---------------------------------------------15 16 20 banrupt banrupt banrupt 34 nonbanr 0. 2ô83 o . 0026 O.944 O. 90606395 1 o .305175 0. 70 1 .) ITI 1. 62997 o . 225972 1 -0.

+ 0 16 + 0.~ :~ j Thus.5 Fitted Values 11.5 0. with points 16 of the bankrupt firms and 13 of the nonbankrupt firms labelled.5 +.27 ~a) Plot'Üfthe4ata in the (Xi. Here is a scatterplot of the residuals against the fitted values.0 bankrupt nonbankrupt "'~ ° Cò + 0 c: ~ -0.0 1. + .5 . It appears that point 16 of the bankrupt firms is an outlier.~ 0 In -e :i "C 'ëi (I 0.X4) variabte space: 1. the APER is 3::1 = . ".0 . + °eo + ° ° 13 0 0.087.307 Predicted Membership Total '1i '12 Actual membership 18 3 1 24 F .

:.5 3. :. This indicates that the observations from '1i may have a different covariance matrix from the observations from '12 and '13.++++ :. :.++++++ + + + + + + + + + +++ ++ o X o Setosa Versiclor Virginic o 000 o 00 0 0 00 0000000000 0.0001 . :. + + :.'§ -i: 1. :. :. :2 :.0 3. :.145 Num DF Den DF Pr)J F 8 288 Q. :. :.¡'s is different from the of significance: Statistic WiJ.5 :.5 4.~2343B63 199. + + + + .3 vel'US others at the a = 0.0 :. :. However.5 o o 2. :. :. The points from all three groups appear to form an ellptical it appears that the points of '11 (Iris setosa) form an ellpæ with a different orientation than those of '12 (Iris versicolor) and 113 (Iri virginica). :.0 X2 (sepal width) 'Shape. 2. :. :.308 2.05 level Hi : at least one of the ¡. ~ . :. :.x'S' Lambda Value F o .0 + :. (b) Here are the results of a test of the null hypothesis Ho : Pi = 1L2 = ¡. :. :. :. . :.0 + ~ 00 0 0 2. :.5 Ii - + Jo + Q) 'V 1. :. :. :. :. :.

30X4 .44.13.6 '13 15A9X2 + 3'6.043 cf(xo) = -1.87x~ + 24.22.Xi) '11 -3.04 To classify the observation x~ .77 AQ d2 (xo) = 0.58x4 .38x4 .22.3lx2 + 1£.9.37.09x~ + 19..36.23 So classify Xo to '12 (Iris versicolor).54x2X4 .92 .26x4 .ß3x4 .92x2 + 12. 76x~ + 8.71x2 + 2.28x4 -59.05 level of significance.32x~ + 22.37.68X2 + 6. 36.Xi)' Sii(x .78 57.28.59.5 1.3.60x4 + 23. the plots give us reason to doubt the assumption of equal covariance matrices for the three groups. AQ di (xo) = -103.67 '12 -9. compute Jftxo) for i = 1.12 '12 i9.l~Spooledæi J dïÍ2. (c) '11= Iris setosa. As discussed earlier.94x2 + 7.57x2x4 .et.l(x . '12 = Iris versicolor '13 = Iris virginica The quadratic discriminant scores d~(x) given by (11-47) with Pi = P2 = P3 = l are: population ~(x) = _1 In ISil.00 .O) 1li .53 '13 -6..02x2 .16x2x4 . (d) The linear discriminant scores di(x) are: population I di(x) = ~SpooledX .2.75).309 Thus.g.47.. the null hypothesis Ho : J11 = J12 = J13 is æjected at the Q = 0. and classify Xo to the population for which ~(xo) is the la¡.73 '58.

1.~ 2.. :..: 0 for all i =l 2. construct dki(X) = dk(x) . ..di(æ) for all i "It.3: i 1 2 3 0 -30. . R2..5 0 0 000 00 0 0000000000 00 0 0 2. :.310 Since d¡(xo) is the largest for í = 2.. we allocate Xo to '12.. .0 :. :.80 0 -29.2.5 :. :.. . with the classification regions Ri.75) to'1i according to (11-52).i- U . :.94 -0. :. :.74 30. . :.5 eØ - + + + ã5 Co '" X 1..++++++ + + + + + + + + + + + + :: 0. . :. :. . .. .3. ..0 2. Then classify x to'1k if dki(X) . ..i ..2. :.0 :.0 0 3. and Rg delineated. :.: 0 for all i = 1.0 0 .5 1.94 0 1 J 2 i Since dki(XO) ... k = 1.. .i++++ . . :.5 X2 ~sepal width) 0 0 0 4. The results are the same for (c) and (d). :. using (11-52) Here is the scatterplot of the data in the (X2' X4) variable space.5 3. . Here is a table of dn(%o) for i. -..80 0. we classify the new observation x~ = i3. . ..74 29. 2. (e) To use rule (11-56). . :.

36.90 log 1' ..5 0 1... 10gY2) variable space: 0 2.5 -~ - Cl . these are the linear discriminant -scores dit x) for i = 1. .. + + t +.... However. labelled with a "*".8 1. from the observations from '12 and '13.0 0 0 0 0 0 00 0 0 0 o 0 0 0 * +++. 3.80 IQg Y2 ..ND CLASSIFIC.6 .. + .83 7r3 79.'l~~ ..97 log 1' ..04 11. +.')l 0.033. there also appears to be an outlier. (b).4. In '1i. Ê(AER) = itg = . 26. .SpooledZi '11 . ++:t++~ +.311 CHAPTER 11. 0.28 (a) This is the plot of the data in the (lOgYi.....TIOH 36 (f) The APER = ii~ = . 2.94 log Yi + 10. (c) Assuming equal covariance matri.10 log Yí + 13.30 ..t + + :P + J: +.81 7r2 75. For both variables log Yi...t..0 log(Y1 ) The points of all three groups appear to follow roughly an eliipse-like pattern.31...0 0 0 !b 1. V + + t ..~ . the orientation of the ellpse appears to be different for the observations from '11 (Iris setósa).37.SpooledX .. .4 0.ces and ivariate normal populations.Q o Setosa + Versiclor ~ Virginic 0 0 Oll 00 0 0 OCDai 0 0 0 0 00 0 0 o 0 2. DISCRIMINATION A. and log 1': population J df(X) = ä.82 log Yi + 28..lä.

93 '13 For variable 10gY2 only: population ¿¡(x) = ~SpooiedX .84 log Yi .7... Therefore.23 i50 . much.44 '13 16.312 For variable log Yi only: population ¿¡(x) = ~SpooledX .90 log Yi . (d) Given the bivarate normal-like scatter and the relatively large samples.(c) to differ.31.23 34 .30 85.læ~Spooleåæi '1i 40. .28.l~Spooledæi '11 30.93 log Y2 .17 27 . log Y2 + 8.33 150 .18 150 .12.73 '12 19. It is not as good at discriminating 7í2 from 1i3. 49 .33 iš -.87 Variables log Yl.. log Y2 log Yl log Y2.. i50 . 150 .52 log Y2 .54 APER E(AER) 26 . Using "shape" is effective in discriminating'1i (iris versicolor) from '12 and '13. shape is not an effective discriminator of all three species of iris. 49 . we do not expect the error rates in pars (b) and. The preceeding misclassification rates are not nearly as good as those in Ex:- ample 11.11.20 log Yi .33.. because of the overlap of '11 and '12 in both shape variables.82 '12 81. 34 .

009 ( 5.B- w-i _ 1518. A i a2 -0.3 Allocate x~ to '1k if EJ=i(âj(x . use (11-67) and compute EJ=i(âj(x .000193 0. Xi.11. X.000003 ( 0.646.11 (b) 1518. k L~_l(â'.(X .Xk)J2 1 2. and Spooled agree with the results for these quantities given in Example 11.99 3 2. classify Xo to '13 This result agrees with thedasifiation given in Example 11.74 J .009 J ).313 11.Xi))2 i = 1.191.74 258471.000193) _ ( 12.12 .i - 5.o.348899 0.63 2 16.Xk))2 ::E.=i (âj(æ .2. Any time there are three populations with only two discrim- .Xi))2 for alli i= Ie For :.21 497).'50 The eigenvalues and scaled eigenvectors of W-l Bare ). A' ai 0. X3.014 ( 0'2071 To classify x~ = (3.43 Thus.2 - 0.29 (a) The calculated values of Xl.

d3\~ where di(x) is given in the following table for i = 1.).39x4 .071 The holdout confusion matrix is: Predicted Membership '1i '12 '13 Total me~~~~hiP :: J ~ I ~ 1 :5 ( ~ E(AER)= 2+5~+3 = .11.93x4 + 1. and '13. 11.314 inants.70xi + 0.32x2 .14xs . '12.08xs . population di(x) = ~SpooledX .l6x3 + 5.35.2.0.12.30 (a) Assuming normality and equal covariance matrices for the three populations '1i.64xi + 0.78x3 + 8.20X2 . d2 \ æ ).78 '12 1.52x3 + 6.23.44.2.85xi + 0.61 (b) Confusion matrix is: Predicted Membership '1i '12 '13 Actual '11 membership '12 7 7í3 0 1 0 10 3 0 0 35 Total 7 11 38 And the APER O+5~+3 = .0.l3.lX~SpooledXi '11 0. the minimum TPM rule is given by: Allocate xto '1k if the linear discriminant score dk (x) = the largest of di (:.125 .58x2 .3. classification results using Fisher's Discriminants wil be identical to those using the sample distance method of Example 11.20 '13 2.33x4 .44xs .

ela- tions are smalL. The error rates (APER. appears to improve the normality of the data but the classification rule from these data has slightly higher error rates than the rule derived from the original data. y'. log X2. 00 500 0 0 :: cø 0 Q) 0 0 450 Q) i: "t 0 00 0 400 0 0 0 0 c6 0 0 0 0 0 00 õJ° 0 0 0000 000 °õJ + C\ + 0+ 0 + Gl X +0 0 + + + + 0+ + ~it + + a Alaskan + Canadian 60 80 t 120 + + + +++ + + + + 100 + ++ . Xl.14 are also slightly higher than those for the original data. the corr..equal -covariance matrioes does not seem unreasnable. .315 (c) One choice of transformations..31 (a) The data look fairly normaL. Thus the assumption of bivariate normal distributions with . 11.¡+ + + + 't + 1+ 350 300 + + + + + 140 160 180 X1 (Freshwater) Although the covariances have different signs for the two groups. log X4. Ê(AER)) for the linear discriminants in Example 11.

Q 12..0 4. . ".54 Classify an observation Xo to'1i (Alaskan salmon) if â'xo-m .0 -4. . . ... .. ...052x2 . .07 and E(AER)= ~t~ = ...5.Alaskan . Dot diagrams of the discriminant scores: .. . as APER= ~t~ = . .. -------+---------+---------+---------+---------+--------.. . ".07 ( c) Here are the bivariate plots of the data for male and female salmon separately. .13xi + 0..0 0. .... ..: 0 and clasify Xo to '12 (Canadian salmon) otherwise.rñ = -0.0 8.. ... . . -------+---------+---------+---------+---------+---------Canadian -8. ....316 (b) The linear discriminant function is â'x ... I. .0 It does appear that growth ring diameters separate the two groups reasonably well. ..

m= -0.31884 ( 181.+ + + + + + + + + 300 140 160 180 X1 (Freshwater) For the male salmon.~"E~" % 500 0 0 - 45 0 0 CI i: . + + + .97101 -ì97.8. Si -197.71015 1702.1667 ( 100.17210 141:643121 X2 The linear discriminant function for the male 'Salmon only Is â'x .64312 760.0 80 160 180 140 i mae ¡~:~:%~~.317 eo 100 12.12 Classify an observation Xo to 1ii (Alaskan salmon) if â'xo-m.71015 J ( ::::::: l' S2 141.erwIse.¡: 0 Cb 0 ct e o 0 0 0 0 o 0 400 C\ o 0 X 350 o il 0 0 o 0 o + o 000+ o + + + ++ +o + + ++++ + 0+ o 00 o + óJ 000 + o o ++ + ++ o+0 + + o + ++ + :.D56x2 .3333 i. xi 436. these are some summary statistics .12xi + 0.65036 ( 370..: 0 and clasify :c to '12 (Canadian -salmon) oth.

0 = .06 and E(AER)= 3.05X2 .2. . APER= 3i.w.21846 120.08 and E(AER)= 3:ä2 = . APER= 3tal = . these are some summary statistics Z¡ .318 Using this classification rule.66 Classify an observation xo to'1i (Alaskan salmon) if â'xo-m ~ 0 and classify xo to '12 (Canadian salmon) otherwise. It is unlikely that gender is a useful discriminatory varable. as splitting the data into female and male salmon did not improve the classification results greatly.06.13xi + O.72ûOO ( 289.rñ = -O. -210.91539 ( 336. For the female salmon.23231 1097. Using this classification rule.(:::::::: J' S2 120.0 = ..64000 1038.64000 J The linear discriminant function for the female salmon only is â' X .(4::::::: J' s.23231 i Z2 .33385 -210.

319 the data for the two groups: 11.2 + + 0 + + + 0 0 0 0 0 0 0 0 e + +0 + -0.0 and classify Xo to '12 (Obligatory carriers) otherwise.4 -0. (b) Assuming equal prior probabilties.0 + + C\ + X + + ++ + + + + ++ +0 0 +0 + ã' + %+ooo' + + ll 0 + ++õ + -0. the sample linear discriminant function is â'x .ri = i9.32 (a) Here is the bivarate plot of + + 0.: .l2x2 + 3.6 -0. airrier o -0.rñ . The holdout confusion matrix is . the bivariate normal assumption appears to be a reasonable one.2 + + + ++ ++ + + 0.0 X1 Because the points for both groups form fairly ellptical shapes. Normal -score plot-s fDr each group confirm this.32xi .4 o Noncrrer + Ob.l7.56 Classify an observation Xo to '1i (Noncarriers) if â'xo .2 0.

320 Predicted Membership '1i '12 Actual membership '11 j '12 26 4 8 37 Total t ~~ Ê(AER)= 4is8 = .17 '1i 3.112 2 3 -0.064 -0.rñ Classification 6.0'59 -0. noncarriers is i.rñ :.279 -0. The hold.019 4 5 6 7 8 9 10 0.123 -0.27 '11 3.63 lii 3.094 -0.090 -0.043 -0. .12x2 + 4.ut confusion matrix is .45 '11 (d) Assuming that the prior probabilty of obligatory carriers is ~ and that of .66 Classify an observation Xo to lii (Noncarriers) if â':.012 -0.o .050 -0.17.59 111 3.32xi .011 -0.126 â' x .62 lii 4.143 -0.052 -0.rñ = 19.04 7íi 1.16 (c) The classification results for the 10 new cases using the discriminant function in part (b): Case Xl X2 1 -0.068 0. the sample linear discriminant function is â':.68 '1i 3.210 -0.098 -0.98 '11 1. 0 and classify ::o to '12 (Obligatory carriers) otherwise.58 '1i 4.113 -0.037 -0.

27 4. here are Fisher's linear discriminants di d2 cÎi - -3737 + l26.0:03xg . (a) For '11 = Angus.0.0.08x3 .03xg -3881 + l28.88X3 .37 4.206.012 -0. X6 = Frame.08 2.052 -0.65x5 .08x5 .0. X7 = BkFat.090 -0.050 -0.68 5.ri -0.113 -0. X4 = FtFrBody.50X7 + 29.78 4. '12 = Hereford.019 7.068 0.48x4 + 19.011 -0.210 -0.69 4.143 -0.321 Predicted Membership '11 '12 Actual membership :: j ~~ I 2°7 Total t ~~ Ê(AER)= l~tO = 0.03xg -3686 + l27.037 -0.279 -0.55 Classification 7ri '1i '11 '11 7ri '1i '11 '1i '11 '11 11.84x7 + 28.098 -0.123 -0.205.0. Xa = SaleHt.80xa .24 The classification results for the 10 new cases using the discriminant function in part (b): Case 1 2 3 4 5 6 7 8 9 10 Xi X2 â'x .18x6 +265.112 -0.064 -0.206.48X4 + 19.72 5.70x3 .36x6 +245.126 -0.59xs .47X4 + l8.73 5.33x7 + 26.043 -0. and '13 = Simental.059 0.l5xa .33 Let X3 = YrHgt.22x6 +275.47xa .094 -0.0. and Xg = SaleWt.14 2.

2 :.20 .29 . 1525J we obtain di = 3596. XS. :.39 . + + % 0 eO + + ~ + 0 2 Angus Hereford Simental 4 y1-hat (b) Here is the APER and Ê(AER) for different subsets of the variable: Subset I APER Ê(AER) X3.17.13.25 . X7. ?:. X4.34 For . + -2 0 :. X7.thes . 00 0 + + 0 0+ .eaual costs. :. d2 = 3593.32. Hereford. so assign the new observation to '12. .22 .Xa X7. This is the plot of the discriminant scores in the two-dimensional discriminant space: 2 0 0 0 0 0 ~ 0 0 . and d3 = 3594. Xa X4.28 .X7 X4.XS X4.XS 11.54.36 .1000.32 '11 = General Mils.32 .322 When x~ = (50.XS XS. ~ ~ 0 :.7.24 . :.43 . and '13 = Quaker and assuming multivariate flmai data with a 'Cmmon covariance matdx. and equal pri.~ i.25 . Xs. "b + + ~ ~ 0 + + :.rct i C\ .73.22 .:.14 .13 .36 . :.46 . Xa XS.X7 Xs.21 .p + -1 0 + 0 + 00 :. X7. :. '12 = Kellogg. X6. Xai Xg X4. .: 0 8000 ~ :.31. :.

64x4 .0.: + lU i 0 .O.43x7 1.: -2 + + -3 -4 + .20X7 2.1. but also have low protein and carbohydrates. Mils + .02X69..69xs .02x65. and low fat.3.12xio . However.53x7 - 1.323 are Fisher's linear discriminant functions: di d2 d3 .32x3 + 4.14 .13xio The Kellogg cereals appear to have high protein.Olx65.36xg . fiber.: .07 . Here is a plot of the cereal data in two-dimension discriminant space: 2 0 o ar 0 1 .62xs .: .43.: -2 0 y1-hat 0 Gen..20xio .90XB + 1.: .l5x4 .23x3 + 3.t + + .79x4 . they also have high sugar.20xs .33.29x3 + 2.ü.50xg .22xB + . and carbohydrates.1.65xg ..c 0 C\ .: auar Kellog 2 . The Quaker cereals appear to have low sugar.07xB + 1.: 0 0 0 0 0 + 0 00+ + + + ~ + + o~++ -1 .

924 = .-. . ..429 0. . . .. _ _." ..924 .35 (a) Scatter plot of tail length and snout to vent length follows. .. . ~ ~ . . . ..d:-'.919 N = 66 N Correct = 61 E(AER) = 1 . ..--....931 Proportion 0.-.... ~ . Sêatterplotof SntoVnLength. .. . .-' _ _.6% Proportion Correct 0. il . .076 ~ 7......vs Ta..a ..310 TailLength Constant Male -41.."--" . ~ .039 SntoVnLength 0. ...d.501 0. ...324 11.163 -0.. . . ~ ~ ~ ~ ~ . .-.:':___::. 140160 180 liàjlLength OD) Linear Discriminant Function for Groups Female -36....----..... . . .046 sumary of Classification with Cross-validation Put into Group Female Male Total N N correct True Group Male Female 34 3 37 34 i 27 29 27 0. . .. . ..d. ..... It appears as if these variables wil effectively discriminate gender but wil be less successful in discriminating the age of the snakes. .-----.

7% (d) Linear Discriminant Function for Groups 2 3 4 -141. 94 SntoVnLength 0. a18 E(AER) = 1-. .33 SntoVnLength 0.808 0.14 0.60 0.824 0.2% Using only snout to vent length to discriminate the ages of the snakes is about as effective as using both tail length and snout to vent length.65 0.833= .76 0.808 19 23 19 0.167 ~ 16.41 Constant -79.53 Tai lLength Constant sumary of Classification with Cross-validation True Group Put into Group 2 3 4 Total N N correct proportion N = 66 2 3 13 4 2 4 0 2 21 0 3 17 13 26 21 21 23 21 0. there is a reasonably high proportion of misclassifications.913 Proportion Correct N Correct = 55 0.11 0.38 0.765 0.833 E(AER)= 1-.48 sumry of Classification with Cross-validation Put into Group 2 3 4 Total N N correct Proportion N = 66 True Group 2 3 4 14 1 3 21 0 4 0 4 17 14 26 21 0.818 = .182 ~ 18.45 0.826 N Correct = 54 Proportion Correct o.325 (e) Linear Discriminant Function for Groups 4 3 2 -112.36 -102.76 -193. Although in both cases.44 -145.

34 Ratio Lower Upper 1.0145240 0.000 The regression is significant (p-value = 0. Predicted 1 2 Total 1 46 4 50 Actual 2 3 47 50 7 APER = . otherwise assign 1.95 1.8.52 0.000) and retaining the constant term the fitted function is In( p~z) ) = 3.326 11. P-Value = 0. ~ 0 . 06 0. OF = 2.0358536 0.31500 0.000 0.= .049(marinegrowth) 1.841. The confusion matrix follows.93 1.92484 0.126(freshwater growth)-.925+. .p(z) Consequently: Assign z to population 2 (Canadian) if in( p~z) ).22 0.126051 -0.p(z) z to population 1 (Alaskan).0485441 6.394 Test that all slopes are zero: G = 99.98 Log-Likelihood = -19.001 -3.36 Logistic Regression Table 95% CI Odds Predictor Constant Freshwater Marine Coef SE Coef Z P 3.534 0.07 ~ 7% This is the same APER produced by the linear 100 classification function in Example 11.62 3.13 0.

6 C-F o C-N .4 Y.2 .1 Democrat Y~s 1 -+ South a) Codes: Yes Repub 1 icanNo o -+ non-South No e.Cart~r: i 0 1 1 o o 2 2 t P.6 .327 Cha.4 .4 N-K .es No .8 . a+d = 3/5 = 60 Pair Coefficient (a+d)lp R-C .6 R-F R-N R-J R-K o .pter 12 12.6 F-N F-J F-K J-K .g.6 .4 .6 C-J C-K N-J . Reagan .4 .

6 R-J 0 0 0 .4 .571 .571 3 .6 .333 .429 .5 4:5 1 1 1 4.5 0 .2 9 9 9 14 14 12 6 12 6 0 0 0 .'5 4.5 5 6 7 5 6 .2 0 0 4.429 .667 .571 .5 14.4 .25 .571 .5 14.333 .571 .667 .333 .5 R-N .5 10 10 10 10 10 10 4.5 4.5 .75 .5 '1 .5 .571 .25 3 11 6 3 11 11 1i 6 3 3 3 6 '6 6 .25 .5 4.4 .4 .328 12.429 4.25 .5 4.889 .333 .571 0 .333 0 .5 13 10 4.4 .6 .75 .2 9 14 9 9 14 14 0 . .5 10 10 10 Rank Order Coeffi c i ent Pair 4.4 12.6 .429 Pair R-C R-F R-K C-F C-N C-J C-K F-N F-J F-K N-J H-K J-K .4 .25 .6 .4 .25 .143 .75 .2 .6 .429 .429 .75 .5 14.667 .5 14.5 .4 .5 14.8 .75 .25 .5 4.5 4.667 .5 4.1 b) RankOr¿~r Coeffi ci ent 1 2 3 1 2 3 .2 9 9 9 0 14 14 14 .667 7 3 3 1 1 .5 14.5 4.5 .8 .2 .5 0 .333 0 .5 10 10 .25 .5 .75 4.571 R-C R-F R-N R-J R-K C-F C-N C-J C-K F-N F-J F-K N-J N-K J-K 13 10 10 4.25 10 .5 4.333 6 3 .111 13 .333 .111 14 12 .

i = (a+b)/p¡ Y = (a+c)/p 1 p .i-x.-y.-xP = (a+b)(1-(a+b)/p)2 + (c+d)(O-(a+b)/pF = (e+d)(a+b) r(y._y)2 = (a+c)(1-(a+cl/p)2 + (b+d)(O-(a+el/p)2 = (a+C)(b+d) l' 1 1 1 1 r(x.y.-x)(y.y+xy) p p p1 = a _ (a+c)(a+b) _ (a+b)(a+c) + p (a+b)(a+c) p P = a(a+b+c+d)-~a+eHa+b) = ad-be Therefore (ad-bc) lp r = ((c+dHa+b~~a+C)(b+dl )'. = ad-be ((a+b He+d) (a+c)(b+d))~ .-y) = r(x. P r(x.

es increases = c.4 Let c. -+ -+ 3 (D 0 Z (j 0 ~ ri. 2.330 12. i ncrea s. =-.5 a) Single linkage 2 1 3 4 (12) 3 1 0 (123) (12) 0 3 11 z o 4 3 4 5 4 3 o Dendogram ~ J :i 1. 2 cz so A 1 so t c2 increases as c. '3 'l (123) 4 4 4 o 4 (: oJ . 4 Finally. 1 + . a+d a+d c3 =(a+d)+2( b+c) c = 2 p 1 e3 so then c3 = 1 +2( c. 1_1) increases as c. Cz = c-1+3 so Cz increases as c3 increases 3 12.

All three methods produee the same hierarchical arrangements. S'(45)3 = . 12.l 12. 4 ...3 ~t j .5 b) c. Item 3 is somewhat different from the other items.g.2 1.18.r 5.-:3 .6 "3 4 Dendograms Complete Linkage 10 Average Li nkage .1 ~ . we have: i Single linkage '" A 3 s l.7 Treating correlations as similarity coefficients. 4 :z !. I i S45 =.(45)2 = . 4 :i 5'3. i-i -.-' ~ . S51) = .2 S '5.16 .Single Linkage e (¿ 4- . . 1.S1 Jrr- . i 1 I I i i .32.) Average Linkage Compl ete Li nkage Dendoaram Dendoaram 10 S e 4 ~ "3 2- Ll 1- :i :: :3 4 .68 S(45)1 = max (S4!. and so forth.w~ I'.331 12. 1.

c.g_. S5i) = .Sl--S: S(45)1 = min (S41.5 a 4 6 CD 8.5 a 1.es r~sults si~.l¡r to single linkage.1 Both methods arrive at nearly the same clust.eri ng.'3 .:: tt Average linkage pr~uc.68 ..15.. e3 . S(45)3 = .12 S(45)2 = . 0 . 12.1 ..332 i2~4S Complete linkage S45 = . ~-.21. 4 . :3 s.8 1 2 3 4 1 0 2 9 0 3 3 7 0 4 6 5 9 a 5 11 10 CD 8 5 1 -+ 2 (35) 1 0 2 9 0 (35 ) 7 8. and so forth..

) :: 1 1.0 t ..S I .: .3 :L i i. nkage a A1 though the vertical s~a les are differ~ntt all three linkage methods produce the same groupings.::3 4 S' '1 .2 '4 S" .i :3 cf s- Average L. · :i _i- 5' LL .9 Dendograms Singl e Linkage COl'pl ete Linkage 1. · 4. .333 12." ~ n . (Note different vertical 2 scales.

2)2 = 0. and ESS4 =(8 .25 3.5 ~34).4.5 Squa red di stance ~ntr.4)2 + (8 .25 3.0 (4).8)2 = o.oids C1 us ter (AD) (BC) A 8 3. (b) At step 2 Increase in ESS Clusters t 13)- (3)(2)- ( 14)- (2) t 12)- ( 1) ( 1) ( 1)- (4).25 45.25 3.334 12. ESS3 = -(5 .4)2 +(5 .5 (3).5 -.4)2 + (1 ..25 11.7 Finally all four together have ESS = (2 .5 (c) At step 3 Increas Clusters in ESS t 12)- (34)- ( 123)- (4) 5.1)2 = 0.5)2 = 0.8.0 (23)(24)(2)- (3)-24.25 29.18. ESS2 = (1 .25 .10 (a) ESSi = (2 .5 t 4).4.25 to C 9 ro up 0 27.4)2 = 30 12.0 8.11 K = 2 initial clusters (AB) and (CD) Xl xi (AB) 3 1 (CD) 1 1 Final clusters (AD) and (Be) Xi (AD) 4 (BC) 0 xi 2.

It makes no difference if you star at the top or bottom of the list of items. .13 K = 2 initial clusters (AB) and (CD) Xi x2 (AB) 2 2 (CD) -1 -2 Final clusters (A) and (BCD) (A) (BCD) Xi x2 5 3 -1 -1 Squared distan~e to group cen troi ds Cl uster A (BCD) B C 0 01 40 41 89 52 41 51 A 51 The final clusters (A) and (BCD) are the same as they are in Example 12.5 Squared di stance. and only. 12. In this case we start with the same initial groups and the first. to group centroi ds Final clusters (A) and (BCD) C1 uster I Xi x2 (A) 5 3 (A) (BCD) A B 0 52 C 0 40 41 89 4 5 5 (BCD) -1 -1 As expected. reassignment is the same.12 K = 2 initial clusters (AC) and (BD) Xl x2 (Ae) 3 .335 12.5 (BD) -2 -.11. A graph of the items supports the (A) and (BCD) groupings. this result is the same as the result in Example 12.11.

8 187.5134.6 227.9 283.0 C7 86.4 0.2 148.1 78.9 51.6206.0 0.6 160.9 C42 191.6 76.6 96.8 56.4 C29 107.5 45.1 68.6 106.4 10.2 31.1 280.178.2 100.1 208.9 190.1 75.6148.6 100.2115. 0 C8 15.2 155.2 14.5 C25 49.0 84.7 221.9 27.0 ~5.1 181.0 0.0 103.0 235.1 43.7 136.2111.0 99.8 177.1113.1 268.7 125.7 136.3 39.5 43.4 110.2 155.8 290.1 100.2 322.0 0.2 54 .5 45.9 148.9 80.8 122.4 71 .2 112.9 61.5 113.9 112.4205.5 90.0 C18 265.9 121.7130.0 183.4 51.1 38.6 21.2 105.3 93.1 46.1 177.5 139.6191.8 205.2 186.6 44.6 94.5 C19 68.3 0.1 251.5 82.1150.7 111.1 100.0 C5 103.9 C28 16.2 110.2 14.4 0 .7 160.9 C15 60.1 21.3 0.9.9 207.5 155.3 237.3 190.1128.0 141.8 122.1 194.3 28.8 285.9 C3S 116.8 151.8 111.4 209.8 100.8 215.3 34.7 99.5 59.2 C37 53.7123.2 63.2 C23 58.1 0.8 55.8169.7 218.8 C13 C14 C15 C16 C17 C18 C19 C20 C21 ~22 C23 C24 C13 0 .8 63.1 170.8 121.9 233.9 82.4 C41 197.6 41.4 58.5 120.1 62.2229.1 230.4 173.7 51.4 267.1 54.9 138.5 47.3 130.6 103.2 105.5 70.3 60.2 31.9 60.2 109.8 183.8 51.1 152.6140.4 51.7 78.8 129.8 65.8 49.3 82. 0 0 .0 36.4 136.2 51.5 121.8 83.5 127.1 0.0 C36 114. 2 72 .2 44. 8 32 .1 22.1 158.5 61.0 43.5 36. 5 0 .6 71.7 94.7 90.7 O.7 92.0 C6 72.0 268.7 130.3 190. 4 117.1 101.9 60.3 19.4 113.0 21.3 280.8 170.8 120.8 36.9 0.5 84.1 113.6 288.1 64.6 143.3 111.7 187.3 190. 9 10 .4 65.8 59.0 120.7 132.3 28.7 115.1 301.2 185.3139.6 90.8 60.6 180.0 C3 15.6 260.4220.5 111.2 90.1 C30 33.8 309.6 250.7 83.1 85.5 50.1 105.3 44.0 102.9 60.0 123. (a) The Euclidean distances between pairs of cereal brands CL C2 C3 C4 C5 C6 C7 C8 C9 CL0 ~11 C12 CL 0.9 123.0 111.6 0.3 114.1 112.3 50.2 60.7 76.2 21.2 C31 78.14.7 131.8 C34 173.~ CL1 81.2 51.6 113.8 127.0 C16127.6 43.9 81.1 51.9 54.4267.b 294.0 C15 213.4 270.0270.1 0.5 194.1 1'02.0 105.2 1-04.3 48.5 81.5 65.3 52.0 C19 212.7 114.3 51.8 90.7 C32 32.1 60.6100.9 148.0 68.2 189.6 90.9 1'01.8 71.~ 209.2 290.5 63.844.9 61.6 81.6 103.5 75.0 C9 46 .5 44.9 62.0 C2116.9 210.7 198.3 116.5 159.8 108.2 163.5 50.7 17.0 65.5 42.9 220.2 181.0 59.6 62.2 133.6 194.9 52.3 196.8 138.3 42.5 51.3 157.2 186.7 96.0 61.5 86.0 217. 20.7 C27134.6 C20116.8 16.4 112.6 102.6 21.2 102.1 91.2 79.9 69.'6 25.3 170.4 70.2 70.4 45.0 C13 163.0 C17 173.7 26.3 186.5 49.4 42.9 82.1 41.1 141.8 174. 1 65 .8 92.6 75.7 120.7 59. 7 48 .5 103.8 145.7 210.5 44.6 89.3 14.4 68.8 C17 23.5 90.3 142.4 121.6 203.2 29.4 112.7171.6 101.6 288.9 101.5 92.1 90.0 C14 46.7 87.3 150.7 0.7 321.5 183.4 37.0 155.4 177.2 220.0 C4 6 .7 214.7 189.9 C33 143.2 99.474.454.7 174.2 30.6278.8 51.1129.3 38.7 85.5 11'0.8 55.3 200.7 C26 182.5 1.1124.8 82.5 170.0 C18 134.9 90.0 56.5 68.4 1.0 94.7 69.6 141.4 52.4 181.2 68.3 121.4 0.9 22.3 67.8 60.7 C43 185.8 101.3 13.0 120.3 154.9 168.2 139.6 121.9 148.0 .336 12.3 69.0 157.2 103.1 122.7 111.8 60.8 C39 48.9 32.0 CL0 54.7 C24 68. 0 C14 127.3 C40 40.2 173.3 67.0 113.7 C22 98.8 270.9 16.0 54.1 112.6 78. 9 75 .8 286.9 60.4 92.2 221.0 C12 42.0 C16 46.9 59.2 61.1 89.6 C21103.6 144.0 61.5 36.6 C38 54. 6 54 .

7 231.2 136.8 ~4.9 75.3 190.6 101.9 209.4141.9 112.8 81.3144.0 56.1 132.5 163.2 91.3 80.0 152.1 96.4 C38 24.8 136.8 295.2 295.2 143.2 10.7 161.9 148.3 148.7 56.2 205.1 153.6 31.7 71.7 C39188.0 170.0 96.3 55.0 195.2 264.7 64.4 0.9207.7 93.8 150.1177.6 227.5 110.2 36.0 46.5 0.1 335.0 C43 237.2 0.2 94.7 C40146.1 193.1 162.3 31.6166.3 184.5 228.8 226.7 16.7 97.1 0.3 250.9 305.3122.2233.0 C37 43.9 130.5234.036.2 58.2.3204.6 140.4 95.2 117. 0 C38 27 .2 30.8 C31104.6 C25 C26 C27 C28 C29' C30 C31 C32 C33 C34C35 C36 C25 0 .2 70.2 151.7 49.8 50.7 250.3 148.8 0.5 71.4 156.2 0.0 28.0 136.8 0.6 117.7 58.3 157.9 117.1292.9 231.9 165.0 105.1227.1 108.0 121.3 C26 233.9 71.3 81.1278.7 78.7 0 .3 226.0 0 .8117.4 63.0141.8 107.7 C30 191.3327.0 C32 75.3 92.3 72.8 244.6 196 .1 131.4 135.9 0.9 C37 C38 C39 C40 C41 C42 C43 C37 0 .3 375.7 197.5141.4 186.5 54.8 294.0 C40 90.9 133.4 80.0 C31 111.0 113.6 41.0 93.9 214.2 149.6 71.3 121.7 149.1 257.0 123.4300.1 232.8 C37182.1 197.3 141.0 108.9 120.1 195.1 114.6 34.2 36.3 98.4 57.9214.6 62.6 30.8 154.6 203.8 24.5 175.7 173.6 27.2 116.1131.6 293.7 96.7 146.5 83.7 126.5 0.0 197.5 253.4 70.5 131.2 17.2 214.9 74.4 132.5 140.5 75.0 143.5 37.7 C43 229.5 283.0 110.0 C23 204.3 91.8 C27 67.7 81.0 170.6 27.1 102.2225.8 401.1 292.8 53.7 66~9 91.3220.1 C41 209.1 201.4 144.4 80.0 C33 122.1 78.1 213.4 C28174.3 94.3 158.0 C34 30.0 C35 91.2 75.8156.3 '204.0200.8 200.2 128.7231.1 200.9 C29 83.5 225.3 37.1 0.1 60.8 74.5 100.8 148.8 201.6 181.7 231.7 230.7 108 .4 248.2 140.9 153.2 C36 226.6214.6136.6 96.6 44.3 72.1 71.4166.663.6 71.2303.8 0.7 C32 150.8 140.7 0.4 91.9 17.1 93.0 296.1132.5 141.5 79.8 216.6 88.1 0.5 102.2 0.5 172.5 35.2 226.0 60.2 117.6 158.2 73.2 0.0 C39 20.7 107.4 108.9 188.7 124.1 0.7 74.2 286.5 102.9 297.8 52.278.7 152. 0 C26 C27 C28 C29 213.4 36.7207.6 132.7289.4 225.8 10.6 96.6 95.8 167.8 180.6 21.8 0.0 210.9 132.0 O.3 204.7 131.5' 63.5 C40 86.9 88.3 46.7 59.9 152.3 200.2 186.6 70.1 C41 301.8 221.9 30.9 210.3142.8 49.1 89.6 62.5 47.8 139.4 282.1 97.1 248.1 83.6 33.3 159.0 98.4 178.7 C33 230.1101.5 141.5 324.1 88.6 150.0 C21 234.0 86.4 324.4 101.3297.2347.7 138.2 151.2 108.7 70.7 104.8 C42 210.5201.2 179.2 342.~ C34 215.337 C20 223.2 79.0 59.132.3 180.0150.1 233.5 196.1 322.1 147.2 79.6 142.3 78.8 148.1 270.1 227.8 204.0 236.4 93.0 C24 212.0 C30 20.1 135.2 227.5 140.7 178.7 72.5 130.3288. 3 C35 221.0 C25207.1 35.6 95.8 110.4 160.1 C42 277.4220.7 194.5 103.7254.5 172.4 24.5 121.1 177.1 227.8 C38198.6 165.3 151.7 140 .8 79.8 200.9230.2104.7 300.9 137.3 170.7 122.9 96.9 23.8 96.8 81.1252.4235.2 325.2 0.8 297.4158.5 86.2 81.0 C43218.4153.8 149.0 C36131.9 341.9 150.9 34.1 92.3 30.3 73.2 164.2 0 .0 C22 91.0 148.6 52.4 164.1 103.5 C39 30.1112.6 1.8 66.0 152.8 107.8 102.0' C42 237.1 214~1 131.5 81.2 66.4 315.7 87.0 251.6 195.0 C41 241.5 53.4 51.8 151.9 83.3 165.3 66.0 221.1 229.

00 o "'.. Single linkage ~ "" ì3 g g ~ o N CO.338 (b) Complete linkage produces results similar to single linkage......"" õú ~" ::~ 0l 00 . 8 '" "" ì3 ~ o . 8. Oõ MI ot.. N. _N 00 . Complete linkage 8.

6 123.9 1.0 5.1 3 86 .2 0.0 2.7 15.2 0.0 4 195.3 7. 5 26. Final cluster centers 1 2 3 4 1 110.8 15.8 55.1 1. 7 2.7 4 112.1 0.9 50.0 1 2 86.0 .4 132.0 162.0 for K = 4 5 6 7 8 0.34 4 Single C1 C2 C3 1 1 1 0.8 12.15.0 3 190.8 245.0 6.1 0. we use the means of the clusters identified by average linkae as the initial cluster centers. C35 C36 C37 1 1 1 1 C3B 1 C39 C40 C41 C42 C43 C11 C13 C18 C22 C27 C29 C31 ~34 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 C10 C12 C14 C15 C16 C17 C19 .4 10.7 171.5 10.c19 C21 C24 C26 C36 C41 C42 .8 225.c41 41 C42 41 Complete C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C12 C14 C16 C17 C20 C23 C25 C28 C30 C31 C32 C33 C35 C37 C38 C39 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ~40 1 CL1 11 11 11 11 11 11 15 15 15 15 15 15 .c13 C22 C27 C29 C34 C15 .0 2 114. C20 C23 C24 C25 C28 C30 C31 C32 C33 C35 C37 C38 C39 C40 C21 C26 C36 C41 C42 C43 C11 C13 C18 C22 C27 C29 C34 K = 4 1 CL 1 1 C2 C3 C4 C5 C6 1 1 C4 C5 C6 C7 C8 C9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C7 C8 C9 C12 C15 C17 C19 C20 C23 C24 C25 C28 C30 C33 C35 C37 1 1 1 C4 C5 1 1 C6 C7 C8 C9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C3B 1 C39 CL0 C11 1 2 C14 C16 C22 C29 2 2 2 2 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 2 C31 C32 C40 C21 C2q C36 C41 C42 C43 C13 C18 C27 3 3 3 4 4 4 (.0 2.8 5.4 4 clusters K = 3 C1 C2 C3 Distances between centers 1 2 3 4 2 2 2 3 3 3 1 1 1 1 1 1 1 C10 1 C11 1 C12 1 C13 1 C14 1 C15 1 C16 1 C17 1 C19 1 C20 1 C21 1 C22 1 C23 1 C24 1 C25 1 C27 1 C28 1 C29 1 C30 1 C31 1 C32 1 C33 1 C34 1 C3S 1 C36 1 C37 1 C38 1 C39 1 C40 1 C18 18 C26 26 C43 26 .339 12.7 275.5 3.4 3.9 215.c43 Ci8 1S 15 15 18 0. 3 o. In K-means method.0 K-means K = 2 1 CL 1 2 C2 C3 C4 C5 C6 C7 C8 C9 1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 1 1 1 1 1 1 1 C10 C12 C14 C15 C16 C17 C19 C20 C21 C23 C24 C25 C26 C28 C30 C32 C33.

40 and 46 form a group at a relatively high level of distance. 43.16 (a). as examples. . 25 and 44 form a group at a relatively small distance. countries 11. depending on the distance level.340 12. The dendrograms are similar. (b) Dendrograms for single linkage and complete linkage follow. The clusters are more apparent in the complete linkage dendrogram and. in both procedures. 37. and countries 4. might have as few as 3 or 4 clusters or as many as 6 or 7 clusters. 27.

341

(c) The results for K = 4 and K = 6 clusters are displayed below. The results seem
reasonable and are consistent with the results for the linkage procedures.

Depending on use, K = 4 may be an adequate number of clusters.

Data Display
Countr ClustMemK=6

ClustMemK=4

1
2

6

2

2

4

3

4

1
4

4

5

3

1

6
7

6
6

2
2
2

8

1

9

4
1

10
11
12
13
14
15
16
17
18
19
20

21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

44
45
46
47
48
49
50
51
52
53

54

5
3
2
6

3
6
6
2

2

2
3

1
4
2
1
2
2

1

2
1

1

4
1

4
4
4
6
6
6
3
3
2
1

Numer of

Cluster1
Cluster2
VCluster3
Cluster4

Within

Average

Maximum

sum of

from

from

298.660
318.294
490.251
182.870

4.494
3.613
11.895
2.681

9.049
6.800
16.915
7.024

Wi thin

Average

Maximum

sum of

from

from

490.251
128.783

11.895
2.669

5.521

cluster distance distance

observations squares centroid "Centroid
11
20
3

20

4

4

1

4

4

4
3
6

Numer of clusters:

2

2
1

Numer of clusters:

4
2
4
4

4
2
2
2
1
1

4
1

4

4

6
4
5
3

2
4

2
4
2
2
5
1

4

3

1
4

4
4
3
2

6
6

4

6
1

4

3
6
2

1

2
1
2

4

6

Numer of

cluster distance distance

observations squares centroid centroid
4.008
2.884
90.154
10
Cluster1
2.428
1.
ti3
22.813
8
Cluster2
6."651
3.346
116. S18
8
Cluster3
5.977
2.513
78.508
10
Cluster4
16.915

¡.lusterS

Cluster6

vi IdcMl.c.a,\

3

15

342
12.17 (a), (b) Dendrograms for single linkage and complete linkage follow. The
dendrograms are similar; as examples, in both procedures, countries 11 and 46
form a group at a relatively high level of distance, and countries 2, 19,35,4,48
and 27 form a group at a relatively small distance. The clusters are more apparent

in the complete linkage dendrogram and, depending on the distance level, might
have as few as 3 or 4 clusters or as many as 6 or 7 clusters.

cOoø . ... , .. ,'~.. .,., , . ' , " . '..' ,

, . ,'~~~~~~~"~"1'\~~~'~'

, Countries

343
(c) The results for K = 4 and K = 6 clusters are displayed below. The results seem

reasonable and are consistent with the results for the linkage procedures.
Depending on use, K = 4 may be an adequate number of clusters. The results
for the men are similar to the results for the women.
Data Display

Country Cl us tMern=4
1
2
3

2

4
5
6

7

2

ClustMem=6

2

2

4
4

4
1
4

1

3

4

6
2
1

8

2

9

4

2

10
11

2
3

2
5

12
13

2

1

2

14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

2

1
2
3

1
2

2

4
4
4

6
4

1

2

1
1

2
1

3

2
1
4

2

4
4

6

4
4

6
4
2

30
31
32

2
2

1

33

1

3

34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

1
4

4

1

3

4

4

4

2
2
3

49
50
51
52
53
54

2

4
3

1

4

Wi thin

Average

Maximum

cluster distance distance

Numer of sum of
from
from
observations squares centroid centroid
Clusterl
10 169 .042
3.910
5.950
Cluster2
21
73 .281
1. 684
3.041
VCluster3
2
49 .174
4.959
4.959
Cluster4
21
56 .295
1. 481
3.249

Numer of clusters:

4
1
2
5
3

2

4
2

4

2

2
1
2

1
3

2

4

6

Average

Maximum

cluster distance distance

Numer of sum of
from
from
observations squares centroid centroid
Clusterl
12
26.806
1.418
2.413
Cluster2
15
18 _ 764
1.048
1. 844
Cluster3
10 169 _ 042
3.910
5.950
Cluster4
10
10.137
0.935
1.559
vkluster5
2
49.174
4.959
4.959
Cluster6
5
6 _ 451
1.092
1.£06

3

3
2

6

wi thin

1

4

"4
4

4

4

2
2

2
1
1

Numer of clusters:

/.U-t~\'CA

12.18.

344

St~s

.1

(.

'4 ì

io

.1
(.0''7)

i

h

~

.2

. Superior

North

r
.

The multidimensional scaling
configuration is consistent
with the locations of these cities
on a map.

St. Paul t MN

Marshfield

.

,

Wausau

· Appleton

Dubuque, IA

'.

Mad i son

.

Monroe

,.

· Ft. Atki nson

.'

.

Be 1 0; t

· -ch

i1

Mi lwaukee

ago , It

345

12.19.

The stress of final configuration for q=5 is 0.000. The sites in 5 dimensions and the
plot of the sites in two dimensions are
COORDINATES IN 5 DIMNSIONS

v~ABLE

--------

PLO

DIMENSION

--------1

P1980918

A

.51

B
C
D
E

-1. 32

P1SS0960
P1S30987
P1361024

Pl3S100S
P1340945

G

P193ll31

Pl3ll137

P1301062
DIMENSION

F
H

I

.47
.39
.23
.47
.58

-1.12
-.22

2

-.28

3

4

5

-.68
.12
-.05 -.02
.69
.30
.06
-.07
.34
.09
.10
.05
.30 -.32
.12
.14 -.22 -.14 -.28
-,35 .46 .18 -.10
-.31 .~S - .01
-1. 12
,61 -.70 -.06
.01
.24
.62
.19
.05

2I
+
+
I
I
1I
+
+
I
B
I
E
DF
I
oI
+
C
+
I
AG
I
I
I
-1
+
H
+
I
I
I
I
-2
+
+
-2 -1 0 1 2
2

-+------------ --+--------------+--------------+--------------+-

-+--------------+ --------------+ --------------+--------------+-

DIMNSION 1

The results show a definite time pattern (where time of site is frequently determined

by C-14 and tree ring (lumber in great houses) dating).

346
12.20~ A correspondence analysis of the mental health-socioeconomic .data

A correspondence analysis plot of the mental health-socioeconomic-data

Ex

;\12 = 0.026

It
C\

Ò

a Impaired

..It
ò

Ox

It
0

ò
U

...~..~~.~.~.~~t:........................................... L..............:...... ......._.Ç..~...

1.2-0.0014 8- I · MUd

~
9
It
..
9
Ax

It
C\

9

a Well

-a.07

-a.05

-a.03

-a.01

0.01

0.03

c2

u

v

-0.6922 0.1539 0.5588 0.4300
-0.1100 0.3665 -0.7007 0.6022

-0.6266 -0.2313 0.0843 -0.3341
-0.1521 -0.2516 -0.5109 -0.6407

0.0411 -0.8809 -0.0659 0.4670

o . 0265 0 . 5490 0 . 5869 -0. ~756

0.7121 0.2570 0.4388 0.4841

0.4097 0.4668 -0.5519 -0.2297
o . 6448 -0. ~032 0 . 2879 -~. 3062

lambda
0.1613 0.0371 0.0082 0.0000

Cumulative inertia
0.0260 0.0274 0.0275

Cumulative proportion

o . 9475 0.9976 1.0000

The lowest economic class is located between moderate and impaired. The next lowest

das is closetto impairro.

347

12.21. .A correspondence analysis of the income and job satisfaction data
A correspondence analysis plot of the income and jOb satisfaction data

~ $50,000 c

..II
ò
..0
ò

VS
x

II

$25.000 . $50,000 c

0q

q0
......j..i.;;.ö:öööï..................................r.........................................

u II0
9

so
l(

MS
x

c: $25.000 c

~

9

II
N

9 vp
-o.Q5

-0.025

-0.005

0.005

0.015

c2

u

V

-0.6272 -0.2392 0.7412

0.2956 0.8073 0.5107

-0.6503 -0.6661 -0.3561
-0.1944 0.5933 -0.7758

o . 7206 -0.5394 0.4356

-0.3400 0.3159 0.2253
0.6510 -0.3233 -0.4696

lambda
0.1069 0.0106 0.0000

Cumati ve inertia
0.0114 0.0116

Cumulative proportion
0.9902 1.0000

Very satisfied is closest to the highest income group, and v€ry dissatisfid is b€low the
lowest income group. Satisfaction appears to in'Cl'ease with income.

.. .Uï.. CD 9 S2~eS1 'I BlackOak x ..5 0...est data ¡ ì..C' I -0.348 12.u 0d RedOak x '...22...Ö96 AmericànElm x.2 0......1 0..4 0............12 = 0.... ...0 0..... n .4 -0...537 C! Ironwood 59 D Sugai..n . .6 c2 ......Maple x S10 D ¡àasswood x CD d co d SSe "# d S7e C\ d . WhiteOak x "# SSe 9 S4e BurOak.2 0.......................1'.......6 -0. A correspondence analysis of the Wisconsin forest data A correspondence analysis plot of the Wisconsin for.Ö.3 0..

2464 -0.5154 0.3258 0.4029 -0.3393 -0. 20020 .7326 0.3232 -0.0644 0.3369 0.4260 0.0674 0.4200 -0.3622 0.2428 -0.2267 0.9155 0.4071 -0.3181 -0.1821 0.9950 1.0738 -0.0106 0.6329 0.3192 -0.2134 0.5817 -0.0000 .1607 0.0042 -0.7700 Cumulati ve proportion 0.0623 0.2172 -0.0763 -0.0831 -0.3140 0.3943 0 .1165 -0. 4626 0 .3366 0.4089 -0.6026 -0.0625 -0.0726 -0.3856 -0.4781 0.1950 -0. 1262 -0.1598 -0.3889 0.0925 -0.3511 -0.4005 0.3495 -0.3101 0.0518 o .6644 -0.2646 0 .6573 -0.0831 0. 2687 -0.9747 0.2685 0.1478 0.3484 -0.4271 -0.0937 0.2108 -0.0698 0.4562 -0.1520 -0.6122 -0.0307 -0.6298 0.1381 o .0616 0.0151 -0.3835 -0.1852 -0.4771 0.5587 -0.4689 -0.4391 0.1999 0.0772 -0.0544 -0.5718 0.4160 -0.3217 0.4850 -0.1590 -0. 3549 -0.7662 0.3412 0.0826 -0.3394 0.5327 -0.0582 0.4856 -0.4345 -0.1456 0.5090 -0.4907 V -0.1866 -0. 2022 0 .1355 0.3420 -0.0006 -0.7050 0.5367 0.5142 -0.2897 -0.3294 -0.6970 0.4080 0.1167 0.3877 -0.2476 0.1272 -0.3606 0.0291 0.5895 0.3006 0.0345 -0.1052 0.349 U -0.5994 0 .1596 -0.7506 0.1808 -0.0756 -0.1108 0.4247 -0.0164 -0.1685 0. 5400 0 . 2668 -0.3904 -0.3310 -0.7616 0.1567 0.0377 0.0978 -0.4985 0 .5382 -0.2343 -0.3150 0.1955 0.326S 0.0394 0.3412 lambda 0.2635 -0.0540 -0.8219 0.0820 -0.1968 -0.3772 -0.7086 -0.3634 -0.2507 -0.4079 -0.2745 0.2333 0.1726 0.9891 0.4138 -0.0000 Cumulati ve inertia 0.

0000 pel pc2 pe3 pe4 St. 5000 -~.0059 -0.6769 0.0064 -0.0628 -0.5 0.0978 0.0000 1 As in the ~or~esondence analysis.1374 -0.0952 0 Prop.0051 0.5853 0. 0. 1396 -0. Eigenvectors of S S 0.0187 0.2604 0.2385 -0.23' We construct biplot of the pottery type-site data.0091 0. 0.7694 0.6932 0 .6769 0.0390 -0.0059 0.3464 0.1951 -0. Dev.1940 0.0376 0.0223 0.9373 1. 5 Eigenvalues of S 0. .0051 0.0025 -0. of Vax.5 O.0025 -0.0627 0 Cumulati ve Prop.0061 0.0084 -0. 0.8325 -0.6233 0. with row proportions as variables.350 12.0187 -0.5 -0.0390 -0.0061 -0.0511 -0.3128 0.

351 12.0 0. 000809 0 . c:i Moderate E oo 0 c:.5 Eigenvalues of S 0.5 0.1685 0.000809 -0.000284 0. A i"eflction about the 45 degi-ee line would make them appear more alike.7379 0.10 -0. co 0 c:i -0.000000 pc1 pc2 pe3 pc4 St.ô487 0.0 0.000284 -0. 0.7033 0.5 '~0.24.003485 -0.9355 0.0219 0.05 0.003485 o .0614 0.9355 0. of Vax.0837 -0.003089 0.2270 0. vVe construct biplot of the mental health-socioeconomic data.0031 0 Cumulati ve Prop.10 0co oo 0 c: c: oo 0 c: D C\ 0 C c: 0C\ Mild C\ ci E 0 () c: mpaired 0 0 Well c: c: A C\ 0 C\ 0 c:.05 0. A bipJot of the mental health-socioeconomic data -0.000413 -0.004021 -0.000480 0.000413 -0.000318 -0.05 0. Dev.0794 -0.2719 -0.10 -0.000853 -0.8320 0.10 Compo 1 S Eigenvectors of S 0.0049 0 Prop.000379 0.9969 1.007314 0. .15 0.000853 0.0000 1 The biplot gives similar locations for health and socioeconomic status.15 -0.000024 0. 000329 -0.3628 0. with column proportions as variables.000318 0.4764 0. 0.05 0.0855 0. 0.5 0.5676 0.

- P130106 P1931131 o ia _ oc iic Q) P1351005 P1361024 o c) P1530987 P155090 E Õ "C C ou ai Il Pl340945 c)I - en P19B018 o P1311137 -7 - ~I I -1.0 First Dimension A two-dimensional representation of archaeological sites produced by nonmetric multidimensional scaling C! _ .0 .5 0.0 i I i i i -0.25..0 1. A Procrustes analysis of archaeological data A two-dimensional representation of archaeological sites produced by metric multidimensional scaling C! _ .0 First Dimension -T 1.5 0.0 -0.352 12.5 1.0 0.- P1931131 P1301062 0 ia _ c0 iic ai E Pl361024 P~7 C! _ 0 PI55096 Õ pP198il18 1340 5 "C c0 u ai en ~C! .5 2.5 2.0 i i 0. -1.5 1.-f P1311137 ia .

156 -0.353 Site P1980918 P1931131 P1550960 P1530987 P1361024 P1351005 P1340945 P1311137 P1301062 Metric MDS -0.545 -0.000 0.122 1. 137 0 .071 -0.715 To better align the metric and nonmetric solutions.234 0. 118 -1.581 -0.889 -0.048 -0.e.692 1.278 Nonmetric MDS -0.9969 0.829 1.512 -0.469 0.137 -0. is reduced to the Procrustes measure of fit P R2 = 0. 1459 0.387 -0.276 -0. we multiply the nonmetrk scaling solution by the orthogonal matrix Q.349 -0.5 degrees. 262 -0. 0.096 0.642 0. 0679 O.703 -0.469 0.0784 -0.470 -0. 9977 -0.296 -0.608 0.7819 0.0000 2.0784 0.9893 -0.387 0.088 -0. 0679 -0 .338 -0.409 1.756. .9969 4.the sum of squared distanc. This corresponds to clockwise rotation of the nonmetric solution by 4. 379 -0.216 0. After rotation.963 -0. 9977 Q Lambda u -0. .803. 989 0.318 0.9893 v -0 .1459 0 .

... . .."..::.. .... as 7 or 8 clusters...:. .. ......43 O..-'" ... __ . ........91 Fars .. .M..... ...-.. Both dendrograms suggest there may be as few as 4 clusters (indicated by the checkmarks in the figures) or perhaps as many either dendrogram would depend on the use and require some subject matter knowledge.:- ....26 The dendrograms for clustering Mali Family Fars are given below for average linkage and Ward's method. ...37 428...... ..:.:... c..':." _:' ..."...: "p"-.......:.:........... ':'..:.. -:..: ......-.:' -..Waíf ..." " -. ..354 12..'.Fatms 79.'....:..' 643.. .:. ... Average Linkage.. _ _ .-'.OO"f~W~"~~~~~'\~ Fars ........... The dendrograms are similar but a moderate number of distinct clusters is more apparent in the Ward's method dengrogram than the average linkage dendrogram... '..'.. EUdit:eanOisteince. ...' ._':.. ...'....... Euclidean Distance.. ..'.. . .. ....". .'."...kage.. Reading the "right" number of clusters from ..-...c. Malilf=aI1i1Yi...:...' '"..

.-'-...-.':-.Eu(¡ndean..'.-:---..d.jnk~lIe..._.'..'..--.'..'--._::--X_.-..'.-i'..'......03 uCD i: 5.--. Mali:fiamil.. I 29.figtjifi)J~lidean Di$t.-.'.-'-.:-:::..".--.._.:.8 M ¡ ._.i.355 12. The dendrograms follow..-. MållFamilyFafm!i 44. the results are somewhat different from those using the original observations and different from one another...'._-...__.-.-.":-::.-.-..'.'.68 'e .--.27 If average linkage and Ward's method clustering is used with the standardized Mali Family Farm observations..'.....-." .-'. .'..-_. we see the two procedures produce quite different results..'.-.".--.-':.--._:-....-~1~¥?TFars W~td.'--.-..". The distinct clusters we focus are more clearly delineated in the Ward's method dendrogram and if attention on the 4 marked clusters..'..'-'.'..- ~ërilge t..lIsll..51 .:. '-.".'. . _..f~rm$h('5tandat4jlCc-: 8._. There could be as few as 4 clusters (indicated by the checkmarks in the figures) or there could be as many as 8 or 9 clusters or more.36' Û Q 2.:-..'"'..--.' ".---:.----_.:-::-_.-_--:.68 .Llnk...--__:.-:.~~'\~-:..'~oall~~ .'::-'.--.

356 12.539 1129.647 21.072 15.878 9.28 The results for K = 5 and K = 6 clusters follow. 053 16.878 9.418 .511 8.030 22. there is no change in the other clusters.083 1943.26.072 15.330 3298.ata Display Farm ClustMem=5 ClustMem=6 1 1 2 2 1 2 3 3 3 3 3 4 ~ 5 5 1 7 4 8 9 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 cE 2 4 3 3 3 3 :: 5 4 4 2 4 3 3 3 3 3 2 3 4 4 3 3 5 5 3 3 3 3 5 2 3 3 5 5 3 3 3 3 3 2 3 3 3 3 3 3 3 3 4 4 5 3 4 4 5 3 3 3 3 4 5 4 3 3 3 3 2 3 2 3 3 3 2 2 3 3 3 2 2 3 5 3 5 ! 18.083 1943.647 Numer of clusters: 6 Numer of Cluster1 v-luster2 i.619 Wi thin Average Maximum 696.uster5 Cluster6 observations 4 11 35 12 8 2 cluster distance distance from from sum of squares centroid centroid 3 2 3 2 3 3 3 2 2 3 3 3 2 2 3 3 1 2431.005 19.luster3 vCuster4 L.125 13.609 4440.094 4440. K = 4 is a reasonable solution as welL. cluster 1 in the K = 5 solution is paritioned into two clusters.156 1005.498 19.418 15.539 1129.330 3298.è S ö~ K 21.076 24. D.619 22. Note as the number of clusters increases from 5 to 6. Although not shown.024 19. in the K = 6 solution.156 3 3 4 1 1 8 Maximum 3 3 4 4 2 veuster4 \/luster5 6 11 35 12 Average 5 4 4 4 4 Clusterl vèluster2 veluster3 2 3 5 5 Numer of observa tions Wi thin cluster distance distance from from sum of squares centroid centroid 5 3 3 Numer of clusters: 5 4 4 4 2 1 1 1 s- . The results seem reasonable and are similar to the results for Ward's method considered in Exercise 12.053 16. 1 and 6./ rd~lGL1\ for t-wo CttOl(.030 33.474 24.511 8.024 19.

ÛDtc. us ter2 6 cluster distance distance from from sum of Numer 0 f observations squares centroid centroid 5 5 Cluster3 Cluster4 34 18 Cluster6 7 3 ~uster5 51. clusters 3 and 4 in the K = 5 solution lose 1 and 2 farms respectively to form cluster 6 in the K = 6 solution. 228 65. Note as the number of clusters increases from 5 to 6. Data Display Farm SdC1usMem~5 SdC1usMe=6 1 2 1 5 1 3 3 3 3 5 6 5 5 1 7 3 8 9 3 1 3 3 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 5 3 Numer of clusters: 5 3 3 3 3 3 5 4 4 4 4 3 4 3 3 2 3 3 2 4 4 4 4 4 3 3 3 6 3 3 3 4 2 3 4 4 3 3 3 3 3 3 3 vfuster1 4 3 3 6 3 3 3 42 43 44 45 46 4 4 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 3 3 3 3 3 '68 '69 70 71 72 3 2 5 4 4 4 4 5 3 3 7 Numer of clusters: 2 3 3 4 35 20 vcuster5 4 3 2 3 Average Maximum 14.099 63. 211 1.071 7. 183 1. These results using standardized observations are somewhat different from the corresponding results using the original data.357 follow.703 4.29 The results for K = 5 and K = 6 clusters are similar to the results for Ward's method considered in Exercise 12.970 1.482 Within Average Maximum 14.259 1.727 55. 951 3.954 2.482 1. 954 .27.uc c. The results seem reasonable and 12.259 3. 806 2.172 3.501 63.604 4 5 3 3 2 3 4 3 2 5 4 4 4 4 5 3 3 4 4 4 3 4 3 3 4 2 4 1 1 1 1 1 1 5 5 3 2 / IdetthcOv\ for f£.727 1.288 2. 993 3. there is no change in the other clusters.050 56.960 1.318 84. It makes a difference whether standardized or un-standardized observations are used.S o( K 1.970 1.195 3.050 56.071 1.703 4. 568 2.568 3.288 3 3 4 3 4 5 3 3 39 40 41 5 C1uster4 4 3 5 Cluster3 3 4 3 within cluster distance distance from from sum of Numer 0 f observations squares centroid centroid L.1 us ter2 3 5 6 3 l/lusterl i.

the y-values at x = 10% shown in the char are 10% for baseline (no model) and 30% for the gain (lift) provided by the modeL. We see.Baseline ix 50 CD E~Ul 40 30 o 0. Ul 90 BO 70 60 -+ Lift Curve CD -. The y-axis shows the per-centage of positive responses. 20 ~ 10 ~ o o 10 20 30 40 SO 60 70 80 90 100 % Customers Contacted . of the positive responses. Our response model predicts 6. Cumulative Gains Chart 100 Ul CD Ul c: &.OOO customers. for example. we expect to get 80% of the positive responses. This is the percentage of the total possible positive responses (20.30 The cumulative lift (gains) chart is shown below. Consequently.1 x20. The x-axis shows the percentage of customers contacted. or 2. With no model. if we -contact 10% of the customers we would expect 10%. which is a fraction of the 100.000 = . if we contact the top 40% of the customers determined by the model.000 total customers.000.000 or 30% of the positive responses if we contact the top to.358 12.000). Continuing this argument for other choices of x (% customers contacted) and cumulating the results produces the lift (gains) chart shown.

0.1418 11.3188 6.5158 0. P3 = 0.0909.3526 0.1059. We also repeat the analysis using the me function to select mixture distribution with K = 3 components. P2 = 0.2453 6. Theestimatedproportionsarepi = 0.4986.9742 5.0. These four components are shown in the matrix scatter plot where the observations have been classified into one of the four populations.3355.2633. = 7.1418 11.1322.0319.4846 and the estimated covariance matrices turn out to be restricted to be of the form 1JkD where D is a diagonal matrix. The four estimated centers are: . Comparing the matrix scatter plot of the four group classification with the matrix scatter plot of the true classification.3228 6..4988 0.1073.2431 3. we see how the oil samples from the Upper sandstone are essentially split into two groups.7044 0. P3 = 3.1862 5.2467 0.5910 0..3290 0.1806 5. We further restrict the covariance matrices to satisfy ~k = 1JkD.8099 0.18.2762 0..5369 íÆ2 .2598.1408.5485 0.1730 0.1794 5. 173 = 0. ¡¿3 = . This is clear from comparing the two scatter plots for (Xb X2).0295) and the estimated scale factors are 17i = 0.P2 = 8.2445 7.6993 .7093 0. The estimated D = diag(l1.Pi = 5.P4 = 0..5343 4. This minimum BIC model has BIG = -547.3732. -P4 = 8. 2.359 12. 174 = 0. 172 = 0.2454 4.0053.2871 0.9780 . The matrix scatter plot of the true classification.2465 .3511 0.7647.3395 5. is given in the next figure.Pi = 3. which selects the best overall model according to the BIC criterion.6893 4. (b) The model chosen above has 4 multivariate normal components. selects a mixture with four multivariate normal components. The K = 3 groups selected by this function have estimated centers .2834 . .31 (a) The Mclust function.

3 . N 0 00 a ai"t'+ a00 '" 0++~ ++ aa DC IJ D 0 tm D-+O+ D+ + Ò °0 0 " a 00 ...a o + D iIa'l CaD.8 . .s x4 0 0 ~ A~ S ++ i! - ò A + A 0 A. + æ D DaD 0 a cO + oii llo'¥ +" ¡ + " D i A 00 ft" /:i (..0 . D a C°itOD ++ ai c~OD D ODD ~~A 68 +-0 D D Do0 0Q x1 + t +t+ aa +:t . f t (. ~ e DO D DO D "' il + . a x3 + + (.. x2 o _ ~+.l..2 0. + + '" t.... c. +(¡ D0Da a Ai c.0 0 a 8 0 A A i. .)...Îa+ + a + 0 lf+ ..+ a 'l + 00 10 .. ~ .i' CD a.....+¡ t a a C DOl! &D Â "". ~d' 0.& D ë ages rì ~ ofr's'co'' a 0.. lA ~ A A. D o a 8a~c 0" + o+¡'b. 0.6 0.0 + ++0..0 12 Figure 1: Classification into four groups using Mclust ..:e t+ * - a + + B c ~c.0 To +0 gtP ~ . G~ao :. Ò 0 " n Ò i1c+ ++ l0.0 02 0. -T T 10 12 N . o a e o dc ê 0 . ++ (t + 0 0 o D .4 o _+ + a " CD 00"* ++ ++ + c aA A 0 +0+ G1) D 0 A -i 'I .pa+ooa o CT. . a a l_ A I CD 8! 0 d"ii ~+ 0-i " o 00 +l DO . +*:J o o oir 0 'D + !X Aa + c 0 0 + a o a J U D 8 a0 e :l co l' 'l: 11+ + e H~ +. D ri ¥" 0 0 c a a + + a + +-i\+ + D diD (.4 0." .360 0.: aD 0 ec .... x5 . ò o ll ~:i++ o + ~cOD~+ CD c' . 80 ¡ 00 0 a ""A o~ 00 c D D~a-b~~ * 0 Oil +.15 0.§ .~ 8 a ~""q¡ a a 0 ..0 " 4i 0lPo AAA 0 ii~ o A Aa 'l e o + *t * 0 if c bO OJ' fl+ + 00 . c .l D ++ AA"lDD B + B ++ + d' a a D a " I) Dl ". DC o -i+ + 0 D a A AA aaaao +! t a ~ 0 rm a D 0 o ò + ...t " a A a a " 0 N D 8.. Qj. + . 0 0 a D a . o :t +0 +L¡i + .l~CD a a ii'i' a g + aq. ~ 0 . D QCoO D-i .10 0.+'" D C A A . . "O..

0 0.3 6 7 4 i I Do . "! " 0 D.25 0.0 0 0 0 0 0 00 DC 0 ~D. 0 . . c ni .. i"o. 0.0 ~ o 0lf 0 .. . . I 0 -t00 .¡ 'F . ufO . 00 o 0i . lIO II ~ x4 Bg E c 0 B 8 C ø g q¡ iao 0 E 0 0 OdjOoo 0 C .4: C . m '" . o Il 1' n " u " 00 DD D 0 0 i: fk coooe DO D o D . Do 000 .0c" llo 0 m 0 0 .&0 0000 o ci. ii0 ~ . 00 l. "'.:t C 00 DJoOCOO 0 o . .0 o"C 'l ° B "''i8'ci 0 .8 § 0 ~ . ¡Pll 0 Øl0là 0 0'0 cP 0 . 0 0"0 .0 · rJ ~ å 0 å c rlo S . 0 0 go o N d An . 0 0 0 D ÚD§DOcjJ o.... o 0 . C "'0 O.l. I/ O.' 0 .¡¡eo c °OCOo~i¡ ! .a DC .0 1..0 0 C 0 0 0 o.. c c 0 0 c -i cOc~°Qi°o c of~aoOl" 00 c 0 B . flo 6 . 0 0 0 0 ..0 . 0 8. 0 0 0 x5 ..i - oS o.0 0 0 10 Figure 2: True 0.2 classification into sandstone strata II Cl Upper 0 Wilhelm ~ T i 10 12 N .0 li 0 DO DO D . c II å 00 0 4 . CC .15 020 0.. n .t ior. 6 !ì ~c 0 . 8 ~ ° . 0 0 Oll iP 0 )0 'ì 0 .. . ~ ~ 0 0 000 8'0 0 · ¡¡o .o 00 08 o~ .DC CJ DO i. .0 00 0 .8 1.0 0 D. ..2 0.. 0 c n B o 0 iiJ . 0 lD 0 .. i. o 0 c '" o 'l ~. 0 Dc 0 0 0 i 0 ~OO 0 0 .0 :"0 0 . .¡¡Ec 0. o 0 a II '" 0 0. .0 ~ o.. 0 .l SubMuli II B CC 0 0 0 e "- ... ~ å 00°0 ... :U~ .361 i.f Jl'òrt'ó 0. t.6 0. 0 _'I 0 8 " ° . !. 'lQ) 0 0 c ~ 0 8°~o .. o r 0 o 'I 0 ~o .0 0 00 om ¡. .DC 0 . DO 00 dt .~OCG x1 AD (OJO.. 0.a CD 0 c °coog 0 o 1& ODD 0 " å 0 . 0 "" 0 00 o DO 000 0 0 oO"lu B 0 !l 00 x3 00 0 i: D 0 B c c 0 0 '...1 00 0 0 O. oB f 0 o o. l 0°il°1l08 Il =0 . 0 ~ i0 °00°1a& 110 II li x2 'i'" 0 0 . ..4 0.00. . ~. ~ N 9.10 0. B . e ~ o 0 ~ ~gåo 0 c 0 .. ceo 00 c . HCe ~O.

P2 = 0.1535. 2.3702.6295.93%.0.1315. If we use this method to classify the oil samples. P3 = 0.0955) with estimated scale parameters rii = 0.362 the estimated diagonal matrix D = diag(1O.24.2969. with resulting BIG = -534.0314.0052. The estimated proportions are Pi = 0.0.3296. the following samples are misclassified: 7 19 22 25 26 27 28 29 30 31 32 33 34 35 39 44 45 46 49 and the misclassification error rate is 33. rì2 = 0.5651. rì3 = 0. .1052.0949.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.