"""" "'' ' Ill!
0 0 0 0
11111
Applied Multivariate Statistical Analysis
I
' ..,J .,..'
SIXTH EDITION
Applied Multivariate Statistical Analysis
RICHARD A. JOHNSON
University of WisconsinMadison
DEAN W. WICHERN
Texas A&M University
PEARSON
Prentice
Hall
y
_vppe_r sadd_.le Ri_ver, N_ew Je_rse 0_7458
1IIIIIIillllllll
L
i
,brary of Congress CataloginginPublication Data
>hnson, Richard A. Statistical analysisiRichard A. Johnson.61h ed. Dean W. Winchern p.em. Includes index. ISBN 0131877151 1. Statistical Analysis
Data Available
\xecutive AcquiSitions Editor: Petra Recter Vice President and Editorial Director, Mathematics: Christine Hoag roject Manager: Michael Bell Production Editor: Debbie Ryan' .>emor Managing Editor: Unda Mihatov Behrens 1:anufacturing Buyer: Maura Zaldivar Associate Director of Operations: Alexis HeydtLong Aarketing Manager: Wayne Parkins Assistant: Jennifer de Leeuwerk Editorial AssistantlPrint Supplements Editor: Joanne Wendelken \It Director: Jayne Conte Director of Creative Service: Paul Belfanti .::over Designer: B rnce Kenselaar '\rt Studio: Laserswords
To the memory of my mother and my father.
R. A. J.
To Dorothy, Michael, and An drew.
D. W. W.
© 2007 Pearson Education, Inc. Pearson Prentice Hall Pearson Education, Inc. Upper Saddle River, NJ 07458
All rights reserved. No part of this book may be reproduced, in any form Or by any means, without permission in writing from the publisher. Pearson Prentice HaWM is a tradell.1ark of Pearson Education, Inc. Printed in the United States of America
ID 9 8 7 6 5 4 3 2 1
ISBN13: 9780131877153 0 13  187715' 1 ISBNl0:
Pearson Education LID., London Pearson Education Australia P1Y, Limited, Sydney Pearson Education Singapore, Pte. Ltd Pearson Education North Asia Ltd, Hong Kong Pearson Education Canada, Ltd., Toronto Pearson Educaci6n de Mexico, S.A. de C.V. Pearson EducationJapan, Tokyo Pearson Education Malaysia, Pte. Ltd
i
1i
'" ii
='"
¥
Contents
PREFACE
xv
1
1
ASPECTS OF MULTlVARIATE ANALYSIS
1.1 1.2
1.3
Introduction 1 Applications of Multivariate Techniques 3 The Organization of Data 5
Arrays,5 Descriptive Statistics, 6 Graphical Techniques, 11
1.4
Data Displays and Pictorial Representations 19
Linking Multiple TwoDimensional Scatter Plots, 20 Graphs of Growth Curves, 24 Stars, 26 Chernoff Faces, 27
1.5
1.6
Distance 30 Final Comments 37 Exercises 37 References 47
49
2
MATRIX ALGEBRA AND RANDOM VECTORS
2.1 2.2 2.3 2.4 2.5 2.6
Introduction 49 Some Basics of Matrix and Vector Algebra 49
Vectors, 49 Matrices, 54
Positive Definite Matrices 60 A SquareRoot Matrix 65 Random Vectors and Matrices 66 Mean Vectors and Covariance Matrices
68
Partitioning the Covariance Matrix, 73 The Mean Vector and Covariance Matrix for Linear Combinations of Random Variables, 75 Partitioning the Sample Mean Vector and Covariance Matrix, 77
2.7
Matrix Inequalities and Maximization 78
vii
viii
Contents
Contents
ix
Supplement 2A: Vectors and Matrices: Basic Concepts 82
Vectors, 82 Matrices, 87
5
INFERENCES ABOUT A MEAN VECTOR
210
Exercises 103 References 110
3
SAMPLE GEOMETRY AND RANDOM SAMPLING
5.1 5.2 5.3 111 5.4
Introduction 210 The Plausibility of Po as a Value for a Normal Population Mean 210 HotelIing's T2 and Likelihood Ratio Tests 216
General Likelihood Ratio Method, 219
3.1 3.2 3.3 3.4
Introduction 111 The Geometry of the Sample 111 Random Samples and the Expected Values of the Sample Mean and Covariance Matrix 119 Generalized Variance 123
Situations in which the Generalized Sample Variance Is Zero, 129 Generalized Variance Determined by I R I and Its Geometrical Interpretation, 134 Another Generalization of Variance, 137
Confidence Regions and Simultaneous Comparisons of Component Means 220
Simultaneous Confidence Statements, 223 A Comparison of Simultaneous Confidence Intervals with OneataTime Intervals, 229 The Bonferroni Method of Multiple Comparisons, 232
5.5 5.6
Large Sample Inferences about a Population Mean Vector Multivariate Quality Control Charts 239
234
3.5 3.6
Sample Mean, Covariance, and Correlation As Matrix Operations 137 Sample Values of Linear Combinations of Variables Exercises 144 References 148
140
Charts for Monitoring a Sample of Individual Multivariate Observations for Stability, 241 Control Regions for Future Individual Observations, 247 Control Ellipse for Future Observations 248 2 ' T Chart for Future Observations, 248 Control Charts Based on Subsample Means, 249 Control Regions for Future SUbsample Observations, 251
4
THE MULTlVARIATE NORMAL DISTRIBUTION
149
4.1 4.2 4.3
Introduction 149 The Multivariate Normal Density and Its Properties 149
Additional Properties of the Multivariate Normal Distribution, 156
I
J
5.7 5.8
I
Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation 168
The Multivariate Normal Likelihood, 168 Maximum Likelihood Estimation of P and I, 170 Sufficient Statistics, 173
Inferences about Mean Vectors when Some Observations Are Missing 251 Difficulties Due to TIme Dependence in Multivariate Observations 256 Supplement 5A: Simultaneous Confidence Intervals and Ellipses as Shadows of the pDimensional Ellipsoids 258 Exercises 261 References 272
273
6
COMPARISONS OF SEVERAL MULTIVARIATE MEANS
4.4 4.5 4.6 4.7 4.8
The Sampling Distribution of X and S 173
Properties of the Wishart Distribution, 174
6.1 6.2 6.3
Introduction 273
Paired Comparisons, 273 A Repeated Measures Design for Comparing Treatments, 279
Paired Comparisons and a Repeated Measures Design 273
LargeSample Behavior of X and S 175 Assessing the Assumption of Normality 177
Evaluating the Normality of the Univariate Marginal Distributions, 177 Evaluating Bivariate Normality, 182
Comparing Mean Vectors from Two Populations 284
Assumptions Concerning the Structure of the Data, 284 Further Assumptions When nl and n2 Are Small, 285 Simultaneous Confidence Intervals, 288 The TwoSample Situation When 1:1 oF l;z,291 An Approximation to the Distribution of T2 for Normal Populations When Sample Sizes Are Not Large, 294
Detecting Outliers and Cleaning Data 187
Steps for Detecting Outtiers, 189
Transformations to Near Normality 192
Transforming Multivariate Observations, 195
Exercises 200 References 208
6.4
Comparing Several Multivariate Population Means (OneWay Manova) 296
Assumptions about the Structure of the Data for OneWay MANOVA, 296
x
Contents
A Summary of Univariate ANOVA, 297 Multivariate Analysis of Variance (MANOVA), 301
Contents
8 PRINCIPAL COMPONENTS
xi
430
6.5 6.6 6.7 6.8 6.9 6.10
Simultaneous Confidence Intervals for Treatment Effects 308 Testing for Equality of Covariance Matrices 310 1\voWay Multivariate Analysis of Variance 312
Univariate TwoWay FixedEffects Model with Interaction, 312 Multivariate Two Way FixedEffects Model with Interaction, 315
8.1 8.2
Introduction 430 Population Principal Components 430
Principal Components Obtained from Standardized Variables 436 Principal Components for Covariance Matrices ' with Special Structures, 439
Profile Analysis 323 Repeated Measures Designs and Growth Curves 328 Perspectives and a Strategy for Analyzing Multivariate Models 332 Exercises 337 References 358
360
8.3
Summarizing Sample Variation by Principal Components 441
The Number of Principal Components, 444 Interpretation of the Sample Principal Components, 448 Standardizing the Sample Principal Components, 449
8.4 8.5 8.6
Large Sample Properties of Aj and ej, 456 Testing for the Equal Correlation Structure, 457
Graphing the Principal Components 454 Large Sample Inferences 456
7
MULTlVARIATE LINEAR REGRESSION MODELS
Monitoring Quality with Principal Components 459
Checking a Given Set of Measurements for Stability, 459 Controlling Future Values, 463
7.1 7.2 7.3
Introduction 360 The Classical Linear Regression Model 360 Least Squares Estimation 364
SumoJSquares Decomposition, 366 Geometry of Least Squares, 367 Sampling Properties of Classical Least Squares Estimators, 369
Supplement 8A: The Geometry of the Sample Principal Component Approximation 466
The pDimensional Geometrical Interpretation, 468 The nDimensional Geometrical Interpretation, 469
7.4 7.5 7.6
Inferences About the Regression Model 370
Inferences Concerning the Regression Parameters, 370 Likelihood Ratio Tests for the Regression Parameters, 374
Exercises 470 References 480
9 FACTOR ANALYSIS AND INFERENCE FOR STRUCTURED COVARIANCE MATRICES
Inferences from the Estimated Regression Function 378
Estimating the Regression Function at Zo, 378 Forecasting a New Observation at Zo, 379
481
Model Checking and Other Aspects of Regression 381
Does the Model Fit?, 381 Leverage and Influence, 384 Additional Problems in Linear Regression, 384
9.1 9.2 9.3
Introduction 481 The Orthogonal Factor Model 482 Methods of Estimation 488
The Pri,!cipal Component (and Principal Factor) Method, 488 A ModifiedApproachthe Principal Factor Solution, 494 The Maximum Likelihood Method, 495 A Large Sample Test for the Number of Common Factors 501
7.7
Multivariate Multiple Regression 387
Likelihood Ratio Tests for Regression Parameters, 395 Other Multivariate Test Statistics, 398 Predictions from Multivariate Multiple Regressions, 399
9.4 9.5
Factor Rotation
504
'
7.8 7.9 7.10
The Concept of Linear Regression 401
Prediction of Several Variables, 406 Partial Correlation Coefficient, 409
Oblique Rotations, 512
Factor Scores 513
The Weighted Least Squares Method, 514 The Regression Method, 516
Comparing the Two Formulations of the Regression Model 410
Mean Corrected Form of the Regression Model, 410 Relating the Formulations, 412
9.6
Multiple Regression Models with Time Dependent Errors 413 Supplement 7A: The Distribution of the Likelihood Ratio for the Multivariate Multiple Regression Model 418 Exercises  420 References 428
Perspectives and a Strategy for Factor Analysis 519 Supplement 9A: Some Computational Details for Maximum Likelihood Estimation 527
Recommended Computational Scheme, 528 Maximum Likelihood Estimators of p =
+ 1/1. 529
Exercises 530 References 538
718 Inertia.4 10. 561 Distances and Similarity Coefficients for Pairs of Items.3 Introduction 575 Separation and Classification for Two Populations 576 Classification with 1\vo Multivariate Normal Populations 584 Classification of Normal Populations When l:1 = l:z = :£. 726 11. 644 Neural Networks. AND ORDINATION 671 12. DISTANCE METHODS. 673 Similarities and Association Measures for Pairs of Variables. 636 Classification. 590 Is Classification a Good Idea?.2 11. 648 Graphics.1 11. 708 . 10.5 10. 628 Procrustes Analysis: A Method for Comparing Configurations 732 Constructing the Procrustes Measure ofAgreement. 701 11.4 11.1 12. 648 757 764 767 DATA INDEX SUBJECT INDEX . 638 Logistic Regression with Binomial Responses.9 Biplots for Viewing Sampling Units and Variables 726 Constructing Biplots.5 11. 649 Exercises 650 References 669 12 CLUSTERING. 545 Canonical Correlations as Generalizations of Other Correlation Coefficients.2 Introduction 671 Similarity Measures 673 .2 10.725 Interpretation in Two Dimensions. 593 Clustering Based on Statistical Models 703 Multidimensional Scaling 706 The Basic Algorithm.7 Nonhierarchical Clustering Methods 696 Kmeans Method. 690 Ward's Hierarchical Clustering Method. 644 Classification Trees. 726 Final Comments.584 Scaling. Correspondence Analysis 716 Algebraic Development of Correspondence Analysis. 606 Classification with Normal Populations. 609 12. 547 The First r Canonical Variables as a Summary of Variability. 685 Average Linkage. 589 Fisher's Approach to Classification with 1Wo Populations. 647 Selection of Variables.1 10.8 Final Comments 644 Including Qualitative Variables.xii Contents 10 CANONICAL CORRELATION ANALYSIS Contents xiii 539 10. 649 Practical Considerations Regarding Multivariate Normality.8 12. Testing for Group Differences. 592 Classification of Normal Populations When:£1 =F :£z. 733 Supplement 12A: Data Mining 740 Introduction. 692 Final CommentsHierarchical Procedures.6 12.6 The Sample Canonical Variates and Sample Canonical Correlations 550 Additional Sample Descriptive Measures 558 Matrices of Errors ofApproximations. 727 Fisher's Method for Discriminating among Several Populations 621 Using Fisher's Discriminants to Classify Objects. 695 11 DISCRIMINATION AND CLASSIFICATION 575 12. 742 11.6 Evaluating Classification Functions 596 Classification with Several Populations 606 The Minimum Expected Cost of Misclassification Method. 634 The Logit Model. 741 Model Assessment. 548 A Geometrical Interpretation of the Population Canonical Correlation Analysis 549 . 682 Complete Linkage. 740 The Data Mining Process. 677 Concluding Comments on Similarity. 558 Proportions of Explained Sample Variance.5 12.7 Logistic Regression and Classification 634 Introduction. 634 Logistic Regression Analysis. 678 12.3 Hierarchical Clustering Methods 680 Large Sample Inferences 563 Exercises 567 References 574 Single Linkage.3 Introduction 539 Canonical Variates and Canonical Correlations 539 Interpreting the Population Canonical Variables 545 Identifying the {:anonical Variables. 696 Final CommentsNonhierarchical Procedures.4 12. 640 Exercises 747 References 755 APPENDIX 11.
and we then show how they simplify the presentation of muItivariate models and techniques. becomes truly fascinating and challenging when several variables are involved. physical. Applied Multivariate StatisticalAnalysis. Preface I if ! INTENDED AUDIENCE j I This book originally grew out of our lecture notes for an "Applied Multivariate Analysis" course offered jointly by the Statistics Department and the School of Business at the University of WisconsinMadison. selecting appropriate techniques. in a wide variety of subject matter areas. consequently. The introductory account of matrix algebra. Data analysis. We emphasize the applications of multivariate methods and. and understanding their strengths and weaknesses. we introduce matrices as they appear naturally in our discussions. We hope our discussions wiII meet the needs of experimental scientists. the concepts of a matrix and of matrix manipulations are important. We have tried to provide readers with the supporting knowledge necessary for making proper interpretations. On the other hand. have attempted to make the mathematics as palatable as possible. We avoid the use of calculus. Modem computer packages readily provide the· numerical results to rather complex statistical analyses. Sixth Edition. In our attempt to make the study of muItivariate analysis appealing to a large audience of both practitioners and theoreticians. and social sciences frequently collect measurements on several variables. LEVEL r I Our aim is to present the concepts and methods of muItivariate analysis at a level that is readily understandable by readers who have taken two or more statistics courses. while interesting with one variable. Researchers in the biological. In this way we hope to make the book accessible to a wide audience. as a readable introduction to the statistical analysis of multivariate observations. we have had to sacrifice xv . is concerned with statistical methods for describing and analyzing multivariate data. We do not assume the reader is familiar with matrix algebra. The Chapter 2 supplement provides a summary of matrix algebra results for those with little or no previous exposure to the subject. The proofs may be ignored on the first reading. highlights the more important matrix algebra results as they apply to multivariate analysis. in Chapter 2. Rather. This supplementary material helps make the book selfcontained and is used to complete proofs.:l: .
(2) carry out the analyses dictated by exercises. we have removed the long proofs of Results 7. we would suggest a quick pass through the first four hapters (concentrating primarily on the material in Chapter 1. much of the book can successfully be covered in one term.For information on additional forsale supplements that may be used with the book or additional titles of interest. Here. we agrams and verbal descriptions to teach the corresponding theoretical developments.3 and 11. fr h t Each instructor will undoubtedly omit certam sectIons om some c ap ers to cover a broader collection of topics than is indicated by these two choices. To make the methods of multivariate analysis more prominent in the text." ter .6. seq . These chapters represent the heart of the book. and so forth are helpful. principal components. discriminant analysis and clustering. we start with a formulatIOn of the population delineate the corresponding sample results.7.prenhall. 2. or (3) analyze the data using methods other than the ones we have used or . "Multivariate Normal Distribution. stock price rates of return. . one mIght dISCUSS the comparison of mean vectors. .prenhall. Consolidated previous Sections 11. The could feature the many "worke? out" examples included in these sections of the text. Click on "Multivariate Statistics" and then click on our book. An Instructors Solutions Manual is available on the author's website accessible through www. In particular. • Thirty seven new exercises and twenty revised exercises with many of these exercises based on the new data sets. Instructors' Solutions Manual. Possible a uences for a onesemester (two quarter) course are indicated schematically. canonical correlation.xvi Preface Preface xvii The resulting presentation IS rather SUCCInct and difficult the fIrst We hope instructors will be a?le to compensat.3 to include "An Approximation to the.comlstatistics.of Chapter 7.6. For example. The division of the methodological chapters (5 through 12) Into three umts instructors some flexibility in tailoring a course to their needs. Expanded Sections 7. car body assembly measurements. appropnate for theIr students and by toning them tlown If necessary. Instructors may rely on dI . psychological profile scores. and Concho water snake data. please visit the Prentice Hall web site at www. Section 6. If the students have uniformly strong mathematical backgrounds. 2. cell phone tower breakdowns. Some sections are harder than others. • Twelve new data sets including national track records for men and women. Typically. onsistency of level. We have found individual dataanalysis projects useful for integrating material from several of the methods chapters.5. Section 12. 2. In addition." and Chapter 4.prenhall. Expanded Section 6. and those that rely on realworld and computer software. Distribution of T2 for Normal Populations When Sample Sizes are not Large" 5. Section 11.6 and 7.2.comlstatistics. • Four new data based examples and fifteen revised examples.2. pulp and paper properties measurements. peruse Chapo 3 "Sample Geometry.dit and uncluttered. but they cannot be assimilated without much of the material in the Chapters 1 4. The are of two types: those that are simple and hose calculations can be easily done by hand.1 and placed them on a web site accessible through www.7 Logistic Regression and Classification 3. Even those readers with a good of matrix algebra or those willing t accept the mathematical results on faIth should. Mali family farm data.7 to include Akaike's Information Criterion 6. all full data sets saved as ASCII files that are used in the book are available on the web site. Sections 2. at the very least.e for the In by JUdiciously choosing those and subsectIOns. These will provide an opportunity to (1) duplicate our analyses. factor analysis.7. and the "assessing normality" material in Chapter followed by a selection of methodological topics. our rather complete treatments of multivariate analysis of variance (MANOVA). For most students. Users of the previous editions will notice several major changes in the sixth edition.5 on two group discriminant analysis into single Section 11.10 and 10. summarized a volumi?ous amount . corn.4. • Six new or expanded sections: 1. Our approach in the methodological is to the discussion. factor analysis. Getting Started Chapters 14 New material.6 Testing for Equality of Covariance Matrices 2. and liberally illustrate every:'ing examples. CHANGES TO THE SIXTH EDITION ORGANIZATION AND APPROACH The methodological "tools" of multlvariate analysis are contained in Chapters 5 through 12. even though they may not be specifically covered in lectures. regression analysis.5 Clustering Based on Statistical Models 4. discriminant analysis.1. and 3.3 Web Site.
Bruce McCullough. we would like to thank Petra Recter. Sam Kotz.cs ""iii Preface . Richard Kiltie. Shanhong Guan. Finally. Joanne Wendelken and the rest of the Prentice Hall staff for their help with this project. Michael Bell. and Stanley Wasserman. and we appreciate her expertise and willingness to endure cajoling of authors faced with publication deadlines. A number of individuals helped guide various revisions of this book. Jialiang Li and Zhiguo Xiao for their help with the calculations for many of the examples. K. Steve Coad.edu D. and Alison Pollack for implementing a Chernoff faces program. We would also like to give special thanks to Wai K wong Cheang. We also acknowledge the feedback of the students we have taught these past 35 years in our applied multivariate analysis courses. University of Florida. Debbie Ryan. University of Minnesota. Sivakumar University of Illinois at Chicago. George Mason University.wisc. University of Virginia. We must thank Dianne Hall for her valuable help with the Solutions Manual. and we are grateful for their suggestions: Christopher Bingham. Linda Behrens. Michigan State University. Him Koul. We are indebted to Cliff GiIman for his assistance with the multidimensional scaling examples discussed in Chapter 12.edu Applied Multivariate Statistical Analysis . University of Michigan. lohnson rich@stat. Eric Smith. R.ACKNOWLEDGMENTS We thank many of our colleagues who helped improve the applied aspect of the book by contributing their own data sets for examples and exercises. University of Illinois at Urbanaciiampaign. Wichern dwichern@tamu. Jacquelyn Forer did most of the typing of the original draft manuscript. Their comments and suggestions are largely responsible for the present iteration of this work. A. Steve Verrill for computing assistance throughout. Drexel University. Virginia Tecn. W. Shyamal Peddada.
Most of our emphasis will be on the analysis of measurements obtained without actively controlling or manipulating any of the variables on which the measurements are made. more mathematics is required to derive multivariate statistical techniques for making inferences than in a univariate setting. Throughout this iterative learning process. the human mind is overwhelmed by the sheer bulk of the data.of methodology is called multivariate analysis. Because the data include simultaneous measurements on many variables.1 Introduction Scientific inquiry is an iterative learning process. variables are often added or deleted from the study. Additionally. Although the experimental design is ordinarily the most important part of a scientific investigation. the complexities of most phenomena require an investigator to collect observations on many different variables. some mathematical sophistication and a desire to think quantitatively will be required. Objectives pertaining to the explanation of a social or physical phenomenon must be specified and then tested by gathering and analyzing data. this body . In turn. The need to understand the relationships between many variables makes multivariate analysis an inherently difficult subject. Often. making heavy use of illustrative examples and a minimum of mathematics. Only in Chapters 6 and 7 shall we treat a few experimental plans (designs) for generating data that prescribe the active manipulation of important variables. Nonetheless.Chapter ASPECTS OF MULTIVARIATE ANALYSIS 1. Thus. Our objective is to introduce several useful multivariate techniques in a clear manner. This book is concerned with statistical methods designed to elicit information from these kinds of data sets. We have chosen to provide explanations based upon algebraic concepts and to avoid the derivations of statistical results that require the calculus of many variables. it is frequently impossible to control the . an analysis of the data gathered by experimentation or observation will usually suggest a modified explanation of the phenomenon.
H. invariably. page 89. Rather. formulated in terms of the parameters of multivariate populations. should provide you with an appreciation of the applicability of multivariate techniques acroSS different fields. This should not. One classification distinguishes techniques designed to study interdependent relationships from those designed to study dependent relationships. The phenomenon being studied is represented as simply as possible without sacrificing valuable information. Specific statistical hypotheses. plus the examples in the text. making the implementation step easier. The nature of the relationships among variables is of interest.) • A matrix of tactic similarities was developed from aggregate data derived from professional mediators. It allows one to Applications of Multivariate Techniques 3 maintain a proper perspective and not be overwhelmed by the elegance of some of the theory: If the results disagree with informed opinion. (See [2].14. (See [21].) • ltack records from many nations were used to develop an index of performance for both male and female athletes. economics. the choice of methods and the types of analyses employed are largely determined by the objectives of the investigation. but we feel it is appropriate for a broader range of methods. be implemented on a computer. they are probably wrong. In Section 1. 2. geology. Chapters in this text are divided into sections according to inference about treatment means. how? 4.) You should consult [6] and [7] for detailed accounts of design principles that. (See Exercise 1. You should keep it in mind whenever you attempt or read about a data analysis. (See [13]. These problems.) Sorting and grouping • Data on several variables related to computer use were employed to create clusters of categories of computer jobs that allow a better determination of existing (or planned) computer utilization. Multivariate analysis is a "mixed bag. Are all the variables mutually independent or are one or more variables dependent on the others? If so.) • Measurements of several physiological variables were used to develop a screening procedure that discriminates alcoholics from nonalcoholics. The objectives of scientific investigations to which multivariate methods most naturally lend themselves include the following: L Data reduction or structural simplification. do not admit a simple logical interpreta tion. 5. (See [26]. Sorting and grouping. are tested. There is no magic about numerical methods. Prediction. These descriptions are organized according to the categories of objectives given in the previous section. Hypothesis construction and testing. however.2.f t Data reduction or simplification I • Using data on several variables related to cancer patient responses to radiotherapy. This may be done to validate assumptions or to reinforce prior convictions." It is difficult to establish a classification scheme for multivariate techniques that is both widely accepted and indicates the appropriateness of the techniques. inference about covariance structure.) • Multispectral image data collected by a highaltitude scanner were reduced to a form that could be viewed as images (pictures) of a shoreline in two dimensions. be considered an attempt to place each method into a slot.2 Chapter 1 Aspects of Multivariate Analysis generation of appropriate data in certain disciplines. not sausage machines automatically transforming bodies of numbers into packets of scientific fact. Investigation of the dependence among variables. Recent advances in computer technology have been accompanied by the development of rather sophisticated statistical software packages. Regardless of their origin. and many ways in which they can break down.) . The statement was made in a discussion of cluster analysis. rules for classifying objects into welldefined groups may be required. based upon measured characteristics. Of course. multivariate techniques must. fortunately. 3. for example.) • Data on several variables relating to yield and protein content were used to create an index to select parents of subsequent generations of improved bean plants. 1. From this matrix the number of dimensions by which professional mediators judge the tactics they use in resolving disputes was determined. and sociology. we list a smaller number of practical problems designed to illustrate the connection between the choice of a statistical method and the objectives of the study. Alternatively. ecology. However.15. we offer the following short descriptions_of the results of studies from several disciplines. and do not show up clearly in a graphical presentation. Other methods are ad hoc in nature and are justified by logical or commonsense arguments. It will become increasingly clear that many multivariate methods are based upon an underlying proBability model known as the multivariate normal distribution. (This is true. in business. Another classifies techniques according to the number of populations and the number of sets of variables being studied. and techniques for sorting or grouping. We conclude this brief overview of multivariate analysis with a quotation from F.) • Data related to responses to visual stimuli were used to develop a rule for separating people suffering from a multiplesclerosiscaused visual pathology from those not suffering from the disease. also apply to multivariate situations. C Marriott [19].2 Applications of Multivariate Techniques The published applications of multivariate methods have increased tremendously in recent years. They are a valuable aid to the interpretation of data. as we did in earlier editions of this book. It is hoped that this will make interpretation easier. Groups of "similar" objects or variables are created. a simple measure of patient response to radiotherapy was constructed. . many of our examples are multifaceted and could be placed in more than one category. (See [23]. in order to give some indication of the usefulness of multivariate techniques. Relationships between variables must be determined for the purpose of predicting the values of one or more variables on the basis of observations on the other variables. (See Exercise 1. (See [8] and [22]. It is now difficult to cover the variety of realworld applications of these methods with brief discussions.
which quantitatively portray certain features of the data. on the other hand.) • Data on several variables were used to determine whether different types of firms in newly industrialized countries exhibited different patterns of innovation.6. Prediction Throughout this text. on the one hand. Internal Revenue Service uses data collected from tax returns to sort taxpayers into two groups: those that will be audited and those that will not. A reliable classification of tumors is essential for successful diagnosis and treatment of cancer. individual. contains the data consisting of all of .4 Chapter 1 Aspects of Multivariate Analysis . (See [31]. selects a number p 1 of variables or characters to record . (See [16] and [25]. are also necessary to any description. (See [18]. For example. were used to discover why some firms are product innovators and some firms are not. Xjk = • The associations between test scores. (See [17]. We now introduce the preliminary concepts underlying these first steps of data organization. Summary numbers. The values of these variables are all recorded for each distinct item. we are going to be concerned with analyzing measurements made on several variables or characteristics. seeking to understand a social or physical phenomenon. characteristics of the paper made from them are used to examine the relations between pulp fiber properties and the resulting paper properties. (See [9]. (See [3]. and variables related to the business environment and business organization. (See [28]. (See [15].the observations on all of the variables.) Investigation of the dependence among variables The preceding descriptions offer glimpses into the use of multivariate methods in widely diverse fields. . as quantified by test scores.) • Measurements of variables related to innovation. called X. The goal is to determine those fibers that lead to higher quality paper. of n rows and p columns: Xll X21 X12 Xn Xj2 Xn2 Xlk X2k Xjk Xnk xl p X2p Xjp x np X Xjl Xnl The array X. (See Exercise 1. (See [12]. then. T 1.) Hypotheses testing measurement ofthe kth variable on the jth item Consequently.) • Data on several variables related to the size distribution of sediments were used to develop rules for predicting different depositional environments. • Data on several variables were used to identify factors that were responsible for client success in hiring external consultants.) .3 The Organization of Data The Organization of Data 5 • The U.) Or we can display these data as a rectangular array.) • Data on many variables were used to investigate the differences in structure of American occupations to determine the support for one of two competing sociological theories.) • Measurements of pulp fiber characteristics and subsequent measurements of . and several college performance variables were used to develop predictors of success in college.S. or whether there was a noticeable difference between weekdays and weekends. n measurements on p variables can be displayed as follows: Variable 1 Item 1: Item 2: Itemj: Itemn: Xu X21 Xjl Xnl Variable 2 X12 X22 Xj2 Xn2 Variablek Xlk X2k Xjk Xnk Variable p xl p X2p Xjp xnp • Several pollutionrelated variables were measured to determine whether levels for a large metropolitan area were roughly constant throughout the week. We will use the notation Xjk to indicate the particular value of the kth variable that is observed on the jth item. and several high school performance variables.) • Experimental data on several variables were used to see whether the nature of the instructions makes any difference in perceived risks. (See [10).) • Measurements on several accounting and financial variables were used to develop a method for identifying potentially insolvent propertyliability insurers. graphs and tabular arrangements are important aids in data analysis. (See [27]. or experimental unit. Arrays Multivariate data arise whenever an investigator.) • The associations between measures of risktaking propensity and measures of socioeconomic characteristics for toplevel business executives were used to assess the relation between risktaking behavior and performance. That is. (See [7] and [20].) • cDNA microarray experiments (gene expression data) are increasingly used to study the molecular variations among cancer tumors. or trial. These measurements (commonly called data) must frequently be arranged and displayed in various ways.
"'" Xjk . there will be p sample means: Xk Variable 1 (dollar sales): 42 52 48 58 Variable 2 (number of books): 4 5 4 3 Using the notation just introduced. for p variables. And the average of the squares of the distances of all of the numbers from the mean provides a measure of the spread. as gains are attained in both (1) describing numerical calculations as operations on arrays and (2) the implementation of the calculations on computers. The sample mean can be computed from the n measurements on each of the p variables. in the numbers. is known as the sample standard deviation. and its very mass poses a serious obstacle to any attempt to visually extract pertinent information. . or variation.6 Chapter 1 Aspects of MuItivariate Analysis Example 1. it is convenient to use double subscripts on the variances in order to indicate their positions in the array.. Let Xll. Let the first variable be total dollar sales and the second variable be number of books sold. l is the sample mean of the 2 Sk XiI'S. is small.1 rather than n. is a descriptive statistic that provides a measure of locationthat is. we are concerned only with their value as devices for displaying data. At this point. We consider the manipulation of arrays of numbers in Chapter 2. . although the S2 notation is traditionally used to indicate the sample variance. among other things. we have Xll = X12 = =  2: Xjk n j=l j=l 1 n k = 1. n). . We shall rely most heavily on descriptive statistics that measure location. then Xl is also called the sample mean for the first variable.. In this situation. Later we shall see that there are theoretical reasons for doing this."'" Xjl  n _2 xd and the data array X is X = with four rows and two columns. and it is particularly appropriate if the number of measurements.p (11) A measure of spread is provided by the sample variance. which now use many languages and statistical packages to perform array operations. . The formal definitions of these quantities follow. in general. variation. known as descriptive statistics. the number of books sold and the total amount of each sale. . the arithmetic average. so that. Consider n pairs of measurements on each of variables 1 and 2: [xu]. Much of the information contained in the data can be assessed by calculating certain summary numbers.p Descriptive Statistics A large data set is bulky. X2I>"" Xnl be n measurements on the first variable. The two versions of the sample variance will always be differentiated by displaying the appropriate expression. . Then the arithmetic average of these measurements is (13) I I The square root of the sample variance..2. or sample mean. . Each receipt provided. we introduce the notation Skk to denote the same variance computed from measurements on the kth variable.1 (A data array) A selection of four receipts from a university bookstore was obtained in order to investigate the nature of book sales. Therefore.Xk 1 ( _ n k = 1. n. a "central value" for a set of numbers. . and we have the notational identities k=I.. we shall eventually consider an array of quantities in which the sample variances lie along the main diagonal.. 1\vo comments are in order. are r I The Organization of Data 7 If the n measurements represent a subset of the full set of measurements that might have been observed. many authors define the sample variance with a divisor of n . in tabular form. This measure of variation uses the same units as the observations.2. Second.. First. X12 [X2l]. Xjl and Xj2 are observed on the jth experimental item (j = 1. 42 52 48 58 4l 5 4 3 = .p (12) j=l • Considering data in the form of arrays facilitates the exposition of the subject matter and allows numerical calculations to be performed in an orderly and efficient manner..2. defined for n measurements on the first variable as X4l X42 42 4 X2l X22 = 52 = 5 X3l X32 = 48 = 4 = 58 = 3 where Xl 2 SI = . Then we can re&ard the corresponding numbers on the receipts as four measurements on two variables. we have )2 . . A measure of linear association between the measurements of variables 1 and 2 is provided by the sample covariance . ..2. [Xnl] X22 Xn2 That is. and linear association.. We adopt this terminology because the bulk of this book is devoted to procedUres designed to analyze samples of measurements from larger collections. •. In general. The efficiency is twofold. For example. Suppose the data.
n. .2. Here r measures the strength of the linear association. The value of r must be between 1 and +1 inclusive. provided that the constants a and c have the same sign. This measure of the linear association between two variables does not depend on the units of measurement. Sl2 will be negative. Suspect observa.p. Sample variances and covariances [u Spl 'pI Sl2 S22 S2p '" sp2 spp Sample correlations R r12 1 'p2 '" r2p ] ] (18) 1 .Xi)(Xjk . If there is no particular association between the values for the two variables. this implies a lack of linear association between the components. 3.2. sl2 will be positive.2. . convey all there is to know about the aSSOCIatIOn between two variables.. in general.p. .) (Xjk . . These quantities are Wkk :L (Xji j=l n = x.y calculated and analyzed.. . the sign of r indicates the direction of the association: r < 0 implies a tendency for one value in the pair to be larger than its average when the other is smaller than its average.by these statistics. where the product of the square roots of the sample variances provides the standardization. Arrays of Basic Descriptive Statistics Sample means cause both sets are centered at zero and expressed in standard deviation units. They provide cogent numerical sum aSSOCIatIOn the data do not exhibit obvious nonlinear patterns of aSSOCIation and when WIld observations are not present. Nonlinear associations can exist that are not revealed .2. see [14]). The sample covariance Sik j The Organization of Data.p (16) and Wik = 2: (Xji j=l n x. . p. The value of rik remains unchanged if the measurements of the ith variable are changed to Yji = aXji + b. 9 = :L n 1 n _ (Xji . .p (14) measures the association between the ·ith and kth variables.. k = 1. n. . and r > 0 implies a tendency for one value of the pair to be large when the other value is large and also for both values to be small together.. We note that the covariance reduces to the sample variance when i = k. k = 1.2. Notice that rik has the same value whether n or n . Sik = Ski for all i and k . The final descriptive statistic considered here is the sample correlation coefficient (or Pearson's productmoment correlation coefficient. The sample correlation coefficient is a standardized version of the sample covariance.2. j = 1. covariance and correlation coeffi are routi':lel. and the values of the kth variable are changed to Yjk = CXjk + d. ...2.Xk) (15) j=I 2: (Xjk  n Xk)2 k = 1. If large values from one variable occur with small values for the other variable. The descriptive statistics computed from n measurements on p variables can also be organized into arrays. Sl2 will be approximately zero. and the small values also occur together. Although the signs of the sample correlation and the sample covariance are the same. Note rik = rki for all i and k. . and Sik' The sample correlation coefficient rik can also be viewed as a sample co variance.p (17) = 1. If r = 0. . sa. or association along a line. p and k = 1.. the correlation is ordinarily easier to interpret because its magnitude is bounded. If large values for one variable are observed in conjunction with large values for the other variable. little exists. . To summarize.Xk) i j=l = 1. Suppose the original values 'Xji and Xjk are replaced by standardized values for i (Xji . The sample correlation coefficient is just the sample covariance of the standardized observations.2.1 is chosen as the common divisor for Sii.8 Chapter 1 Aspects of Multivariate Analysis f if or the average product of the deviations from their respective means. The sample correlation coefficient for the ith and kth variables is defined as The Sik and rik do not. . . Otherwise. The values of Sik and rik should be quoted both with and without these observations. Their values are less informative other kinds of association. In spite of these shortcomings. On the other hand. .. these quantities can be very sensIttve to "wild" observations ("outIiers") and may indicate association when in fact. The sum of squares of the deviations from the mean and the sum of crossproduct deviations are often of interest themselves... ..2...Xk) i = 1.. . j == 1.tions must be accounted for by correcting obvious recording mIstakes and by takmg actions consistent with the identified causes.. Covariance and corr'elation provide measures of lmear aSSOCIatIOn... 2. ... . . the sample correlation r has the following properties: Sn = 1. Moreover.) (Xjk .
.= vs. . Although it is impossIble to simultaneously plot all the measurements made on several variables and study configurations. Simple. Each. we have a total of four measurements (observations) on each variable.5 ! '" :a = 1.50l + (48 .tent al. The first subscript on an entry in arrays Sn and R indicates the row.5J 0 4 6 8 ! Sn = [ 1.4? + (4 .36 = rl2 so . the entries in symmetric positions about the main northwestsoutheast diagonals in arrays Sn and R are the same. The coordinates of the points are determined by the measurements: (3.1. (4. The arrays Sn and R consist of p rows and p columns.4)2) 4 2: xd • • • • • • = ..4) + (52 .1.lE 10 Chapter 1 Aspects of Multivariate Analysis The sample mean array is denoted by X. receipt yields a pair of measurements. VS.5 2 4 6 7 8 10 2 5 5 7. duced in Example 1. aids in data analysis. plots of individual variables and plots of pairs of variables can stIll be very informative. and number of books sold. many valuable insights can be obtained from !he data by plots with paper and pencil. the sample variance and array by the capital letter Sn.7. Sn' and R.50)(5 . Since there are four receipts.4) + (58 .5 The sample variances and covariances are Sll = 4 data ?lotted as seven points in two dimensions (each axis represent a vanable) III FIgure 1. then..5 • • 2 4 ! 5 8 6 Dot diagram • ! ! 10 I .2 (The arrays .5). Sophisticated computer programs and display equipn.50)(4 . = . two.36 1 lE Graphical Techniques are but frequently neglected. . It is good statistical practIce to plot paIrs of varIables and visually inspect the pattern of association.5. On the other hand.5 Sl2 = 2: j=l (Xjl  XI)( Xj2  X2) • • • CS •• Cl • 8 6 4 2 8 6 4 2 • = . or three dimenSIOns WIth relatIve ease.5). Since Sik = Ski and rik = rki for all i and k.50)(3 .. and the sample correlation array by R.50)(4 . R _ [ Sl2 V34 v'3 1.low the luxury of visually examining data in one. 2: j=l 4 (Xjl  xd + (58 .4f + (5 . the following seven pairs of measurements on two variables: Variable 1 Variable2 . SR' and R for bivariate data) Consider the data intro The Organization of Data The sample correlation is r12 r21 II = . XI Figure 1.4) + (48 . (5. Example 1. The resulting twodimensional plot IS known as a scatter diagram or scatter plot.50)2 + (52 . Find the arrays X.5). Thesample means are Xl = 1 2: 4 4 Xjl j=l j=l = 1(42 + 52 + 48 + 58) = 50 X2 = 12: Xj2 = + 5 + 4 + 3) = 4 (Xl): (X2): 3 5 4 5. total dollar sales.4» S21 = Sl2 and 34 1. p. The subscrIpt on the array Sn is a mnemonic device used to remind you that n is employed as a divisor for the elements Sik' The size of all of the arrays is determined by the number of variables.50)2) = 34 X2 X2 = S22 = . The array x is a single column with p rows.c. and the arrays are said to be symmetric. Consider.1 A scatter plot and marginal dot diagrams.5 .4f + (3 . yet elegant and for data are available in [29]..50)2 (Xj2  • 10 10 j=l 1«4 . the second subscript indicates the column.
but that the scatter diagrams are decidedly different.3 Profits per employee and number of employees for 16 publishing firms..1 and 1.1.1. The information contained in the singlevariable dot diagrams can be used to calculate the sample means Xl and X2 and the sample variances SI 1 and S22' (See Exercise 1..3 (The effect of unusual observations on sample correlations) Some fi. we find that the marginal dot diagrams are the same. the descriptive statistics for the individual variables Xl. In Figure 1.39 . and their coordinates can be used to calculate the sample covariance s12' In the scatter diagram of Figure 1. but comparatively small (negative) profits per employee. Sports Illustrated magazine provided data on Xl = player payroll for National League East baseball teams. Of course.2 Scatter plot and dot diagrams for rearranged data.1 are separate plots of the observed values of variable 1 and the observed values of variable 2.2. large values of Xl are paired with small values of X2 and small values of Xl with large values of X2' Consequently.• 12 Chapter 1 Aspects of Multivariate Analysis Also shown in Figure 1. Comparing Figures 1. will now be negative. but the sample covariance S12. article on money in sports. Dot diagrams and scatter plots contain different kinds of information. respectively. These plots are called (marginal) dot diagrams.1. The scatter plot in Figure 1. but is "typical" in terms of profits per employee. SI 1> and S22 remain unchanged. 1990.3. this causeeffect relationship cannot be substantiated. X2 40 5 4 5. The different orientations of the data in Figures 1. Example 1.56 = { _ . They can be obtained from the original observations or by projecting the points in the scatter diagram onto each coordinate axis.) The scatter and dot diagrams for the "new" data are shown in Figure 1. The next two examples further illustrate the information that can be conveyed by a graphic display.2. I I • f The sample correlation coefficient computed from the values of Xl and X2 is . Dun & Bradstreet is the largest firm in terms of number of employees.39 ..50 for all 16 firms for all firms but Dun & Bradstreet for all firms but Time Warner for all firms but Dun & Bradstreet and Time Warner r12 X2 X2 • • • •• • • 10 8 10 8 • • • •• 4 It is clear that atypical observations can have a considerable effect on the sample correlation coefficient. • 6 6 4 2 4 2 0 2 • 6 • 8 10 XI t • 2 • 4 t • t 6 t 8 I 10 . they are nqt competitors. nancial data representing jobs and productivity for the 16 largest publishing firms appeared in an article in Forbes magazine on April 30.' '' 30 20 0 0 •• • • (We have simply rearranged the values of variable 1. The data for the pair of variables Xl = employees Gobs) and X2 = profits per employee (productivity) are graphed in Figure 1. f tE Co] 10 0 .§ 8'. X2. We have added data on X2 = wonlost percentage "for 1977.4 supports the claim that a championship team can be bought..5 S. XI Figure 1.2. TIme Warner has a "typical" number of employees. large values of Xl occur with large values of X2 and small values of Xl with small values of X2' Hence. We have labeled two "unusual" observations.1978. so that the measurements on the variables Xl and X2 were as follows: Variable 1 Variable 2 (Xl): (X2): The Organization of Data 13 Example 1. the fact that the marginal dot diagrams are the same in the two cases is not immediately apparent from the scatter plots.1 and 1. The results are given in Table 1. As an illustration. 0 •• • • • • • Time Warner • Dun & Bradstreet • 1 10 Employees (thousands) Figure 1. suppose the data preceding Figure 1.5 5 6 4 2 7 2 10 8 5 3 7.2 are not discernible from the marginal dot diagrams alone.4 (A scatter plot for baseball data) In a July 17. statistics cannot answer the question: Could the Mets have won with $4 million to spend on player salaries? . The information in the marginal dot diagrams is not sufficient for constructing the scatter plot. The two types of graphical procedures complement one another. because the experiment did not include a random assignment of payrolls. At the same time.) The scatter diagram indicates the orientation of the points. S12 will be positive. which measures the association between pairs of variables.1 had been paired differently. Thus.
21 78.4 Salaries and wonlost percentage from Table 1.91 122.802 .22 54.31 112.805 .60 116.35 68. 74. Because of the orientation of fibers within the paper.725.51 56.840 .62 53.815 .20 131.759 .80 124.788 .841 . we have regarded the six paired observations in Table 1.50 127.2 PaperQuality Measurements 15 Team Philadelphia Phillies Pittsburgh Pirates St.802 .42 72.5.70 115.80 135.500 .824 .826 .1 as the coordinates of six points in twodimensional space.801 Machine direction 121.971 .10 125.80 125.820 .  Example I.91 68.33 76.02 73.816 .20 74.41 73.70.89 71.70 121.68 50. Table 1.31 114.25 72.23 68.463 . The latter are on a different scale with this 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 .81 109.836 .788 .776 .00 131.802 .1 1977 Salary and Final Record for the National League East X2= The Organization of Data Table 1.33 75.822 .48 70.758 Source: Data courtesy of SONOCO Products Company.832 .80 130.796 .803 .469.450 1.39 69.40 120.822 .93 53.4.497.51 109.1.10 130.20 120.60 103. The scatter plots are arranged as the offdiagonal elements of a covariance array and box plots as the diagonal elements.828 .90 123.12 71.795 .645.816 .845 .20 118.70 126.41 127.53 70.822 .10 70.782 .575 1.68 78.70 117.88 68.54 71.395 Strength Specimen 1 2 3 4 5 6 7 8 9 10 Density . To construct the scatter plot in Figure 1.475 1.90 124.836 .900 2.10 118.42 70.810 .623 .10 115. page' 16. the machine direction.52 48.782.51 110.80 107.70 129.806 .50 126.10 131. or at right angles to.875 1.28 76.842 .93 53.80 Cross direction 70.843 .10 120.485.819 .60 118.67 52.85 51.770 .512 .47 78.30 115.00 125. .42 11 • o ••• • • Player payroll in millions of dollars Figure 1.29 72.800 wonlost percentage .593 .772 .10 50.10 79. Louis Cardinals Chicago Cubs Montreal Expos New York Mets Xl = player payroll 3.759 .2 shows the measured values of Xl X2 X3 = density (grams/cubic centimeter) = strength (pounds) in the machine direction = strength (pounds) in the cross direction A novel graphic presentation of these data appears in Figure 1.31 110. it has a different strength when measured in the direction produced by the machine than when measured across.50 127.71 113.75 80.68 74.14 Chapter 1 Aspects of Multivariate Analysis Table 1.772 .64 76.S (Multiple scatter plots for paper strength measurements) Paper is manufactured in continuous sheets several feet wide. The figure allows us to examine visually the grouping of teams with respect to the variables total payroll and wonlost percentage.60 53.
. Consider the natural extension of the scatter plot to p dimensions. However.. ..5 74.0 67.97 Strength (MD) Strength (CD) The Organization of Data 17 n Points in p Dimensions (pDimensional Scatter Plot).. Limited as we are to a three:dimensional world. .067 10.33 Med 70.:.. . . Max .5 150. .890 SVL 73.5 64. .::. . 9.0 69. . this kind of analysis can be misleading if one variable has a much larger variance than the others. . .015 10. where the p measurements ·i " 0 Med Min .:.0 HLS 136. . The next example illustrates a threedimensional scatter plot...2. .. .* ••••*' .. These scatter plot arrays are further pursued in our discussion of new software graphics in the next section. If the task is not too great to warrant the effort.76 .5 .0 77..0 123. Min 103.6.004 8..0 47. . ... : : . .0 135.. 0.5 68. S " '" OIl .888 7.493 ..5.0 61.0 129. these representations can actually be graphed.70 .601 7.e' . we first calculate the standardized values. Knowing the position on a line along the major axes of the cloud of poinfs would be almost as good as knowing the three measurements Mass.5 Scatter plots and boxplots of paperquality data from Thble 1.. Although there are three size measurements..0 118.5 135.0 Lizard 14 15 16 17 18 19 20 21 22 23 24 25 Mass 10. :.93 Example 1. . p variables are simultaneously oon items.81 0.0 59. data provide an important conceptual framework for Vlewmg multIvanable methods... is given in grams while the snoutvent length (SVL) and hind limb span (HLS) are given in millimeters.273 2. for all pairs. .. '...4 .199 6. Med r r T ' I I 121.0 66.0 126. 4*.r .135.610 7.213 8. there is one unusual observation: the density of specimen 25.. The scatter plots can be mspected for patterns and unusual observations. In cases where it is possible to capture the essence of the data m three dimensions.5 69. we cannot always picture an entire set of data. Min 48..0 116. The weight.3 Lizard Size Data Lizard 1 2 3 4 5 6 7 8 9 10 11 12 13 Mass 5. or mass. we can ask whether or not most of the variation is primarily restricted to two dimensions or even to one dimension...5 142.063 6. Bonine. . .0 75.622 SVL 59. SVL.. Max : : ... Xjp units along the pth axis.953 7. .447 15.. .0 75.0 62..0 74. .149 9..0 141.= 16 Chapter 1 Aspects of Multivariate Analysis Density Max 0. . However. Clearly most of the variation is scatter about a onedimensional straight line.. so that the jth point is Xjl units along the first axis. important variables and.. . .0 97.158 12. In Figure 1..0 117... The coordinate axes are taken to correspond to the variables.. software so we use only the overall shape to provide information on and possible outliers for each individual characteristic. Consequently.. .5 62.5 67.0 In the general multiresponse situation.610 11.. . we construct the threedimensional scatter plot in Figure 1.0 162.0 70. .0 133.0 116. '. and HLS... The data are displayed in Table 1.401 9.978 6. .5 136. To help answer questions regarding reduced dimensionality.733 12. .0 86... Table 1.. Some of the scatter plots have patterns suggesting that there are two separate clumps of observations.: .526 10.5 123.. Groupings of items will manifest themselves in this representation. . Source: Data courtesy of Kevin E.6 (Looking for lowerdimensional structure) A zoologist obtained measurements on n = 25 lizards known scientifically as Cophosaurus texanus.132 6.049 5. .0 137. T 80.0 140.:..5 63.. but also will show similarities (and differences) among the n items. so the variables contribute equally to the variation Figure 1.5 HLS 113.5 139.. Scatter plots should be made for pairs of ..3..0 124...5 66..0 117.5 79.0 125. two further of t?e.1 . on the jth item represent the coordinates of a point in pdimensional space.091 10..0 73.. Xj2 units along the second. . The resulting plot with n points not only will exhibit the overall pattern of variability.. Zjk = (Xjk .. .
to sit at one's desk and examine the nature of multidimensional data with clever computergenerated pictures. for example. determines the ith point. As we shall see in Chapters 8 and 12. ternune y SVL 80 90 95 Figure 1. plot of standardized lizard data.. and distinguishable groupings can often be discerned by eye. sional scatter plot can often reveal group structure. 3 2 0 1 2 1 . Figure 1. The ith column. males are typically larger than females.4 Data Displays and Pictorial Representations The rapid development of powerful personal computers and workstations has led to a proliferation of sophisticated statistical software for data analysis and graphics..IS Cbapter 1 AspectS 0 f Multivariate Analysis Data Displays and Pictorial Representations 19 Figure 1. o· <Bo  • • HLS 5 50 60 70 er lot.1 3D scatter 1. Each column of X determines one of the points..  Athreedifnen . d b a line through the cloud of points. •. One good source for more discussion of graphical methods is [11].3. if multidimensional observations can be represented in two dimensions.8 3D scatter plot of male and female lizards.8 repeats the scatter plot for the original variables but with males marked by solid circles and females by open circles. to see if male and female lizards occupy different parts the for group structure in three dimensions) to Exam ple 1. Figure 1.. • p Points in n Dimensions. we show how the closeness of points in n dimensions can be related to measures of association between the corresponding variables . The n observations of the p variables can also be regarded as p points in ndimensional space.6. I space containing the size data.. 3 2 ZSVL· . These pictures are valuable aids in understanding data and often prevent many false starts and subsequent inferential problems. d vana es. by row.. In general.. • 80 90 • 15 \oTl 70 SVL 95 115 135 155 Figure 1. Clearly.3 are fmffmfmfmfmfm mmmfmmmffmff . In Chapter 3. relationships. We shall discuss and illustrate several methods for displaying multivariate data in two dimensions..7 gives the scatter plot for stanin the scatt . for the lizard hree_dimenslona in Table 1.. It is often possible. then outliers.. • • • consisting of all n measurements on the ith variable.6 3D scatter plot of lizard data from Table 1. there are several techniques that seek to represent pdimensional observations in few dimensions such that the original distances (or similarities) between pairs of observations are (nearly) preserved. 15 10 5 50 60 •• . . It IS m. The gender.. Pbl Most of the variation can be explamed by a smgle vanable de.
... •• : (a) .. 48.. 48.:.. . Corresponding interpretations hold for the other scatter plots in the figure..' . . .. . ' .10(b)... and X3 = strength in the cross direction. . . ..: .:.... . highlights the selected points in all of the other scatter plots and leads to the display in Figure 1.2.. . . .... .: (b) .. . .: r ... Selecting these points. .9 80. . .. .4.1O(a).9 135 80. . Notice that the variables and their threedigit ranges are indicated in the boxes along the SWNE diagonal... These data represent measurements on the variables Xl = density...3 ::' .. '\ ..9 .i.10..:.. .. . . " .. Example 1.. the obvious outlier in the (Xl...• ·:·25 .. . '.. . " . specimen 34. r ... 25 ···hs:. '.'".. '. . . .. • •• 135 ... .ll(a). 25 Density (Xl) . we notice that some points in. .I.. . r . . .:.. .......... "'. ... .. . ':.3 . " . ..:. • J •• .2. .758 . X2 = strength in the machine direction. .. .... 25 •1 104 ... .971 .. .. . .. '...971 .. .... ..758 . the Xl values are plotted along the horizontal axis. .}. Figure 1.... .. Figure 1.. :.. .. the picture in the upper lefthand corner of the figure is a scatter plot of the pairs of observations (Xl' X3)' That is.. 135 48..t .. :. ..: . .. .758 . .. .. Further checking revealed that specimens 1621. the (X2' X3) scatter plot seem to be disconnected from the others.I. .... ....9 shows twodimensional scatter plots for pairs of these variables organized as a 3 X 3 array.8 (Linked scatter plots and brushing) To illustrate linked twodimensional scatter plots..'\ ... ·. ::. X3) scatter plot of Figure 1..: . .' 25 . ..... where the outlier is labeled as specimen 25 and the same data point is highlighted in all the scatter plots...:... The operation of deleting this specimen leads to the modified scatter plots of Figure 1. and specimens 3841 were actually specimens . .e... the axes are reversed. The lower righthand corner of the figure contains a scatter plot of the observations (X3. X2) scatter plot but not in the (Xz. :.... . For example. .... ' .. .1 Density (Xl) .3 ..i. '. :.. ....20 Chapter 1 Aspects of Multivariate Analysis Data Displays and Pictorial Representations 21 Linking Multiple TwoDimensional Scatter Plots One of the more exciting new graphical procedures involves electronically connecting many twodimensional scatter plots.. Xl)' That is.. : .. Specimen 25 also appears to be an outlierin the (Xl... ... we refer to the paperquality data in Thble 1.. 80.. .. .. I' .:. .. and the X3 values are plotted along the vertical axis.. .971 Density (Xl) .9 Scatter plots for the paperquality data of Table 1... . "..9 creates Figure 1.10 Modified scatter plots for the paperquality data with outlier (25) (a) selected and (b) deleted..... .... .. . . 104 ·.=t. Figure 1. ".. From Figure 1. . .. The operation of marking (selecting)... X3) scatter plot. .I . .. ' . .:.. . .... ot.' . ' .: ..... :... .. using the (dashed) rectangle (see page 22). for example.'... 104 . ".....
· ..12(a).:Y. .... ". . . Another important new graphical technique uses software that allows the data analyst to view highdimensional data as slices of various threedimensional perspectives..:.'. This can be done dynamically and continuously until informative views are obtained..14.  scatter plots with (a) group of points selected and (b) points.9 (Rotated plots in three dimensions) Four different measurements of lumber stiffness are given in Table 4. . ... _ Scatter plots like those in Example 1... . : .... specimen (board) 16 and possibly specimen (board) 9 are identified as unusual observations. :..8 are extremely useful aids in data analysis. : (b) . deleted and the scatter plots rescaled...' . '" . including specimen 25. In Example 4. ..... ... · . :. . ...: . (a) Outliers clear.) . as in Figure l.. 114 Machine (x2) •.. .. (a) . x. 80.. .. Density (x.11 (b) . is given in [32].16 Cross (x3) . . . X4 space.i. •1 . .. . .3. A strategy for online multivariate exploratory graphical analysis.· 68....) ... . . . • • • ••:. .... .. .. Machine (x2) I I . :7 (b) Outliers masked.. Deleting the outlier and the cases corresponding to the older paper and adjusting the ranges of the remaining observations leads to the scatter plots in Figure 1.12 Threedimensional perspectives for the lumber stiffness data. (b). but then the brush could be moved to provide a sequence of highlighted points. .. (c) 9· (d) Specimen 9 large..1 I Modified ) X3 ..I...3 X2 . . 1. . and (c) contain perspectives of the stiffness data in the XbX2... " 'le. motivated by the need for a routine procedure for searching for structure in multivariate data. Example 1. . . ' .. Spinning the coordinate axes allows one to get a better 80.. .. :... Figure 1. r " . .. page 186. .. The process can be stopped at any time to provide a snapshot of the current situation..... .: ..U(a)... ' ..:... ... 104 Density (x...6 9 •••• • ]6. t. . . . A comprehensive discussion of dynamic graphical methods is available in [1]...3 .. .. .. X3 space.. . x2 x• ·9 x3 x.:. Figure 1. Good view of x2' x 3. ....1 135 . . .. Brushing could begin with a rectangle.: . .. The operation of highlighting points corresponding to a selected range of one of the variables is called brushing..".22 Chapter 1 Aspects of Multivariate Analysis Data Displays and Pictorial Representations 23 from an older roll of paper that was included in order to have enough plies in the cardboard being manufactured. Figures 1. These views were obtained by continually rotating and turning the threedimensional coordinate axes.. ... . •• :. . .. .. ·1 135 •.
Figure 1. Notice that Figures 1. A further spinning of the X2. the individual curves should be replotted in an array where similaritIes an? dIfferences are easily observed.0 2. and the two unusual observations are masked in this view. an assistant later failed to convert the field readings to kilograms when creating the electronic database.10 (Arrays of growth curves) The Alaska Fish and Game Department monitors grizzly bears with the goal of maintaining a healthy population. and 5 years of age. 66. Table 1.14 gives the array of seven curves for weIght.13 Combined growth curves for weight for seven female grizzly bears.3. repeated measurements of the same characteristic on the same unit or subject can give rise to a growth curve if an increasing. Specimen 9 is very large in all three coordinates. X3. pattern is expected. one of the outliers (16) is now hidden. The noticeable exception to a common pattern is the curve for bear 5. The correct weights are (45. Roberts. This is an example of a growth curve. . in this case. Is this an outlier or just natural variation in the population? In the field. Figure 1. we plot the weights versus the ages and then connect the weights at successive years by straight lines.5 3.e!' 100 50 0 / 2 3 4 Year 100 50 0 1 2 3 4 Year 5 Oj 5 68 5 Source: Data courtesy of H. Figure 1. or even an increasing followed by a decreasing. X3 axes gives Figure 1.5 5. 112) kilograms. . bears are weighed on a scale that Table 1. In general. First. plot. B. X4 space.12(d) gives one picture of the stiffness data in X2. Bear I 150 Bear 2 150 Bear 3 150 150 Bear 4 100 50 0 2 3 4 50 0 2 3 4 Year 5 §IOO 50 0 2 3 4 Year Bear 7 100 50 0 2 3 4 Year 5 5 5 Year Bear 5 Bear 6 150 150 Bear 1 2 3 4 5 6 7 Wt2 48 59 61 54 100 68 Wt3 59 68 77 43 145 82 95 Wt4 95 102 93 104 185 95 109 Wt5 Lngth2 Lngth3 82 102 107 104 247 118 111 141 140 145 146 150 142 139 157 168 162 159 158 140 171 Lngth4 168 174 172 176 168 178 176 Lngth5 183 170 177 171 175 189 175 150 100 50 / 2 3 4 Year . Measurements of length are taken with a steel tape. Bears are shot with a dart to induce sleep and weighed on a scale hanging from a tripod.12(b). for each bear.12(c). decreasing. Further inspection revealed that.12(a) and (d) visually confirm specimens 9 and 16 as outliers.ombmed. 200 150 100 50 2.5 Year 4.0 4. Additional insights can sometimes be gleaned from visual inspection of the slowly spinning data. 84.14 Individual growth curves for weight for female grizzly bears.12 allow one to identify readily observations that do not conform to the rest of the data and that may heavily influence inferences based on standard datagenerating models.0 3.13 shows the growth curves for all seven bears.24 Chapter 1 Aspects of Multivariate Analysis Data Displays and Pictorial Representations 25 understanding of the threedimensional aspects of the data. Figure 1. A counterclockwiselike rotation of the axes in Figure 1. _ Plots like those in Figure 1.4 gives the weights (wt) in kilograms and lengths (lngth) in centimeters of seven female bears at 2. Example 1.ecause it can be difficult to inspect visually the individual growth curves in a c.0 Graphs of Growth Curves When the height of a young child is measured at each birthday.4 Female Bear Data Figure 1.12(a) produces Figure 1. This gives an approximation to growth curve for weight. reads pounds. the points can be plotted and then connected by lines to produce a graph.4. It is this dynamic aspect that statisticians are just beginning to understand and exploit. Some growth curves look linear and others quadratic.
The lengths of. • figure 1.3 5 6 5 5 5 4 Year Year Year figure 1. One bear seemed to get shorter from 2 to 3 years old.15 Individual growth curves for length for female grizzly bears.:. !160 140 / 2 180 . pupil position. TIeating this value the variables are plotted on identical scales along eight equiangular rays ongmatmg from the center of the circle. large variable 8). consequently. none of these utilities appears to be similar to any other. then Boston Edison and Consolidated Edison are similar (small variable 6.3 140 4 5 / 2 3 180 5 160 j 140 4 5 / 2 6 5 5 4 3 4 5 Central Louisiana Electric Co. we can construct crrcles of a fixed (reference) radIUS WIth p equally spaced rays emanating from the center of the circle. beginning in the 12 o'clock position. There are eight vaflabIes. In this case some of the observations will be negative. page 688.. to standardize the observations. However. The observations can then be reexpressed so. 2. At first glance. and the stars can be grouped according to their (subjective) siniilarities.::t)iE++. Chernoff faces react to faces. each variable gets equal weight in the visualImpresslOn... mouth curvature. and Arizona Public Service. The observations on all variables were standardized... nose length.15 gives a growth curve array for length. the stars are distorted octagons.l . . and so forth) are determined by the measurements on the p variables. that the center of the circle represents the smallest standardized observation within the entire data set. Stars Suppose each data unit consists of . Cherooff [41 suggested representing pdimensional observatIOns as a twodimensional face whose characteristics (face shape.nonnegativ: observations on p. (4) Consolidated Edison Co. Example 1.e::::. utility [rrms in Table 12. but the researcher knows that the steel tape measurement of length can be thrown off by the bear's posture when sedated.the rep..3 4 5 180 5 140 2 r 3 180 5 160 . Central Louisiana Electric. Among the first five utilitIes. when constructing the stars. Each star represents a multivariate observation.11 (Utility data as stars) Stars representing the first 5 of the publi.resent the values of the variables. are shown in Figure 1. In two dimensions. and Commonwealth Edison are similar (moderate • variable 6. (2) 180 fo l60 3 140 T 1 2 / 3 . eye size. of way the stars are constructed.6. If we concentrate on the variables 6 (sales in kilowatthour [kWh1 use per year) and 8 (total fuel costs in cents per kWh). the smallest standardized observation for any variable was 1. The variables are ordered in a clockwise direction. 140 3 4 5 2 J 3 8 2 180 5 160 j 140 4 5 / 2 3 4 7 . (NY) (5) I Year Bear 5 Year Bear 6 Year Bear? Year 180 .variables..4.16.16 Stars for the first five public utilities.26 Chapter 1 Aspects of Multivariate Analysis Figure 1. The ends of the rays can be connected With straight lmes to form a star. moderate variable 8). We now turo to two popular pictorial representations of multivariate data in two dimensions: stars and Cherooff faces. It is often helpful.. Bear 1 Bear 2 Bear 3 Bear 4 Arizona Public Service (I) Data Displays and Pictorial 27 Boston Edison Co. (3) Commonwealtb Edison Co.c .
and the longitudinal changes in these indicators are thus evident at a glance. as the next example indicates.6. Some iteration is usually necessary before satisfactory representations are achieved. The data are ordinarily standardized within the computer program as part of the process for determining the locations.4.12 (Utility data as Cher!. We have subjectively grouped "similar" faces into seven clusters.18 Cherooff faces over time. (See [24]. the firms group largely according to geographical location. the faces are used to track the financial wellbeing of a company over time. sizes.18 illustrates an additional use of Chernofffaces.. The assignment of variables to facial features is done by the experimenter.17. the 22 public utility companies were represented as Chernoff faces. _ Jf!)(b 1978 1979 1977 1976 1975 _______________________________________________________ Figure 1. For our assignment of variables to facial features. clusters 2 and 3 to obtain four or five clusters. we can use Chernoff faces to communicate similarities or dissimilarities. . each facial feature represents a single financial indicator. As indicated. If a smaller number of clusters is desired. Leverage Example 1. _ CD0CD 00CD 18 11 14 8 2 !2 19 16 17 Figure 1. Liquidity . Profitability Constructing Chernoff faces is a task that must be done with the aid of a computer. With some training.13 (Using Chernoff faces to show changes over time) Figure 1. and 7 and.17 Cherooff faces for 22 public utilities. and different choices produce different results. Chernoff faces can handle up to 18 variables. Chernoff faces appear to be most useful for verifying (1) an initial grouping suggested by subjectmatter knowledge and intuition or (2) final groupings produced by clustering algorithmS.28 Chapter 1 Aspects of Multivariate Analysis As originally designed. and orientations of the facial characteristics.) In the figure.!off faces) From the data in Table 12. r Data Displays and Pictorial Representations 29 Cluster I Cluster 2 Cluster 3 Cluster 5 Cluster 7 Example 1. perhaps. We have the following correspondences: Variable Xl: X z: X3: X4: FIxedcharge coverage Rate of return on capital Cost per kW capacity in place Annual load factor X5: Peak kWh demand growth from 1974 X6: Sales (kWh use per year) X7: Percent nuclear Xs: Total fuel costs (cents per kWh)  Facial characteristic Halfheight of face Face width Position of center of mouth Slant of eyes Eccentricity (height) width of eyes 008wQ) QQ)QCJ)Q) 00 Q00CD 4 6 5 7 ID 3 22 21 15 13 9 Cluster 4 Cluster 6 20 Halflength of eye Curvature of mouth Length of nose The Chernoff faces are shown in Figure 1. we might combine clusters 5..
the straIghtlIne dIstance. .. . in due course. remain fixed... We have described some of them. . distance is unsatisfactory for most es... and x. In general.. P) =V(xD2 = )( + (x. .. are based upon the simple concept of distance. To illustrate. : xp) thatlie a constant squared distance.. This is because the inherent variability in the Xl direction is greater than the variability in the X2 direction.O.)2 +( Js. From these. It IS often deslfable to weIght CO?rdl subject to a great deal of variability less than those that are not highly variable... to weight an X2 coordinate more heavily than an Xl coordinate of the same value when computing the "distance" to the origin.Yp)lsglVenby d(P. upon division by the standard deviations. from P to the origin 0 = (0. . .. X2) from the origin 0 = (0. After taking the differences in variability into account.)'z)2 + . points equidistant from the origin lie on a hypersphere. IAt this point. This is because each coordinate contributes equally to the calculatlOn of ean distance.5 Distance Although they may at first appear formida?le.YI)2 + (X2 . at this point we use the term statistical distance to distinguish it from ordinary Euclidean distance.0) can be computed from its standardized coordinates = xIiVS. such as c2.X2.• . the presence of correlatlOn. we shall construct a measure of distance from the origin to a point P = (Xl. = xIi". and vice versa. we see that values which are a given deviation from the origin in the Xl direction are not as "surprising" or "unusual" as values equidistant from the origin in the X2 direction..···. X2..Yp)2 (112) Figure 1. X2. P).0) IS. If we consider the point P 0= (Xl . The standardized coordinates are now on an equal footing with one another. + (110) • •• • • • •• • • •• • • • • •• • • • • • •• • • • • (See Chapter 2. d(O. ' x p )..X p ) andQ 0= (Yl>Y2. Consequently. . •. This suggests a different measure ?f . Because our Glancing at Figure 1. and the faces mIght multivariate measurements on several U.30 Chapter 1 Aspects of Multivariate Analysis Distance 31 choice will depend upon the sample variances and covariances. cities.S. however. . or Euclidean. Straightline. Additional examples of thiS kind are discussed in [30]. . xp). = xz/vS. we have the "standardized" coordinates x.Q) = V(XI . we determine distance using the standard EucIidean formula. . Therefore. . a statistical distance of the point P = (Xl. To begin. X2 Cherooff faces have also been used to display differences in vations in two dimensions. . from the origin satisfy the equatIon 2 d2(O. It is statistical distance that is fundamental to multivariate analysis.. .19 Distance given by the Pythagorean theorem.20. For example. In our arguments. When the coordinates that subject andom fluctuations of differing magmtudes. Further are possible and will almost certainly take advantage of improved computer graphICs.. as d(O. Call the variables Xl and X2.. assume that the variability in the X I measurements is larger than the variability in the X2 measurements.. + (xp .. the straightline distance from P to the ongm 0= (O. There are several ingenious ways to picture multIvanate data m two dimensiOns..19. large Xl coordinates (in absolute value) are not as unexpected as large X2 coordinates. (19) The situation is illustrated in Figure 1. Our purpose now is to develop a "staUstlcal distance that ac:counts for dIfferences in variation and. The data that determine distance will. be familiar. . . then. if the point P has p coo:d. P) = XI + + . or Euclidean. One way to proceed is to divide each coordinate by the sample standard deviatIOn. I In addition.y (113) = Figure 1. the coordinates (Xl> X2... The straightline distance between two P and Q WIth COordInatesP = (XI. xp) of P can vary to produce different locations for the point. and assume that the Xl measurements vary independently of the X2 measurements.20 A scatter plot with greater variability in the Xl direction than in the X2 direction. A scatter plot of the data would look something like the one pictured in Figure 1. I::' 1. "independently" means that the Xz measurements cannot be predicted with any accuracy from the Xl measurements. Thus.20.. + = c (111) Because this is the equation of a hypersphere (a circle if p = 2).) All points (Xl> X2.O)is d(O. accordmg to the Pythagorean theorem...inates so that P = (x).P) 0= Vxr + + . we take as fIXed the set of observations graphed as the pdimensional scatter plOt. suppose we have n pairs of measurements on two variables each having mean zero..X2) III plane. the coordInate ffilght resent latitude and longitude (geographical locatiOn). and xi 0= X2/VS. It seems reasonable.
In cases where the weights are the same.22."" xp. Equation (114) is the equation of an ellipse centered at the origin whose major and minor axes coincide with the coordinate axes. + (xp .: _z::rJ'jL. we measure the square of the distance of an arbitrary point P = (Xl. respectively.)'z)2 + .0) whose major.!. Example 1. Note that if the sample variances are the same.14 (Calculating a statistical distance) A set of paired measurements (Xl.1) (2. This general case is shown in Figure 1.22 Ellipse of unit statistical distance d 2(O. that is. . X2) . That is. Since the sample variances are unequal. we see that the difference between the two expressions is due to the weights kl = l/s11 and k2 = l/s22 attached to xi and in (1l3)..21 The ellipse of constant Figure 1. X2) that are a constant distance 1 from the origin satisfy the equation .. )'z). aXIS along the Xl coordinate axis and whose minor axis lies along the X2 coordmate aXIS.P) = xI!sll + = c 2. Let the points P and Q have p coordinates such that P = X2. \/3/2) += 1 4 1 0 2 (1)2 +=1 4 1 (114) . a distance of 1. XI +  DIstance'.+2= 1 x2 x2 4 1 .j I (Xl Sl1 YI)2 + (X2 S22 )'z)2 '(115) All points (Xl. we assume that . X2. the statistical distance in (113) has an ellipse as the locus of all points a constant distance from the origin. A pl?t ?f the equation xt/4 + xVI = 1 is an ellipse centered at (0.. The ellipse of unit distance is plotted in Figure 1. Suppose the Xl measurements are unrelated to the x2 measurements. then xI and will receive the same weight.···.Q) = The coordinates of some points a unit distance from the origin are presented in the following table: sll Yl? + (X2 .0) by The expression in (113) can be generalized to accommodate the calculation of statistical distance from an arbitrary point P = (Xl. All points on the ellIpse are regarded as being the same statistical distance from the originin this case. if the variability in thexl direction is the same as the variability in the X2 direction. Euc1idean distance is appropriate. we see that all points which have coordinates (Xl> X2) and are a constant squared distance c2 from the origin must satisfy Distance 33 Coordinates: (Xl.The extension of this statistical distance to more than two dimensions is straIghtforward. 4 1 02 12 = 1 (0. Sll = 4. and the Xl values vary independently of the X2 values.Yp)2 s22 spp (116) . I Z Figure 1. :espectively. X2) on two variables yields Xl = X2 = 0. . s22. the dIstance from P to Q is given by d(P.. . X2) to any fIXed point Q = (YI. Suppose Q is a fixed point [it may be the ongm 0 = (0.0.)'z.1) (0. xi 1 = 1. __ cJs. O)J and the coordinate variables vary independently of one another.+*x. xp) and Q = (Yl.21. Q) = \. measurements within a pair vary independently of one another. Let Su. • x..0) (1. In other words. X2) to the origin 0= (0.the coordinate variables vary independently of one another. Then the statistical distance from P to Q is d(P.. I distance. Using (113). and S22 = 1. Yp). . 4 + .32 Chapter 1 Aspects of Multivariate Analysis Comparing (113) with (19). kl = k 2 . The halflengths of these major and minor axes are v'4 = 2 and VI = 1. 4" + 12 22 02 + =1 4 1 (\/3/2)2 1 = 1 . it is convenient to ignore the common divisor and use the usual Euc1idean distance formula."" spp be sample variances constructed from n measurements on Xl..
this is the equatio n of an ellipse centere d at Q. sin2(6) all = coS1(O)SIl + 2sin(6)co s(/I)SI2 + sin2(O)s12 + cos2(8)S22 .ot vary mdepen dently 0 t e 2 mall together and the sample correlatIOn ) h'b't a tendenc y to b e 1 arge or s ' h . we can formally substitu te for Xl and X2 in (117) and express the distance in terms of the original coordinates.Is. .24.""X ) be a point whose coordin ates p represe nt variable s that are correlat ed and subject to inheren t variability. Ifgfec of distance when the variability in the Xl direcWhat IS a meamn u . ). a12. the statistic al distance ofthe point P = (x]. ll 22 In general. and S22 calculat ed from the original data. wa From Fi ure 1. d' f variability. X2) is provide d by Xl = Xl cos (0) + x2sin(0 ) X2= Xl sin (8) + X2 cos (8) 0 (118) in (116). =' (Xl. the distance from P = (Xl. X2) to the origin 0 = (0. f d(P. bles X and X . Let P = (Xl.0) can be written in terms of the original coordinates Xl and X2 of Pas d(O. .cog2(/J)S22 . X2) to the origin 0 = (0. The expressi on in (113) can be regarde d as a special case of (119) with all = 1/s . = spp' . I. Moreov er.)'2) for situatio ns in which the variable s are correlat ed has the general form )'2) + azz(x2 )'2? (120) and can always be comput ed once all.Q) = Val1(X I  yd + 2adxI .23 A scatter plot for positively correlated measurements and a rotated coordinate system.m the Xl . P) = (117) denote the sample variances comput ed with the Xl arid X2 where Sl1 and sn measurements. note followmg: 1. and Sll. In fact.0) as Given the relation s in (118). a12.e ihe angle: while keeping the scatter fixed and lOa) cO d the scatter in terms of the new axes looks very . 1 (121) By definition. the coordinates of all points P = (Xl. The major (long) and minor (short) axes are indicated.2 sin(/J} ooS(6)812 + sin2(/I}sll .X2 . the coordmates 0 t e o. For the choice of all. and a22 in footnote 2.  Z If Sll _  S22  _ . h variability in the X2 directio n an d t h e vana 1 tion is w e can use what we have already provided 2 are corre a e . They are parallel to the Xl and 1'2 axes. . This suggests that we calculate the . and a22 are not importa nt at this point. the variability in the X2 direction is larger than t e coe . • The distance in (116) still does not include most of the f the assumption of indepen dent coordmates.xes m 10 FIgure . = YP = The relation between the original coordin ates (Xl' Xz) and the rotated coordinates (Xl.YI)(X2  )'2) + a22(x2  )'2)2 = c2 __ • •• I I . The distance of P to the origin 0 is obtained by setting Yl = )'2 = . That. t the Euclidean distance formula m (112 IS appropna e. e sca e a twodimensional situation in which the xl io FIgure... • 1 Figure 1. the axe.23. . Here all.. After some straight forward algebraic manipul ations..rmined by the angle 8.2sin(8)oo s(8)sl2 + sin2(8}slI 2 sin2(/I} oos (8) a22 = cos2(8}SII + 2 sin(lI}cOS(8}SI2 + sin2(6)S22 + cos2(9)sn . we see that If we rotate the ong. Let 2Specifically.YI)(XZ  d(O. What is importa nt is the appeara nce of the crossproduct term 2a12xlxZ necessit ated by the nonzero correlat ion r12' Equatio n (119) can be compar ed with (113).. a22 = 1/s . The general ization of the distance formula s of (119) and (120) to p dimensions is straight forward . X2 Xl • • . a12. 2 The particul ar forms for all. '.. and a22 are known.2sin(8)oos (/I}SI2 + sin2(8)sll cos2(8) al1(xl  yd 2 + 2adxI .sample theIr coordin ates and measure distance as in EquatIOn (113). X2) that are a constan t squared distance 2 c from Q satisfy x .0 c.ositive . ou wish to turn the book to place the Xl and X2 a. the Xl and X2 axes are at an angle () with respect to the Xl and X2 axes. s12.P) = Val1x1 + 2al2xlx2 + (119) where the a's are number s such that the distance is nonnega tive for all possible values of Xl and X2. . 8 .. f h X measurements. a12. and a12 = O... The graph of such an equatio n is displayed in Figure 1. and a22 are dete. In addition . and cos(lI) sin(/I} sin(6} oos(/I} al2 = cos2(II)SIl + 2 sin(8) cos(8)sl2 + . using the Xl an 2 h d X axes we define the distance from the pomt 'th reference to t e Xl an 2 ' . We centere .34 C bapter 1 Aspects of Multivar iate Analysis Distance 35 hyperellipsoid All points P that are a constan t squared distance from Q rle on a d' t d at Q whose major and minor axes are parallel to the coor ma e ax es. X2) from the fvced point Q = (Yl.
1: 3 X2 4 2 4 6 7 8 2 5 5 55 10 5 75 S22. •••••• . f . 0 ap er . . • • • / XI / / "" " " Figure 1.) ..YI)(X3 . o 0) denote the origin...YpI)(X p Yp)] (123) . gand in particular positive definite quadratic forms. and displaying data. . • •• o Figure 1. ee xerclse .. + + 2a12xlx2 + 2a13Xlx3 + .. A hypereIIipsoid resembles a football when p = 3. Q) between two points P and Q is valid provided that it satisfies the following properties. .. .. . . h' 'fy the distance func distance formulas.. summarizing. 's cannot be arbitrary numbers.. where R is any other intermediate point: d(P. p.. S . P. 2 . + app(xp Yp)2 + 2an(xI YI)(X 2__ Y2) 2a13(XI .are ereIlipsoids. Q) = d(Q..d the distances from P to 0 and from Pto Q have the general (0 0 _ _______________ _ _______ _ _____ _ _ allx1 + + . . In addition. These coeffIcIents can be set out in the rectangular array (125) (triangle inequality) r::: :::] la]p a2p a: p 1.p(xp1 . P appears to be more like the points in the cluster than does the origin.25.. It IS possible to display these quadrahc dratlCJorms" .•• . X2) plotted in Figure 1. .12. 3 The need to consider statistical rather than Euclidean distance is illustrated heuristically in Figure 1.24 Ellipse of points a constant distance from the point Q.. . since they are multiplied by 2 in the h were . + 2a p_l.. . .k . Consider the seven pairs of measurements (x].Q) :5 d(P. to visualize in more than three lJbe 81 ebraic expressions for the squares of the distances in . but important. . . * Exercises 1. However..1.•.. k.and are known as ... Consequently. Any distance measure d(P. the coeffiCIents (weIghts aik> I . we shall do so iD echon .R) + d(R. then Q will be closer to P than to O. Consider the Euclidean distances from the point Q to the point P and the origin O.. (124) the a· 's with i k are displayed twice. 23 fCh t 2 forms in a simpler manner using matrix algebra. . .1.1. .k . . (See Exercise 1. 110 ) distance is nonnegative for every paIr 0 pomts. and let Q = (YI. + 2apl.Q) > OifP Q d(P. .Q) * where the a's are numbers such that the distances are always nonnegatIve.36 Chapter 1 Aspects of Multivariate Analysis X2 Exercises 37 / P@ ••• :..22) .. Calculate the sample means Xl and x2' the sample variances S]l and covariance Sl2 . they must be such that the computed t1OnS.Y:l) + .) At times.. a general concept of distance has been introduced that will be used repeatedly in later chapters.25 depicts a cluster of points whose center of gravity (sample mean) is indicated by the point Q. the entnes m t IS array specI ..<1.. .6 Final Comments We have attempted to motivate the study of multivariate analysis and to provide you with some rudimentary.. .P) = and [aJ1(xI d(P.Q) yd + a22(x2 + Y2)2 + . •• •: ••• ®: . . The a.'2 . Other measures of distance can be advanced. Y2.25 A cluster of points relative to a point P and the origin. . We note that the distances in (122) and (123) are completely by .Q) = OifP = Q d(P... it is useful to consider distances that are not related to circles or ellipses. . P) d(P. It IS Impossible hY P . The Euclidean distance from Q to P is larger than the Euclidean distance from Q to O. • •••• •• • •• . Yp) be a speC!"fd le fix. . methods for organizing.. . given the nature of the scatter. . Contours of constant distances computed from (122) \123) . If we take into account the variability of the points in the cluster and measure distance by the statistical distance in (120).px p_IX p (122) d(O. This result seems reasonable. . (S E . Figure 1.. and the sample ..
4. Table 1.04 65. 11 8 9 7 16 3 3 3 3 3 3 3 4 3 3 13 9 14 7 4 3 13 5 10 7 11 2 23 6 11 10 8 2 7 8 4 24 9 10 12 18 25 6 14 5 4 3 4 3 3 3 3 3 3 11 7 9 7 10 12 8 10 6 9 6 4 3 4 4 6 4 4 4 3 7 7 5 6 4 4 3 3 2 2 2 2 3 2 3 2 13 9 8  11 6 Source: Data courtesy of Professor O. Sn. The data in Table 1.91 14. (a) Plot the scatter diagram and marginal dot diagrams for variables Xl and ment on the appearance of the diagrams. (See also the airpollution data on the web at www. 3 4 0 2 1.00 17.73 8. thepattems.13 1.33 18.110.10 11.5 AirPollution Data Wind (Xl) Solar radiation (X2) (b) Infer the sign of the sampkcovariance sl2 from the scatter plot.46 1. The world's 10 largest companies yield the following data: The World's 10 Largest Companiesl Xl = sales X2 = Company Citigroup General Electric American Int! Group Bank of America HSBCGroup ExxonMobil Royal Dutch/Shell BP INGGroup Toyota Motor April 18.19 285.97 263. ) (a) Plot the marginal dot diagrams for all the variables.54 15. and interpret the entries in R.prenhall.com/statistics. (d) Display the sample mean array i.54 14.59 10. X2' Com su.52 25. (b) Compute Xl> X2.83 191.36 95. and '12' Interpret '12' X3) 1. S12.95 15.14 9.5 are 42 measurements on airpollution variables recorded at 12:00 noon in the Los Angeles area on different days.3.99 18. Sn. the sample variancecovariance array Sn. (c) Compute the sample means XI and X2 and the sample variartces SI I and S22' Compute the sample covariance SI2 and the sample correlation coefficient '12' Interpret these quantities. The following are five measurements on the variables XI X2 Xl' X2. and (x].49 (a) Construct a scatter plot of the data and marginal dot diagrams.00 12. 6.6. Sn.1. and R arrays for (XI' X2. (billions) profits (billions) X3 = assets (billions) 108.031. (a) Plot the scatter diagrams and dot diagrams for (X2. X3).5.2005.16 211. and the sample correlation array R using (I8). (b) Construct the i.00 1. and R.94 7.4.95 8.99 265.29 195.45 62.26 193. . (b) Compute the i.II 3S Chapter 1 Aspects of Multivariate Analysis .95 19. 1.68 17.06 92.01 165. X3)' Comment on 8 7 7 10 6 8 9 5 7 8 6 6 7 10 10 9 8 8 9 9 10 9 8 5 6 8 6 8 6 10 8 7 5 6 10 8 5 5 7 7 6 8 98 107 103 88 91 90 84 72 82 64 71 91 72 70 72 77 76 71 67 69 62 88 80 30 83 84 78 79 62 37 71 52 48 75 35 85 86 86 79 79 68 40 4 3 11 13 10 12 18 4 .05 16.175. A morning newspaper lists the following usedcar prices for a foreign compact with age XI measured in years and selling price X2 measured in thousands of dollars: 1 Exercises 39 2 3 3 4 5 6 8 9 11 3.33 766.28 152. CO (X3) 7 4 4 5 4 5 7 6 5 5 5 4 7 4 4 4 4 5 4 3 5 4 4 3 5 3 4 2 NO (X4) 2 3 3 2 2 2 4 4 1 2 4 2 4 2 1 1 1 3 2 3 3 2 2 3 1 2 2 1 3 1 1 1 5 1 1 1 1 2 4 2 2 3 N0 2 (xs) 12 9 5 8 8 12 12 21 0 3 (X6) 8 5 6 15 10 12 15 14 11 9 3 7 10 7 10 10 7 4 2 5 4 6 HC(X7) 2 3 3 4 3 4 5 and X3: 9 12 2 8 6 6 5 4 8 10 X3 Find the arrays i. S22.2. C. and R arrays.compartiallybasedonForbesTheForbesGlobaI2000. Tiao.15 IFrom www.10 750.42 1.Forbes.484.11 1. Use the data in Exercise 1.
(a) Construct the twodimensional scatter plot for variables X2 and X3 and the marginal dot diagrams (or histograms). . Interpret the pairwise correlations. = (distance)2 to the origin 0 = (0. 2» = 1 + 1 = 2. (See also the mineralcontent data on the web at www. DefinethedistancefromthepointP = (Xl> X 2) (a) xi + + XIX2 = (distance)2 (b) xi . Consider the following eight pairs of measurements on two variables XI and x2: XI 4 5 c D E 1. and SI2 from (a).) Two different visual stimuli (SI and S2) produced responses in both the left eye (L) and the right eye (R) of subjects in the study groups. calculate the measurements on vanables XI and as: uming that the original coordmate axes are rotated through an angle of () . 2» = deeD. Sn.1 = = 1 1.0) using (119) and the expressions for all' a22. an investigator measured the mineral content of bones by photon absorptiometry.2).14. the distance between intersections (D. Define the distance between any two intersections (points) on the grid to be the "city block" distance. on a 13 scale). 2» = 1 + 1 = 2. 1). 1.26 0 [given cos (26 0 ) = .' . 1). such as sore throat or nausea). (C. Interpret the pairwise correlations. on a 15 scale).10. Sn. (The triangle mequahty IS more dIfficult to venfy. You are given the following n = 3 observations on p = 2 variables: Variable 1: Xll Variable 2: XI2 =2 X21 X22 =3 = 2 X31 X32 = 4 =1 =4 (a) Plot the pairs of observations in the twodimensional "variable space. 1.) 1. on a 03 scale). (a) Plot the twodimensional scatter diagram for the variables X2 and X4 for the multiplesclerosis group.16. 2). 1) to the point Q = (I.7.2) to the origin 0 = (0. and compute SII. Note: You will need SIl.8. (Within rounding error. (C. stant squared statistical distance 1 from the pomt Q. 2» + deeD. X3 (difference between responses of eyes to stimulus SI. and (C. on a 15 scale).7 (See also the radiotherapy data on the web at www. Sn. The following exercises contain fairly extensive data sets.2. (c) Using the Xl and X2 measurements from (b).)The data consist of average ratings over the course of treatment for patients undergoing radiotherapy.1/3 a 2 = 4/27 and aI2'= 1/9. (E. 1). 1. P) = max(lxd." That is. Sn. ISIL . (C. and R arrays for the nonmultipleSclerosis and multiplesclerosis groups separately.) 1. (C.com/statistics. (b) Plot the locus of points whose squared distance from the origin is (c) Generalize the foregoing distance expression to points in p dimenSIOns. Compare the distance calculated here with the distance calculated USIng the XI and X2 values in (d).prenhall. and R arrays. 3).899 and sin (26 ) = . S22.8. and S12: (b) Using (118). X3 (amount of sleep. Note: You will need SIl and S22 from (c). which we might call deeD. is given by deeD. X4 (amount of food consumed. (D. and al2 m footnote 2. 1» + d«C.6 contains some of the raw data discussed in Section 1.41 Assume the distance along a street between two intersections in either the NS or EW direction is 1 unit. and R arrays. 2). Some of the 98 measurements described in Section 1. 1). that is. 2 . 1: I 13 A I ge city has major roads laid out in a grid pattern. Sketch the focus of pomts that are a conWI all .com/statistics. 1). Also. 1) and (C. Xs (appetite. deeD. The values recorded in the table include Xl (subject's age). (b) Compute the X.4) to the origin. At the start of a study to determine whether exercise or dietary supplements would slow bone loss in older women.prenhall. 2». Comment on the appearance of the diagram. A computer may be necessary for the required calculations. the numbers should be the same.prenhall.9. and X6 (skin reaction. Measurements were recorded for three bones on the dominant and nondominant sides and are shown in Table 1. and so forth. Evaluate the distance of the point P = (1. (See also the multiplesclerosis data on the web at www.11. (C.15.2 are listed in Table 1.?) the Euclidean distance formula in (112) with p = 2 and using the dIstance m (120) 'th . (e) Calculate the distance from P = (4. compute the sample vanances Sll and S22' (d) Consider the new pair of measurements (Xl>X2) = (4. Locate a supply facility (warehouse) at an intersection such that the sum of the distances from the warehouse to the three retail stores is minimized. Verify that distance defined by (120) with a 1. and calculate the dIstance d(O.0) using (117).comlstatistics. [For example. 2).. 5). P) of the :ewpointP = = (0.) Compute the i. construct a twodimensional scatter plot of the data. (b) Plot the data as two points in the threedimensional "item space. SIL + SIR)." 1. as indicated in the following dia• • ar Streets 1 through 5 run northsouth (NS). X2 (amount of activity. on a 15 scale). 2» = deeD. Do there appear to be any errors in the X3 data? (b) Compute the X.40 Chapter 1 Aspects of Multivariate Analysis Exercises . first three conditions in (125). Variables measured include XI (number of symptoms.12. 3 3 2 1 2 5 2 6 8 3 5 (a) Plot the data as a scatter diagram. 1). X2 (total response of both eyes to stimulus SI.0) as d(O. and streets A through E run eastwest Suppose there are retail stores located at intersections (A. .SIR I). (C. I X21) (a) Compute the distance from P = (3.] 3 A B 1. Table 1. Are the following distance functions valid for distance from the origin? Explain.438].Transform these to easurements on xI and X2 using (118). .
Compare your results with the results you obtained in Exercise 1.000 2.0 4.734 1.0 + SIR) X3 X4 IS1L .8 and (b) the nationaltrackrecords data in Table 1.454 .758 . (See also the nationaltrackrecords data on the web at www. Source: Data courtesy of Mrs.8 209. For example.204 1.873 1.17.555 .8 1.846 1. Create the scatter plot and boxplot displays of Figure l.739 1.272 1.811 1.835 .578 .823 .950 .prenhall.677 .268 1.4 164.455 2.889 2.733 .921 .0 14.2 3.606 1.549 .715 1. long.364 1.0 143.6 MultipleSclerosis Data Table 1.) The national track records for women in 54 countries can be examined for the relationships among the running eventl>.786 1.0 . Annette Tealey.4 194. The marathon is 26.378 2. and R arrays. 1500m.8 .2 10.800 1.000 2. G.0 may be omItted.0 are to errors in the datacollection process.037 1.546 .9 to speeds measured in meters per second.2 157.859 .8 .869 Ulna .000 2.759 2.94i 2.2 250.857 .42 Chapter 1 Aspects of Multivariate Analysis Table 1.0 134.091 2.19.458 1.898 .8 186.0 .8 154.838 .2 165.751 .6 1.032 1.744 .615 .586 .786 . R.657 1.000 1.000 1.873 .4 262.4 216.546 .932 .757 .811 1.6 .4 .6 1.863 2.S2RI .872 .6 .0 6.869 Dominant ulna .9.545 1. 1.813 1.324 2.682 .comJstatistics.009 1.482 .889 Activity 1.853 .925 .2 8. Convert the national track records for women in Table 1.792 .688 .0 169.869 2.103 .8 3.000 4. Interpret these pairwise correlations.437 1.0 144.091 .563 4.6 148.4 204.541 .4 243.062 2.2 are listed in Table 1.805 .0 Source: Data courtesy of Everett Smith .547 1.815 .695 1.4 190.2 miles.312 2.755 .18.782 .999 2.000 2.875 2.706 .618 .718 .590 .000 Skin reaction 1.936 Dominant humerus 2.139 1.385 2.028 1.2 210.474 1.744 .6 238.6 205.17.748 .842 .902 1.508 1.810 .6 204.437 1.650 2. Table 1.809 .0 . Celesia.000 3.000 .618 .571 .803 .000 .769 1.231 3.000 Eat 2.000 .737 .868 S2R) IS2L .N.390 2.731 .887 1.777 1.945 .125 6.856 .890 .794 1.000 1.509 1.954 1.2 155.389 1.819 . Sn.909 1.787 .915 Radius 1.933 .752 .724 .740 1.4 1. the record speed for the lOOm dash for Argentinian women is 100 m/1l.900 .819 2.616 .971 Humerus 2. Sn.595 .334 1.945 2.610 .8 235.2 158.533 .0 198.052 .8 217.866 .8 Mineral Content in Bones Exercises 43 NonMultipleSclerosis Group Data Subject number 1 2 3 4 5 65 66 67 68 69 Subject number 1 2 3 4 5 25 26 27 28 29 Xl X2 (Age) 18 19 20 20 20 67 69 73 74 79 (SlL 152.0 2.662 .779 .222 2.100 .4 12. Compute the X.8 .57 sec = 8.532 .4 171.825 .195 meters.906 .900 1.8 199.940 .850 .686 .6 .8 .764 . 1.713 .Compute the X.727 2.0 3.6 .624 2.656 1. Values of X2 and x3less than 1.880 .187 1. Notice the magnitudes of the correlation coefficients as you go from the shorter (100 m) to the longer (marathon) running distances.8 229.000 3.668 . .767 .6 6.723 .090 1.2 175.493 .654 .5 for (a) the mineralcontent data in Table 1.000 2.600 1.643 m/sec.422 1.312 2.4 180. Notice the magnitudes of the correlation coefficients as you go from the shorter (lOOmeter) to the longer (marathon) ruHning distances.SlRI 1.0 195.087 1.836 .777 .000 2.0 1.799 .664 .8 219.0 . 3000m and marathon runs are measured in minutes.743 1.2 .6 15.741 1.509 1.225 1.455 1.2 16. and R arrays. Notice that the records for the 800m.809 1.880 .4 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 MultipleSclerosis Group Data Xl X2 X3 X4 Xs 23 25 25 28 29 57 58 58 58 59 148.6 205.238 1.795 .752 . Interpret ihese pairwise correlations.672 .8 198.000 Appetite 1.4 (S2L + X5 Subject number 1 2 3 4 5 6 7 8 9 10 Dominant radius 1.0 .000 Sleep 1.876 .678 . Source: Data courtesy of Dr.294 2. G.462 2.0 138.727 4. or 42.2 .765 .873 .9.8 8. Rows containing values of X2 and X3 less than 1.757 .059 2.7 Radiotherapy Data Xl X2 X3 X4 X5 X6 Symptoms .851 .4 5.273 2.674 . Some of the data described in Section 1.2 304.4 .0 .
46 148. South Korea.98 1.66 11.25 4.43 11.92 1.99 3. and on the web at www.05 51.25 4.10 2.17 10.64 51.31 12.43 141.30 50.62 48.95 1.63 10. X2.08 51.37 8.97 4.37 23.81 49.99 2.74 52.63 11.92 21.10 3.25 47. (a) View the entire data set in three dimensions employing various combinations of .28 10.com/statistics.67 51. (a) Stars.24.41 148.87 5.10 9.14 165.36 23.27 141.prenhall.02 4.95 3000 m (min) 9.29 142.93 11.98 2.42 143.23.98 1.91 2.10 9.82 22.50 154.12 Marathon (min) 150.10.96 4.10 4. (b) Check this data set for outliers.15 1.87 8.15 48.81 9.9 National Track Records for Women Country Argentina Australia Austria Belgium Bermuda Brazil Canada Chile China Columbia Cook Islands Costa Rica Czech Republic Denmark Dominican Republic Finland France Germany Great Britain Greece Guatemala Hungary India Indonesia Ireland Israel Italy Japan Kenya Korea.49 11. X3 space for which the set of points representing gasoline trucks can be readily distinguished from the set of points representing diesel trucks? 1.01 1.97 2.18 54. Rotate the coordinate axes in various directions.86 22.prenhall.48 4.50 23.12 11.06 221.82 4.46 11. Are different groupings indicated? .84 4.33 145.) 1. X3 space.69 22.12 4.01 1.72 11.39 155.22 3.03 2.88 22.03 4.32 800 m (min) 2.49 200 m (s) 24.02 2.56 11.22 11.24 4.54 4.89 52.43 146.43 Marathon (min) 154.31 149.88 49.51 154.41 146.42 11.93 51.01 1.82 9. Using the utility data in Table 12.05 22.92 25.prehhall.39 4.53 8.16 Source: IAAFIATFS T.47 139.14 48.52 11.02 50.13 10.75 49. Compare your faces with the faces in Figure 1.44 Chapter 1 Aspects of Multivariate Analysis Exercises 45 lOOm (s) 12.com/statistics.48 8.97 2.21 9.07 1. Thailand Thrkey U.12 145.62 23.50 55. page 345.81 23.08 49.38 13.33 148.30 11.62 4.09 11.02 2.73 10.74 8.33 11.97 2. variances.38 4.56 55.22.55 9.43 50. Using the data in Table 11.08 8.08 2.48 23.09 144.37 23.17.33 23. page 688. Begin with the Xl. (b) Chemoff faces.71 21.51 8.04 2.19 212.33 4.A.32 52.90 3. 1.ack and Field Ha])dbook fo.36 8.16 11.92 49.77 12.97 1.52 4.87 25.00 2. and on the web at www.21 9.09 11.56 23.34 400 m (s) 55.84 8.36 152.15 22.79 11.35 4.57 8.91 23.31 191.83 23.11 8.94 2.13 23.07 52.45 400 m (s) 52.06 2.10 23.09 1. Check for unusual observations.03 1.Using appropriate computer software.05 174.11 56.81 8.45 55.06 23.80 11.24 3. North Luxembourg Malaysia Mauritius Mexico Myanmar(Burma) Netherlands New Zealand Norway Papua New Guinea Philippines Poland Portugal Romania Russia Samoa lOOm (s) 11.53 10.64 61.89 9.56 54.37 11.46 141.01 22.) Are there some orientations of Xl.01 4.97 2.24 8. page 348.42 3000 m (min) 9.00 2. Do any of the gasolinetruck points appear to be multivariate outliers? (See Exercise 6.28 51.04 8.12.58 (continues) Country Singapore Spain Sweden Switzerland Taiwan . Examine various threedimensional perspectives.18 147.53 162.96 3.98 11.06 2.51 150.12 2.00 148.21.63 50.70 22.39 21.06 51.31 9.60 53.02 23.9153.82 23.06 4.60 23.26 8.14 11.29 4.03 3.53 8.19 8.57 47.24 2. Check for unusual observations.99 52.93 2. Helsinki 2005 (courtesy of Ottavio Castellini).34 166.68 23.60 22.com/statistics.96 9.S.23 156.96 51.60 52. and on the web at www.78 8.38 22.41 11.35 22.35 51. BOOm (min) 2.prenhalI.76 11.50 48.45 11.56 53.4.94 8.33 8.13 22.10 22.38 3.35 21.62 11.17 4. Refer to the milk transportationcost data in Thble 6.12 1.96 9.84 22.96 1.96 3.18 143.coml statistics.92 1.41 3.07 8.29 158.54 22.89 2.29 1500 m (min) 4.94 1.30 10. Refer to the oxygenconsumption data in Table 6. X3 space.91 22.08 11. page 666.62 49.00 1.38 200 m (s) 22. represent the cereals in each of the following ways.06 11.69 8.50 141.13 23.Using appropriate computer software.34 4.98 4.31 51.23 22. (b) Highlight the set of points corresponding to gasoline trucks.81 8.Using appropriate computer software. and covariances calculated from these data? (See Exercise 11.28 167.39 145.07 2.50 8.99 21.60 49.51 159.23 139. Refer to the bankruptcy data in Table 11.83 11. X2.97 4. (a) View the entire data set in Xl.01 8.99 1.34 11.54 9.28 2.00 4.16 3. page 657. three variables to represent the coordinate axes.64 8.83 Table 1.10 1.10 4. (Experiment with the assignment of variables to facial characteristics.45 135.71 9. cornlstatistics.39 151.32 143.02 4.41 11. represent the public utility companies as Chemoff faces with assignments of variables to facial characteristics different from those considered in Example 1.36 11.60 9.01 2.93 1.45 53. and on the web at www.23 169.50 11.94 1.20 4. and on the following website www.09 4.09 3.32 11.71 22.67 24.20.69 51.17.01 4. (b) Highlight the set of points corresponding to the bankrupt firms.12 1.19 149.96 23.98 4.97 23.07 1.80 25.81 11.88 22.9.10 142.30 22. (a) View the entire data set in three dimensions.19 4.96 11.35 143.52 4.06 158.89 8.08 4. Are there some orientations of threedimensional space for which the bankrupt firms can be distinguished from the nonbankrupt firins? Are there observations in each of the two groups that are likely to have a significant impact on any rule developed to classify firms based on the sample mearis.47 146.92 11.36 143.4.14 11.40 171.09 2.38 11.13 11.71 8.92 53.92 3. Rotate the coordinate axes in various directions.39 9.68 49.25 10.62 51.67 56.23 56:07 52.65 10.) 1.63 8.24.36 9.05 4.94 1500 m (min) 4.prenhall.96 2.41 138.50 9.05 1.31 24.15 11.36 4.97 2.57 11.94 22.65 52.33 164.48 144.12.76 8.25 153. 1.10 10. X2.59 8.72 11.
"Dynamic Graphics for Data Analysis.4 310. "Multivariate Analysis of National Track Records. 1. and G.0 521. Check for outliers.19 1." Applied Statistics.25 .5 1217. and T.8 32..59 (a) Compute the X. S. "Canonical Correlation Analysis in a Predictive System. H.8 FtFrBody 1128 1108 1011 993 996 997 991 928 990 992 PrctFFB 70. M. Manufacturers. paperback). . 2 (1970). WhichthreedimensionaI display appears to result in the best separation of the three breeds of bulls? Table 1.0 51. Visually group the companies into four or five clusters.1 51.1 71. Hulbert.35 .0 : SaleWt 1720 1575 1410 1595 1488 1454 1475 1375 1564 1458 2. W.10 Data on Bulls Breed 1 1 1 1 1 8 8 8 8 8 SalePr 2200 2250 . C. 68. Also included in the taBle are the selling prices (SalePr) of these bulls. "Profiles of Product Innovators among Large U. Yellowstone Yosemite Zion Size (acres) 47.8 71. 38.26. Do some of these variables appear to distinguish one breed from another? (b) View the data in three dimensions using the variables Breed." Journal of the American Statistital Association. 43. Lehman. 1625 4600 2150 1450 1200 1425 1250 1500 YrHgt 51. Sampling Techniques (3rd ed. no.1 56. and A. Experimental Designs (2nd ed.10 .27. "Information Contained in Sediment Size Analysis. (c) Would the correlation in Part b change if you measure size in square miles instead of acres? Explain.53 1.com/statistics. 1. SO.11 presents the 2005 attendance (millions) at the fIfteen most visited national parks and their size (acres). Using the data in Table 12. 1.0 70. and BkFat. and SaleHt.84 3.6 922. Table 1.10 (see the bull data on the web at www.8 5. P. Drop this point andrecaIculate the correlation coefficient. R. 8. "Using Faces to Represent Points in KDimensional Space Graphically. Dunham. Interpret the pairwise correlations. S. Cochran.34 3. 4 (1975). Cox.05 1. Chernoff. New York: John Wiley.25 .17 2. Farley. Frame. R. 4 (1987). Benjamin. 110115.prenhaIl.4 70.7 7 6 6 6 7 .355395. Cleveland. no. Davis. "Clustering Categories for Better Prediction of Computer Resources Utilization.. 342 (1973).23 4. Sn.6 73.11 Attendance and Size of National Parks N ationaI Park Arcadia Bruce Canyon Cuyahoga Valley Everglades Grand Canyon Grand Teton Great Smoky Hot Springs Olympic Mount Rainier Rocky Mountain Shenandoah . "Comparison of Discrimination Methods for the Classification ofThmors Using Gene Expression Data.. 1. 10.7 235. (a) Create a scatter plot and calculate the correlliltion coefficient.2 51.361368.40 2. 5. 9.4 35. R. 2 (1992).6 Visitors (millions) 2.3542. 3. FtFrBody." Journal of Experimental Education. 2.. A.9 72.8 761. B.0 2219.." Statistical Science. W.02 2. Y. no.30 2. 4. G.3 146. 2 (1991). Are the breeds well separated in this coordinate system? (c) Repeat part b using Breed.9 53.14 1. Igbaria. J. W. 40.4 55.8 70. 6." The American Statistician.15 55. no." Management Science. G. and R arrays. 105112." Mathematical Geology..9 54. BkFat .9 68. Dawkins.2 54." Journal of the American Statistical Association. 157169. S. New York: John Wiley.9 55. Speed.6 Frame 7 7 6 8 7 : References 1.6 68.15 .O 50.prenhall. 1992. The column headings (variables) are defined as follows: I Angus Breed = 5 Hereford { 8 Simental FtFrBody = Fat free body (pounds) Frame = Scale from 1 (small) to 8 (large) SaleHt = Sale height at shoulder (inches) Y rHgt = Yearling height at shoulder (inches) PrctFFB = Percent fatfree body BkFat = Back fat (inches) SaleWt = Sale weight (pounds) References 47 (b) Identify the park that is unusual.25. and D.7787.4 and on the web at www.9 49.6 53. 2. 1977. Kravetz.25 .9 1508.1 51. The data in Thble 1.3 53. 97.6 265.8 199.46 9. and M. 2 (1989).80 1.). Rotate the coordinate axes in various directions. Comment on the effect of this one point on correlation. Table 1. Capon. Fridlyand.10 . Cochran.1 Source: Data courtesy of Mark EIIersieck.com!statistics) are the measured characteristics of 76 young (less than two years old) bulls sold at auction.09 2. no. 457 (2002). no. no. D.46 Chapter 1 Aspects of Multivariate Analysis 1.10 . 43. 7. no. Dudoit.8 55.15 SaleHt 54.295307. N. 1. B. and 1. Wilks. Becker.4 49.represent the 22 public utility companies as stars.
1993).D.422435. L. 76. 5. University of Wisconsin. If you have not been previously exposed to the rudiments of matrix algebra.. MacCrimmon. F. Reading. S.. no. Nason. and D. "A Multivariate Model for Predicting Financially Distressed PL Insurers. Everitt. M. no. Trieschmann. F. 12. Johnson. D. 1978. J. K. no.." Journal of the American Statistical Association. no. 54 (1984).). R. no. no. Spenner. "Graphical Data Analysis. 3 (1988).:n:J or x' = (Xl> X2. "Innovation in a Newly Industrializing Country: A Multiple Discriminant Analysis. 20. 26. • •. "Characteristics of Risk Taking Executives. Pinches. The study of multivariate methods is greatly facilitated by the use of matrix algebra. R. 40. and R. 4 (1990)." Mathematical Geology. 21. 14. Wainer. Xn is called a vector. M. W." Ph. no. 66. no.1622.1. The Interpretation of Multiple Observations. Wehrung." Statistical Science.140144. 318. K. Statistics: Principles and Methods (5th ed. 411430. W.219234. 1. "Professional Mediators' Judgments of Mediation Tactics: Multidimensional Scaling and Cluster Analysis. and H. Moreover. 1979. CA: Brooks/Cole.. 1977. Khattree. Multivariate Analysis with Applications in Education and Psychology." Management Science." New England Journal of Medicine. 3 (1972). and 6. 36. C. H. 2 (1996). 1. and Y. 1977. 3 (1973). We begin by introducing some very basic concepts that are essential to both our geometrical interpretations and algebraic explanations of subsequent statistical techniques. 24. 42. 28.:146. 19.. C15. a rectangular array of numbers with. The matrix algebra results presented in this chapter will enable us to concisely state statistical models. 3 (1991). G.. 27. (1981). 30. N.1 Introduction We saw in Chapter 1 that multivariate data can be conveniently displayed as an array of numbers. 23. 22. A. 2. P. B. x ll ] where the prime denotes the operation of transposing a column to a row. n rows and p columns is called a matrix of dimension n X p. "Relationships Between Properties of PulpFibre and Paper. 139. H. C. 49 . Taffler. 8 (1996) 11751198. 1. C.. 4 (1995). "Principal Component Analysis in Plant Breeding. 4.. Monterey. 29. In general.. and R.2 Some Basics of Matrix and Vector Algebra Vectors An array x of n real numbers Xl. Smith. 3 (1985) 312322.465473. 2 (1990). University of Toronto. New York: NorthHolland. R. no. the formal relations expressed in matrix terms are easily programmed on computers to allow the routine calculation of important statistical quantities. Weihs. 191241. Tabakoff.." Journal of Risk and Insurance. no..134139. Bhattacharyya. Bliss. E. Klatzky. ." The Wall Street Journal (February 24. McLaughlin. "Don't Wave a Red Flag at the IRS. Schmidli. New York: John Wiley.." Journal of Applied Psychology. . "Differences in Platelet Enzyme Activity between Alcoholics and Nonalcoholics.48 Chapter 1 Aspects of Multivariate Analysis 11. and it is written as x = lrx:. G. 44. Naik.. Exploratory Data Analysis. 32. "A Canonical Correlation Analysis of Occupational Mobility. "Study of Factors Influencing Variation in Size Characteristics in FIuvioglacial Sediments. dissertation. Cl. B. 333 (1971). Wartzman. A. Kim. G. 175226. you may prefer to follow the brief refresher in the next section by the more detailed review provided in Supplement 2A.. Thkey. Mather. MATRIX ALGEBRA AND RANDOM VECTORS 2. 18. Timm. Faculty of Forestry (1992). no. Kim. 32. K. MA: AddisonWesley. "Threedimensional Projection Pursuit." Annual Review of Psychology. et al." Applied Statistics." Management Science. H. Hodge.. et al. and D. M. 2005. "OMEGA (On Line Multivariate Exploratory Graphical Analysis): Routine Searching for Structure. Halinar. Graphical Techniques for Multivariate Data. and R. Marriott. 16. "A Multidimensional Model of Client Success when Engaging External Consultants." Accounting and Business Research. 17. S. 1975. N. "Improving the Communication Function of Published Accounting Statements. 15.327338. no. for instance." Management Science." The American Statistician. 1974. London: Academic Press. and G." Unpublished doctoral thesis. University of Wisconsin. 14." Unpublished report based on data collected by Dr. 50. 31. 13. 31. 25. Gable. "From Generation to Generation: The nansmission of Occupation. Lee.. . Thissen. X2. "Revisiting Olympic Track Records: Some Practical Considerations in the Principal Component Analysis.
a negative value of c creates a vector with a direction opposite that of x. Length of x = v'xi + . [ CXn That is. [See Figure 2. [Recall Figure 2./' : I I I I x+y= [. is defined to be L. A vector can be expanded or contracted by mUltiplying it by a constant c.!.2].. xn : Xn Yn + Yn I I I : I I . .' Figure 2. and Xn along the nth axis. X2 along the second axis.2 Scalar multiplication and vector addition. we obtain the unit vector which has length 1 and lies in the direction of x. XI] [YI] [XI + YI] Y2 + Y2 X2 : + : . In n = 2 dimensions.1 for n = 3. l' __________________ .50 Chapter 2 Matrix Algebra and Random Vectors Some Basics of Matrix and Vector Algebra 51 1\vo vectors may be added.I.. so that x + y is the vector with ith element Xi + Yi' The sum of two vectors emanating from the origin is the diagonal of the parallelogram formed with the two original vectors as adjacent sides.. 2 (a) (b) Figure 2. From Equation (21).] Choosing c = L. = X2 . (22) it is clear that x is expanded if I cl> 1 and contracted if 0 < Ic I < 1. . A vector has both direction and length..__ ' I I I .3 Figure 2. This is illustrated in Figure 2. the length of a vector in two dimensions can be viewed as the hypotenuse of a right triangle. Addition of x and y is defined as 2 _________________ .3. is defined by [XI. = v'xI + cx = CXI]' CX2 .] Lx = v'xI + + . The length of a vector x' = X2. L. with n components.IX. we define the vector c x as XI [:J The length of x. cx is the vector obtained by multiplying each element of x by c. + = Ic Iv'XI + + .1 The vector x' = [1.2(a). + = Ic ILx Multiplication by c does not change the direction of the vector x if c > O. Le. = v'c2xt + + ... written L. This is demonstrated schematicaIly in Figure 2.3.. From 2 2 Lex = /elL. However.2(a). + (21) Multiplication of a vector x by a scalar c changes the length.2(b). This geometrical interpretation is illustrated in Figure 2. Geometrically.. In particular."" xn]. .. we consider the vector x = A vector x can be represented geometrically as a directed line in n dimensions with component along the first axis.
449 and x'y = cos(O) cos (02 .YZ)· Since.y vx'x vy'y A pair of vectors x and y of the same dimension is said to be linearly dependent if there exist constants Cl and C2.y = V6 = = . as in Figure 2. such that (27) Linear dependence implies that at least one vector in the set can be written as a linear combination of the other vectors. coord YI COS(02) = L y sin(02) and le the ang = y cos(o) = cos(Oz  ° 1) = cos (82) cos (0 1 ) + sin (02 ) sin (oil °between the two vectors x' = [Xl> X2) and y' = [Yl> Y2] is specified by = Next. . .4 The angle 8 between x' = [xI. . A second geometrical is angle.3°. we have the natural extension of length and angle to vectors of n components: Lx cos (0) x = length ofx = =  (25) (26) x'y LxLy = W.742 cos(O) = vy. two vectors in a plane and the le 8 between them. From the figure. • CIX Lx = Wx x'y x'y cos(O) = L L =. Next. Xz. we say that x and y are perpendicular whenx/y = O. Cb not all zero. b d f· . the inner product of x and y is 2 so 0 = 96. x'x = l z + 32 + 22 = 14. 8 can be represented. and the angle between x and y. Therefore.y x/y Figure 2... First. find 3x and x + y.1.53 2 Using the inner product. we define the Inner product of x andyas A set of vectors Xl. as ang difference between the angles 81 and 82 formed by the two vectors and the fITSt the inate axis.= x'y LxLy 3. determine the length of x.. Finally. 2.2) and y' = [2.52 Cbapte r2 Matrix Algebra and Random Vectors Some Basics of Matrix and Vector Algebra ..x21andy' = [YI. x/y = XIYI + XzY2 + . . Vectors of the same dimension that are not linearly dependent are said to be linearly independent. IJ. such that + C2Y = 0 Since cos(900) = cos (270°) = 0 and cos(O) = 0 only if x'y = 0..1 (Calculating lengths of vectors and the angle between them) Given the vectors x' = [1. x. vy. both not zero. (24) . P rpFor an arbitrary number of dimensions n. again. Example 2.742 X 2.oil = (rJ (Z) (Z) + x'y = XIYl Lx and = = Wx = v'I4 = 3. + XzY'2 L 3x = V3 2 + 92 + 62 = v126 and showing L 3x = 3L x. check that the length of 3x is three times the length of x. + xnYn 1be inner product is denoted by either x'y or y'x. x and y are e endicular when x'y = O. = (2)Z + 12 + Ly (1)2 = 6. y e ImtJon.109 (23) = = . 3L x = 3 v'I4 = v126 With this definition and Equation (23). . the length of y. .3.449 . Also. For n dimensions. Since. Consider. y'y 1(2) + 3(1) + 2(1) = 1. 1 We find it convenient to introduce the inner product of two vectors.4. Cz.. cos (8) = 0 only if x/y = 0. Xk is said to be linearly dependent if there exist constants Cl..
. . I x'y I = Lx ILx'y z:L y I : ca np x y = Lxi cos (B) I (29) 1\vo matrices A and B of the same dimensions can be added.y = . (2X3) and B _ [1 (2X3) 2 2 5 14 cos ( 9 ) .j)th entry aij + bij . The transpose operation A' of a matrix changes the columns into rows.4 (The sum of two matrices and multiplication of a matrix by a constant) If A (2X3) _ [0 3 1 1 • y then Figure 2. Example 2.3 (The transpose of a matrix) If Setting implies that Cl': C2 2Cl (2X3) A_[3 A' (3X2) 1 1 5 4 2J +  C3 2C3 =0 = 0 then Cl . and so forth. such that Cl Xl + C2 X2 + C3 x3 = 0.2 (Identifying linearly independent vectors) Consider the set of vectors Some Basics of Matrix and Vector Algebra 55 Many of the vector concepts just introduced have direct generalizations to matrices. C2.54 Chapter 2 Matrix Algebra and Random Vectors Example 2. not all zero.) Example 2. and X3 are linearly independent. Thus call .l Matrices 4A = [0 4 12 and 4 :J (2X3) A + B (2X3) 32 13J=[11 = [0 + 1 1 + 2 1 + 5 1 + 1 3 4 A matrix is any rectangular array of real numbers. x2.C2 + C3 = 0 = 2 4 ca12 with the unique solution Cl = C2 = C3 = O.. the second column becomes the second row. An element of the new matrix AB is formed by taking the inner product of each row of A with each column ofB.• = . . The length of the projection is Length of projectIOn = [ : : '.. The sum A (i. + B has where B is the angle between x and y. : a n2 alP] a2p '" anp It is also possible to define the multiplication of two matrices if the dimensions of the matrices conform in the following manner: When A is (n X k) and B is (k X p).. (See Figure 2.L L Y YY y y (x'y) (x'y) 1 (28) lP] cA = (nXp) where the vector has unit length. so that the first column of A becomes the first row of A'. so that the number of elements in a row of A is the same as the number of elements in a column of B. can 1 ca n 2 . the vectors Xl. We denote an arbitrary array of n rows and p columns by all a21 .. As we cannot find three constants Cl.5. • The projection (or shadow) of a vector x on a vector y is Projectionofxony • ca A matrix may also be multiplied by a constant c.. The product cA is the matrix that results from multiplying each element of A by c. we can form the matrix product AB.5 The projection of x on y.. : anI • A = (nXp) [ a12 a22 . •. and C3.
6 (Some typical products and their dimensions) Let or (i. ] b'..1>. + aikbkj = t=1 L a. we have four products to add for each· entry in the matrix AB.. b11 .. (a" + a..1>.1(4) The product d' A b is a 1 X 1 vector or a single number. b 41 b 1j b 2j b 3j b 4j .. here 26..1(1) 2 4J 6 2 (2x3) = [2(3) 2(1) + 0(5) 2(2) + 0(4)J 1(1) .. .5 (Matrix multiplication) If [7 3 6) [ !J 113) The product b' c is a 1 A= [ 1 X 1 vector or a single number.. .J. Example 2. 3 1 2J 54' then 3 A B = [ (2X3)(3Xl) 1 1 2J [2] = [3(2) + (1)(7) + 2(9)J 5 4 1( 2) + 5(7) + 4(9) bc' = [ 3 [5 8 4] = 6 7] [35 56 15 24 30 48 28] 12 24 The product b c' is a matrix whose row dimension equals the dimension of band whose column dimension equals that of c. (nx4)(4Xp) A B = [a" (at! : anI a12 a13 a.cbtj (210) When k = 4..bc'.. Thus. 4) a n4 .1(5) 1(2) . Row {. ..b'c. which is a single number. it is customary to use the lowercase b vector notation..j) entry of AB = ail blj + ai2b 2j + ..56 Chapter 2 Matrix Algebra and Random Vectors Some Basics of Matrix and Vector Algebra 57 The matrix product AB is A B = (nXk)(kXp) the (n X p) matrix whose entry in the ith row and jth column is the inner product of the ith row of A and the jth column of B k When a matrix B consists of a single column. Example 2. b 2p b 3p an2 a n3 b 4p Then Ab. This product is unlike b' c. Column j The product A b is a vector with dimension equal to the number of rows of A. • = • Square matrices will be of special importance in our development of statistical methods.1 + a"b.. A square matrix is said to be symmetric if A = A' or aij = aji for all i andj.2 ai3 al a. here 13. and d'Ab are typical products. and  G J ! !J + 0(1) 1(3) .1 + a.
with corresponding eigenvector x =f. the matrix is AI.2)2 (.1 (A symmetric matrix) The matrix Some Basics of Matrix and Vector Algebra 59 so [ is symmetric. When two square matrices A and B are of the same dimension.6)4 + (. Even so.58 Chapter 2 Matrix Algebra and Random Vectors Example 2.2 . it follows from the definition of matrix multiplication that the (i. both products AB and BA are defined. so AI = A. the existence of AI is equivalent to (212) (See Result 2A.2 . That is..6 is not symmetric.jI X 0 + aij X 1 + ai. . A method for computing an inverse.9 in Supplement 2A.8 (The existence of a matrix inverse) For [1 0 0 0 0 0 0 a44 0 0 0 1 a55 0 0 0 1 a22 0 0 o o o o o if all the aH =f.4)1 + (.8 . when one exists. . Similarly. j. Ck. . A square matrix A is said to have an eigenvalue A.6)1 J The name derives from the property that if Q has ith row qi. although they need not be equal. the columns have the same property.8)2 QQ' = Q'Q =I or Q' = QI (213) .: 1 and qiqj = 0 for i =f. ak of A are linearly indeperident.8 (. O.. you must be forewarned that if the column sum in (212) is nearly 0 for some constants Cl. a2. but lengthy. . We note that . so the columns of A are linearly independent. The fundamental scalar relation about the existence of an inverse number aI such that ala = aaI = 1 if a =f.) If we let I denote the square matrix with ones on the diagonal and zeros elsewhere. especially when the dimension is greater than three. + aik X 0 = aij. so (kXk)(kxk) I A = (kxk)(kXk) A I = (kXk) A for any A (kxk) (211) The matrix I acts like 1 in ordinary multiplication (1· a = a '1= a). It is always good to check the products AAI and AI A for equality with I when AI is produced by a computer package. calculations are usually relegated to a computer. The routine. if = Ax = AX (214) . then QQ' = I implies that qiqi . lA = A. so the rows have unit length and are mutually perpendicular (orthogonal)..) Example 2. Another special class of square matrices with which we shall become familiar are the orthogonal matrices.8)3 + (. characterized by A=[! you may verify that [ . 0. is given in Supplement 2A. (See Supplement 2A. 0 has the following matrix algebra extension: If there exists a matrix B such that (kXk)(kXk) 1 all 0 o o 1 o o o 1 o o o o 1 0 a22 BA=AB=I (kXk)(kXk) (kXk) 0 0 a33 then B is called the inverse of A and is denoted by AI.According to the condition Q'Q = I. so it is called the identity matrix. then the computer may produce incorrect inverses due to extreme errors in rounding. For example..4)4 (. .10.) Diagonal matrices have inverses that are easy to compute. This • confirms the condition stated in (212). • implies that Cl = C2 = 0.2)3 + (. We conclude our brief introduction to the elements of matrix algebra by introducing a concept fundamental to multivariate statistical analysis.4J . The technical condition that an inverse exists is that the k columns aI. + ai. j)th entry of AI is ail X 0 + .j+1 X 0 + . (See Exercise 2.4J [3 4 2J = 1 [(..•.6 .
Moreover.2 = 4. We usually rely on a computer when the dimension of the square matrix is greater than two or three... e:ej = 0 for i j.2.AI I = 0 are Al = 9. Then A has k pairs of eigenvalues and eigenvectorsnamely. A2 = 9. = e"ek and be mutually perpendicular..4ell 4e21 2e21 + 2e31 = gel1 + 13e21  2el1  + 10e31 2e31 = ge21 = ge31 A method for calculating the A's and e's is described in Supplement 2A. we normalize x so that it has length unity. and e3 are the (normalized) solutions of the equations Aei = Aiei for i = 1. + Ak ek eA: (kX1)(lXk) (kx1)(lXk) (216) Example 2..2. that is. and e:ej = 0 for i j. .1 0 (The spectral decomposition of a matrix) Consider the symmetric matrix Then. we find that e31 = O. You may verify that ez = [1/v18.. a direct consequence of an expansion for symmetric matrices known as the spectral decomposition. It is instructive to do a few sample calculations to understand the technique. • ez 13ell . but two of the equations are redundant. = [1/VI2 + 12 + 02. * A  [1 5J 5 . Squared distances (see Chapter 1) and the multivariate normal density can be expressed in terms of matrix products called quadratic forms (see Chapter 4).. we consider quadratic forms that are always nonnegative and the associated positive definite matrices.. 1/3] is the normalized eigenvector corresponding to the eigenvalue A3 = 18. multivariate analysis. A 2.. Sparing you the details of the derivation (see [1 D. 0/V12 + 12 + 02] = [1/\12..3 Positive Definite Matrices The study of the variation and interrelationships in multivariate data is often based upon distances and the assumption that the data are multivariate normally distributed. 1 Example 2. Thus. e2.3.1. k.I/\I2]. Selecting one of the equations and arbitrarily setting el1 = 1 and e21 = 1. . In this section. el = . eiei = 1 for i = 1. since the sum of the squares of its elements is unity. and we do so in what follows.1.. 1 = x'x. 2. we state the following basic result: Let A be a k X k square symmetric matrix. and . = [1/v'2. the normalized eigenvector is e. in many cases.. ek are the associated normalized eigenvectors. and e3 = [2/3. 2/3.0].1. 1/\12.. since A = [ 4 2 13 4 2] 13 2 2 10 Al = 6 is an eigenvalue. .p 60 Chapter 2 Matrix Algebra and Random Vectors Positive Definite Matrices 61 Ordinarily. . * lA proof of Equation (216) is beyond the scope ofthis book.30).. The eigenvectors· are unique unless two or more eigenvalues are equal.9 (Verifying eigenvalues and eigenvectors) Let where AI. It is convenient to denote normalized eigenvectors bye. The interested reader will find a proof in [6). The corresponding eigenvectors el. (215) The eigenvectors can be chosen to satisfy 1 = e.14 in Supplement 2A). You may wish to show that a second eigenvalueeigenvector pair is . Chapter 8. Ak are the eigenvalues of A and el.3 = 18 (Definition 2A. + . Results involving quadratic forms and symmetric matrices are. . .2 e2 ez + . I/VI2 + 12 + 02. 1/v'I8. Ael = Ae1 gives or is its corresponding normalized eigenvector. The spectral decomposition of a k X k symmetric matrix A is given by1 A = Al e1 (kXk) (kX1)(lxk) e. 4/v'I8] is also an eigenvector for 9 = A 2 .. Consequently. (See also Result 2A. e2. . it should not be surprising that quadratic forms play a central role in Moving the terms on the right of the equals sign to the left yields three homogeneous equations in three unknowns. and The eigenvalues obtained from the characteristic equation I A . Thus. Consequently.
) A is a nonnegative definite matrix if and only if all of its eigenvalues are greater than or equal to zero. premuItiplication and postmultiplication of A by x/ and x. respectively.0. .11 (A positive definite matrix and quadratic form) Show that the matrix + Azezez + A3 e 3e 3 for the following quadratic form is positive definite: 3xI + . When a k X k symmetric matrix A is such that (217) Os x/A x for all x/ = (XI' Xz.. Thus.2Vi XlxZ 1 Vi To illustrate the general approach. As we pointed out in Chapter 1.. Xz. O. Xp of a vector x are realizations of p random variables XI. . which we now develop... or (3 . (See Exercise 2.. we can write IA A (ZXZ) + Azez (ZXIJ(JXZ) ei = 4el e.17. that x/ Ax = 4YI + > 0. we first write the quadratic form in matrix notation as (XI XZ{ o 1 vJ V. consequently. . The first of these is a matrix explanation of distance. the eigenvalues of A are the solutions of the equation . . With it.30. xd.. Xp. Xz. From the definitions of Y1 and Yz. But x is a nonzero vector. If equality holds in (217) only for the vector x/ = (0. Using the spectral decomposition in (216)..62 Chapter 2 Matrix Algebra and Random Vectors Positive Definite Matrices 63 The spectral decomposition of A is then A = Alelel or [ 13 4 4 13 2 2 2 2 10 J = 9 _1_ Vi [ Example 2.A)(2 . • or y (ZXI) The spectral decomposition is an important analytical tool. . we are very easily able to demonstrate certain statistical results. Because x/ Ax has only squared terms xt and product terms XiXb it is caIled a quadratic form. . respectively.J = Aiel ej (2XIJ(IXZ) = x/Ax VIS +9 1 VIS 4 4 ] VIS vT8 + 18 1 VIS 1 18 1 18 4 18 1 18 1 18 4 18 4  2 3 2 3 1 3 By Definition 2A.] [. x = E/y. where x/ = (XI' xz] is any non zero vector. A is positive definite if (218) 0< x/Ax for all vectors x = E X (ZX2)(ZXI) Now E is an orthogonal matrix and hence has inverse E/. (ZXI)(IX2) + e2 ei (ZXIJ(IXZ) 18 4 18 16 18 where el and e2 are the normalized and orthogonal eigenvectors associated with the eigenvalues Al = 4 and Az = 1.0]. we can easily show that a k X k symmetric matrix A is a positive definite matrix if and only if every eigenvalue of A is positive.A) . Assume for the moment that the p elements XI. . . The solutions are Al = 4 and Az = l. Because 4 and 1 are scalars. . and 0 x = E/y implies that y O. then A or the quadratic form is said to be positive definite.2 = O. both the matrix A and the quadratic form are said to be nonnegative definite. • Using the spectral decomposition. we have as you may readily verify. give x/ A x = 4x' el ej x (I XZ)(2xZ)(ZXI) (I XZ)(ZXI)(I X2)(ZX 1) + ·x/ ez (IXZ)(2XI)(1 X2)(ZXI) ei x = 4YI +0 and Yz + 4 9 4 18 9 2 9 4 9 4 9 2 9 2 9 2 9 1 9 with YI = x/el = ejx = x/ez = eix We now show that YI and Yz are not both zero and.AI I = 0. or A is positive definite. In other words..
I = 1. where A IS a p X P symmetric positive definite Ifp > 2. the points at distance c lie on an ellipse whose axes are given by the eigenvectors of A with lengths proportional to the reciprocals of the square roots of the eigenvalues. be interpreted in terms of standard deviation units. x = cA1/2· 2 e2 gIves the appropriate distance in the e2 direction.. A .. ' I) Jl' I J. .) We easily verify that x = cA I l/2el . .17. + + 2(a12xlx2 + a13 x l x 3 + . b .···. Then + + 2a12xIX2 = 2 A (kXk) By the spectr. Points with the same associated "uncertainty" are regarded as being at the same distance from the origin. p.. If we use the distance formula introduced in Chapter 1 [see Equation (122»). and in this case should.. 1. where AI. .. = P'P = I and A is the diagonal matrix •• : o 0J with A.. . Conversely. The halflength in the direction e· is equal to cl Vi .6 Points a constant distance c from the origin (p = 2.. a positive definite quadratic form can be interpreted as a squared distance.p x plXp) provided that (distance)2 > 0 for all [Xl. . Setting a·· = ti·· .X p ) a constant distancec = v'x'Axfrom the origin lie on hyperellipsoids c2 = AI (x'el)2 + ...2. . . f· '( ' 1/2' satIs Ies x 'A x = "l Clll elel )2 = 2 .. .p.. Ap are the eigenvalues of A. we have a2p [Xl] X2 .. p. and this leads to a useful squareroot matrix. X2. . Thus. + ap1. Then the points x' = [XI. . X2. suppose p = 2. .···.'·' ed.p. Xp) to the ongm be gIven by x A x. e2.=1 2: Aieie. .. Then the square of the distance from x to an arbitrary fixed point po I = [p.. Expressing distance as the square root of a positive definite quadratic form allows us to give a geometrical interpretation based on the eigenvalues and eigenvectors of the matrix A. the distance from the origin satisfies the general formula (distance)2 = allxI + + . Xp) [0.4 A SquareRoot Matrix The spect. . 1 S Al < A2)· = x'Ax forx 0 (219) From (219).0. .. and the "distance" of the point [XI> X2..2. " I = 2 2. .po).. For example.1> P.2. the points x' = [XI.Chapter 2 Matrix Algebra and Random Vectors A SquareRoot Matrix 65 64 we can regard these elements as the coordinates of a point in pdimensional space.po)' A( x . ] = 1. S·ImiI arIy. xpJ' to the origin can. Let A be a k X k positive definite matrix with the spectral decomposition A = .. distance is determined from a positive definite quadratic form x' Ax. .ral allows us to express the inverse of a square matrix in of Its elgenvalues and eigenvectors. c2 = AIYI + is an ellipse in YI = x'el and Y2 = x'e2 because AI> A2 > 0 when A is positive definite. .=1 ei ej = P A pI (220) (kxl)(lXk) (kXk)(kXk)(kXk) Now. In sum. In this way.. .11. A = Alelei + A2e2ez so x'Ax = AI (x'el)2 + A2(x'e2)2 where PP' 2: Ai .X2..al decomposition.p) is given by the general expression (x .·.. The constant of proportionality is c. alP] .. we see that the p X P symmetric matrix A is positive definite. The situation is illustrated in Figure 2. Let the normalized eigenvectors be the columns of another matrix k k P = [el. . . . we can account for the inherent uncertainty (variability) in the observations. + A (x'e )2 whose axes are . > 0 . the of the from the point x' = [Xl.. as in Example 2. (See Exercise 2.. .6. X2) of constant distance c from the origin satisfy x' A x = a1lx1 . . .2..0). a pp Xp or 0< (distancef Figure 2.. PP' gIven y the elgenvectors of A..
3) + (1)(.40) E(X + Y) == E(X) + E(Y) E(AXB) == AE(X)B (223) 2If you are unfamiliar with calculus. and let A and B be conformable matrices of constants. of a positive definite matrix A. A I/2A. Then (see Exercise 2. where. for each element of the matrix. as the ith diagorial element. 2. Let the discrete random vanable XI have the following probability function: The squareroot matrix. (N/ 2 )' = AI/2 (that is. Next.I/2A. let the discrete random variable X 2 have the probability function ±. where AI/2 = Then E(X2) == Thus. Then the expected value of X. and A. (AI/2rl. = P A l/2p.5 Random Vectors and Matrices A random vector is a vector whose elements are random variables. let X = {Xij} be an n X P random matrix. eje.4) ==. AI/2 is symmetric). 1. is called the square root of A and is denoted by since (PAIp')PAP' AI/2. (AI/2) I = eiej = P A1/2p'. you should concentrate on the interpretation of the expected value and. and consider the random vector X' = [XI .3 (222) 1 . variance.2. k .66 Chapter 2 Matrix Algebra and Random Vectors Random Vectors and Matrices 67 Thus.4 = 2: VX.2 (221) E(X.8) + (1) (. all L X2 X2P2(X2) == (0) (.j) = = PAP'(PAIp') = PP' = I.! = 1. (224) E(XIP)] E(X2p ) E(Xd E(Xnp ) . let A 1/2 denote the diagonal matrix with VX. is the n X P matrix of numbers (if they exist) • '!Wo results involving the expectation of sums and products of matrices follow directly from the definition of the expected value of a random matrix and the univariate properties of expectation. denoted by E(X). E(XI + Yj) == E(XI) + E(Yj) and E(cXd = cE(XI)' Let X and Y be random matrices of the same dimension.1. as the ith diagonal element. AI/2 AI/2 = A.I/2 = AI.X2 ]. Our development is based primarily on the properties of expectation rather than Its partIcular evaluation for continuous or discrete random variables. j=1 ! L a!lx! 1: Xij/ij(Xij) dxij if Xij is a continuous random variable with probability density functionfu(xij) if Xij is a discrete random variable with probability function Pij( Xij) Xi/Pi/(Xi/) aJlxij Example 2.2) == . AI/2 o . 3. 4.12 (Computing expected values for discrete random variables) Suppose P = 2 and.3) + (0)(. Similarly. j=1 Similarly. Specifically. 2. a random matrix is a matrix whose elements are random variables. where A1j2 is a diagonal matrix with vA j 1/ VX. The expected value of a random matrix (or vector) is the matrix (vector) consisting of the expected values of each of its elements. The matrix L VX.I/2 = AI/2AI/2 = I. eje. = P A l/2p' i=1 k ThenE(XI) has the following properties: = L xIPI(xd = (1)(.
JL2) (X2 . (Tf = It will be convenient in later sections to denote the marginal variances by (T.JL2)2 (Xp .ILl) = [ E(Xp . the collective behavior of the P random variables Xl. .. i. As we have already noted in this book. than the more traditional The behavior of any pair of random variables. . and Xb is described by their joint probability function..JLz)Z .) and (Tt = E (Xi . (Tik = E(X.. .'" xp).2.) (225) XiPi(Xi) 00 00' (x. and X k are independent (229) alIxj L (x. Xk) (226) and JL. equivalently. the independence condition becomes fik(Xi..12.JLP)] E(X2 . Xp]. More generally.JLd 2 (X2 ..JL2) E(XI . . Thus. and X k are not independent.f(x) will often be the multivariate normal density function. . Specifically.JLp) E(XI .)2 p..) The means and covariances of the P X 1 random vector X can be set out as matrices. Xk)' The P continuous random variables Xl. X 2.(x. the covariance becomes the marginal variance. such as X.JLp) E(Xp . . Statistical independence has an important implication for covariance.(x. P. .) . for all pairs of values xi.. .TheneachelementofXisa random variable with its own marginal probability distripution. . and X k are continuous random variables with joint density fik(Xi.JLI)2 E(X2 .. When X. and X k are said to be statistically independent.JLP:) (Xl  JLI) . so that (227) = E (Xl . we shall adopt this notation . Xp are mutually statistically independent if their joint density can be factored as (228) for all ptuples (Xl> X2.68 Chapter 2 Matrix Algebra and Random Vectors Mean Vectors and Covariance Matrices 69 2..JLp)(XI  JLI) JLI) (Xl ... k = 1./L)(X . x 2. Xp] isap x 1 random vector...JLk)Pik(Xi. X k ) = 0.JLi)2. Xp or.JL2). p.able '" 'with probability density function fi(Xi) JL. if Xi is a continuous random vari.. X k are discrete random variables with joint probability function Pike Xi.. .. .. 2. . and X k :5 Xk] can be written as the product of the corresponding marginal probabilities.I = E(X .) if Xi is a discrete random variable with probability function P. (Xl ./L)'. respectively. The expected value of each element is contained in the vector of means /L = E(X). and a measure of the linear association between them is provided by the covariance !1 ! aUXi 1 L 00 00 x. Xk) all xk if X. (See Chapter 4. The factorization in (228) implies that Cov (X. Specifically. X k ) = O.. ... but X..JLl)(Xp .) If the joint probability P[ Xi :5 X.JLP)] .JL2) E(Xz . X 2. is described by a joint probability density function f(XI' X2.JLi)(Xk .'" xp) = f(x).6 Mean Vectors and Covariance Matrices SupposeX' = [Xl.i and the pep .JLp)(X2 .JLI)(X2 . there are situations where Cov(Xi . rather and consequently. .1Lz):(XI [ (Xp .( x) dx. and JLk. (See Example 2. and the P variances (T. ·. if Xi is a discrete random variable with probability function p. Xk.. are the marginal means. the random vector X' = [Xl. X 2. When i = k. (See [5].JLlt.) The marginal means JLi and variances (Tf are defined as JLi = E (X.)(Xk .JLI)(X2 . . X k are continuous random variables with the joint density functionfik(x.(Xi) if X. then X.JLp)2 E(XI . Xk) all E(X) = = = /L [ E(Xp) JLp E(XI)] [JLI] (230) and L L Xi (X..JL.ILz)(Xp .1)/2 distinct covariances (Tik(i < k) are contained in the symmetric variancecovariance matrix . [.  ut. i = 1.JLI)(Xp . Xk) = fi(Xi)fk(Xk) for all pairs (Xi.(x) dx.JLk) if X.ILz)(XI .. if Xi is a continuous random variable with probability '" 'density function fi( x. xd and marginal densities fi(Xi) and fk(Xk). The converse of (229) is not true in general.(Xp JLp) (Xp . (X2 .
iL2) = all pairs (x j.fL2)(X I ..2) .4) 1T22 = E(X2 .JLlf (X2 .iLl) = E(XI .IL2)2 = (0 .12 and 2..06 ..1)2pl(xd = .70 Chapter 2 Matrix Algebra and Random Vectors or Mean Vectors and Covariance Matrices 71 'Consequently.06) = .3 . so it is not surprising that these quantities play an important role in many multivariate procedures. X2) "is represented by the entries in the body of the following table: .JL)(X  [UU JL)' = ITt2 1T12 1T22 .1)\.JL)' Example 2.16 • P2(X2) We note that the computation of means.00 .2 Pl(xd . (See Exam(XI .24 ..2)(.2)PdXI' X2) We shall refer to JL and l: as the population mean (vector) and population variancecovariance (matrix)...JL2)2 >z XI 1 0 1 = = [ITIl 0 ..IL2)(Xl .JLlf [ E(X2 . X2) 2: (Xl  .2f(.=:.E[(Xl . X21.69 1T22 . Because lTik = E(Xi .2.1)(1 . [5].ILI)(X2 .JLd IT12J = [ .2)(.08J .JL)(X .14 .8) = 1T12 = = 2: all X2 (X2 .ILI)(X2 .1 and iL2 = E(X2) = . it is convenient to write the matrix appearing in (231) as We have already shown that ILl ple 2..3) 2: all Xl ITlp 1T2p l (232) + (0 . It is frequently informative to separate the information contained in variances lTii from that contained in measures of association and.2)2(.JLi) (Xk .13 (Computing the covariance matrix) Find the covariance matrix for the two random variables XI and X 2 introduced ill Example 2..2)(..fL2)] (X2 .3) + (1 .JLI)(X2 .) . + (1 .fL2)] E(X2 . The multivariate normal distribution is completely specified once the mean vector JL and variancecovariance matrix l: are given (see Chapter 4)..1)2(.JLk) = ITki.) In addition.16 .fL2)2 E(Xl ..ILl? = = (1 ..JLd E(Xl .lJ E(X2) IL2 . .JL2)(XI .4 1 1T21 .24) + (1 .1)(1 .08 (XI .08 1T12 Pi k = = . with X' = [Xl..= lTik + .. the measure of association known as the population correlation coefficient Pik' The correlation coefficient Pik is defined in terms of the covariance lTik and variances ITii and IT kk as = (1 . respectively.1)2(. and covariances for discrete random variables involves summation (as in Examples 2.iL2) = The correlation coefficient measures the amount of linear association between the random variables Xi and X k .1)(x2 . variances.08 .".13).JLl) (X2 ..40 .12 when their joint probability function pdxJ.2 l: = E(X .69 l: = E(X .16 E(XI . 1T2p ITpp u" 1T11 = E(XI . (See.for example. while analogous computations for continuous random variables involve integration.3 . 1T11 l: = COV(X) = [ ITpl (231) and JL = E(X) = [E(XdJ = [ILIJ = [.00) 1T21 (233) = E(X2 .12..8 1 . .1)(0 .. = E(XI) = . in particular...::.2)2pix2) + (1 ..
. . Example 2. One approach to handling these situations is to let the characteristics defining the distinct groups be subsets of the total collection of characteristics. If the total collection is represented by a (p X 1)dimensional random vector X..' II l:. the correlation matrix p is given by (234) o ! 3 o and let the p X 0] [4 0 1 5 1 1 9 2 3 3 25 0 P standard deviation matrix be jJ Then it is easily verified (see Exercise 2. two groups of size q and p . 00 0 0] [2 H] 2] [! 0 0] 0 0 p= vU. 73 X Let the population correlation matrix be the p 0"11 0"12 P symmetric matrix Here vu:. we can partition the p characteristics contained in the p X 1 random vector X into. the subsets can be regarded as components of X and can be sorted by partitioning X.YU... In general.23) that and (235) • Partitioning the Covariance Matrix Often. we can write (236) (237) obtained from · "can be obtained from Vl/2 and p.72 Chapter 2 Matrix Algebra and Random Vectors Mean Vectors and Covariance Matrices. Consequently.. Vl/2 = o 0"12 0"22 and [ o Vo). from (237). the expression of these relationships in terms of matrIX operatIOns a ows the calculations to be conveniently implemented on a computer.Yu. Moreover. respectively. the characteristics measured on individual trials will fall naturally into two or more groups. consider measurements of variables representing consumption and income or variables representing personality traits and physical characteristics.. 0"2p O"lp Yu.. whereas p can be Th a t IS.14 (Computing the correlation matrix from the covariance matrix) Suppose 3 Obtain Vl/2 and p. = 25 0"13 . For example.q. for instance. . As examples.
JLp) is a matrix containi ng all of the covariances between a compon ent of X(!) and a compon ent of X(Z).. i = 1.JLd(X .aILIl(bXz . between a compon ent of X(!) and a component of X(2). Making use of the partitioning in Equation (238). 1:21 ! pq + b2lT22 + bZYar( Xz) + 2abCov (X1.. we can easily demons trate that (X .lTi..JLq+l) (XI (X2 =JL2)(X JLI)(Xq+2 = JLq·d q+2 ILq+2) (X:I (X2 = : ' : (Xq . aXI + bX2 can be written as [a b) (pxp) Uu Uql lTl q Uqq 1:22J lTlp lTqp = e'X 1". using addition al properti es of expectation.JL)(X ..JLq)(Xp ..JL2)] 2 = E[a(XI .q+l IT qP = 1: IZ (239) If X 2 is a second random variable and a and b are constants..JLZ)]2 = E[aZ(X I . we have z E(aXI (X(I) .. q. that of X(2) is 1:22 .JL(2))'J (qxl (IX(pq» Yar(aXI + bXz) = aE(XI ) + bE(X2) = aJLI + bJL2 + bX2) = E[(aXI + bX2) .2.JL(Z»' lTI.JLI) 2 1: = E(X .JL)' = q pq . I ' I (X(2) and consequently. .74 Chapter 2 Matrix Algebra and Random Vectors Mean Vectors and Covarian ce Matrices 75 From the definitions of the transpose and matrix multiplication. q (pxp) ((pq)XI) . E(aXl + bX2) = aJLI + bJL2 can be expressed as [a b] = e'..q+2 = cE(Xd = CJLI and lTIP] = [ UZt 1 Uq.bX2) = E(aXI .ILq+2) JLI)(Xp IL2) (Xp = JLP)] JLp) The Mean Vector and Covariance Matrix for linear Combinations of Random Variables Recal1 that if a single random variable. such as XI.1 lTpl Uq+l.. we get which gives al1 the covariances. lTq+I. p.JLq [ Note that 1: 1z = 1: 21 .JL (2»). .. == Xq ...r(!»(X( Z) ..JLq+d (X2 .JL(I»)(X(2) . IT q.(aJLI + bIL2»)2 . b]. (IX(pq» = a2Yar(XI ) = a lTl1 + b(Xz . Note that the matrix 1:12 is not necessarily symmetric or even square.JLp) constan t c.X(2) = 1:12 [Xq+l'...JLq+2.. q + 2. ::: ==: (XI ... 21 It is sometimes conveni ent to use the COy (X(I)..JLd(Xq+1 .JL)(X .. we get UI.. for the linear combina tion aX1 + bX .JL2)(Xq+1 . then Upon taking the expectation of the matrix (X(I) . !Uq. .+_ . is multiplied by a E(cXd (Xq ..)' Cov(aXI . X(Z» notation where COy (X(I).... .JLq+l> Xq+2 .JLI) (X2 .ILZ)2 + 2ab(XI .JL(I»)(X(Z) ..(2»'.XZ) + 2ablT12 (241) With e' = [a.(2) (X(Z) ."" Xp .q+1 l : i Similarly.JLq+l) (Xq . and 11 that of element s from X(!) and X(Z) is 1:12 (or 1: ).q+2 . j = q + 1. The covariance matrix of X(I) is 1: ..q+1 E(X(l) .. .q+ l lTpq lTq+l.q (q+l.Xz ) = ablT12 Finally. then.p lTpp If we let j Up.JLz) = abCov (XI.. lTZt Z :.bILz) =abE( XI .q+1 .JLI)2 + bZ(Xz .JLq)(Xq+1 .JLq)(Xq+2 .
(238).x = and l:z = mean = E( c'X) = c' Pvariance = Var(c'X) = c'l:c where p. X2. + C2pXp Note that if all = a22 that is.X 2 b] [all al2] al2 Zz = XI a22 = a2all + 2abul2 + b2un or + X2 The preceding results can be extended to a linear combination of p random variables: The linear combination c'X·= CIXI + '" + has (243) in terms of Px and l:x.Xp) E l 2 IS (Means and covariances of linear combinations) Let X'.. and let ."" xp] be the vector of sample averages constructed from n observations on p variables XI. n L 1 n (Xjl  Xl) (Xjp .== E(X) and l: == Cov (X).xp 1 ( n j=l _ )2 be the corresponding sample variancecovariance matrix. if Xl and X 2 have equal variancestheoffdiagona} terms in :tz vanish. consider the q 1· mear combinations of the p random variables Xj. ponents and factor analysis in Chapters 8 and 9. Mean Vectors and Covariance Matrices 77 be the variancecovariance matrix Var(aXl since c'l:c = [a Equation (241) becomes (242) + bX2 ) = Var(c'X) = c'l:c [a] b Find the mean vector and covariance matrix for the linear combinations ZI = XI . . 1 l:x = :::J .. are228 for the computation of the offdiagonal terms m x. (237). . The linear combinations Z = CX have Pz = E{Z) == E{CX) = Cpx (245) l:z = Cov(Z) = Cov(CX) = Cl:xC' Let x' = [XI..£J xJP . _ be a random . p. . X p . vector with mean vector Px .[JLIJ 1 JL2 = Cov(Z) = C:txC' = nlJ [a 1 [/LI . . = [Xl> xamp e· ..z } and variancecovanance matrIX .JL2] JLI + J.•.. X 2 .:.76 <. • or (244) Cq 2 (qXp) Partitioning the Sample Mean Vector and Covariance Matrix Many of the matrix results in this section have been expressed in terms of population means and variances (covariances).::hapter 2 Matrix Algebra and Random Vectors .L2 l2 a a22 11 al2 J[ 11J 1 1 ZI = Z2 C!1X1 C21Xl = + C12X2 + ..Xp: C1J . . the mean vector and matrix v:here Px and l:x.[/LI. . . •... The results in (236). In general. and (240) also hold if the population quantities are replaced by their appropriately defined sample counterparts... Here Pz = E(Z) = Cp. on the result in (245) in our discussions of principal com. This demonstrates the wellknown result that the sum and difference of two random variables with identical variances are uncorrelated.. + CjpXp + CnX2 + . . .
xd'b . where x is an arbitrary scalar. If we set [see also (222)] p . The matrix inequalities presented in this section will easily allow us to derive certain maximization results. e. Matrix Inequalities and Maximization 79 Proof. For cases other than these. Linear discriminant analysis. is concerned with allocating observations to predetermined groups. and S12 = S:n is the sample covariance matrix for elements of x(I) and elements of x(Z). in this case o< (pXl) (b . If we complete the square by adding and subtracting the scalar (b'd)2/ d 'd. (247) A simple. The allocation rule is often a linear function of measurements that maximizes the separation between groups relative to their withingroup variability. consider the vector b . 0 = (b .78 Chapter 2 Matrix Algebra and Random Vectors The sample mean vector and the covariance matrix can be partitioned in order to distinguish quantities corresponding to groups of variables.=1 it follows that b'd = b'Id = b'Blf2B1/ 2d = (Bl/2b)' (B1/2d) X 1 vectors. which will be referenced in later chapters.0.... consider the squareroot matrix Bl/2 defined in terms of its eigenvalues A. we get (b'd)2 (b'd)2 0< b'b .X d. and (249) (b'd/ $ (b'B b)(d'B.. and the same argument produces • (b'd)2 = (b'b)(d'd).2x(b'd) + x 2d'd X J!L Xq+l (246) + x 2 (d'd) The last expression is quadratic in x. 2.xd is positive for b .xd *. d (pXI) be any two vectors. extension of the CauchySchwarz inequality follows directly.xp]'.1/ Z = 2: VX.q+1 Sip = b'b . so we conclude that (b'd)2 O<b'bd'd or (b'd)2 < (b'b)( d' d) if b *. The inequality is obvious if either b = 0 or d = O.xd for some x. (b'b)(d'd) = cd (or d = cb) for some constant c. The term in brackets is zero if we choose x = b'd/d'd. Then (pXl) (pxp) where x(1) and x(Z) are the sample mean vectors constructed from observations x(1) = [Xi>"" x q]' and x(Z) = [Xq+b"" .2 (b'd) + 2(d'd) d'd d'd x x and SI. _1_ I I . Since the length of b . Excluding this possibility.+ . Extended CauchySchwarz Inequality. Note that if b = cd.=1 ±VX. for example.cd).b'(xd) = b'b . The inequality is obvious when b = 0 or d = O. .q+1 Sqp .1d) with equality if and only if b = c B1d (or d = cB b) for some constant c. principal components are linear combinations of measurements with maximum variability.xd)'(b . Let band let B be a positive definite matrix. SZ2 IS the sample covanance matrix computed from observations X(2). and the normalized eigenvectors e. Let band d be any two p (b'd)2 with equality if and only if b $ Proof.xd) = b'b . (b'd)2 + (d'd) (b'd)2 x .d'd d'd SIl (pxp) = Sql Sqq : Sq. as B1/2 = B.cd)'(b . • (248) The extended CauchySchwarz inequality gives rise to the following maximization result. SII is the sample ance matrix computed from observatIOns x( ). As another example. Then and the proof is completed by applying the CauchySchwarz inequality to the vectors (Bl/2b) and (B1/2d). Thus. CauchySchwarz Inequality.ej.1 Matrix Inequalities and Maximization Maximization principles play an important role in several multivariate techniques. important.
(x'd)2 * ekel == . . e2. Let B be a (pXp) positive definite matrix with eigenvalues Al A2 ..1) (252) where the symbol . Ap 0 and associated normalized eigenvectors el. ek+1. inequality in (253) become s elgenvectors e..Ue1 = Al A similar produce s the second part of (251)...== Al x>'O x... for an arbitrar y nonzero vector x . 0. ( __ _x d'B1d x'Bx Taking the maximum over x gives Equatio n (250) because the bound is attained for x = CBId. .. x'Bx x'B1(2B1/2x x'PA 1 /2P'PA 1(2P'x = y' Ay y'y y'y x'pP'x x'x (pxp) Proof.' . Let (pxp) B be positive definite and d be a given vector... so x ..= ek XX Ak+1 (attained when x = k = 1. k = 1 k *1 'd)2 ::. maXImum value of th ' pomts x whose distance from the ori in i . A [mal maximization result will provide us with an interpretation of For this choice ofx.. P . Setting x = el gives Then.Py == Ylel + Y2e2 + .x>...... x.1 is read "is perpendicular to..value as x'Bx.LeJ. i (attaine d when x = ed (attaine d when x <= (251) x'Bx x'x = ... SImIlarly." be the orthogonal matrix whose columns are the eigenvectoIS el.o x'Bx with the maximum attained when x (pXI) = = d' B1d 1 (250) cB (pxp)(px l) d for any constan t c * O. y .2. e argest and smallest The "interm ediate" eigenvalues of the X 0 x x for on the unit sphere.80 Chapter 2 Matrix Algebra and Random Vectors .J (pxp) I i=l = p i=l A. e quad rahc form x'Bx for all the quadratic form for all pOI'nts g s. Ap along the p main diagonal .···. . Because x 0 and B is positive definite.=1 2: YT . x#.Uel eiel == e..Yp == O· gIVes the asserted maximum. since $: (x'Bx) (d'BId ). ufmt . A 2 .i=l p "l \ 2:YT . . . w en x IS urther restncte d to be perpend icular to * • f . + ypeje p == Yi. l) Consequently.x x'Bx min. Thus. Dividing both sides of the inequality by the positive scalar x'Bx yields the upper bound proof. we have y' Ay/y'y = Al/l = AI' or (254) e.·. +2 .Y'f l=k+l i=k+l p ep) L YT p Taking Yk+I=I Yk . x . Ap is the smallest value of . Matrix Inequalities and Maximization 81 (pXI) Maximization Lemma ...e2 ' + .=A x>'o x'x p Moreover.O implies Y O...1 eh'" ek . x one umt rom the ori' Th I elgenvalues thus represen t extreme values f I gm.. interpre tation as extreme values hP. p Implies • eigenvalues.. . Now. + ype . max x'Bx ."" e and A be the diagonal matrix with eigenvalues AI. {I.' k Therefo re... By the extende d CauchySchwarz inequality. x'Bx > O. the lefthan d side of the $: o= I == ye'e 1 i 1 + Y2e.2: A. Let P * a fixed x x' ==For xo/Vx& xo is 0 x' B / I has the same . where largest eigenvalue A I'S the gt: onsequently. ' 1.yf p <: _ AIp.. e po Then x'Bx max. EquatIOn (251) says that the .. (pXl) ( 'd)2 max 2. Let Bl/2 = PA 1 /2P' [see (222)] and (plO) v = (pxp)(px P' x. pOSItIve matrix B also have an the earlier choices. for x perpend icular to the first k . e2. Maximization of Quadratic Forms for Points on the Unit Sphere.
. . Xm are required. such that alxl + a2x Z + . . . is 0. is that vector + Y with ith entry Zi = Xi + Yi Thus.Xk.Supplement z = x Vectors and Matrices: Basic Concepts 83 Definition 2A. + akXk is a linear combination of the vectors Xl. .3 (Vector addition). Definition 2A. for example.4. + akxk = 0 Otherwise the set of vectors is said to be linearly independent. take Cl = Sand Cz = 1. Xz.0) and the vector x to be the mtuple (Xl. ak).xm).. .. Xz. If one of the vectors. Then the product cx is a vector with CXi' To illustrate scalar multiplIcatiOn. several different measurements Xl' Xz.X2..1.. .· .. . Examples of vectors are Taking the zero vector. or cannot be adequately quantified as a single number. Then so CIY=S[ 2 10 82 and CZY=(1. Let c be an arbitrary scalar. ... Definition 2A. Definition 2A..·.·. Xk is said to be linearly dependent if there exist k numbers (ai.. .2.S. is called their linear span. Xi. The vector y = alxl + azxz + .6. Vectors are said to be equal if their corresponding entries are the same. . (Let ai be the only nonzero coefficient in Definition 2A. A set of vectors xl. 0.. The space of all real mtuples. .) The familiar vectors with a one as an entry and zeros elsewhere are lirIearly independent."" Xm) arranged in a column is called a vector and is denoted by a boldfaced. An mtuple of real numbers (Xl> Xz. to be the mtuple (0. The sum of two vectors x and y.. Definition 2A..4 implies that al = a2 = a3 = a4 = O.. Xk' The set of all linear combinations of Xl. not all zero. such as a person's health. . . each having the same number of entries.6. the two operations of scalar multiplication and vector addition can be combined in a useful manner. . x + y z VECTORS AND MATRICES: BASIC CONCEPTS Vectors Many concepts.0. the set is linearly dependent. . For m = 4. lowercase letter. az. is called a vector space. intellectual abilities. Xz..2 (Scalar multiplication). Definition 2A. .. Rather.2)[ 2 2. with scalar multiplication and vector addition as just defined... Xi.
. Th e angI e () between two vectors x and y both h ..I x.. + .83 V42 + (3)2 + 02 + 12 = v26 = 5.S. pefinition 2A.. since anyone can be written as a linear combination of the others (for example.. The inner (or dot) d number of entries is defined as the pro uct of two vectors x and y with the same sum 0 f component products: XIYI X2 ... let k Vectors and Matrices: Basic Concepts 85 Definition 2A.10 These four vectors were shown to be linearly independent. x I. . th With the x'y notation we ma the angle between two vedtors as y express e length ?f a vector and the cosine of Definition 2A. )'2.9. The length of a vector of m elements emanating from the origin is given by the Pythagorean formula: Lx = length of x = V xI + + . . . . vectors are g y. defined from . 2 • = 5. Xm are the elements of x.T. . With m = 4. . () = 135°. an the cosme of the angle between the two length ofx = lengthofy = and V( _1)2 + 52 + 22 + (_2)2 = V34 = 5. x2. = 3 and m = 3.. x + x2Y2 + .. Every vector can be expressed as a unique linear combination of a fixed basis. the len th of d . avmg m entfles. .. with m = 2. Let Thus. IS .. x2 = 2xI + 3X3)· Definition 2A. xl.83 X 5.10 [21J = . X2. the usual choice of a basis is Then the length of x. .. Any set of m linearly independent vectors is called a basis for the vector space of all mtuples of real numbers. Any vector x can be uniquely expressed as = V34 v26 [(1)4 + 5(3) + 2(0) + (2)lJ 1 1 1 A vector consisting of m elements may be regarded geometrically as a point in mdimensional space. pro d uct. + LxLy XmYm) where Lx = length of x and L = len th of and YI.. Ym are the elem:nts Of:' y.X2 + 3x3 = 0 (XIYI + X2)'2 + . and let cos«() = Then 2xI .. + xmYm We use the notation x'y or y'x to denoteth·IS mner .I.. . + = cos«() = x'y lengthofx = Lx = VXI + + . . For example.IO. 84 Chapter 2 Matrix Algebra and Random Vectors As another example..706 Consequently. . Result 2A. x3 are a linearly dependent set of vectors. the vector x may be regarded as representing the point in the plane with coordinates XI and X2· Vectors have the geometrical properties of length and direction..
Vectors and Matrices: Basic Concepts 87 We can also convert the u's to unit length by setting Zj kl = In this construction. is a rectangular array of elements having m rows and k columns.2.l (GramSchmidt Process). . Result 2A. B = [: 3 2 . X2..Y2 + . we say that x and y are perpendicular. . . ..II. Xk.. (xiczj) Zj is the projection of Xk on Zj and of Xk on the linear span of Xl . R.3 [ J. projection ofx on Y = (x'y)y If YJ.3] n 1 . l.86 Chapter 2 Matrix Algebra and Random Vectors Definition 2A. The same construction holds for any number of entries m. These may be constructed sequentially by setting UI = XI Examples of matrices are A = [7 '] . the projection (or shadow) of a vector x on the linear span ofYI> Y2. y is 8 = 9(}" or 270°. Uk with the same linear span. Yr are mutually perpendicular.. . . (c) Mutually perpendicular vectors are linearly independent. .. generally denoted by a boldface uppercase letter such as A. An m X k matrix. Given linearly independent vectors Xl. _ Definition 2A. (b) If z is perpendicular to each vector XI. Yr is (X'YI) . each has length unity.12.. Result 2A. Xkl' L (XkZj)Zj is the projection j=1 • For example.. When the angle between two vectors x.. Ly If Yhas unit length so that Ly = 1.7 2 1 .Yr Y2Y2 YrYr (X'Y2) + (x'Yr) Matrices Definition 2A..1 y. to construct perpendicular vectors from The basis vectors and we take are mutually perpendicular. Since cos (8) = 0 only if 8 = 90° or 270°. . 8 E = [ed . Y2. The projection (or shadow) of a vector x on a vector y is projection ofx on y = 2. X2. . U2. X2.YI YIYI + .."" Xb then Z is perpendicular to so and XZUl their linear span. .. there exist mutually perpendicular vectors UI.Y = 3(4) + 1(0) + 0(0) . . (a) z is perpendicular to every vector if and only if z = O.ll.. I [i 0 1 0 .1(2) = 10 (x'y) Thus. the condition becomes x and Yare perpendicular if x' Y = 0 We write x . .. and so forth. Also.
Since matrices can be considered as vectors side by side. .. Then (mXk) cA = (mXk) Ac = (mXk) B = {b ij }.. i = 1. .. k. written A ..IS. m. Multiplication of a matrix by a scalar produces a new matrix whose elements are the elements of the original matrix. is the k X m matrix with elements aji. . the dimension of the matrix I is 3 X 3.2... i = 1. of arbitrary constants can be written 4] [3 4] [6 8] 6 5 2 0 A (mxk) = : :..2. . Let the matrices A and B both be of dimension rn X k with arbitrary elements aij and b ij . alkl .i = 1. In the preceding examples.j = 1. . denoted by A'. Consider the rn x k matrix A with arbitrary elements aij.17 (Scalar multiplication).m. (3X3) An rn X k matrix.. Let A (mXk) (mxk) B = {bi } I be two amI a m2 matrices of equal dimension.. (mXk) Vectors and Matrices: Basic Concepts 89 Definition 2A.. such that the arbitrary element of C is given by i = 1. Then the difference between A and B.cij or more compactly as (mxk) A = {aij}.. say. and C (of equal dimension) and scalars c and d. . .) (f) (cd)A = c(dA) + B C (g) (cA)' = cA' • .2. (b) Every corresponding element is the same.18 (Matrix subtraction).2. The transpose of the matrix A. That is. . the matrix elements will be real numbers or functions taking on values in the real numbers...2.rn. m. .1Womatrices A (mXk) = a. Let c be an arbitrary scalar and A .j} and B written A = B. the transpose of the sum is equal to the sum of the transposes. . . where b ij = Caij = ail'c.. Definition 2A. .j} given by C = A . The dimension (abbreviated dim) of an rn x k matrix is the ordered pair (rn. k. For example. . B. .B = A + (1)B Thatis. k. j = 1. the rn X k matrix A is denoted by A . a2k amk cA 6 2 5 Ac 4 0 12 10 B = {ai } and I Definition 2A..4. rn. the following hold: (a) (A = 1... "m is the row dimension and k is the column dimension. j = 1. A 1 X k matrix is referred to as a row vector.2. For all matrices A. As an example. it is natural to define multiplication by a scalar and the addition of two matrices with the same dimensions. . k..k. Thus. j (mXk) = {bij} are said to be equal.2.2. rn. . (mXk) i = 1. Definition 2A. = {a.ifaij = bij.. if (2X3) A_[2 1 4 3J 6 ' 7  then A' = (3X2) [2 7] 1 3 4 6 Result 2A. For example. The sum of the matrices A and B is an m X k matrix C. and this information can be conveyed by wr:iting I . where the index i refers to the row and the index j refers to the column..: : .19. . .2.. j = 1..j = 1. ...i = 1. (b) A + B = B + A (c) c(A + B) = cA + cB (d) (c + d)A = cA + dA (e) (A ] A + B)' = A' + B' (That is.j + (I)b ij = aij . A..B.= {aij}. 2. the transpose of the matrix A is obtained from A by interchanging the rows and columns.2.2. An rn X 1 matrix is referred to as a column vector.88 Chapter 2 Matrix Algebra and Random Vectors In our work. rn. . i = 1. .14.. written C = A + B. Definition 2A.•. j = 1. is an m x k matrix C = {c. . if c = 2.2. .Thatis.. . The dimension of a matrix is frequentIyindicated in parentheses below the letter representing the matrix..bij. 2.. Definition2A.16 (Matrix addition). k). . . .k. . k + B) + C = A + (B + C) Note that the addition of matrices is defined only for matrices of the same dimension.two matrices are equal if (a) Their dimensionality is the same. respectively.. each multiplied by the scalar.
23 (Matrix multiplication). k. .[ (2X3) 4 0 5 Then J and (3X2) B = [! 4 3 + C) = AB + AC (d) (B + C)A = BA + CA (c) A(B (e) (AB)' = B'A' More generally. For example. If an arbitrary matrix A has the same number of rows and columns. so the column dim of x.. Definition 2A. The matrices l. k. then the row dimension of AB equals the row dimension of A. Let 1=010. Let A be a k X k (square) matrix.11 C21 C12 ] C22 j=l :2: AXj = n A j=l 2: Xj n • .2. and the column dimension of AB equals the column dimension of B. consider the product of two vectors. the column dimension of A must equal the row dimension of B.is the square matrix (kXk) with ones on the main (NWSE) diagonal and zeros elsewhere.20... . For all matrices A. for any Xj such that AXj is defined.k Note that the product xy is undefined. y'x = x'y = XIYl + X2Y2 + '" + XnY.. such as n X 1. is unequal to the row dim of y. The 3 matrix is shown before this definition. denoted by 1 ...2. and E given after Definition 2A.. since x is a 4 X 1 matrix and y is a 4 X 1 matrix. . (a) c(AB) = (c A)B Note that for the product AB to be defined. 4.13 are square matrices. .. . The product AB of an m X n matrix A = {aij} and an n X k matrix B = {biJ is the m X k matrix C whose elements are Cij = (=1 :2: aiebej n i ='l. If x and y are vectors of the same dimension. Then A is said to be symmetric if A = A'. [! (2X3) 2J 5 4 3 (3X2) (f) = [11 20J 32 31 (2X2) = [c. and C (of dimensions such that the indicated products are defined) and a scalar c... Examples of symmetric matrices are where Cll C12 = = (3)(3) (3)(4) (4)(3) + (1)(6) + (2)(4) = 11 + (1)(2) + (2)(3) = 20 + (0)(6) + (5)(4) = 32 C21 C22 = = (4)(4) + (0)(2)+ (5)(3) = 31 As an additional example.m j = 1. In particular. . . The k X 0 2 3J and k identity matrix.S.90 Chapter 2 Matrix Algebra and Random Vectors Vectors and Matrices: Basic Concepts 91 Definition 2A. That is:A is symmetric if aij = aji. (3X3) 0 0 1 1 0 0] [ (4X4) B [: fe g d . 1.21.jth element XiYj' Result 2A. i = 1.2. If that is so. both of the products x'y and xy' are defined. X 3 identity Definition 2A. then A is called a square matrix. B.22. c a Then x' = [1 Definition 2A. and xy' is an n X n matrix with i. j = 1. I. let 3 (b) A(BC) = (AB)C 1 2 A .2" .
AB #0 BA. 1A 1 = row. In matrix algebra.24. Hence.1) matrix obtained by deleting the first row and but jth column of A.Also. that is. that if either A B = 0 . however.1) X (k . the matrix with zero for every element. not commutative. 7 6] 3 1 [1 _ 2 4 2 0 1 = [19 1 10 J 18 3 3 6 12 26 In general. with theith row in place of the first Examples of determinants (evaluated using Definition 2A. i=l L aijlAijl( l)i+i. is zero. Matrix multiplication is. Let 0 denote the zero matrix. in g. all al2 aB aZl aZZ aZ3 a31 a3Z a33 zz Z3 . In the algebra of real numbers. a = 0 or b = O. the product of two nonzero ces may be the zero matrix. then The determinant of any 3 X 3 matrix can be computed by summing the products of elements along the solid lines and subtracting the products along the dashed .24) are I! !! but = 1141(I)Z + 3161(1)3 = 1(4) + 3(6)(1) = 14 [ Also. The determinant of the square k A I. 2 IJ [4 IJ = [ 8 [ 3 4 0 1 12 _. TWo of these differences are as follows: Vectors and Matrices: Basic Concepts 93 Definition 2A. if the product of two numbers. denoted 1. ab. is not defined.\ 92 Chapter 2 Matrix Algebra and Random Vectors There are several important differences between the algebra of matrices and the algebra of real numbers. AB (mxk) 0 does not imply that A I = 0 or B = O.eneral. Several examples will illustrate the failure of the commutatIve law (for matriceJ). if k =1 i=l k L aliIAlil(l)1+i ifk> 1 I where Ali is the (k . (mXn)(nxk) (mXk) (mXn) A = (mXn) 0 or (nXk) B = (nXk) 0. is the scalar by 1 1A 1 = 1A 1 = all k X k matrix A = {aiJ. 1I 1 = 1.a11 /a a !(_1)2 + a12la21 aZ31(_1)3 + al3la21 a ZZ I(_1)4 a32 a33 a31 a33 an a32 l • • It is true. : = + + + 6(57) = 222 but = 3(39) . in general.1(3) IJ 7 2. however. (mxn)(nXk) 100 = 1 ! + + = 1(1) = 1 If I is the k X k identity matrix. That is. For example.
+ Xk3b where 3i is the ith column of A. Equivalently. Definition 2A. it is called singUlar. If a matrix fails to be nonsingular. Definition 2A.. This is no coincidence. A square matrix A (kXk) is nonsingular i f A x (kxk)(kXl) (kXl) 0 implies that x (kxl) (kXI) 0 . Note that the column rank of A is also 2. In fact. written as vectors.( that is. The column rank of a matrix Definition 2A. Note iliat Ax = X13I + X232 + . 1 5 n 0 1 1 1] 5 1 1 1 [23J [ 1 5 X ::. ::. Definition 2A. • Thus. Result 2A. a square matrix is nonsingular if its rank is equal to the number of rows (or columns) it has. row vectors). • We next want to state a result that describes some properties of the determinant. then B = AI.26.T. Then there is a unique k X k matrix B such that AB = BA = I where I is the k X k identity matrix. but in general. as the following result indicates. This procedure is not valid for matrices of higher dimension.2S. Let A be a nonsingular square matrix of dimension k X k. if BA = I or AB = I.S. [2 3J = [1 0J Result 2A. and both products must equal I. . The row rank of a matrix is the maximum number of linearly independent rows. we must first introduce some notions related to matrix inverses. The row rank and the column rank of a matrix are equal.94 Chapter 2 Matrix Algebra and Random Vectors Vectors and Matrices: Basic Concepts 95 lines in the following diagram.. so that the condition of nonsingularity is just the statement that the columns of A are linearly independent. [23J 1 5 has AI = [ i ::.24 can be employed to evaluate these determinants.2T. A = since For example. let the matrix is the rank of its set of columns. Result 2A.6. = [ ::. The B such that AB = BA = I is called the inverse of A and is denoted by AI. However. The rows of A. the rank of a matrix is either the row rank or the column rank. were shown to be linearly dependent after Definition 2A.::. For example. considered as vectors .6. consIdered as vectors. since (3) The inverse of any 2 2 matrix . is given by (b) The inverse of any 3 X 3 matrix but columns 1 and 2 are linearly independent.
AA' = I.All = 0 are called the eigenvalues (or characteristic roots) of a matrix A. _ Definition 2A. so the columns are also mutually perpendicular and have unit lengths. . IA II AI I = 1.soAA' = A'A = AA. Result 2A. Let A and B be k X k matrices and c be a scalar. where c is a scalar.. that is. ith entry [lA. Then the following hold: (a) (AI). . _ An example of an orthogonal matrix is (b) IAI "# (c) There exists a matrix AI such that AA. it is clear that IA I "# 0 if the inverse is to exist.ll.=1 written tr (A). KI has j.13. For an orthogonal matrix.I = AlA = o. Let A = {a.or Result 2A. (a) IAI = lA' I (b)· If each element of a row (column) of A is zero. that is. then I A I = 0 (c) If any two rows (columns) of A are identical. Then the scalars AI. Ak satisfying the polynomial equation I A . considered as vectors. For example.AI I = 0 (as a function of A) is called the characteristic equation. For a square matrix A of dimension k X k. Some of these proofs are rather complex and beyond the scope of this book.29. and let the indicated inverses exist. The equation IA . Result 2A. where A.30.. are mutually perpendicular and have unit lengths. The trace of the matrix A. = (AT I (b) (ABt l = B1AI The determinant has the following properties. We verify that AA = I = AA' = A'A. Definition 2A. Let A be a k X k square matrix and I be the k X k identity matrix. A square matrix A is said to be orthogonal if its rows. You are referred to [6} for proofs of parts of Results 2A. 96 Chapter 2 Matrix Algebra and Random Vectors is given by 12 22 a231 la /a a32 a33 a32 a33 Vectors and Matrices: Basic Concepts 97 Result 2A.j is the matrix obtained from A by deleting the ith row and jth column. tr (A) = 2: k aii' . (f) IcA I n 2 . the following are equivalent: (a) A x = (kXk)(kx1) (kXI) 0 implies x = (kXI) (kxl) 0 (A is nonsingular). j (c) In general.A = A'. A matrix A is orthogonal if and only if AI = A'. Az.12. Square matrices are best understood in terms of quantities called eigenvalues and eigenvectors.1 o.j} be a k X k square matrix.2B.9 and 2A. Result 2A. then I A I = 0 (d) If A is nonsingular.NIAIJ(lr .1 Z 2 I 1 2" I 2 1 2 1 I 2 A 2 1 Jlrl 2 2 2 I 1 2 2 A 0 1 0 0 0 0 1 0 I so A' = AI. al31 la a22 a23 12 al31 (a) tr(cA) = c tr(A) (b) tr(A ± B) = tr(A) ± tr(B) _AI = TAT 1 21 aZ31 la a3J a33 zl jail a31 a33 ll al3I_lall al31 aZI aZ3 la a2l l1 (c) tr(AB) = tr(BA) (d) tr(BIAB) = tr(A) (e) tr(AA') = anI Ia a121 la a31 a32 a31 a32 a121 a22 i=1 j=1 2: 2: afj k k In both (a) and (b). _ A = Definition 2A. (kXk) I . Let A and B be square matrices of the same dimension. let (e) IABI = IAIIBI = ck I A I. then IA I = 1/1 AI I. is the sum of the diagonal elements.II. AA' = A' A = I. that is. . Let A and B be k X k square matrices.9.  2 1 2" I 2 2 I 1 2 2 2 I 2 I '2 1 1 2 I 2 1 2" 22 I 2 1 1 1 Clearly. and A must be an orthogonal matrix.
Xl = Xl Xl or Xl = .30.A 2 = _A3 + 36..AI) That is. A2 = 9. 9. Xk] and A is a k X k symmetr ic matrix.. Xk is Q(x) = x'Ax.405A + 1458 = 0 is an eigenve ctor correspo nding to the eigenva lue 1.2X2 There are many solution s for Xl and X2' Setting X2 = 1 (arbitrar ily) gives Xl = 2. It is usual practice to determi ne an so that It has length unity. X2.\2 .31. that is. That is.A)(3  Then \1 A 3 AI A) = 0 From the first expressi on. and 18 are the eigenva lues ofA. we take e = x/YX'X as the elgenve ctor correspo nding to A. I An equivalent condition for A to be a solution of the eigenval ueeige nvector equation is IA . ' . Let 3.. by Result 2A. the columns of A .A 4 13 . and hence. ••. as asserted. The determin ed by solving the followmg equatIOns: G X2) = XI + X3] 2XlX2 +  with these eigenva lues can be Q(x) = [Xl X2 [! o 2 2 X3 = xi + 6XIX2  4XZX3 + symmet ric square matrix can be reconst ructured from its eigenva lues and The particul ar express ion reveals the relative importa nce of paIr accordm g to the relative size of the eigenva lue and the directio n of the elgenve ctor. The eigenva lues of A are 3 + 3X2 = X2 A Then the equation . From the second Xl = 3Xj Xl + implies that Xl = 0 and x2 3X2 expressi on.32.[13 4 2] = 13 2 2 10 2 2 lA . Al = 1 and A2 and 1. and A3 = 18.9(b) ..AI) + .x2.98 Chapter 2 Matrix Algebra and Random Vectors .All = 4 13 . eigenve ctor correspo nding to the eigenva lue 3.AI I = O. the eigenve ctor for A = 1 is [2/v'S . we have shown that the eigenvalues of Note that a quadrat icform can be written as Q(x) = Q(x) = [Xl 2: 2: a/jx/xj' For example. 1/v'S]. ifAx = Ax.AI are linearly depende nt so. * et = is an. This follows because the stateme nt that A x = Ax for some A and x 0 implies that * Definition2A. For example . . Definition 2A.AI I = 0. 9.. . . Vectors and Matrices: Basic Concept s 99 = (1 . = 3xz = 1 (arbitrar ily).AI)x = Xl colj(A . implies that there are two roots. A quadraticform Q(x) in thekvar iables Xl. and hence. 0= (A . If x is a nonzero vector ( x 0) such that (kXI) (kXI) (kXl) Ax = Ax then x is said to be an eigenvector (characteristic vector) of the matrix A associat ed with the eigenvalue A. 2 10 . Following Definiti on 2A. /=1 j=l k k IA A= are Al = 1 and A2 = 3.A has three roots: Al = 9." " where x' = [Xl... Let A be a square matrix of dimension k X k and let A be an eigenvalue of A. + Xk colk(A .
12)(')' .5A + 6. the eigenvalues are = A[l . V. and consequently.2) where U has m orthogonal eigenvectors of AA' as its columns.All = A2 . The nonzero eigenvalues are the same as those of AA'. V has k orthogonal eigenvectors of A' A as its columns. SingularValue Decomposition. > 0 = (for m> k). let A = [2.16 .16 (A .). .3 . Vr = [VI' V2.Then Vi = natively. . . and Ar is an r X r diagonal matrix with diagonal entries Ai' . Consequently. U2.1 S.. Let A be an m X k matrix of real numbers.=1 2: A.8J .6 . e. Result 2A. The positive constants Ai are called the singular values of A. Specifically."" V r ].2 + [1.15..6 1.4J . 2. Ur]..')'1 I = _..3)(A . 2/VS] and ez = [2/VS.4 1.22')'2 .4 You may verify Utat the eigenvalues ')' = A2 of AA' satisfy the equation ')'2 . The matrix expansion for the singularvalue decomposition written in terms of the full dimensional matrices U.. let so A has eigenvalues Al = 3 and A2 = 2. the Vi are the eigenvectors of A' A with the same nonzero eigenvalues At. Then there exist an m X m orthogonal matrix U and a k X k orthogonal matrix V such that A = UAV' where Ute m X k matrix A has (i.vj = r [ 10 UrArV. . Un and r orthogonal k X Lunit vectors VI..10).=1 2: Aieiej k • with At.. ')'2 = = 10.u.2J 2.100 Chapter 2 Matrix Algebra and Random Vectors Result 2A.14.8 = A (mXm)(mxk)(kxk) U A V' lA ... A'Av2 = [ where U r = [UI> U2.. respectively. The corresponding eigenvectors are et = [1/VS.12)(')' . For example. The Spectral Decomposition. and the eigenvalues are ')'1 = AI = 12. + 120 = (y. Also. Then A can be expressed in terms of its k eigenvalueeigenvector pairs (Ai. rather than a square.. • The singularvalue decomposition can also be expressed as a matrix expansion that depends on the rank r of A. If A is a rectangular matrix. .4 2. A computer calculation gives the eigenvectors I VI 2 1 ] v2 ' = [2 = [1 v'6 v'6 v'6' VS 1 0 ] . A is (mXk) . there exist r positive constants AI. .2 ..10). . mine m. Uten the vectors in the expansion of A are the eigenvectors of the square matrices AA' and A' A. 10'_1 Th e J The ideas that lead to the spectral decomposition can be extended to provide a decomposition for a rectangular. and ')'3 = = O. .22. so IA' A . l/VS]. An r orthogonal m X 1 unit vectors U1. V" such that A = UI = Vi V2 an U2 Vi V2' respectively. Let A be a k x k symmetric matrix. .. k) and the other entries are zero.) as A = For example. so AA'Ui = A7ui 101 . . A2.2 Then Vectors and Matrices: Basic Concepts Here AA' has eigenvalueeigenvector pairs (At.). and A is specified in Result 2A. Then A= [ A 13 3 11 1J = [ 2.. Ui). i) entry Ai 0 for i = 1. Vz.120')' = ')'( ')' . matrix. . and V3 VS 1 = [ v30 Eigenvectors VI and V2 can be verified by checking: 10 A'Avl = . .4 AA' [: : :J[: 12 1 aJnd d ')': = = J [1: I:J eigenvectors are = [.8 .
(AB)' = B'A' (mXk) 1 (kxt) .)AB = BI(AIA)B = BIB = I.B)') = tr[V'(A .3] = [2 2 DJ and [11. A = [ 3 1 1J 1) 1 3. 1'. graph [5 . 3] andy' = [1.B)VV'(A .6. Let = V'BV. It minimizes tr[(A .B)VV'(A . I'. (ii) the angle between x and y. . and (AAi)' = (AI). is . having the same dimension but lower rank. (c) Smce x = 3 and y = 1. ' . due to Eckart and Young ([2]).I To establish this result.B)'V) = tr[(A where C .I )' (AB)' = B' A' For general A and B . and (iii) the projection of y on x. The other Cu = O.1A. or error of approximation.4.B)(A . (a) 5A (b) BA (c) A'B' (d) C'B (e) Is AB defined? Result 2A.102 Chapter 2 Matrix Algebra and Random Vectors Taking Al Ais Exercises 103 = VU and A2 = v1O. 1].3. s = J U J B = and is the ranks least squares approximation to A.11J=[2. When AI and B. Verify the following properties of the transpose when A (a) (b) (c) (d) = i=1 2: AiDi v. Given the matrices v'2 The equality may be checked by carrying out the operations on the righthand side. 2.OJ.3.3 . Let A be an m X k matrix of real numbers with m k and singular value decomposition VAV'. the minimum occurs when Cij = Ofor i '* j and cns = Ai for i=1 the s largest singular values. Part b follows from (B.1 . . 2: AT.5. (a) (A. Letx' = [5. 1. (a) Graph the two vectors.)l = (AI).=s+1 k (A')' = A (C.exist.B)'j Hint: Part a can be proved br noting that AAI = I.31.B)(A . we use vV' squares as tr[(A . The minimum value.16. prove each of the following. = Im and VV' = Ik • to write the sum of 2. .2.B)(A .1.A'. Lets < k = rank (A). That is.C)') = i=1 j=1 2: 2: (Aij  m k Cij? = i=1 2: (Ai  m + 2:2: CTj i"j 2. we find that the singularvalue decomposition of Exercises 2.)l = (C. (b) (AB)I = BIA. 2.B)') over all m X k matrices B having rank no greater than s.2. Then B 2. The singularvalue decomposition is closely connected to a result concerning the approximation of a rectangular matrix by a lowerdimensional matrix.C)(A .3. the sum of squared differences i=1 j=1 2: 2: (aij  m k bijf = tr[(A . v'6 + v'6 _1 2 J 1 VS VS 1 DJ (b) (i) length of x. Check that Q = is an orthogonal matrix.B)'] perform the indicated multiplications. Clearly. Cii)2 [ 5 12J IT IT 12 5 IT IT = tr[UV'(A . UBV' = As or B = 2: Ai Di vi· (a) Is A symmetric? (b) Show that A is positive definite. If a m X k matrix A is approximated by B.
Hint: Set y = A x so that y'y = x' A' A x.8. A quadratic form x' A x is said to be positive definite if the matrix A is positive definite. (d) Find the eigenvaiues and eigenvectors of AI. since III = IP'PI = IP'IIPI.) Show Properties (1)(4) of the squareroot matrix in (222). Show that Q' (pXp)(pXp)(pxp) = II I. i *. Apply Exercise 2.I/2 = A. since Q'Q = I.11. we can write 0 = IQ' 11 A . x2) whose "distances" from the origin are given by c = 4xt 2 2. (b) Write the spectral decomposition of A. (See Result 2AIS) Using the matrix (b) Compute the eigenvalues and eigenvectors of AI. (c) Obtain the singularvalue decomposition of A. (c) Write the spectral decomposition of AI. + O. Multiply on the left by e' so that e' Ae = Ae' e.19..9. Given the matrix A = G for c = 1 and for c = 4. Show that I Q I = + 1 or 1 if Q is a p X P orthogonal matrix. Verify the relationships V I/ 2pV I!2 = I and p = (Vlf2rII(VI/2rl.eie.7. (pXp) Hint: Let A be an eigenvalue of A.002001 J (a) Calculate A' A and obtain its eigenvalues and eigenvectors.6. (c) The eigenvalues and eigenvectors of II. (See Result 2A1S) Using the matrix These matrices are identical except for a small difference in the (2.1 S. small changesperhaps caused by roundingcan give substantially different inverses.11. 2.3.wherePP' = P'P = I. where Ae = Ae.j.1I(e). from Result 2A. Check that the nonzero eigenvalues are the same as those in part a. 2.11(e). Let A be as in Exercise 2. What will happen as c2 increases? 2.001 4. Let X have covariance matrix A Q and A have the same eigenvalues if Q is orthogonal. Moreover. 2.001 4. I A I = a" A" + 0 + . . By Exercise 2.11. Show that the determinant of the p X P diagonal matrix A = {aij} with aij = 0. 2. 8 6 9 rr. (The A. 2.1f2A1/ 2 = I.All. Check that the nonzero eigenvalues are the same as those in part a. (b) Calculate AA' and obtain its eigenvalues and eigenvectors.AlII Q I = IQ' AQ .001 4.14. Then A' A is a symmetric p that A' A is necessarily nonnegative definite. 2. Repeat for the submatrix All obtained by deleting the first row and first column of A.=1 2.104 Chapter 2 Matrix Algebra and Random Vectors 2.. 2.2XIX2 positive definite? 2.13 and Result 2A. lA I = IPAP' I = IP IIAP' I = IP 11 A liP' I = I A 1111.13. 2.001J 4.. Consequently. (c) Find AI. A = PAP' with P'P = I.. a p p.23. 2. Then 0 = 1 A . . using the matrix A in Exercise 2. = . Show that the determinant of a square symmetric p x p matrix A can be expressed as the product of its eigenvalues AI. Determine the major and minor axes of the ellipses of constant distances and their associated lengths. Prove that every eigenvalue of a k x k positive definite matrix A is positive. A Ai.'s and the e.=1 VA.24.AI I. = PA J/ 2P'. 2. Exercises 105 Let A be as given in Exercise 2. IQ" Q' I = IQ 12.I. where I is the p X . deter. IQ 12 use Exercise 2. (c) Obtain the decomposition of A.10. Determine the spectral decomposition (216) of A. Consider the matrices A = [:. . Show that AI ='= (3)B. Consider the sets of points (XI. Consider an arbitrary n X p matrix A. 2. IA I = Hint: From (216) and (220). 1A 1 = a" a22 . (a) Find AI. Determine the squareroot matrix AI/2.. p is the p X P population correlatIOn matnx [EquatIOn (234)].17. A 8J = [. Sketch the ellipses of constant distances and comment on their pOSitions.2) position.12.002 and 4 B = [ 4. (b) Calculate A' A and obtain its eigenvalues and eigenvectors. the columns of A (and B) are nearly linearly dependent. Hint: I QQ' I = II I.8.8. Also. Thus.'s are ' I the eigenvalues and associated normalized eigenvectors of the matrix A.18. and show that A I/ 2A. and compare it with that of A from Exercise 2. Hint: By Definition 2A24.22. X P matrix. Also.11. thus. is given by the product of the diagonal elements.P matrix (232)]. and V /2 is the population standard deviation matrix [Equation (235)]. Hint: Consider the definition of an eigenvalue. Let AI/2 (mXm) 2 +  2v'2XIX2 2 find the eigenvalues Al and A2 and the associated nonnalized eigenvectors el and e2. 2.. (a) Determine the eigenvalues and eigenvectors of A. From Result 2A. . Show Find (a) II (b) The eigenvalues and eigenvectors of I.16.20. 2. Ap. 2. Is the quadratic form 3xt + . that is.21. mine AI/2. Now (a) Calculate AA' and obtain its eigenvalues and eigenvectors.
and X 3. + C2p(Xp .ILm) has expected value and consider the linear combinations AX(!) and BX(2). X (2) (j) COY (AX(J). BX(2) 2 . .30. Let X have covariance matrix = [Xl> X 2. IL4.2. Find (a) E(X(J) (b) E(AX(l) (c) Cov(X(l) (d) COY (AX(!) (e) E(X(2) (f) E(BX(2) (g) COY (X(2) (h) Cov (BX(2) (i) COY (X(l).: = [ILl> IL2..ILe) (Xm .E(Z2) = C21(XI . This verifies the offdiagonal elements CIxC' in (245) or diagonal elements if Cl = C2' Hint: By (243).31. X 2.IL2) + .4X2 if XI and X 2 are independent random variables. 2.ILl) + CdX2 . Let A = (1 2J and B = C=n + Clp(Xp . but with A and B replaced by Verify the last step by the definition of matrix multiplication.ILp). .ILp»J. + C2p(Xp ..26.E(Z2»J = E[(cll(XI .X3 (f) 3XI . I. + C2p(Xp .] (a) Findpl3' (b) Find the correlation between XI and + 2... C2 pJ. A = [1 1 J and B =  ] . Repeat Exercise 2. X 3..sJ· Partition X into 107 2.29. IL3.nd X'" (a) Determine p V 1/2. (b) Multiply your matrices to check the relation VI/2pVI/2 = 2. The product (Cu(XI . X5J with mean vector I = 25 2 [ 4 2 4] 4 1 1 9 where X = X (2) .' . X 3.Zz) = E[(ZI . xl" [. 1J and variancecovariance matrix 3 0 Ix = Partition X as o f 2 1 1 2 0 where Cl = [CJl..ILl) + . .x = [4.106 Chapter 2 Matrix Algebra and Random Vectors Exercises 2.ILp) and Z2 ..28.25.27. Show that Let I be the covariance matrix of X with general element (Tik' Partition I into the covariance matrices of X(l) and X(2) and the covariance matrix of an element of X(1) and an element of X (2)..E(ZI) = Cl1(XI . Consider the arbitrary random vector X' . X 4 J with mean vector Jl..IL2) + . You are given the random vector X' = [XI' X 2.ZI ..ILl) + '" + CIP(Xp .. 2.30.2X2 (b) XI + 3X2 (c) XI + X 2 + X3 (e) XI + 2X2 . . Use I as given in Exercise 2.SOCov(ZI.E(Zd)(Z2 .. (a) XI .ILl) + C22(X2 . cl2.. Cl PJ and ci = [C2l> C22. Derive expressions for the mean and variances of the following linear combinations in terms of the means and covariances of the random variables XI. X 4.ILp»(C21(XI .IL p»(C21(XI .ILp» = cu(Xe (=1 m=1 ILe») C2m(Xm  ILm») = 2: 2: p p CJ(C2 m(Xe .25. The same steps hold for all elements.ILd + C22(X2 . Jl.3.IL2) + .ILl) + '" + Clp(Xp .
E(Xij + Yij ) = E(X ij ) + E(Yi) by a umvanate property of expectation. Hint: X. Verify the CauchySchwan inequality (b'd)2 s (b'b)(d'd). You are given the random vector X' = [Xl.6.3.37. j)th element of E(X) + E(Y).4. j)th entry bCkCkj = dCj' So A(BC) has (i. Fmd the maximum and minimum values of the quadratic form + + all points x' = [x I . 2.2. + has Xij + Yij as its element.3S.32.40. Using the b' = [4.3] and d' = [1. . X 3 . fmd the maximum value of x' A x for x' x = 1. (c) Which pairs of linear combinations have zero covariances? . C aieXCkbkj. j)th element and consider the linear combinations AX(I) and BX(2).'x = [2. X 2.33.32. X2] such that x' x = 1.108 Chapter 2 Matrix Algebra and Random Vectors 2. X 2 . 1].j)th entry by the additive property of expectation.j)th entry t aicbckCkj Hint: BC has (e. and this last quantity is the (i. 1.x = [3.0] and d' = [1. 3 1 2 I 1 2 0 0 for 4 0 2 1 0 2.38. X3] if A = 2. 1 (a) Find E (AX). X(2) COy (AX(I). Show that Partition X as 2 2 10 s Let A =D and B= G (rXs)(sXt)(tXV) t A B C has (i.. Find the maximum and minimum values of the ratio x' Ax/x'x for any nonzero vectors x' = [Xl> X2. Next (see Exercise 2.36. With A as given in Exercise 2.AXB has (i. .0] and variancecovariance matrix 4 1 Exercises 109 2.3. 2. but with X partitioned as Ix = [30 o 1 1 0 3 0 0 0 3 o 0 0 0 Let and with A and B replaced by A = 3 11 0J and B = [11 12J A = [1 1 1 1 2 2. 1. j)th element of AE(X)B. (b) Find Cov (AX).. 2.0] and variancecovariance matrix 2. Repeat Exercise 2.41. Now. 6XIX2 Ix = 1 1.39. Consider the vectorsb' = [2. Verify (224): E(X + Y) = E(X) + E(Y) and E(AXB) = AE(X)B. You are given the random vector X' = [XI.34. BX(2) e aiCXCkbkj) = aj{E(XCk)bkj k e k which is the (i. and k (g) (h) (i) (j) COy (X(2) Cov (BX(2) COy (X(l). the variances and covariances ofAX. the mean of AX. X 4 ] with mean vector IJ.39).4. Xs] with mean vector IJ. 2. Find (a) E(X(l) (b) E(AX(I) (c) Cov(X(1) (d) COV(AX(l) (e) E(X(2) (f) E(BX(2) 2.1]' verify the extended CauchySchwarz inequality (b'd) s (b'Bb)(d'B1d) if I 2: 6 1 I 2: 1 1 0 0 1 B= [ 2 2 2J 5 2.
1969. if n observations have been obtained. NJ: Prentice Hall. P. As in Chapter 1. and 1. 1997. K.4. The connection between K. statistical inferences are based on a solid foundation. Englewood Cliffs. young.: Xnl Xn2 ••• x np ..) Philadelphia: Soc for Industrial & Applied Math (SIAM). R. Furthermore.. 2.. CA: Wadsworth. 2005. it is this structure of the random sample that justifies a particular choice of distance and dictates the geometry for the ndimensional representation of the data. Sn. when data can be treated as a random sample. Applied Linear Algebra (3rd ed. Daniel. .). Repeat Exercise 2. Halmos.2 The Geometry of the Sample A single multivariate observation is the collection of measurements on p different variables taken on the same item or trial. and G. BeIlman. In later sections we use matrix algebra to provide concise expressions for the matrix products and sums that allow us to calculate x and Sn directly from the data matrix X. 1993. SAMPLE GEOMETRY AND RANDOM SAMPLING 3. 5. 3. 1988. called generalized variance.41.. 1 (1936). Introduction to Analysis (2nd ed. A. Noble. "The Approximation of One Matrix by Another of Lower Rank. and G. 6. we introduce a single number.211218. Johnson. Statistics: Principles and Methods (5th ed. 4. R. Simply stated. F. C. W. Eckart. and R.0 Chapter 2 Matrix Algebra and Random Vectors 2. we do so in Section 3. FiniteDimensional Vector Spaces. A." Psychometrika. Bhattacharyya.3 we introduce the assumption that the observations constitute a random sample. Ultimately. R.42. we can now delve deeper into the geometrical interpretations of the descriptive statistics K. Sn. Many of our explanations use the representation of the columns of X as p vectors in n dimensions. using the notion of matrix products. Belmont. Introduction to Matrices with Applications in Statistics. the entire data set can be placed in an n X p array (matrix): X (nxp) Xl1 = "' r XZl : X12 X22 XIPj X2p ".2. New York: SpringerVeriag. This generalization of variance is an integral part of the comparison of multivariate means. Graybill. 3. random sampling implies that (1) measurements taken on different items (or trials) are unrelated to one another and (2) the joint distribution of all p variables remains the same for all items. to describe variability. and the means and covariances for linear combinations of variables is also clearly delineated. B. but with References 1. In Section 3.) New York: John Wiley. Returning to geometric interpretations in Section 3.1 Introduction With the vector concepts introduced in the previous chapter.
and locate xon the resulting diagram. as in the pdimensional scatter plot. this scaUer plot representation cannot actually be graphed. the rows of X represent n points in pdimensional space.2 (Data as p vectors in n dimensions) Plot the following data as p = 2 vectors in n = 3 space: . '" XI ] xZp : = [YI i Yz i " (32) Example 3.5).x. . We can write Xll The Geometry of the Sample 2 5 4 113 . The alternative geometrical representation is constructed by considering the data as p vectors in ndimensional space.x @x 3 x 2 • 3 2 . As we have seen."" xnd is determined by the ntuple of all measurements on the ith variable. each of which has p components. In this geometrical representation. If the points are regarded as solid spheres.1 (Computing the mean vector) Compute the mean vector x from the xnp data matrix. In general. is the center of balance. pdimensional scatter plot. We shall be manipulating these quantities shortly using the algebra of vectors discussed in Chapter 2. . ..1 A plot of the data matrix X as n = 3 points in p = 2 space. xnd are the n measurements on the first variable. . A single numerical measure of variability is provided by the determinant of the sample variancecovariance matrix." The sample then consists of n measurements. the sample mean vector X. Moreover. . and it is quantified by the sample variancecovariance matrix Sn. When p is greate: 3. we depict Yb"" YP as vectors rather than points. locations and variability of the points.3] andx3 = [3. the ith point yi = [Xli. Xl> has coordinates xi = [4. The row vector xj. xnp nth (multivariate) observation Figure 3. Plot the n = 3 data points in p = 2 space. representing the jth observation.. Since the entire set of measurements is often one particular realization of what might have been observed. The first point. . Variability occurs in more than one direction. Yet the ?f the data as n points in p dimensions provides insights that are not readIly avallable from algebraic expressions. we say that the data are a sample of size n from a "population. Example 3.1). contains the coordinates of point. . For the. XZI. the concepts illustrated for p = 2 or p = 3 remain valid for the other cases.111 Chapter 3 Sample Geometry and Random Sampling Each row of X represents a multivariate observation. Let x (nxp) = : : XnI Xn 2 P ". X2i. the remaining two points are xi = [1. . 2 1 1 X12 X22 XI P] X2p 1st '(multivariate) observation 2 2 3 4 5 X (nXp) = [ : Xnl · · ·  _ X2 . Then the coordinates of the first point yi = [Xll. Here we take the elements of the columns of the data matrix to be the coordinates of the vectors. The scatter plot of n points in pdlmensIOnal space provIdes mformatlOn on the . Similarly. Finally. the data can be ploUed in two different ways. Figure 3.1 shows that x is the balance point (center of gravity) of the scatter . . given by (18).
so the vector (l/Vii)I has unit length in the equalangle direction. i = 1.2..}/n = yjI/n corresponds to the multiple of 1 required to give the projection of Yi onto the line determined by 1.5]. we have the decomposition I 15 ]. or mean corrected. Decomposition of the Yi vectors into mean components and deviation from the mean components is shown in Figure 3. By selecting an appropriate threedimensional perspectivethat is. for the data given in Example 3.3.. di = Yi . Since the specific choice of axes is not relevant to the geometry. by (28)." n . however.]. it is possible. Figure 3. It is possible to give a geometrical interpretation of the process of finding a sample mean.3. to illustrate certain algebraic statistical concepts in terms of only two or three vectors of any dimension n. and volume. These vectors are shown in Figure 3. and 3. The projection of Yi on the unit vector (1/ vn)I is. angle. Xl = (4 . Unfortunately.XiI. with the right choice of axes.XiI = Xli . can span no more than a threedimensional space. i = 1. and consequently.2 A plot of the data matrix X as p = 2 vectors in n = 3space. 3 Many of the algebraic expressions we shall encounter in multivariate analysis can be related to the geometrical notions of length.1 xI+X2'+"'+xnl I . x2i. It turns out. label the coordinate axes 1. (To simplify the notation.. .X· [ ':_' Xi (34) Xni  Hereyi = [4. the sample mean Xi = (Xli + x2i + . = (1.I 14 Chapter 3 Sample Geometry and Random Sampling The Geometry of the Sample Further.. for each Yi. Thus. The deviation. the subscript n will be dropped when the dimension of the vector 1" is clear from the context.XiI. so That is. . the ndimensional representation of the data matrix X may not seem like a particularly useful device for n > 3. even if n dimensional. vector is 5 1 6 Figure 3. .) The vector 1 forms equal angles with each of the n coordinate axes.XiI.3 for p = 3 and n = 3.3] andyz = [1. we shall always . This is important because geometrical representations ordinarily facilitate understanding and lead to further insights. Example 3. a portion of the ndimensional space containing the three vectors of interesta view is obtained that preserves both lengths and angles.2. Consider the vector Y. We start by defining the n X 1 vector 1.1. = [Xli. that geometrical relationships and the associated statistical concepts depicted for any three vectors remain valid regardless of their dimension. just as two vectors with any number of components must lie in a plane. _ The elements of d i are the deviations of the measurements on the ith variable from their sample mean.2.2: 1 1 ) 1."" xn.3 The decomposition of Yi into a mean component XiI and a deviation component d i = Yi . where XiI is perpendicular to Yi . + xn.2. we are limited to visualizing objects in three dimensions..1].I Yi'(Vii Vii .Xi Here. . 1..Xi] X2.1 (33) + 3)/3 = 2 and X2 = (1 + 3 + 5)/3 = 3.3 (Decomposing a vector into its mean and deviation components) Let us carry out the decomposition of Yi into xjI and d i = Yi . This follows because three vectors.
4 (Calculating Sn and R from deviation vectors) Given the deviation vectors in Example 3. Now consider the squared lengths of the deviation vectors. For any two deviation vectors d i and db did k = j=l 2: (Xji  n Xi)(Xjk  Xk) (36) A similar result holds for x2 1 and d 2 = Y2  x21. we get For the time being. the sample correlation will be approximately zero.xiI.4 The deviation vectors d i from Figure 3. the sample correlation will be close to 1. the length is proportional to the standard deviation. Longer vectors represent more variability than shorter vectors. we see that the squared length is proportional to the variance of the measurements on the ith variable.e 3.xII are perpendicular. Example 3. We have translated the deviation vectors to the origin without changing their lengths or orientations. we obtain I \ and = did i = j=l ± (Xji  xi (35) (Length of deviation vector)2 = sum of squared deviations \ We note that xII and d l = Yl .3.4.using (35) and (36). Equivalently. _ _______ _ _________________ Figure 3. From Example 3.116 Chapter 3 Sample Geometry and Random Sampling The Geometry of the Sample 1I 7 Consequently. if the two deviation vectors have nearly the same orientation. because From (13).3 is given in Figure 3. From (26). Thus. A plot of the deviation vectors of Figur.3. the sample correlation will be close to 1. The decomposition is Let fJ ik denote the angle formed by the vectors d i and d k . Using (25) and (34). If the two vectors are oriented in nearly opposite directions. we obtain so that [see (15)] (37) The cosine of the angle is the sample correlation coefficient. we are interested in the deviation (or residual) vectors d. = Yi . . let us compute the sample variancecovariance matrix Sn and sample correlation matrix R using the geometrical concepts just introduced.3. 3 or. If the two vectors are nearly perpendicular.
be predicted exactly. 2. and we have the random matrix or SII = ¥. that the data have not yet been observed. Consequently. k )th entry in the data matrix be the random variable X jk • Each set of measurements Xj on p variables is a random vector. The sample correlation rik is the cosine of the angle between d i and d k • 1.189 1 A random sample can now be defined. The information comprising Sn is obtained from the deviation vectors d i = Yi . The measurements from different trials must.Xi)" The square of the length ofdi is nSii. and (n . Xl. X np . 3.Xi. 1 The square of the length and the inner product are (n .x.Xni .I)s. X 2 ..3 Random Samples and the Expected Values of the Sample Mean and Covariance Matrix In order to study the sampling variability of statistics such as xand Sn with the ultimate aim of making inferences. is related to the length of the projection of Yi on 1. and projection have provided us with a geometrical interpretation of the sample.X2i . but we intend to collect n sets of measurements on p variables.. their values cannot. Xl> X 2.."" xp). Xjp]. we treat them as random variables. . then. . are shown in Figure 3. The vector XiI has length Vii 1 ith sample mean. angle.1 is used in the definitions of the sample variance and covariance. where f(xj) = !(Xj!. (38) Xn! or S12 = Consequently.. . and R = [1 .XiI = [Xli .. however. ••.5. the vector 1 is the vector XiI. in (38) represent independent observations from a common joint distribution with density function f(x) = f(xl> X2. . Xi. Xj2"'" Xjp) is the density function for the jth row vector. The projection of a column Yi of the data matrix 5 4 Figure 3.2P = . respectively. when the divisor n . Suppose. If the row vectors Xl.. Now. Xn form a random sample if their joint density function is given by the product f(Xl)!(X2)'" f(xn). and the (inner) product between d i and d k is nSik. we expect this to be the case. then Xl.l)s.5 The deviation vectors d 1 andd2· These vectors. Two points connected with the definition of random sample merit special attention: 1. we need to make assumptions about the variables whose oDserved values constitute the data set X.189J .. Also. . X j2 . such as Xi = [Xjl . Mathematically."". in general. In this context. Indeed. We summarize as follows: Geometrical Interpretation of the Sample X onto the equal angular Xi I.v Random Samples and the Expected Values of the Sample Mean and Covariance Matrix 1.k. be independent. let the (j. translated to the origin..1 3. Xn are said to form a random sample from f(x). Before the measurements are made. Therefore. The measurements of the p variables in a single trial. . . Xll or S22 = Finally. will usually be correlated. . .19 I 18 Chapter 3 Sample Geometry and Random Sampling 3 The concepts of length. X (nXp) = r X 21 : XIPJ x.
except for a slight but fortunate downturn in paper and paperboard waste in 2003.p. and respondents were asked to give information on the regions visited. X2k. as with sets of p stock prices or p economic indicators.3 2.1 1 Ee: (1 1 1 (310) so [n/(n . +.Xn = 1) 17.. The following eJglmples illustrate these remarks.. Euclidean distance appears appropriate if the components of a vector are independent and have the same vari= [Xlk' X 2k>'.. Here one would expect the sample observations to conform closely to the criterion for a random sample from the population of users or potential users. !(Yk) = !(Xlk.X2k> .p. as in the "statistical" distances or quadratic forms introduced in Chapters 1 and 2. Example 3.1 • As we have argued heuristically in Chapter 1. We would then be led to consider a distance function in which the coordinates were weighted unequally. The total wilQerness area was divided into subregions. Suppose we consider the location ofthe kthcolumn of X.l: n 1 Sn) = l: n . The method followed was to select persons randomly (perhaps using a random· number table) from all those who entered the wilderness area during a particular week. if one of the samplers had waited at a campsite far in the interior of the area and interviewed only canoeists who reached that spot.4 Random Samples and the Expected Values of the Sample Mean and Covariance Matrix 121 If the n components are not independent or the marginal distributions are not identical.1) ]Sn is an unbiased estimator of l:. Next.) ) n = 2 (1 n n (X t . Table 3. Should these measurements on X t = [Xl> X 2 ] be treated as a random sample of size n = 7? No! In fact. Now.)' n j=l [=1 1 n n . E(S) n Thus. . X = (Xl + X 2 + ...1 Solid Waste Year 1960 29.9 1980 55.)(X . Proof.. in millions of tons."" Xnk) = !k(Xlk)!k(X2k)'" !k(Xnk) and. The repeated use of the properties of expectation in (224) for two vectors gives 1970 44. 1 1 1 1 Yl (X .·' X nk ] ances. =p.6 (A nonrandom sample) Because of concerns with future solidwaste disposal. Example 3. E(X) = p.p.120 Chapter 3 Sample Geometry and Random Sampling 2. All persons were likely to be in the sample.p. Xn be a random sample from a joint distribution that has mean vector p. .7 24..7 1995 81.p.7 2003 83. an ongoing study concerns the gross weight of municipal solid waste generated per year in the United States (Environmental Protection Agency). Violations of the tentative assumption of independence can have a serious impact on the quality of statistical inferences. we can see how X and Sn fare as point estimators of the corresponding population mean vector p.Xnk)' When the measurements X lk ...8 1990 72. .1... Result 3. Estimated amounts attributed to Xl = paper and paperboard waste and X2 = plastic waste.. lengths of stay in the wilderness area for dif• ferent canoeists from this group would all tend to be large. 1 Cov(X) =l: (popUlation mean vector) population variancecovariance matrix) ( divided by sample size (39) n For the covariance matrix Sn. successive measurements would not be independent.1. both variables are increasing over time.Xl + .•• .. each coordinate Xjk contributes equally to the location through the identical marginal distributions !k( Xj k)' + + . Certain conclusions can be reached concerning the sampling distributions of X and Sn without making further assumptions regarding the form of the underlying joint distribution of the variables. = n l : = l: . a naturalresource manager took a survey of users. and covariance matrix l:. + Xn)/n.. + ..l: = (l/n)l:. so the more popular entrances were represented by larger proportions of canoeists. + 1 = .p. Then X is an unbiased estimator of p..) ) ' t=l (Xj .7 18. consequently.. The location of this point is determined by the joint probability distribution !(Yk) = !(Xlk.. X nk are a random sample.5 (Selecting a random sample) As a preliminary step in designing a permit system for utilizing a wilderness canoe area without overcrowding. . + . X 2k . while Sn is a biased estimator with (bias) = E(Sn) ..9 2000 87. the influence of individual measurements (coordinates) on location is asymmetrical.)' = ( 1 n (Xj .E(Xd + . On the other hand.. .7 Xl (paper) X2 (plastics) E(X) = E .:E(Xn) =. regarded as a point in n dimensions. For instance.. The independence of measurements from trial to trial may not hold when the variables are likely to drift over time. and covariance matrix l:. the notion of statistical independence has important implications for measuring distance. + .. and its covariance matrix is That is.p.p. Let Xl' X 2 ..X2 + .p.2 .E(X2 ) + .2 6.)(X t . and other variables. are given for selected years in Table 3. lengths of stay. In particular. + .1 26.
1) potentially different covariances. we obtain £.):I n Here S. Consideration of bias motivates a slightly modified definition of the sample variancecovariance matrix. n:I nlLlL' . which reduces to the usual sample variance of a single characteristic when p = 1.1 j=1 £.1 as a divisor.. k)th element of (Xi . n terms 1 =   n .1):I The sample covariance matrix contains p variances and !p(p . since 2: (Xi i=1 n X) = 0 and nX' = 2: X. or E(rik) . The matrix representing sums of squares and cross products can then be written as :L (Xji  n Xi)(X/ k  X k ).) Consequently.1)1 i=1 To obtain the expected value of Sn' we first note that (Xii . ..1 shows that the (i.IL )(Xe . = j=1 2: XiX. Sometimes it is desirable to assign a single numerical value for the variation expressed by S.1)1 For j "# e.X)(x· . This definition of sample covariance is commonly used in many multivariate test statistics.!.122 Chapter 3 Sample Geometry and R(lndom Sampling Generalized Variance 123 so Result 3. [nl (n ..16. + :I) . [See Exercise 3..1) E(Sn) = . This determinant 2 is called the generalized sample variance: Generalized sample variance = l Sll SIp it follows immediately that Isi (312) (n . (X· . the individual sample standard deviations VS.. are not unbiased estimators of the corresponding population quantities VU..X) (Xj .n :I ILIL' 1=1 (n . Therefore. One choice for a value is the determinant of S.1L)(X j each Xi' we have CoveX)  IL)' is the common population covariance matrix. the correlation coefficients rik are not unbiased estimators of the population quantities Pik' However.Pik> can usually be ignored if the sample size n is moderately large. Therefore..: I • 2 Definition 2A. j=1 E(XjX.24 defines "determinant" and indicates one method for calculating the value of a determinant.IL)' ) 1 =n 2 (:I + :I + . When p variables are observed on each unit. .X k )..) .IL)' is zero because the entry is the covariance between a component of Xi and a component of Xe.!.. its expected value is n For any random vector V with E(V) = ILv and Cov (V) = :Iv.. the sample variance is often used to describe the amount of variation in the measurements on that variable.nXx' i=1 n 3. without a subscript.VU. Result 3. .] Therefore.x)' 1 1 (311) n (. the bias E .= + (1) + = n (1In) (± XiX.(n:I) = 2 . E(XjXj) = :I = S = + ILIL' and E(XX') 1 = :I + ILIL' n Using these results.X)'.17 and (229). we have E(VV') :Iv + ILvlLv· (See Exercise 3.1 provides us with an unbiased estimator S of :I: i=1 :L (Xii  n Xi) (Xik .nE(XX') and thus.. Moreover. the variation is described by the sample variancecovariance matrix .for (Unbiased) Sample VarianceCovariance Matrix S= Sn (n n)  = n12 ( n E(Xi = . has (i. it will replace Sn as the sample covariance matrix in most of the material throughout the rest of this book. since Sn = .1) ]Sn is an unbiased estimator of (Fi k' However.nxx'). of Since:I = E(Xj . k)th entry.4 Generalized Variance With a single variable. (n .X k ) is the (i. k)th entry (n .IL)(Xi .n .XJ (Xik . calculated with either n or n . and these are independent.. each entry in E(Xj .
will be small if just one of the Sii is small or one of the deviation vectors lies nearly in the (hyper) plane formed by the others.ISnl = I[(n .I = n volume? . for a fixed set of data.1\ I . then.x21. LdJ and the area of the trapezoid is / Ld J sin ((1) /L d2 .43) = 26.1)/n]IpSI = I[(n l)/njIpIlSI = [en . 3 ..l)/nJPISI.6 (a) "Large" generalized sample variance for p = 3. obtained from the data in the April 30. Consider the area generated within the plane by two deviation vectors d l = YI .l)/njS.6(a) and (b) show trapezoidal regions.11.04 68. the trapezoid has very little height above the plane.I. Figures 3. or / S /.. some information about the sample is lost in the process.67 d .r12) = (n = . 3 If generalized variance is defmed in terms of the samplecovariance matrix S. /S/ I . 1990. is 3 Forbes " .dp = Yp .sll s2z r iz = Sl1 S 22(1 pr . we have the diagram dl (a) (b) Figure 3. r12 = (n l)"Vsl1 szz (1 . we can (315) = I V j=l ± (xj1 . d 2 = Yz .67) .(68.7 (Calculating a generalized variance) Employees (Xl) and profits per Generalized Variance 125 .l .Xl)Z = V(n  I)Sl1 and cos«(1) = Therefore. d p .rI2) = .. generated by p = 3 residual vectors.43 Evaluate the generalized variance. volume will increase if the residual vectors of fixed length are moved until they are at right angles to one another. \ \ \ S = [252.I . .3. we compute /S/ 68.XiI (or is increased. In this case.xpl. d" = (252. 266): GeneraIized sample variance = /S/ = (n 1)P(volume)Z From (35) and (37). .riz Equation (315) says that the generalized sample variance.XII and d 2 = Yz . 2 \'. Height=Ldl sin «(I) /S/ = (areafj(n . A geometrical interpretation of / S / will help us appreciate its strengths and weaknesses as a descriptive summary. . Let Ldl be the length of d l and Ldz the length of d z . using (315). In addition.p. On the other hand. as in Figure 3. = [en . In the second case. when p > 1. where d 3 1ies nearly in me plane formed by d 1 and d 2 . Of course. \ \ \ . .1 deviation vectors d l . By elementary geometry.124 Chapter 3 Sample Geometry and Random Sampling Example 3.487 • '_2 The generalized sample variance provides one way of writing the information on all variances and covariances as a single number. we can also write the following: Generalized sample variance = IS. Area Also. it is clear from the geometry that volume.::J I I = Sl1 S2Z I . or / S /. the volume. Consequently. employee (X2) for the 16 largest publishing firms in the United States are shown in Figure 1.XII. . d z. magazine article.x21. or both. Since cosz( (1) express this area as + sin ( (1) 2 = 1.43)(68.I)Z Assuming now that / S / = (n .04)(123.6(a).l)(pl) (volume )2 holds for the volume generated in n space by the p . . The sample covariance matrix. For a fixed sample size.I . is 3 proportional to the square of the volume generated by the p deviation vectors d l = YI .. corresponding to "large" and "small" generalized variances. we see that = 3.43J 123. will increase when the length of any d i = Yi . This is the situation in Figure 3. using Result 2A. 1\' I( I" I'. we can establish the following general result for p deviation vectors by induction (see [1].6(b). (b) "Small" generalized sample variance for p If we compare (314) with (313).
we notice that if (A. It can be shown using integral calculus that the volume of this hyperellipsoid is related to 1 S I.7 gives three scatter plots with very different patterns of correlation..x)'SI(X . The eigenvalues and eigenvectors extracted from S further describe the pattern in the scatter plot. . " The meancentered ellipse. air on . as the following example shows.XI."" xpJ.. Although the generalized variance has some intuitively pleasing geometrical interpretations.8 (c) Each covariance matrix S contains the information on the variability of the component variables and also the information required to calculate the correlation coefficient. 7 x. it suffers from a basic weakness as a descriptive summary of the sample covariance matrix S. 1] £or a I1 three cases. 4 • • •• • • • • • • •• • • •• • •• • • • • • • •• • • .42 = (A .:.x) ::s c2 To describe this ellipse as in S ti 2 3 ' I eigenvalueeigenvecto.he eigenva]lueeigenvector pairs Al = 9 ei = [1/\1'2 1/\/2] and "2 . oS c2} = kplSII/2cP (b) • • • 7 . A large volume corresponds to a large generalized variance.'.::th = . (x .5)2 . . In this sense.. and the covariance matrices are •e • • • • • 7 x.x). the? mu1tlplymg on the left by SI givesSISe = ASle or S e " e erefore usmg t h · I ' extends cvX.SI(X . X2.. the coordinates x/ = [Xl> X2"'" xp) of the points a constant distance c from x satisfy (x . where f(z) denotes the gamma function evaluated . e) is an SI That' if S _ A P S. kp = 2u1'/2/ p r(p/2). we know that the e11ipse the eigenvalues satisfy 0= (A .] Equation (316) defines a hyperellipsoid (an ellipse if p = 2) centered at X.2jSll is the squared distance from XI to XI in standard deviation units. • • • •• .x) = (XI . (x .!? The .1) 4 at z.e) IS an elgenvalueeigenvector pair for I' _ . and SI playing the role of A. . .r = 0 S = [45 4J 5' r = . IS .r =._ • Example 3.7 Scatter plots with three different orientations.8 S = [30 3DJ .$ 126 Chapter 3 Sample Geometry and Random Sampling tL Generalized Variance Generalized variance also has interpretations in the pspace scatter plot representa_ tion of the data.x)/SI(x. With these choices. In particular.i) = 127 7 Cl [When p = 1.8 (Interpreting the generalized variance) Figure 3.l w e !.i) or (Volume of ellipsoid)2 = (constant) (generalized sample variance) where the constant kp is rather formidable..ction of ues from S. For those who are curious. All three data sets have x' = [2. Consider the measure of distance_ given in the comment below (219).1 J. in the dir. • ••• • . 1/\/2 . For S= Figure 3. with xplaying the role of the fixed point p.x)'SI(x .9)(A .e2 = 1/ v2. . Volume of {x: (x . S captures the orientation and size of the pattern of scatter.e.the? (A . The most intuitive interpretation concerns the spread of the scatter about the sample mean point x' = [XI. . with center x' = [2 .1. . S= 4J 4 5 [5 . •• ..
8(c). for S= [ 5 4J 4 5' the eigenval1les satisfy o= = (A . we can often sketch the axes of the meancentered ellipse by eye. di = [Xli . the meancentered ellipsoid based on SI [see (316)] has axes.3). "2 7 7 • • • • • • . 0] and A2 = 3. the choice C Z = 5. Next. The situation for p > 2 can be even more obscure. ei = [I. In two dimensions. • • • • • • •• • O! . Notice how the directions are the natural axes for the ellipse. this is a case where one of the deviation vectorsfor instance. The scaled eigenvectors 3V5. therefore.7. • • 7 XI • • • • • • •• • • • •• • • • • • ••• • • •• • ••• • • . Generalized variance is easier to interpret when the two or more samples (patterns) being compared have nearly the same orientations. Notice that our three patterns of scatter appear to cover approximately the same area. IS I can be expressed as the product AIAz'" Ap of the eigenvalues of S.(4)Z (A .for Finally.• . Note: Here the generalized variance 1SI gives the same value. x2 7 Situations in which the Generalized Sample Variance Is Zero The generalized sample variance will be zero in certain situations.99 will produce an ellipse that contains approximately 95% of the observations. ei = [1/V2.8(b).. From Exercise 2..99 el and V5.12.99 ez are drawn in Figure 3. As we have shown geometrically.· . .'s (see Section 2. 1/V2J and A2 = 1. The vectors 3v'5. . .i)'SI(X . Consequently.Xl Xl X21 .128 Chapter 3 Sample Geometry and Random Sampling Generalized Variance 129 In p = 2 dimensions. for all three patterns. the eigenvector approach also works for high dimensions where the data cannot be examined visually. 1S I = 9.xdlies in the (hyper) plane generated by d 1 . Moreover.99 e2 are drawn in Figure 3.3)z and we arbitrarily choose the eigerivectors so that Al = 3. . el = [1/V2. We shall pursue this topic later when we discuss principal components.8 Axes of the meancentered 95% ellipses for the scatter plots in Figure 3.8(a). whose lengths are proportional to the square roots of the A. The ellipses that summarize the variability (x .9) (A . dil> di+l>"" d p .: [0. Xlp X2p  Xp Xp (318) Xnl  Xl  X np  • •• (nxp) XI (nxI)(lxp) i' Figure 3. It is useful. and observe that the lengths of these scaled eigenvectors are comparable to the size of the pattern in each direction. can be expressed as a linear combination of the other columns. as well as their product. 1]. However. in the sense that at least one column of the matrix of deviations. ei . But generalized variance does not contain any information on the orientation of the patterns. since all have IS I = 9. it is often desirable to provide more than the single number 1S I _as a summary of S. • (c) xi :[ Xn  xi  . A generalized variance of zero is indicative of extreme degeneracy.• • • • • and we determine theeigenvalueeigenvectorpairs Al = 9. . = . (a) (b) As Example 3.i) :5 c2 do have exactly the same area [see (317)]. i'] i' X = [Xll . These eigenvalues then provide information on the variability in all directions in the pspace representation of the data.Xi'"'' Xni .5)Z .1) the eigenvalues satisfy 0= (A . different correlation structures are not detected by IS I. to report their individual values. . 1/V2J. The vectors v'3 v'5]9 el and v'3 v'5:99 ez are drawn in Figure 3.99 el and V5.8 demonstrates. .
130 Chapter 3 Sample Geometry and Random Sampling Result 3. if 1S 1 = 0. and determine the nature of the dependency in the data. an algebra score and a geometry score could be combined to give a total math score. in the columns of S. 1].li')a =0 and from Definition 2A.9 (A case where the generalized variance is zero) Show that 1 S 1 = 0 for (3X3) and determine the degeneracy. Premultiplying by a' yields 0= a'(X .5 The deviation (column) vectors are di = [2. with their associated difficulties.) This means that one of the deviation vectorsfor example. Thus. In the other direction. the columns of (X . + apcolp(X . Here x' = [3..3 4. as you may verify.li')' (X . so the same a corresponds to a linear dependency. before the situation was unmasked. al coll(S) + .1. or class midterm and final exam scores summed to give total points. The generalized variance is zero when.2..1. Proof. These data could be the number of successful phone solicitations per day by a parttime and a fulltime employee. A singular covariance matrix occurs when. at least one deviation vector lies in the (hyper) plane formed by all linear combinations of the othersthat is.24. and = d l + 2d2 . and only when. Consequently.Ix') and (n . then there is some linear combination Sa of the columns of S such that Sa = O. respectively. Since d3 1 9 16 10] X= 10 12 13 [ 4 12 2 5 8 3 11 14 where the third column is the sum of first two columns. for instance. investigators are sometimes unpleasantly surprised to find a case of zero generalized variance. d lies in the plane generated by the other two residual vectors. 0 = (n .3 = = 1 1 1 01 4 .1. (n .li') are linearly dependent. 3 (3X3) . the data are test scores and the investigator has included variables that are sums of the others.li')'(X . there is column degeneracy. Example 3.10 (Creating new variables that lead to a zero generalized variance) Consider the data matrix X  lX' = [ 4. Show that the generalized variance 1S 1 = 0. 0 3 4 But then.. for the length to equal zero. For example.li') are linearly dependent.li')'(X .0) + 0 = .0. We have encountered several such cases. there is a linear combination of the columns such that 0= al coll(X .. This common practice of creating new variables that are sums of the original variables and then including them in the data set has caused enough lost time that we emphasize the necessity of being alert to avoid these consequences. + (.li')a = Lfxb')a ISI=3!! = = tl(1)4 3 (1 .li')a = O.Ix')' (X .li') 3 Generalized Variance 13 1 = (X  li')a for some a". If the ct>lumns of the deviation matrix (X .li') a. so 1. (Note that there 3 is row degeneracy also. We have d = [0.1)Sa = (X . so the third column is the total number of successful solicitations per day.1)S = (X . IJ. so that S does not have an inverse. So.1)Sa figure 3. 1S 1 = O. Example 3.[ S  _J 1 ! 2 0] ! 1 2 .9.3 X = 4 1 6 4 0 4 1 2 5] [ = When large data sets are sent and received electronically. + ap colp(S) = Sa = 0. by Result 2A. we must have (X . d z = [1. the total weight of a number of chemicals was included along with that of each component. 1J. 5J. Once. That is.9 A case where the threedimensional volume is zero (/SI = 0). the threedimensional volume is zero. when the columns of the matrix of deviations in (318) are linearly dependent. .9 and may be verified algebraically by showing that IS I = O. This case is illustrated in Figure 3. = (X .li') + . 0 • and.
+ a3(Xj3 . For any fixed sample.I)Sa = (X . with entries Xjk .9. one should retain measurements on a (presumed) causal variable instead of those on a secondary characteristic..Xk) coln(X .Xk) COII(X .X3) = 0 forallj In addition. is zero.. if the three columns of the data matrix X satisfy a linear constraint al xjl + a2Xj2 + a3xj3 = c.li' is less than or equal to n . We showed that if condition (3) is satisfiedthat is. if condition (1) holds.li')' sum to the zero vector. we can write.1 linearly independent row vectorscol2(X li')'. + (Xnk . Since Sa = 0 = 0 a.5 S= 0 2. If n :s.Xl) + l(xj2 .5 2. p. has led to the case of a zero generalized variance.5 2. (3) a/xj = c for .TherankofSisthereforelessthanorequal to n . the generalized variance IS I = 2. if the values for one variable can be expressed in terms of the othersthen the generalized variance is zero because S has a zero eigenvalue. The corresponding reduced data matrix will then lead to a covariance matrix of full rank and a nonzero generalized variance..132 Chapter 3 Sample Geometry and Random Sampling Generalized Variance I 33 We find that the mean corrected data matrix. from Result 2A.li/)' colk(X .9. is a constant. whichas noted at the beginning of the proofis less than or equal to p . We shall return to this subject in our discussion of principal components. The linear combination of the mean corrected data.0 3 =0 In general.li/)'. After substituting for rowl(X .li') = (Xlk .X2) for all j. IS I = o.0 ais a scaled eigenvector of S with eigenvalue O. the n row vectors in (318) sum to the zero vector. so that al(Xjl . unaware of any extra variables that are linear combinations of the others.5 0 2. we can express colk(S) as a linear combination of the at most n . using a. the inclusion of the third variable. In any statistical analysis.1 because n :s.1) colk(S) = (X . so the columns of the original data matrix satisfy a linear constraint with c = O. that IS I = O. (sample size) :s. We must show that the rank of S is less than or equal to p and then apply Result 2A. That is. colk(S). is less than or equal to p . a constant for all j. In the other direction.. • . we. and S is singular.li/)/(X li/)a = (X . p.. Proof. in turn. Result 3. then the eigenvector a gives coefficients for the linear dependency of the mean corrected data. Here the third variable is actually the sum of the first two variables.1. (number of variables). The linear combination of the original data.li)'(X .5 (n . then IS I = 0 for all samples. can be written as a linear combination of the columns of (X . then alxl + a2 x2+ a3 x3 = c. .li/)O = 0 and Sa = 0 establishes the linear dependency of the columns of S.li/) (pxn) (nxp) Sa = 0 [ [ 25 25 5. The existence of this linear combination means that the rank of X . [2.1. if we were unaware of the dependency in this example.5]' 2..1) S (pXp) = (X . . = 0 1 (n . we settle for delineating some simple conditions for S to be of full rank or of reduced rank. in this case.coln(X li/)'. Thus. COlI (X .  Since the column vectors of (X .li')' as the negative of the sum of the remaining column vectors. Since (n . Whenever a nonzero vector a satisfies one of the following three conditions. which. Because we have the special case c = 0.li/)a 0 and the columns of the mean corrected data matrix are linearly dependent.X3) = 0 = (X . At this point.li')' in the preceding equation.. since 2. We verify that.x) = 0 for allj '" ' allj (c = a/x) . a computer calculation would find an eigenvalue proportional to a/ = [1.1.5 3  Let us summarize the important equivalent conditions for a generalized variance to be zero that we discussed in the preceding example. can fID? them by calculating the eigenvectors of S and identifying the one assocIated WIth a zero eigenvalue. When there is a choice. The resulting covariance matrix is ..0 1 o[ = the kth column of S. we see that a is a scaled eigenvector of S associated with an eigenvalue of zero.5 5.. it satisfies all of them: (1) Sa = 0 'v' (2) a/(xj .X2) + (l)(xj3 .52 X 5 + 0 + 0  2. the constraint establishes the fact that the columns of the data matrix are linearly dependent.li/)' The coefficients reveal that l(xjl .li')' + . which is linearly related to the first two.5 . using a. This gives rise to an important diagnostic: If we are. that is. This implies.xb is X fi' +1 2. IS I = 0 means that the measurements on some variables should be removed from the study as far as the mathematical computations are concerned. 1). the sum of the first two variables minus the third is a constant c for all n units.. The question of which measurements to remove in degenerate cases is not easy to answer.3. for example.1.Xl) + az(Xj2 . Hence. That is. In particular. Whenever the columns of the mean corrected data matrix are linearly dependent.
134 Chapter 3 Sample Geometry and Random Sampling Result 3. we readily find that the cosine of the angle ()ik between (Yi . imd the sample mean of this linear combination is c = + a2 x j2 + .. For example..1) P( volume) ( ofthe standardized variables = (320) Generalized Variance Determined by IRI and Its Geometrical Interpretation The generalized sample variance is unduly affected by the of ments on a single variable... geometrically.. The volume generated in pspace by the deviation vectors can be related to the generalized sample variance.> \ \ " \ (319) J2 [(Xlk .) We define Generalized sample variance) = R ( of the standardized variables 1 1 Since the resulting vectors The volume generated by deviation vectors of the standardized variables is illustrated in Figure 3.. p = a/xI [ a/x n : a/x] = [e: c] =  0 a/x e.Xi Vi.2. we can make the statement that 1R 1 is large when all the rik are nearly zero and it is small when one or more of the rik are nearly + 1 or 1.1. with probability 1. .. Proof. the corresponding deviation vector di = (Yi ..xi1 )/Vi. In sum. The saine steps that lead to (315) produce indicating linear dependence..Xk)/. . .. (Xnk .6 reveals that the influence of the d 2 vector (large variability in X2) on the squared volume 1S 1 is much greater than its influence on the squared volume 1 R I.. Then a/x.. then..Xk)/%] = (Yk . A comparison of Figures 3... (X2k ..10 and 3. Let the p X 1 vectors Xl> X2.. X 2. (Part 2). by its standardized value (Xjk . . the conclusion follows fr..Xk)/VS.. it is sometimes useful to scale all the deviation vectors so that they have the same length. • Generalized sample variance) 1R 1 (2 = n . 2: If.. the sample correlation matrix of the original variables.om Result 3.. Vi.' •. Employing the argument leading to (37). suppose some Sii IS either large or qUIte small. Then. S has full rank with probability 1 and 1SI> o. we have the following result: Let Xli  = alXjl + a2 X j2 + .s.10 The volume generated by equallength deviation vectors of the standardized variables. where xj is the jth row of the data matrix X. \ . . . c) for all j.. + apxjp)/n = alxl + a2x2 + .L (alxjl n Vi. but all have a squared length of n . provided that p < n.c be the deviation vectors of the standardized variables.6. i = 1.···. The ith deviation vectors lie in the direction of d. + apxp = a/x. .. a/Xj is a constant (for example.13..Xk)/VS. X n • Then Generalized Variance 135 1. 3 \. and (Yk .2.XiI) X2i .. The proof of Part (1) is difficult and can be found m [2].. (Yi . Scaling the residual vectors is equivalent to replacing each original observation x.xkl)/vSkk is the sample correlation coefficient rik' Therefore.· The sample covariance matrix of the si:ndardized variables is then R... Consequently.XiI) will be very long or very short and will therefore clearly be an important factor in determining volume.4. J . If the linear combination a/Xj has positive variance for each constant vector a * 0. be realizations of the independent random vectors X I. If a/Xj when two or more of these vectors are in almost the same direction. (See Exercise 3.10 for the two sets of deviation vectors graphed in Figure 3. Xn .. j=1 Xi = c for all j.xkl)'/Vskk (a) (b) all have length the generalized sample variance of the standardized variables will be large when these vectors are nearly perpendicular and will be small Figure 3.... then 1S 1 = O. + apXjp = c with probability 1.
the total sample variance is the sum of the squared lengths of the = (YI .xpI). (1  G)(! + GW .9. = 252.2) + 1(6 . Moreover. squared volume (n .l)sii of the d i . .11 (Illustrating the relation between IS I and I R I) Let us illustrate the S22 + .5 Sample Mean. 2 I] + S33 R = Using Definition 2A.. algebraically. However. For instance. S = [252.9) IRI=lli = 3.xII).4) .136 Chapter 3 Sample Geometry and Random Sampling The quantities IS I and IR I are connected by the relationship Sample Mean. 3 S22 68.43J 123. + spp (323) Example 3. divided by n . we see from (315) and (320) that the squared volume (n .04 + 123.. The total sample variance criterion pays no attention to the orientation (correlation structure) of the residual vectors. Example 3. S. the relative value of IS I will be changed whenever the multiplicative factor SI I changes. it is possible to link algebraically the calculation of i and S directly to X using matrix operations.67 = 375. Since IR I is based on standardized measurements. and Correlation as Matrix Operations 137 Another Generalization of Variance (321) We concludethis discussion by mentioning another generalization of variance. and Correlation as Matrix Operations We have developed geometrical representations of the data matrix X and the derived descriptive statistics i and S.71 (3X3) Then Sl1 = 4. it is unaffected by the change in scale.] Interpreting (322) in terms of volumes.12. Covariance.7. Total sample variance = Sll + so (322) [The proof of (321) is left to the reader as Exercise 3. we obtain It ! 2 3 !] 1 = and Total sample variance = Su + S22 = 3+ 1+ 1= 5 • ISI = + + 14 il(1)4 Geometrically. d p = (Yp .9. and S33 4 3 1] [ 3 9 2 1 2 1 2 3 S [ = 1. From Example 3. are easily programmed on electronic computers. which.3(3 . how a change in the· measurement scale of Xl> for example. p deviation vectors d I = 4(9 .7 and 3. Specifically. it assigns the same values to both sets ofresidual vectors (a) and (b) in Figure 3. S22 = 9. we define the total sample variance as the sum of the diagonal elements of the sample varianceco)(ariance matrix S. In addition.I)PIRI.12· (Calculating the total sample variance) Calculate the total sample variance for the variancecovariance matrices S in Examples 3. . Covariance. and the full data set X concisely.!)= ts (check) It then follows that 14 = ISI = Sl1S22S33IRI = = 14 .. The resulting expressions.43 and Total sample variance = Sll + From Example 3.6.1.67 relationship in (321) for the generalized variances IS I and IR I when p Suppose S = = 3.1)pISI is proportional to th<. will alter the relationship between the generalized variances. in turn. Equation (321) shows.04 68. Thus.24. which depict the relation between i. is proportional to the product of the squares of the lengths (n .. The constant of proportionality is the product of the variances.
it can be related to the sample correlation matrix R.U'X n = r" : X2 X2 . x is calculated from the transposed data matrix by postmultiplying by the vector 1 and then multiplying the result by the constant l/n.xp Xll  Xl X r X21  ..1 X'l xn That is. Xln X2n since Xl yi1 n 1 1 111')'(1 .Xl xnp .I/2.. the matrix (n . The relations in (327) show clearly how matrix operations on the data matrix X lead to x and S. we create an n X p matrix of means by transposing both sides of (324) and premultiplying by 1. !X' = .138 Chapter 3 Sample Geometry and Random Sampling Sample Mean. Covariance. p Xl x2p . the matrix expressions relating x and S to the data set X are X2 Y21 n x= xp 1 n X21 X22 1 X'l x=Xpl xp2 xpn n 1 n S = _1_X' (I n .!.111') =111 I . n n n2 n To summarize. Therefore. 0 (325) Then (residuals) 1 lj o o 1 (328) Xl Subtracting this result from X produces the n X p matrix of deviations D1I2 (pXp) 0 0 1 (326) = VS. + Xni '1)ln Xll Xl2 = yj1/n..I)S representing sums of squares and cross products is just the transpose of the matrix (326) times the matrix itself.. 11 1 .xp Xnl ..n n. Next. Let r . o Now. Since and x np . We fIrst defIne the p X P sample standard deviation matrix Dl/2 and compute its inverse. Once S is computed. (X . we have R = DI/2 SDl /2 (329) . X2 The result for Sn is similar. except that I/n replaces l/(n . or Xnl Xn2 . that is. and Correlation as Matrix Operations 139 We have it that Xi = (Xli' 1 + X2i'l + .. xp Xp DII2 (pXp) = 0 0 VS.xp = (X ..1 (324) '!'11')X n (327) or . = X'(I .1) as the first factor. The resulting expression can also be "inverted" to relate R to S. +1 11" 11 =1111' (I . (D J/ 2 l = D.X2 o VS.
+ e'(xn .i)'e n1 + c2X2 + ..b'i)(e'x2 . .i)' + (X2 . variances.140 Chapter 3 Sample Geometry and Random Sampling Sample Values of Linear Combinations of Variables 141 Postmultiplying and premultiplying both sides of (329) by nl/2 and noting that n. i)(x n . + b'(xn ... + e'x n) n = e'(xI + X2 + . n + (X2 . we are led naturally to consider a linear combination of the foim c'X = CIXI + (b'X2 .e'i) = b'(x! .b'i)(e'xn .b'i)(e'x! ...9 as Equations (332) and (333) are sample analogs of (243)..i)(X2 . They correspond to substituting the sample quantities i and S for the "population" quantities /L and 1.e'i)2 + . + (XII . + (b'xn .e'i) + . 3. Result 3. + (e'xn ..i)'e + C'(X2 .i)(xlI .i)(xI . Now consider a second linear combination b'X = blXI + hzX2 + . whereas S can be obtained from nl/2 and R.l/2 = I gives S = nl/2 Rnl/2 (330) It follows from (332) and (333) that the sample mean and variance of these derived observations are Sample mean of b'X = b'i Sample variance of b'X = b'Sb Moreover.e'i) nl + b'(X2 .. we have have sample means.i)(X2 ... in (243)..i)l = e'(xj .i)(X2 ... the sample covariance computed from pairs of observations on b'X and c'X is Sample covariance = (b'xI .2.i)'] e' [ n _ 1 e Sample variance of e'X e'Se (333) • or = Example 3.i)' + . .l/2nI/2 = n l/2n.e'i)2 = (e'(xj .i)(xI ... (e'xI ...i)'e + .1 e'(xI i)(xI .i)(X2 .i)' + .i)(x n . + xn) l n = e'i b'X = blXI + hzX2 + .13 (Means and covariances for linear combinations) We shall consider two linear combinations and their derived values for the n = 3 observations given in Example 3..5. and covariances that are related to i and S by Sample mean of b'X Sample mean of e'X Samplevarianceofb'X Sample variance of e'X Samplecovarianceofb'Xande'X . .i).i)(xI .. R can be optained from the information in S.i)'e = b'[(X! . Equations (329) and (330) are sample analogs of (236) and (237). In many multivariate procedures..i)(xj .i)(x n .6..Je n1 whose observed value on the jth trial is (331) or Sample covariance of b'X and e'X = b'Se The n derived observations in (331) have Sample mean (335) In sum. The linear combinations (332) = (C'XI + e'x2 + . we have the following result..i)' That is.2...e'i)2 + (e'x2 .. respectively.e'i/ Sample vanance = n . + bpXp whose observed value on the jth trial is j = 1. + cpXp = b'i Since (c'Xj . + cpXp j = 1.i)'e = e'i = b'Sb = e'S e = b'Se (336) nl = (XI . + bpXp CIXI e'X = + c2X2 + . n x = 25] 1 6 = x31 X32 x33 4 o 4 Consider the two linear combinations (334) . ...i)'e + . + (xn .i)'e. .6 Sample Values of linear Combinations of Variables We have introduced linear combinations of p variables in Section 2..
(b'X2. and X3 with their observed values. C'X3). . X 2 . we do not even need to calculate the observations b'xj and C'Xj. Observations on these linear combinations are obtained by replacing Xl. C'X2). and (b'X3.(5) = 1 = 2(4) + 2(1) .17)2 + (21 .3)2 + (4 . variances. Thus. and covariances for the linear combinations. .17) 9 2 [2 2 Alternatively. we also have 1 3{!J (check) + + + 2Xl2 2X22 2X32  XI3 X23 x33 = 2(1) + 2(2) . and covariance will first be evaluate. S=p1<moan ofb'X b'i [2 e'i [1 2 1{!J 3 17 (check) The means.1 b' Se + (4 . .d directly and then be evaluated by (336).3)2 = 3 = [2 2 = 3 observations on c'X are Sample variance of c'X = e'Se 1{ 1 1{ lJ 3 3 2 1 I 2 nu] (check) C'XI = 1Xll .9.1X12 + 3x13 = 1(1) .5 pertain to any number of lInear combmatlOns. Sample mean . we find that the two sample means for the derived observations are X. The and relations in Result 3. Sample vanance .142 Chapter 3 Sample Geometry and Random Sampling Sample Values of Linear Combinations of Variables 143 and eX [1 1 b'XI = b'X2 = b'X3 = 2Xl1 2X21 2x31 Consequently.3)(16 .3)2 3 . the n = 3 observations on b'X are S=plemoanofe'X Using (336). is Sample covariance (1 . c'xd..1(1) + 3(6) = 21 C'X3 = 1(4) . using (336). In a similar manner.17? + (16 . the n = (1 + 4 + 4) 3 2 = 3 = (1 .17)2 13 [1 Sample covariance of b' X and e' X 1 3{ = n 13 Moreover. q (337) As these last results check with the corresponding sample quantities _ computed directly from the observations on the linear combinations. ConSider the q linear combinations .17) 3. variances. respectively.3)(21 .. For example.1(2) + 3(5) = 14 C'X2 = 1(4) ..  x. the sample covariance.(6) = 4 = 2(4) + 2(0) .fl] +1 ! mu (cheek) i = 1. computed from the pairs of observations (b'XI. if only the descriptive statistics are of interest.(4) = 4 Sample variance ofb'X = b'Sb = [2 The sample mean and variance of these values are. + 3X.1(0) + 3(4) = 16 and Sample mean = Sample variance 1 3 J [i ! m!] (14 + 21 + 16) 3 = 17 (14 . From Example 3.1 + (4 . we use the sample mean vector i and sample covariance matrix S derived from the original data matrix X to calculate the sample means.2.3)(14 17) + (4 . 2 .
viation vectors in (b) emanatmg from the ongm. Consider the data matrix X = 523 ! (a) Calculate the matrix of deviations (residuals).1. and plot the deVIatIOn vectors _ (b) Sk etch ten h YI . to be c'.1]. . show that Sa = 0. .and verify that the generalized variance is zero. Locate the sample mean on your diagram. (e) Graph (to scale) the two deviation vectors YI . + + . . + = a 2p X p ['n a12 a22 a21 aqlXI aq 2X 2 + . from Table 1.x21..2. and locate the sample mean diagram. with XI = score on first test.h d .6. and t hecosme . 144 Chapter 3 Sample Geometry and Random Sampling These can be expressed in matrix notation as Exercises 145 r nx a21 X I . [See (323).1. (b) Calculate the deviation vector YI .XII and Y2 . (336) imply that the ith row ofAX has samp e mean ajX an an EquatIOns . a q2 (338) 3.. Perform the decomposition of YI into XII and YI . 3. Also.2. (c) Using the results in (b). The following data matrix contains data on test scores. of the angle between them. Th q linear combinations AX in (338) have sample mean vector Ai 6 Resu It 3 . found between Parts a and b.. and X3 = total score on the two tests: 3. Calculate the generalized sample variance 1SI for (a) the data matrix X in Exercise 3. + + + al2 X 2 a22 X 2 + . Given lot in p = 2 dimensions.] 3.x21. Specify an a' = [ai. and compare the results. bse rvat'lons on the. if any. . . (b) Obtain the sample covariance matrix S. t S d R . Compare the results. and Q3 = 1.. Re ate t ese quantIties 0 n an .1. and YI . we see that lng e l ' " 1 ' .1 and (b) the data matrix X in Exercise 3. (c) Sketch th e d eVIa e cosine of the angle between them.3. . Sketch the solid ellipsoids (x . and plot the deVIatIOn (b) Sketch the n_. show that there is linear dependence.3 space representation of the data.X)'SI(x ...S. Useth esIXO (a) Find the projection on I' = [1. 3 dimensional representatIon of the data.1. Calculate the value of the angle between them.] = a qp Xp AX (c) Graph (to scale) the triangle formed by Yl> xII. ti'on vectors in (b) emanating from the origin. Interpret the latter geometrically. e . N h 's .. + ". .9. a2. Calculate their lengths () c Sketch the de I h . k)th eIekth rows ofAX have sample covariance ajS ak' ote t at aj ak IS t e I. • and sample covariance matnx ASA ._ vectors YI . so a can be rescaled to be an eigenvector corresponding to eigenvalue zero. (d) Repeat Parts ac for the variable X 2 in Table 1. Relate these quantIties to of these vect ors and th Sn and R. .9..S. X 12 17 18 20 38 = 14 16 30 [ 20 18 38 16 19 35 29] (a) Obtain the mean corrected data matrix. (b) Determine S and calculate the generalized sample variance 1S I. calculate the total sample variance. (a) Graph the scatter plot in p = 2 dimensions. X2 = score on second test.1.xII and Y2 . ment of ASA'. . .7. That is. (b) Calculate the gene'ralized sample variance for each S. ale. a3] vector that establishes the linear dependence.xII and Y2 . Is this matrix of full rank? Explain.lX'. Identify the length of each component in your graph. a2 = 1. Relate its length to the sample standard deviation.] [X. variable XI ' in units of millions.d th e It . 3. Comment on the discrepancies. .x) s 1 [see (316)] for the three matrices Exercises 3. and verify that the columns are linearly dependent. (c) Verify that the third column of the data matrix is the sum of the first two columns. Given the data matrix S 1 0 0] 0 1 0 001 ond S· [ = i =!] (a) Calculate the total sample variance for each S. h (. (a) Graph the sca tter p . '" k' th 'th roW of A a' to be b' and the kth row of A.. X .1. .XII. with al = 1.XII using the first column of the data matrix in Example 3.x21. Given the data matrix X'[Hl S = = [ S = [ 4 5 4J 5 ' (Note that these matrices have the same generalized variance 1 SI.) 3.1. 3. Calculate the lengths .xII.
.. h i ' j = 1. ShowthatlSI = (SIIS22"· S pp)IRI· . variances. The resulting mean and covariance matrix are 766 0.. and covariance fOTlDulas. 3.161 b'X = [2 = 2X l + 3X2 O. Let X2. k = 1. We have n = 3 observations on p = 2 vari3.X2 :s X2""'Xp :S x p andZ1 :s ZI.635 0. 2. if X and Z are independent then each component of X is (pXl) (qXI) " independent of each component of Z. and then use the sample mean. it is the columns of the mean corrected data 0./Lv)(V .067 0.039 0. Also find the sample covariance of this variable with the total variable in part a.IS.1. and covariance of b'X and c'X using (336). Specify an a = [ai./Lv)'= Iv· ShowthatE(VV') = Iv + /Lv/Lv. not necessan matrix Xc = X .7 to verify (329) and (330). 1S 1 = · t" From Equation (330).. I cons .""Zq:s Zq] = P[Xl:s Xl. Given a data matrix X and the resulting sample correlation matrix R.2. Show that these standar d'Ize d quantities ave samp e covanance matrix R.l1. Hint:P[Xl:S Xl.. 2 I IDl/211 R 11 D / 1· (See Result 2A."" xp and Z2.1 S.039 0. to obtain P[Xl:s xlandZ1 :s zd = P[Xl:s xll·P[ZI:s zd for all Xl> Zl' So Xl and ZI are independent: Repeat for other pairs.043 (a) Using the summary statistics. verify the relation . (b) Calculate the sample means.128 0. by state. variance. ' ly t h ose 0 f t h e data ent.. Given the data b'X and [I I lj (a) Obtain the matrix. .19. from the major sources Xl X3 = petroleum = X2 = natural gas = nuclear electric power hydroelectric power X4 3] is recorded in quadrillions (1015) of BTUs (Source: Statistical Abstract of the United States 2006). variances..18. 3. (c) Show that the columns of the data matrix are linearly independent in this case.) Now examine 1D . (b) Determine the sample mean and variance of the excess of petroleum consumption over natural gas consumption. and covariance of b'X and c'X from first a That is. Let V be a vector random variable with mean vector E(V) = /Lv and covariance matrix E(V ..067 0. . 3. 856 S = _ x= ( ) E aluate the sample means.. Show that. 'der the standardized observations (Xjk .173 0.'"" 146 Chapter 3 Sample Geometry and Random Sampling Exercises and the linear combinations 147 the generalized variance is zero.438 0.568 0. b" abies Xl and X 2 • FOTID the hnear com matIons c'X=[1 by independence. 3. 'der the data matrix X in Exercise 3. and verify that the generalized variance is zero. Energy consumption in 2001. a3] vector that estabhshes the dependence.X 2 :s X2""'X p :S xp]·P[ZI:S Zj.14 using the data matrix r r 0.096 0..508 J O. a2. p. . 3. 14 • ConSl ."" Zq tend to infinity. the columns are linearly dependent. Repeat Exercise 3. determine the sample mean and variance of a state's total energy consumption for these major sources. la k'mg d etermmants gIves H m. When 3I. calculate the observed values of b'X and c'X. which _ D1/2SD1/2 and D l/2RD 1/2 = S.lx' that are linearly depend matrix itself. 3.171 0. Using the summary statistics for the first three variables in Exercise 3. .096J 0.12. 0.Zq:s Zq] 3.635 0.173 0.127 0.16. . 11 U the sample covariance obtained in Example 3. n.17. . . and verify that. .. S = D 1/2 RD 1/2.13. .. (b) Obtain the sample covariance matrix S. Compare the results in (a) and (b). se state that R  3.
0 6. M.7 15. the sampling distributions of many multivariate statistics are approximately normal.0 4. Chapter th climates roads must be cleared of snow quickly following a storm. W' .8 24.1 12.5 8.2 The Multivariate Normal Density and Its Properties The multivariate normal density is a generalization of the univariate normal density to p 2 dimensions. References T W. that normal distributions are useful in practice for two reasons: First.8 10..8 15.7 16. In fact.4 11.5 18. with mean ft and variance u 2 .== 148 Chapter 3 Sample Geometry and Random Sampling £'1 .0 12. It turns out. Here are the results for 25 mCldents m Isconsm. Matrices. New York: 1. second. One torm is Xl = its duration in hours.5 12.0 32.6 THE MULTIVARIATE NORMAL DISTRIBUTION 4.Xjh 25 and then calculating the mean and vanance."The NonSingularity of Generalized Sample Covanance .2 Snow Data xl 12.8 26.3 17.5 8.1 14.0 7.5 17.2 22.4 25. (a) Find the mean and variance of the difference X2 . .0 19. M 2 Eaton.2 32. an mac me.5 10.0 x2 13. 2003. . An Introduction to Multivariate Statistical Analysis (3rd ed.0 17.5 10..0 23. Compare these values . The importance of the normal distribution rests on its dual role as both population model for certain natural phenomena and approximate sampling distribution for many statistics. men. with those obtained in part a.5 10.1 Introduction A generalization of the familiar bellshaped normal density to several dimensions plays a fundamental role in multivariate analysis. Of course. John Wiley.0 13.0 9.20. One advantage of the multivariate normal distribution stems from the fact that it is mathematically tractable and "nice" results can be obtained. While real data are never exactly multivariate normal.5 7. however.. most of the techniques encountered in this book are based on the assumption that the data were generated from a multivariate normal distribution. To summarize.3 11. This is frequently not the case for other datagenerating distributions. because of a central limit effect.710717.0 X2 Xl 3. ad n · PerIman . mathematical attractiveness per se is of little use to the practitioner. 4. spend removal can be q ." Annals of Statistics.0 9.1 2 for] . . .6 13.5 14.individual values Xf2 . Recall that the univariate normal distribution..5 '8.5 42..).Xl by first obtaining the summary statIstIcs.5 8... regardless of the form of the parent population.5 6.6 12. while the effectiveness of snow 3.4 18. to clear snoW. (b) Obtain the mean and variance by first obtaining the .0 6. . 1 (1973). many realworld problems fall naturally within the framework of normal theory. the normal distribution serves as a bona fide population model in some instances. An derson. the normal density is often a useful approximation to the "true" population distribution.5 21.0 X2 Xl 9.0 8. has the probability density 00 < x < 00 (41) 149 ..0 7. Table 3. In nor em f measure 0 s d h' uantified by X2 = the number of hours crews.
2 and Next.e express ion in (46) is random variable s X and X t e normal dIstnbut ion.=. I 2 are uncorre lated so that .P12). .] We shall assume that the symmetric matrix I is positive definite. and II I = (Tll (T22 . Therefore. we findzthat thP1.. Example 4. i = 1. 22(1 (45) The last expressi on is (X2 _ /J. the univariate normali zing constan t (27T rl/2( (Tzrl/2 must be changed to a more general constant that makes the volume under the surface of the multivariate density function unity for any p. The term (42) in the exponen t of the univariate normal density function measure s the square of the distance from x to /L in standard deviatio n units. probabilities are represen ted by volumes under the surface over regions defined by intervals of the Xi values./L2 I1Z) /LlJ = (T22(XI l1d + (Tll(X2 112? . and consequently. Id (44) is more informa tive in man wa unWIe y.=. . . (T1l(T22(1 PI2) I1d(X2 = 1 _1 PI2 [ ( Y + ( Y .z 150 Chapter 4 The Multivariate Normal Distribution The MuItivariate Normal Density and Its Properties 151 J1 . These areas represen t probabi lities.(T12 = (T (T coeffici (1 _ 2) Y squared g ya:.. Xp] has the form (44) (x .J10 J1 J1 +0..=. and the dIstance become s It is conveni ent to denote the normal density function with mean /L and variance (Tz by N(/L. This can be generali zed for a p X 1 vector x of observations on several variables as (43) The p x 1 vector /L represents the expected value of the random vector X. Also shown in the figure are areas under the curve within ± 1 standard deviatio ns and ±2 standard deviations of the mean. .I1d/VC.E(X ) z (T11 = Var(X )./L)'I1( x .2. (1 [(XI /Ll)2 + (X2 .(T2 = (T (T . /L2 == E(X ). We shall denote this pdimen sional normal density by Np(/L.. terms of the standard ized wn values (Xl . X2) = 1 27TY(T11 (T22 X exp {.112)2 vc.(T P(/L . This notation will be extended to the multivariate case later.) A plot of this function yields the familiar bellshaped curve shown in Figure 4.J1 + 20 4.. N(lO. III i since .2 (1 .crtz (T12 (T12J (T11 P(/L . and the p X P matrix I is the variancecovariance matrix ofX...20. (44) 11 we can substItu te for II n to get the expressIOn fo th b· . . and thus. It can be shown (see [1]) that this constant is (27TF/zl Irl/2. which is analogous to the normal density in the univaria te case.Z = Corr(X e mverse of the covarian ce matrix vc. so the expressi on in (43) is the square of th. This is necessary because. in the multivariate case.J [Xl X2 . When this replacement is made.1. . for the normal random variable X.1 (Bivariatenormal density) L density in terms of the ·nd· ·d al et us evaluate the p = 2variat e normal I IVI paramet ers /L .P12) (T22 [ PI2 (TII 1 VC. [See (230) and (231).8. p. The expresSIOn m (46) is somewh at . and PI2: f(xJ. I Using Result l . (TZ).PI2) P12) 2 (46) . a pdimen sional normal density for the random vector X' = [XI' X z. Xz)· 2A. and the compac t general form in useful for discussi ng certain the other th. ( involvin g the individu al parame ter r e Ivanate p = 2) normal density s 111> 112.e generalized distance from x to /L.ttenm . is II = 1 [(TZZ (T11 (T22 .···./L) = [XI . .1 A normal density with mean /L and variance (T2 and selected areas under the curve. 4) refers to the function in (41) with /L = 10 and (T = 2. The multivariate normal density is obtained by replacing the univaria te distance in (42) by the multivariate generalized distance of (43) in the density function of (41)..' e wntten as the product of two un. I). (T11> (T22.:. we 11 Z2 Pl2 . .z)/va: ./Ll.0 .112)J} va:. (TZ2 = Var(X ) andU _ 1 I. For example if the . where CXJ < Xi < CXJ.2P12( ( ) J 12. _ 2P12 (XI  111) (X2 . t h e Jomt denSity can b Ivanate normal denSItIes each of the form of (41). Xz .68 /L + 2cr) == .95 the correlat ion ent Pl2 b writin obtam (T11(T22 ../Lz] (T11(T22(1 .2cr S S X X S S /L + (T) == . ' ..
we have 0 < e'l:e = e' (l:e) = e'(Ae) = Ae'e = A. • The Multivariate Normal Density and Its Properties 153 From the expression in (44) for the density of a pdimensional normal variable. !(X1. A contour of constant density for a bivariate normal distribution with CTU = CT22 is obtained in the following example. then l:e = Ae implies l:le = (±) e so (A.75. Notice how the presence of correlation causes the probability to concentrate along a line. . x'ej = 0 for all i only if O. and it follows that l:1 is • The following summarizes these concepts: Contours of constant density for the pdimensional normal distribution are ellipsoids defined by x such the that (47) These ellipsoids are centered at J. for any p X 1 x.152 Chapter 4 The Multivariate Normal Distribution That is. .l) is constant. e) is an eigenvalueeigenvector pair for l: corresponding to the pair (1/ A.le. it should be clear that the paths of x values yielding a constant height for the density are ellipsoids.l The axes of each ellipsoid of constant density are in the direction of the eigenvectors of l:1. Also.)2 is nonnegative. So x oF 0 implies that positive definite. j=l 2: (l/Aj)(x'ei > p 0.2. 2= . In FIgure 4.. (±)(x'ei x = 0 since each term Ai1(x'e. 2. where l:ej for i = 1.J. the multivariate normal density is constant on surfaces where the square of the distance (x . e) is an eigenvalueeigenvector pair for l:1.5. Xl and X 2 are independent (P12 = 0).l )'l:l(X .) Two bivariate distributions with CT11 = CT22 are shown in FIgure 4. since these ellipsoids are also determined by the eigenvalues and eigenvectors of l:.2(b).J. (b) = Ajei Figure 4. Result 4. e = r1(l:e) = l:l(Ae). [See (228).75. .] This result is true in general. by (221) (a) x'l:l x = x'( .2(a). In Figure 4. We state the correspondence formally for later reference. or e = U.ej. = CT22 and P12 = O. (l/A. Thus.l)' l:1 (x .=1 ± A. If l: is positive definite. P12 = .J. These paths are called contours: Constant probability density contour = {all x such that (x  J. e) for l:1. so that l:1 exists. That is.1. and divi sion by A> 0 gives l:le = (l/A)e. p. Fortunately. Also. (a) CT1! (b)CTll = CT22andp12 = . (See Result 4. we can avoid the calculation of l:1 when determining the axes. Moreover.l) = c2 } = surface of an ellipsoid centered at J. In addition. and their lengths are proportional to the reciprocals of the square roots of the eigenvalues of l:1. For l: positive definite and e oF 0 an eigenvector. Proof..l and have axes ±cv'X. l:1 is positive definite. X2) = !(X1)!(X2) and Xl and X 2 are independent.2 '!Wo bivariate normal distributions.
the axes of the ellipses of constant density for a bivariate normal distribution with 0"11 = 0"22 are determined by (A  0"11  (1n) (A = 0"11 0"11  + O"n) 0"12' Consequently. /11 Figure 4. . AI = 0"11 + IS the largest eigenvalue.A ' = I To summarize. The fact that /L is the mean of the multivariate normal distribution follows from the symmetry exhibited by the constantdensity contours: These contours are centered. e. When the covariance (112 (or correlatIOn pn) IS pOSItive. or balanced. e2 [see (47)]. the eigenvalues Al = (111 vector el is determined from + (112 and A2 + (112) The eigen • We show in Result 4. Specifically. (See Figure 4. Here 1:£ .2 are pictured in Figure 4.4. and the major axes of the constantdensity ellipses will lie along a line at right angles to the 45° line through /L. at /L. Since the axes of the constantdensity elhpses are iven by ±cVA.3 A constantdensity contour for a bivariate normal distribution with Cri I = (122 and (112) 0 (or P12 > 0).leads to contours that contain (1 .7 that the choice c 2 = where is the upper (looa)th percentile of a chisquare distribution with p degrees of freedom.2 (Contours of the bivariate normal d.ty) We shall axes of constant probability density contours for a blvan?te normal when O"u = 0"22' From (47). or mode. or mean. = [1("!2. and the eigenvectors each have fength unity.4 The 50% and 90% contours for the bivariate normal distributions in Figure 4. 1/\12).3.ensi.or.f54 Chapter 4 The Multivariate Normal Distribution The Multivariate Normal Density and Its Properties 155 Example 4. = [1/\12. A2 = 0"11 . these axes are given by the elgenvalues and elgenvectors of :£.2.A)2 .All = 0 becomes When the covariance (correlation) is negative.a) X 100% of the probability. the first eigenvaluehas probability 1 .(1?2 (111 . . A2 = 0"11 . Thus.(112 yields the eigen:ector ei. e and ±cVX.) Figure 4. The pvariate normal density in (44) has a maximum value when the squared distance in (43) is zerothat is. when x = /L. = = (0"11 «(111 + (112)e1 + (112)e2 The solid ellipsoid of x values satisfying These equations imply that e1 = eigenvector pair is and after normalization. the following is true for a pdimensional normal distribution: [::: [:J (1lle1 (112e1 = «(111 [::J or + (112e2 + (111e2 e2. hes along the 45° line through the point p: = [ILl' 1Lz)· 11llS IS true for any value of the covariance (correlation). /L is the point of maximum density. Similarly. as well as the expected value of X. (48) The constantdensity contours containing 50% and 90% of the probability under the bivariate normal surfaces in Figure 4. (These results are true only for 0"11 = 0"22') 0= \0"11 0"12 A (112 = «(111 .a. axis will be associated with the largest For positively correlated normal random then. and its associated eigenvect.0"12 will be the largest eigenvalue. the major of the constantdensity ellipses wiil be along the 45° lme through /L.
1 0 1 1 ] [ X] I = AX . If X is distributed as Nip" the q linear combinations These statements are reproduced mathematically in the results that follow. The second part of result 4. then X must be Np(/L.0. With this in hand.[1.. The key properties. we have 0"11 0"12 0"22 . . ". .4 (The distribution of two linear combinations of the components of a normal random vector) For X distributed as N3 (/L.X 1 [Xl X z . constants. Also.2 is also demonstrated in [1]. 0"2p : : 0"11 O"pp (Jlp 0"2p 0 3. The following are true for a. is distributed as Np(/L + d.0.. • the margmal dlstnbutlOn of any component Xi of X is N(/Ji.(a'd). where d is a vector of Proof.OJ [1:] X. The proofs that are included should help improve your understanding of matrix manipulations and also lead you to an appreciation for the manner in which the results successively build on themselves.2. for every a. Our partial proof of Result 4. . If X is distributed as Np(/L. _ 0"12 a . The expected value E(AX) and the covariance matrix ofAX follow from (245). Result 4. Example 4. + d).. the conclusion concerning AX follows from Result 4. Any linear combination b'(AX) is a linear combination of X of the form a'X with a = A'b. Thus. Result 4. as we suggested in Section 4. 2. . 4. find the distribution of . the properties are almost immediate.2 can be taken as a working definition of the normal distribution. The conditional distributions of the components are (multivariate) normal.2 indicates how the linear combination definition of a normal density relates to the multivariate density in (44). Result 4.ultivariate normal random vector determined by the choice a' = [1. . Zero covariance implies that the corresponding components are independently . Proof.. Linear combinations of the components of X are normally distributed. if a'X is distributed as N(a' /L. All subsets of the components of X have a (multivariate) normal distribution. You can find • a proof in [1 J.1. I). It is known from the umvanate case that addmg a constant a'd to the random variable a'X leaves the unchanged and translates the mean to a' /L + a'd = a'(p.0.. are partly responsible for the popularity of the normal distribution.2. Proving that a'Xis normally distributed if X is multivariate normal is more difficult. and it fol!ows 4. The expected value and variance of a'X follow from (243). (pXl) X + (pXI) d .random vector X having a multivariate normal distribution: 1. Since a • was arbItrary.2 that Xl is distributed as N (/JI. can be stated rather simply. . Many of these results are illustrated with examples.X3 ] = [0 2 a'X [1. which we shall soon discuss in some mathematical detail. X + d is distributed as Np(/L + d. O"ii)' The next result considers several linear combinations of a multivariate normal vectorX. Since are distributed as Nq(Ap" Also. Example 4.0].distributed.3 (The distribution of a linear combination of the components of a normal random vector) Consider the linear combination a'X of a m. The second part of the result can be obtained by considering a'(X + d) = +.156 Chapter 4 The Multivariate Normal Distribution The Multivariate Normal Density and Its Properties and 157 Additional Properties of the Multivariate Normal Distribution Certain properties of the normal distribution will be needed repeatedly in OUr explanations of statistical models and methods.3.0] [ : : '" (JIP1 [11 0_ '" .. then any linear combination of variables a'X = alXl + a2X2 + . where is distributed as N(a'p"a'Ia). These properties make it possible to manipulate normal distributions easily and. + apXp is distributed as N(a' /L. 0"11)' More generally.
5 (The distribution of a subset of a normal random vector) If X is distributed as N5(IL. X 2 4 ILl = [IL2J. X (3Xl) :t11 i ! (2X3) :t12 (2X2) :t = f:t21 i :t22 (3X2) Thus.3.4.4. Nq1 + q2 ([ILl] i :t12]) . for l J " i (3X3) Result 4. /L. To apply Result 4.4.  . :t). X. respectively. we simply relabel the subset of interest as Xl and select the corresponding component means and covariances as ILl and :t ll . X 2) = 0. We set and covariance matrix XI = [X J. Set (qxp) A = [I (qXq) ii (qX(pq)) 0 ] in Result 4.158 Chapter 4 The Multivariate Normal Distribution By Result 4. and its covariance matrix :t as we have the distribution and :t (pXp) = [ __ ((pq)XI) 2 N (ILt>:t 11 ) = N2([::J [::: :::J) = l 1·:t21 i I22 ((pq)Xq) i ((pq)X(pq)) (qxq) :t11 i i (qX(pq)) I12 1 It is clear from this example that the normal distribution for any subset can be expressed by simply selecting the appropriate means and covariances from the original /L and :to The formal process of relabeling and partitioning is unnecessary_ _ We are now in a position to state that zero correlation between normal random variables or sets of normal random variables is equivalent to statistical independence. find the distribution of [ 159 0J [::] 1 IL3 = [ILl . then XI and X 2 are independent ". All subsets of X are normally distributed. and the conclusion follows. the distribution ofAX is multivariate normal with mean The Mu/tivariate Normal Density and Its Properties Example 4.X 2 and Yi = X 2 . We state this property formally as Result 4.X 3 · • We have mentioned that all subsets of a multivariate normal random vector X are themselves normally distributed.If . If we respectively partition X. the mean vector AIL and covariance matrix A:tA' may be verified by direct calculation of the means and covariances of the two random variables YI = XI .5. ( b) If [ XI] IS .IL2J IL2IL3 J.3. [:t11 . from Result 4. then Cov (XI. a ql X q2 matrix of (Q2 XI ) Proof. IL4 _ :t11 = [0"22 0"24J 0"44 0"24 and note that with this assignment. Result 4.4 to an arbitrary subset of the components of X. zeros. its mean vector /L.jX2 IL2 :t21: :t22 and only if:t12 = o. (8) If XI (ql XI) and X2 are independent. and :t can respectively be rearranged and as 0"22 0"24 0"24 0"44 i 0"12 i 0"14 :t = 0"12 0"14! 0"11 [ 0"23 0"25 0"45 f0"13 0"33 0"35 0"15 0"35 0"55 0"34! 0"13 0"23 0"34 0"25] 0"45 i 0"15 or Alternatively.
Then the conditional distribution of Xl> given where f(X2) is the marginal distribution of X 2.) • (pXp) A = 0 _ I i (pq)Xq i (pq)x(pq) Example 4.I12Iz1 (X2 . the quantity Xl .5.ILl . f .I12Izi (X2 . Proof.IL2) and X 2 .P2) . is nonnal and has and I In! > O. I q2 22 ). Therefore.PI .I12Izi I2d· • Example 4. they are independent.160 Chapter 4 The Multivariate Normal Distribution (c) If Xl and X 2 are independent and are distributed as Nq1(PI. (See Exercise 4.o Xl and X . (The equivalence of zero covariance and independence for normal variables) Let X be N3 (p. Ill) and . Because XI .X2) and X3? Since Xl and X 2 have covariance Ul2 = 1.P2 have zero covariance.ILl .ILl . enslty 0 Xl gIven that X 2 = {cond ItIona X2} = be distributed as Np(p. Let X I = = Mean ·· Id . If f(x!> X2) is the bivariate normal density.PI .P2) when X 2 has the particular value x2' Equivalently.P2) is Nq(O. X) 2 3 • Xl and also of X 2· We pointed out in our discussion of the bivariate distri?ution P12 = 0 (zero correlation) implied independence because Jo(mt [see (46)] could then be written as the product of the ensItJes. then [I!] has the multivariate normal distribution. (See Exercise 4.13.I12Iz1 (X2 . III .I 12I 2iI21 ). N (P2.6. which we encouraged you to verIfy dIrectly. Ull Ut2) U22 .] .P2) is a constant. f(Xl.P2)' Since Xl . and X are independent by Result 4.14 for partial proofs based upon factoring the density function when I12 = 0. III . However.I12I21 (X2 . I) with P = [:.P2). Result 4.I12I21I21)' Given that X 2 = X2.P2) has distribution Nq(O.7 (The conditional density of a bivariate normal distribution) The conditional density of Xl' given that X 2 = X2 for any bivariate distribution.5 with ql = q2 = l. IS SImply a speCial 2 case of Result 4.X2) f(X2) I21 ! I22 iliat X 2 = X2.6.ILl .) Take Proof. Pl + I12Iz1 (X2 . Moreover. Xl is distributed as Nq(ILI + I12Izi (X2 .I 12I 161 respectively. We shall give an indirect proof.PI . show that f(xII X2) is N ( PI U12 + (X2 U22 = PI + I 12I21 (X2 . III .I12Iz1 (X2 .I12Iz1 (X2 . I) with (3xl) so I = 4 1 0] [ o 1 3 0 0 2 is jointly normal with covariance matrix AIA' given by Are XI and X 2 independent? What about (X I . is defined by f( Xl IX2 ) = we see that Xl = and X3 have covariance I12 =[?J. This unphes X3 IS mdependent of ( X I.IL2) is the same as the unconditional distribution of Xl . 2iI 21 Note that the covariance does not depend on the value X2 of the conditioning variable. This fact.P2).IL2 are independent. which uses the densities directly. the conditional distribution of Xl . The Multivariate Normal Density and Its Properties and Covariance = III .P2) and X 2 . given that X 2 = X2. so is the random vector XI . they are not mdependent. partitioning X and I as Since Xl .I12I21 (X2 .
Urz/U22 = ull(1 .l) random variables. 2. the conditional distribution of Xl given that X = x is N(ILl + (U12/Un) (X2 . Let X be distributed as Np(IL.IL2) VUll VU22 = (49) Because Pl2 nent is = ya. the complete expo2 ILd _ 2PI2 (Xl  where the f3's are defined by 1 (Xl  2(1 .piz) 1 Un 2Ull(1 . All conditional distributions are (multivariate) normal. Now.I 12I21I 21 = U:l .2p12 (Xl . Z2. where :Iei p . IL2f) . I11 . 22  _ = 1 _ 1 (_1__ 2( 1 .q+2 f32.q+1 f3 q... = 2Ull(1 . or Pl2vU.IL2) )2 IL2f 2 2 l f3I.q+2 · · · . so I1ei = p (I/A i )ei' Consequently.7.. .) :5 where denotes the upper (l00a)th percentile of the distribution.PI2) and I12I2"! = Ulz/U22.p .1]. The chisquare distribution determines the variability of the sample variance S2 = SJ1 for samples from a univariate normal population. I) with II 1 > O. 1 e(X2fJ. It also plays a basic role in the multivariate case.PI2»' Now. 2 Thus. f3 q.).p.p. U22 . ..!rz/U22 = uu(1 ..PI2 .PI2) 00 < Xl < 00 Result 4. Then (a) (X .q+1 f32... The conditional covariance. We know that is defined as the distribution of the sum Zt + + ."" Zp are independent N(O.PI. and see 1 = V2Ti VUll(1 .. Proof.)'II(x .)'eiei(Xp. which we obtained by an indirect method. ( 2) Xl ILl  UI2 22 (X2 .... by the spectral decomposition [see Equations (216) and (221) with A = I.ILl? (Xl  The Multivariate Normal Density and Its Properties For the multivariate normal situation. + where Zl. III .p. II = ± i=l Ai eiei.:. f3I'p] f32..p PI2) (X2 U22 p.. we can write Z = A(X  p.Pl2 2) ( Xl  ILl  PI2 (X2 vu:. .. it is worth emphasizing the following: 163 1. . 1> does not depend upon the value(s) of the conditioning variable(s). for instance.IL2)' uu(l. = Ulz/ U22..» i=1 = p i=l p i=l L [(I/vT. f3 q.q+2 ..2" 1 (X2 U The constant term 21TVUllU22(1 .162 Chapter 4 The MuJtivariate Normal Distribution Here Ull .IL2) )2 . Next.)':II(X . One has to do with the probability content of the ellipsoids of constant density. apart from the nen t of 2 multiplicative constant 1/2( 1 .) = L(1/Ai)(Xp. .) = L(I/AJ(ej(Xp.a to the solid ellipsoid {x: (x .zf 3.PI2) also factors as We conclude this section by presenting two final properties of multivariate normal random vectors.=1 2 = Aiei..:. Result 4.. .  (Xp.6. The conditional mean is of the form Ull • r. 12.) is distributed as where distribution with p degrees of freedom.PI2)...q+1 : f3I.)] = L Zr.PI2) Ull ILI)(X2 1Lz) + (X2 U22 vo:.) ej(X . denotes the chisquare Dividing the joint density of Xl and X 2 by the marginal density !(X2) = V2iiya..p.2)' The two te?Ds involving Xl : ILl in the expothe bivariate normal density [see Equation (46)] become.Jvu:.. The other discusses the distribution of another form of linear combinations..2)2/2u22 and canceling terms yields the conditional density (b) The Np(p" I) distribution assigns probability 1 .)'II(Xp...p.. with our customary notation. agreeing with Result 4.. ILd(X2 .
I). we note that P[ (X .5(c). two highly correlated random variables will contribute less than two variables that are nearly uncorrelated. i (pXn) Xn] (nXl) c (410) (pxp)(pXp)(pXp) A I A' = This linear combination differs from the linear combinations considered earlier in that it defines a p. X 2.. . + The squared statistical distance is calculated as if. Z2. Vl and V2 = blX 1 + b 2 X 2 + .. .a. Zp are independent standard normal variables. eX ./L) has a Np(O. I). Ix)./L) has a x. Moreover.p J=l ± (± Cj/Lj./L). Therefore.lp) distribution. = ClX l + C2X2 + .7 provides an interpretation of a squared statistical distance... Previously.Il(X . (1) standardizes all of the variables and (2) eliminates the effects of correlation. (npXl) X is distributed as Nnp(/L.Z = IZ(X . By Result 4. I). ..5. When X is distributed as Np(/L. we discussed a single random variable that could be written as a linear combination of other univariate random variables.8. + cnXn J=l CY)I)./L) :5 = 1 . where /L = (npXl) and (npXnp) Ix = + + . the use of the inverse of the covariance matrix../L) = Z1 is multivariate normal. If one component has a much larger variance than another. Xn be mutually independent with Xj distributed as VI is distributed as N p( _l_e ] = I vr. Proof./L )'Il(X . first. by Result 4. (Note that each Xj has the same covariance matrix I./L) is the squared statistical distance from X to the population mean vector /L.3./L)'Il(X . x 1 vector random variable that is a linear combination of vectors. • Remark: (Interpretation of statistical distance) Result 4. where ClX l + C2X2 + .I). Zl./L ). Z = A(X ./L) :5 c ] is the probability assigned to the ellipsoid (X . 2 CY)I (b'c)I . VI and Vz are independent ifb'c 2: j=l n cjbj = O. Let Xl. (b'c)I ] = Consequently. the sum of the squares of the variables.distribution.. Essentially. it will contribute less to the squared distance. were applied.Il(X .164 Chapter 4 The Multivariate Normal Distribution The Multivariate Normal Density and Its Properties 1 1 165 where In terms ofIZ (see (222». I . and (pxp) A = = Z'Z = Z1 + + . and we conclude that (X ./L)'Il(X . + cnXn = [Xl i X 2 i .. consider the linear combination of vector random variables and X . and Part b holds.. AIA'). For Part b.. . Next. + bnXn are jointly multivariate normal with covariance matrix By Result 4. the random vector X were transformed to p independent standard normal random variables and then the usual squared distance.. Np(/Lj. But from Part a..Il(X . Moreover.../L). ' + 0] ° ° . From the proof of Result 4.. .7./L) is distributed as Np(O.. P[(X . In particular.) Then Result 4./L is distributed as Np(O. the np component vector (X ./L):5 c2 by the density Np(/L...
X 2 . gives AX Jf. . For sums of the type in (410).A' has the first block diagonal term Find the mean vector and covariance matrix for each linear combination of vectors and also the covariance between them.A') by Result 4. in addition. CIX I + C2 X 2 + C3X3 Every Component of the first linear combination of random vectors has zero covariance with every component of the second linear combination of random vectors. Now consider two linear combinations of random vectors X where I is the p P identity matrix. .3X 4 and AX is normal N2p (AIL. Example 4. V2 • Consequently. Straightforward block multiplication shows that Al:.8 (Linear combinations of random vectors) Let XI. a linear combination a'X I of the components of a random vector is a single random variable consisting of a sum of terms that are each a constant times a variable. By Result 4. the first linear combination has mean vector The offdiagonal term is and covariance matrix Cjbj ) l: [CIl:. This is very different from a linear combination of random vectors.. c2l:. when b' c = 0. • For the second linear combination of random vectors.. cjbj = n (cl + " + . If. and the two linear combinations of vectors are independent.8 with bl = bz = b3 = 1 and b4 = 3 to get mean vector .::] [.. bnIJ' = This term is the cQvariance matrix for VI.2ala2 + 2ala3 That is. . This is a random variable with mean +: + c4X 4 and covariance matrix (by + ++ = 12 X l: = 36 12 [ 12 12 12 o 0 24 12] Finally. Al:. X 3 . .:J (± J=l and Xl + X 2 + X3 . and X 4 be independent and identically distributed 3 X 1 random vectors with [n and variance 'Od We first consider a linear combination a'XI of the three components of Xl. the property of zero correlation is equivalent to requiring the coefficient vectors band c to be perpendicular. _ . cnIJ [b l I.8 with Cl = C2 = C3 = C4 = 1/2. Here each term in the sum is a constant times a random vector. we apply Result 4. each X has a trivariate normal distribution.3.VI and V2 are independent by Result 4..+ cl)X 1 X X [ 1 1 1] 1 0 o 2 Cjbj)l: = (pxp) 0 .5(b).. say. the covariance matrix for the two linear combinations of random vectors is a'l: a = 3af + + 2aj . so that (± j=l j=l 2:.166 Chapter 4 The Multivariate Normal Distribution The Muitivariate Normal Density and Its Properties 167 The choice which is itself a random vector. b2I. then the two linear combinations have a joint sixvariate normal distribution.
Similarly. We can add and subtract i = {l/n) term n (Xj  p. (Xj .3 Sampling from a Multivariate Normal Distribution and Maximum likelihood Estimation We discussed sampling and selecting random samples briefly in Chapter 3. Then tr(x'(Ax» = tr«Ax)x'). At this point.)')]/2} (415) . and the maximizing parameter values are called maximum likelihood estimates. (x.p.) Consequently.Xn as fixed and consider the joint density of Equation (411) evaluated at these values. To do so.X n Let x' be the matrix B with rn = 1.)') p. Xn are mutually independent and each has distribution Np(p.: (Xj  x + x . The result is the likelihood function. + A • k Now the exponent in the joint density in (411) can be simplified..jcji as exp { tr[l:l(jt (Xj .)' n Result 4. ••• . . We shaH need some additionai properties for the trace of a square matrix.)'I1(xj .x)' + j=l 2. we shallbe concerned with samples from multivariate normal populationin particular.i) = . k k are both matrices of zeros. we can write the joint density of a random sample from a multivariate normal population as joint density Of} = (27T { Xl>X .and the result follows. l:). we note thatx'Ax is a scalar..b.28 and Result 2A. the joint density function of all the observations is the product of the marginal normal densities: Joint density } = { ofX 1. Since Xl. using Equations (413) and (414). and covariance matrix l:. Many good statistical procedures employ values for the popUlation parameters that "best" explain the observed data.15. so 1=1.p. and l: for a muItivariate normal population.i)(i .)' J=1 n j=1 n (Xj  x)(Xj .168 Chapter 4 The Multivariate Normal Distribution Sampling from a Muitivariate Normal Distribution and Maximum Likelihood Estimation 169 4.(xj .. We pointed out in Result 2A. J=1 fI {(27T)P III 1(2 j=1 = tr[l:\xj .: (x  n p. m X k and k X rn.)'I.: (Xj  n ± Xj in each p.p.. A • 2 k • Therefore.bij . This follows because BC has j=1 rnp(2/l: In/2 + n(x .)'] (412) 2. now considered as a function of p.)(Xj .)' (414) p.) j=1 2.p. Then (a) x'Ax = tr(x'Ax) = tr(Axx') i=1 because the crossproduct terms.p.. Xn represent a random sample from a multivariate normal population with mean vector p.jCji) = tr(BC).)'l:I(Xj .12(b). where pp' = I and A is a diagonal matrix with entries AI.») Next.: (Xj  x)(Xj .p. One meaning of best is to select the parameter values that maximize the joint density evaluated at the observations.i)' P. The Multivariate Normal likelihood Let us assume that the p X 1 vectors Xl. respectively. Let A be a k x k symmetric matrix and x be a k X 1 vector.p.)]_ j=1 (413) since the trace of a sum of matrices is equal to the sum of the traces of the matrices. tr(A) = tr(P'AP) = tr(APP') = tr(A) = Al + A2 + .p. in Equation (411).p. In this section.X 2"".») p.p. .)(i  2. X 2.. For Part a.sox'Ax = tr(x'Ax).: Ai. X 2 .p.)(i . ) .p.p.)(Xj .) = _ = __ 1_ _ (27T )np(21 I In(2 ) (411) j=1 2. according to Result 2A.p.9(a). and the properties of the trace are discussed m DefmlUon 2A.=1 Cj.)' to give j=1 2.)' and 2.p. we take the observations Xl'X2''''. By Result 4. )(Xj . with the sampling distribution of X and S. )(Xj  2.: (i j=1 n i)'.: b.) = tr[(xj .• . This technique is called maximum likelihood estimation.: tr[l:l(xj  = (Xj  P.)'l:\Xj . they may be substituted for the x .p.)(Xj = = X + X .of a is of its diagonal elements. (See Exercise 4.12. so tr (BC) = element of CB is Cj.p.jcj. the jth diagonal i: tr(CB) = j=1 ±(± .X 2 n X Proof.). X2.: (Xj  n 1 p. its ith diagonal element.i)' + n(i . J=1 (b) tr (A) = 2. .12 that tr(BC) = tr(CB) for any two matrices Band C of dimensions.i)(xj . and l: Jfor the fixed set of observations Xl. In order to simplify matters we rewrite the likelihood function in another form. .: tr[(xj n n p. The resulting expression. where the Ai are the eigenvalues of A.=1 ±(± j=1 b.·. Part b is proved by using the spectral decomposition of (220) to write A = P' AP.9. ·.)(Xj  = When the numerical values of the observations become available.·. A . we shall consider maximum likelihood estimation of the parameters p.)(i . .. (The trace . and let Ax play the role of the matrix C. is called the likelihood. m (_ k b.) in j=l 2. X n.
.iL) (x . Result 4. (n .X) = S n j=1 n A are the maximum likelihood estimators of p.p. and covariance l:. of (2b )beb. The maximum likelihood estimates of p. the eigenvaiues 17. Xn be a random sample from a normal population with mean p. we have or L( iL. The next result will eventually allow us to obtain the maximum likelihood estimators of p. From the properties of determinants .=1 I B1/2l:IB 1/21 = p • IT 17. and ithat maximize the function l:) in (416).tr ( r B)/2 Il: Ib (pxp) I :5 _1_ (2b ybebp IB Ib Straightforward substitution for tr[l:IB 1and 1/1l: Ib yields the bound asserted.=1 171 P IT p (Xj = tr x)(Xj .9(b) then gives tr(l:IB) = tr(B1/2l:1B1/2) = 2:17. Then tr(l:IB) = tr [(l:1 Bl/2)Bl/2] = tr [Bl/2(l:IBl/2)]. Given a p X P symmetric positive definite matrix B and a scalar b > 0. and Bl/2Bl /2 = B1.=1 I B Ib p )b .1 etr[r{t (Xjx)(xjx)'+n(xIL)(XILY)]/2 (27r tp/21l: In/2 J (416) Combining the results for the trace and the determinant yields _1_ Il: Ib e . We shall denote this function by L(iL. Let X I. and l:. tr[ (Xj  i)(xj . and l:.=1 ID p 1 _ _. . In particular. Xn through the summary statistics i and S. for all positive definitel: . Result 4. we can write IB 1/2l:1BI/21 = IB I/2IIl:1 11 BI/21 = 1l:1 11 Bl/211 Bl/21 = 1l:I IIBI 1 = IBI Il: I Proof. are called the maximum likelihood estij=1 mates of p.7j. = 2b.. Thus.10.X)') ] + n(x . The choice 17. therefore gives _1_ etr (IIB)/2 Il: Ib :5 (Xj + n tr[l:l(x _1_ (2b)Pb e bp IBlb = tr [ l:I( (Xj .1 I.X)')] + n(x . to stress the fact that it is a function of the (unknown) population parameters iL and l:.17. This matrix is positive definite because y'Bl/2l:1BI/2y = (B1/ 2y)'l:I(B l /2y) > 0 if BI/2y 0 or.p.)')] .iL )'] (417) But the function 17berJ/2 has a maximum.X)(Xj . occurrjng at 17 = 2b.) The upper bound is uniquely attained when l: = (1/2b )B."".. B1/2l:1B 1/2 = Bl/2(2b )B1B 1/2 = (2b) I (pXp) Maximum Likelihood Estimation of JL and l: and Moreover. Result 4. of Bl/2l:. Bl/2Bl/2 = I.. x and (l/n) 2: (Xj .. equivalently. . Thus./2 = _1_ l?e7j/2 . L(p. Xit into the joint density yields the likelihood function. The exponent in the likelihood function [see Equation (416)]. Result 2A. when the vectors Xj contain the specific numbers actually observed. with respect to 17.I B 1/ 2 are positive so Bl/2Bl/2 Proof. X2. and l:. .x)(Xj .'i. = B. Their observed n values. . X 2. respectively. X2. with equality holding only for l: = (1/2b )B.iL)(X x)(Xj . for this choice.170 Chapter 4 The Multivariate Normal Distribution Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation 171 Substituting the observed values Xl. by Exercise 2.11.x)'. y O. Then and "* "* by Exercise 2.p. l:). .x)' + n(x . apart from the multiplicative factor is [see (417)] !.. we shall make use of the identity = ( IT 17. .x) (Xj . e.(Xj .=1 I B Ib . Let 17 be an eigenvalue of B l/2l:1Bl/2.iL )'l:I(X . and i will depend on the observed values XI. for each i.X)') ] p. it follows that 1 IB 1/2l:1B 1/2 I = 1(2b)II = (2by = IBI IBI IBI _ _ 1_ e.. . Let Bl/2 be the symmetric square root of B [see Equation (222)]. and l: are those valuesdenoted by ji.1) l: = . since.)'l:l(X . .tr [IIBj/2 It will be convenient in later sections of this book to express the exponent in the likelihood function (416) in different ways. l:) = .) .12. The estimates ji.
. the joint density depends on the whole set of observations XI. + = (17 Zl)2 + . 1. It is this latter form that is suitably generalized to the basic sampling distribution for the sample covariance matrix.X)2 is distributed as . . (See Section 4. constant X (generalized variance )n/2 (419) 4. This generally is not true for nonnormal populations.···... Thus.. the maximum j=l .6. techniques that depend solely on x and S may be ignoring other useful sample information.). The maximum likelihood estimator of /L':tl/L isjl'iljl. In the univariate case (p = 1). which is a function of 8.) For example. .10 with b = nl2 and B = L(Xj :. i) =.l)S (or S) are sufficient statistics: products matrix Let Xl. or.l)lnYI S I.) If the data cannot be regarded as multivariate normal. Maximum likelihood estimators possess an invariance property. is a natural measure of variability when the parent population is multivariate normal.. Xn be a random sample from a multivariate normal population with mean JL and covariance:t. . Then the maximum likelihood estimate of h(8) (a function of 8) is given by h(O) (same function of 9) (420) 172 = 1 n population variance sample size (See [1] and [15]. we know that X is normal with mean /L = (population mean) and variance The generalized variance determines the "peakedness" of the likelihood function and. (n .l)S.1)s2 is distributed as 172( Z1 + . . regardless of the sample size n... and consider estimating the parameter h(8). L(jl. They are optained by replacing the observations Xl. Xn in the expressions for jl and :t with the corresponding random vectors. respectively. Since many multivariate techniques begin with sample means and covariances.1 )s2 = is the maximum likelihood estimator of l7ii = Var (Xi)' times a chisquare variable having n . .. In turn./L} > 0 unless /L = X.2 l7ii = n .l is positive definite..x)'. X n • • We note that the maximum likelihood estimator X is a random vector and the maximum likelihood estimator i is a random matrix.x)'. Xl> X 2. recall that (n . It remains to maximize Sufficient Statistics From expression (415). :L (Xj  n X and S are sufficient statistics (421) i) = _ 1 enp/ 2 _ 1 (27T )n p /2 1i 1n/2 (418) The importance of sufficient statistics for normal populations is that all of the information about /L and :t in the data matrix X is contained in x and S. x2. Let 8 be the maximum likelihood estimator of 8. it is prudent to check on the adequacy of the multivariate normal assumption. this chisquare is the distribution of a sum of squares of independent standard normal random variables. j=l The maximum likelihood estimators are random quantities. 2. For the sample variance.. consequently. .172 Chapter 4 The Multivariate Normal Distribution The Sampling Distribution of X and S 173 By Result 4. as stated. :t. the likelihood is maximized with respect to /L at jl = X.occurs at i = (l/n) x)(Xj . + (I7Zn lf The individual terms 17Zi are independently distributed as N(O..1. ± 'I (Xj .. so the distance (x . The maximum likelihood estimator of is where 1 . since 1 i 1= [en . ./L )':tl(x . X 2. where jl = X and i = l)ln)S are the maximum likelihood estimators of /L and :t. the maximum of the likelihood is L( /L.4 The Sampling Distribution of X and S The tentative assumption that Xl> X 2. The maximum likelihood estimates are their particular values for the given data set. In addition. That is. X2.x)(Xj .x)' = (n .1 freedom (dJ. Here we present the results on the sampling distributions of X and S by drawing a parallel with the familiar univariate conclusions. Then :L (Xj  n n over :to By Result 4.. We express this fact by saying j=l that x and (n . Xn constitute a random sample from a normal population with mean /L and covariance :t completely determines the sampling distributions of X and S.£J (Xij .Xi) j=l «n  The result for the multivariate case (p 2) is analogous in that X has a normal distribution with mean /L and covariance matrix (lln ):t. xn only through the sample mean x and the sumofsquaresandcrossx)(Xj .
The . we record. mul tiva riate norm al rand om vec or .. 'f (422) · hart distributIOn with m d . See [9). have larg esa mpl e prop ertie s anal ogou s to thei r univ aria te coun terp arts. dentl whe re the Z j are each mde P n d' y distributed as Np( 0. .12 (Law of larg e num bers ).1 d. p. . 'b' f the samp1 The sam plin g dlstn e cO utiOn 0 . . . d f' d s .t en 2. mat rix A is Pro erties of the Wishart Distribution .' t independently of A 2. P[ e < Y . d W (A + A2 \ '1)..+1>12 I 1 (424 ) degr ees of free dom add.. Tha t is. and (427) L (Xji n n X. This is true for virtually any pare nt distr ibut ion of the V.)( Xjk .' mak ing infe renc es abou t p.{l/ n ). k (42 6) = 1. The univ aria te cent ral limi t theo rem also tells us that the sam plin g distr ibut ion of the sam ple mea n. X n be adrandom rianc e matrJ X t. It turn s out that cert ain muI tivar iate statistics. . IS distr ibut ion of S d oes no t depend e shall see in Chapter 5. X is distr ibut ed as Np (p.) = L (Xji . . If X is the sum then the cent ral limi t theo rem appl ies. afte r ItS d'ISCovere .1'. Proo fs can be foun d from its defi nitio n as a sum of the III P in [1].. (See [11 and [11]. and we conc lude that X has a distr ibut ion that is near ly non nal.... cert ain regu larit ies gove rn the sam plin g vari atio n in X and S.stic or on p. repr esen ting the caus es have appr oximate ly the sam e variability.· . each sam ple covariance Sik conv erges in probability to (Fib i.X. . . Let X I. for any pres crib ed accu racy e > 0.. which IS as If Al is distr Ibut ed as W".l)sik = = j=1 n X conv erge s in prob abil ity to po Also. The only requ irem ents are that the pare nt popu latio n.poi) (Xjk .S largeSample Behavior of X and S Sup pose the quan tity X is dete rmin ed by a larg e num ber of inde pend ent causes VI. Vn .. . 'b' . We sum mar ize the samp ng IS tribution results as follows: U le of size n from a pva riate norm X al samp .. .· . . posi tive de mte than the num ber of van abies p. 'd' dependent informatiOn abou t an d th e S provI es III infe renc es abo ut iJ. and a finite cova rian ce :to == distribution of j=1 '2: ZjZ j X= ltJ. 2. . 12. whe re the rand om vari able s V. (n .) = /L. The following propertieS ?fde endent products.. som the Wishart distribution are derI ved direc tly theo ry. • X.. have a mea n p. Let YI .P. p . ZjZ j. It f . As the sam ple size is increa sed with out boun d.(AI I .) (Xjk poi  A posi tive definite (425) whe re r Xk ) L (Xji j=1 j=1 + /Li .. The n }j z +" ·+ 1'. A simi lar resu lt hold s for man y othe r imp orta nt univ aria te statistics .'s. However. X for a larg e sam ple size is near ly non nal.JLk + /Lk  Xk ) () is the gamma function. be inde pend ent obse rvation s from a popU latio n with mea n E(Y.t does no e unless the sam ple size n is grea ter com plic ated form . The refo re. . If A IS IStn u e a m arlicular need for the density Alth oug h we do not have be of som e inte rest to see ItS rath er unct ion of the Wis hart distributIOn. y When it does exist. densl . \ ) h CAC ' is distr ibute d as Wm (CA C' \ C'lC ') . d' 'b t d sW (A t . As a dire ct cons eque nce of the law of larg e num bers . which says that each conv erge s in prob abil ity to JLi. 2. V2 .. Res ult 4. +V2 +" ·+v " ../Li) (Xk . dire ctly to mak e the dlstn utlOn of X cannot be used Bec ause '1 IS unk now n..JLk) .2. Tha t IS.2. . '1). the 1. its value at the fi . i = 1. irres pect ive of the form of the pare nt popu latio n. \ '1) == WIS . then A is distribute as 2 Proof. as w . S (or i = Sn) conv erge s in prob abil ity to:t Stat eme nt (427) follows from writ ing (n .. . r It IS de me a the sum of inde pend ent pro ucts of distribution. .'l). wha teve r the form of the unde rlyin g popu latio n distr ibut ion.). Y = +Y n conv erge s in prob abil ity to /L as n incr ease s with out boun d. p. t s Specifically. \ A + W"'2(A 2 '1). . like X and S. . The n distr ibut ion with mea n po an cova 1. f Tb' allow s us to cons truc t a stati. wha teve r its form . . provided that n is larg e enou gh.l)S is distributed as a WIsh art (423) 3. ' e further results from For the pres ent.. random matrix with n .f. m In Larg eSa mple Beha vior of X and S' 175 4. ". the conc lusio ns present ed in this sect ion do not requ ire mul tivar iate norm al popu latio ns. X and S are independent.174 Chap ter 4 The Mult ivari ate Normal Distribution variance matr ix is calle d the Wish an . W (./L < e) appr oach es unit y as n + 00.k) + n(X.
. but nonnormal in higher dimensions. As we have seen. The statemellt concerning X is made even more precIse by a multtvanate version of the central limit theorem.( 1'. most of the statistical techniques discussed in subsequent chapters assume that each vector observation Xi comes from a multivariate normal distribution.) and probability .J. S is close to I with high probability. Proof. But to some degree.. by applying the law of large numbers.176 Chapter 4 The Multivariate Normal Distribution Assessing the Assumption of Normality 177 Letting Yj = (Xii .1') when X is approximately normally distributed. Then Vii (X . • . Consequently. Let X I. A univariate normal distribution assigns probability .and finite covariance I.). then. J. .ver the is large.6 Assessing the Assumption of Normality As we have pointed out. it has proved difficult to construct a "good" overall test of joint normality in more than two dimensions because of the large number of things that can go wrong. X will be close to I'.Lk). [See Exercise 4.1') has an approximate NP(O. . X 2 . Thus.I'. I) and n(X .)'SI(X . when Vii (X  1') has an Np(O.8.nonnormal observations into observations that are approximately normal.Li .683 to the interval (J. See [1]. the limit is exact. we address these questions: Vii eX . (It is possible.7 can be used to show that n(X . Consequently. or distances involving X of the form n(X . If the histogram for a variable Xi appears reasonably symmetric. with high probability. J. we know that all linear combinations of normal variables are normal and the contours of the multivariate normal density are ellipsoids.4. Are there any "wild" observations that should be checked for accuracy? It will become clear that our investigations of normality will concentrate on the behavior of the observations in one or two dimensions (for example.Li)(Xik . equivalently.". we would expect the central limit theorem approximation to be quite good for moderate n when the parent population is nearly normal. marginal distributions and scatter plots). The distribution is . the assumption of normality for the individual observations is less crucial. Here n should also be large relative to p. with E(Yj) = (Fib we see that the first term in Sik converges to (Fik and the second term converges to zero. to construct a nonnormal bivariate distribution with normal marginals.1') is approximately Np (0. Therefore. in situations where the sample size is large and the techniques depend solely on the behavior of X..I')'SI(X . . • The approximation provided by the central limit theorem applies to discrete. We want to answer this question: Do the observations Xi appear to violate the assumption that they came from a normal population? Based on the properties of normal distributions.Li . I) or. I) distribution.]) Yet many types of nonnormality are often reflected in the marginal distributions and scatter plots" Moreover. negligible effect on subsequent Result 4. Xn be independent observations from any population with mean I'.. We summarize the major conclusions of this section as follows: Let XI. with a large sample size n... To some extent.1')' II (X .1') r l (X . from the results in Section 4.Li + 2yu. S will be close to I whene.p large. Nj. Xn be independent observations from a population with mean JL and finite (nonsingular) covariance I. It is imperative. On the other hand.2YU. multivariate populations. are not frequently encountered in practice. and the approach to normality is often fairly rapid.1').J. replacing I by S in the approximating normal distribution for X will have a 2 . Result 4. . The practical interpretation of statements (426) and (427) is that. Fortunately.YU. we expect the observed proportion Pi 1 of the observations lying in the 4 for n . we consider ways of verifying the assumption of normality and methods for transforming. that procedures exist for detecting cases where the data exhibit moderate to extreme departures from what is expected under muItivariate normality.954 to the interval (J.approximately the sampling distribution of n(X . we can check further by counting the number of observations in certain intervals. Then 4.".1') is approximately (428) Evaluating the Normality of the Univariate Marginal Distributions Dot diagrams for smaller n and histograms for n > 25 or so help reveal situations where one tail of a univariate distribution is much longer than the other. the quality of inferences made by these methods depends on how closely the true parent population resembles the multivariate normal form.1') has a Xp dlstnbutlOn when X is distributed as 1.". Moreover. when n is large. as well as continuous. Replacing II by SI does not seriously affect this approximation for n large and much greater than p.Li + YU. Do the scatter plots of pairs of observations on different characteristics give the elliptical appearance expected from normal populations? 3. pathological data sets that are normal in lower dimensional representations. As might be expected. for most practical work.I) distribution for large sample sizes. X 2 . Do the marginal distributions of the elements of X appear to be normal? What about a few linear combinations of the components Xi? 2.13 (The central limit theorem). onedimensional and twodimensional investigations are ordinarily sufficient. for example.". . we know that X is exactly normally distributed when the underlying population is normal. In the next three sections. Mathematically. we must pay a price for concentrating on univariate and bivariate examinations of normality: We can never be sure that we have not missed some feature that is revealed only in higher dimensions.
674 . The pairs of points (%).71 2. q(2)"'" q(n). and examme the straightness" of the outcome. [See (430).05 . and we would not reject the notion that these data are normally distributedparticularly with a sample size as small as n = 10.16 . When the x(j) are distinct. Here PU) is the probability of getting a value less than or equal to q( ') in a single drawing from a standard normal population. (q(n).036 . x(2).". exactly j are less than or to xU). the pattern of the deviations can provide clues about the nature of the nonnormality.Din . x(z) is the second smallest observation and x(n) is the largest observation.) The proportion j I n of the sample at or to the left of xU) is often approximated by (j . Using the normal approximation to the sampling distribution of Pi (see [9]).35 .  Vs.125 .317) 1.j2 dz = Pw = _ _ 2 VL1T n 1 2 (430) Figure 4.125 .25 . The idea is to look at the pairs of quantiles (qU). Xi + Example 4.645 or I Pi2  .683)(. Special plots caIled QQ plots can be used to assess the assumption of normality.55 . Similarly. xU» with the same associated cumulative probability (j . Xi + should be about .41 . Calculate the standard normal quantiles q(l). x{j) 1 ·335 DO 1 v17iez2/2dz = .178 Chapter 4 The Multivariate Normal Distribution Assessing the Assumption of Normality I 79 A2 of the observations in (x. Many statistical programs available commercially are capable of producing such plots. For example. Some authors (see [5) and [10)) probability values (1 1)ln. When the points lie very nearly along a straight line.) To simplify notation.ing data. we observe that either (.65.9. parent distributions with thicker tails than the normal are suggested.. x(j» lie very nearly along a straight lme.' For a standard normal distribution. . which we usually assume. .. X(I»' (q(2). . . .628 Vii (429) would indicate departures from an assumed normal distribution for the ith characteristic.e.) to be about . The steps leading to a QQ plot are as follows: 1. x(j) will be approximately linearly related. Order the original observations to get x(1). r.n/( n + 2 A better procedure is to plot (mU)' x(j))' where m(j) = E(z(j)) is the expected value of the jthorder statistic in a sample of size n from a standard normal distribution. XII represent n observations on any single characteristic Xi' Let x(1) x(z) . Xz..954 )(. X(2».15 .75 .30 Here.80 1.26 1. When the observed proportions are too small.Din. (See Section 4. Moreover.45 . the observed proportion 2Vs. 2 lThe! in the numerator of (j  • The calculations required fo'r QQ plots are easily programmed for electronic computers. of observations (q(l). Once the reasons for the nonnormality are identified. in effect..85 .. These plots can be made for the marginal distributions of the sample observations on each variable.] • = l qU ) 00 j .".62 .54 1. (n 1)ln. is theoretically always true when the observahons are of the contmuous type.P[Z . .385 . . If the data arise from a normal the pairs (%).whi.036 1. 2.683 I > 3 n Vii interval (Xi  v's. corrective action is often possible.e forego. Plots are always useful devices in any data analysis. The QQ plot for th.S A QQ plot for the data in Example 4.954 I > 3 (.65 . .!)In by (j .95 Standard normal quantiles q(j) 1.9 (Constructing a QQ plot) A sample of n = 10 observations gives the values in the following table: Ordered observations xU) Probability levels (j ..forexample. (See [13) for further discussion.:.00 ..683.385 . (See Table 1 in the appendix). the quantiles %) are defined by the relation P[ Z q(j)] 1. x(n».645 1.046) n . x(n) represent these observations after they are ordered according to magnitude.396 I Pi! .674 1. . They are.•• . The x(j)'s are the sample quantiles. Normality is suspect if the points deviate from a straight line.! z2 .) Din is a "continuity" correction. x(n) and their corresponding have suggested replacing (j .. and 3.954. . the normality assumption remains tenable.385] = Let us now construct the QQ plot and comment on its appearance..10 . plots of the sample quantile versus the quantile one would expect to observe if the observations actually were normally distributed. .8. (2 1)ln. since U%) + IL is nearly the expected sample quantile.ch is a plot of the ordered data xu) against the normal quanbles qV)' IS m Figure 4.5.. let Xl..!)In for analytical convenience.
9538 .9822 .9198 .9809 . 0 sIgn lcance a If rQ falls below the Table Critical Points for the QQ Plot CorrelatIOn Coefficient Test for Normali ty Sample size n 5 10 15 .8801 .01 .20 .09 .9942 . 31 32 33 34 35 36 37 38 39 40 41 42 Radiatio n . by calculatm g the correlati on co. When this occurs.30 . .9269 . The data are listed in Table 4.2.9935 .9960 Source: Data courtesy of 1.1 Radiatio n Data (Door Closed) Oven no.18 Oven no.3 3 _ _ •• • 5 .08 .01 .9695 .10 (A Q_Q plot for radiation data) The qualitycontrol departm ent of a manufa cturer of microwave ovens is required by the federal governm eI:1t to monitor the amount of radiatio n emitted when the doors of the ovens are closed.9682 .9508 .10.09 .02 .15 .30 .9873 .9632 .07 .9836 .9720 .18 .05 .10 .9389 .) Formally.9787 .9599 .9591 .9792 .05 2.40 .9715 .9503 .10 .20 . [5].9771 .10 . There can be quite a bit of variabili ty in the straightn ess of the Q_Q plot for small samples.9913 .11 .9895 . (The integers in the plot indicate the number of points occupying the same location. pictured in Figure 4.15 . 16 17 18 19 20 21 Radiation .08 . (x(j) . Observa tions of the radiatio n emitted through closed doors of n = 42 random ly selected ovens were made. 10 2 2 3 2 9 .9801 .9479 .9905 .9671 .0 .40 . even when the observat ions are known to come from a normal populati on.20 .10 Oven no. e corre atIOn coefficIe nt for the QQ plot is defined by 2: (x(jl rQ = J=I JI 11 .9953 .10 . .10 .9771 .03 .20 25 30 35 40 45 50 55 60 75 100 150 200 300 Significance levels a . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Radiatio n .01 .9665 .6 A QQ plot of The straightness of the QQ plot can be .q) 22 23 24 25 26 27 28 29 30 and a powerfu l test of normali ty can be ba d we reject the hypothe sis of normali ty at appropr iate value in Table 4. Cryer. . n . D.30 .12 . The points indicated by the circled location s in the figure are outliers values that are too large relative to the rest of the observa tions.20 3 .00 _ _ _ _L _ _ Table 4. This quantile is calculat ed using the average of the quantiles the tied observa tions would have if they all differed slightly.0 3.9410 .9726 .::: 20.05 .0 q(j) 2.9838 .10 .9652 . a probabi lity distribution for the radiation emitted was needed. Example 4.9740 .mode rate to largef or instance .9931 .9749 .0 1.8788 . In order to determin e the probability of exceeding a prespeci fied toleranc e level.08 .40 .0 1. efficient ofthe points in the plot Th I ' measured.9879 .9866 .9032 .0 the radiation data (door closed) from Exampl e 4.10 .10 .) Figure 4..02 .1.30 .02 .6 on page 181. Can we regard the observa tions here as being normally distributed? A comput er was used to assemble the pairs (q(j)' x(j» and construc t the QQ plot. It appears from the plot that the data as a whole are not normally distributed.x)(q(j) .x/ I± (%) _ q)2 V j=1 (431) .180 Chapter 4 The Multivariate Normal Distribution Assessing the Assumption of Normality 181 Q_Q plots are not particularly informative unless the sample size is.9351 .9126 .9928 .05 .9604 .9822 .8299 .05 .15 .and [12]. [lO].05 . several observations are equal..30 .10 .9768 . For the radiatio n data. those observa tions with like values are associated with the same normal quantile .
83 has probability . the observation is outside this is on or inside • first pa1r of observations in Exercise lA is [Xl> X2]' = (108. so either can be used to judge lack of fit.0 t em to be Wlthm th1S contour. we have x= . n7 e1 omts have generalized distances from x of .x):s where have JL by its estimate norma 1ty assumptlOn 1S suspect. each bivariate distribution would be normal..o.1 by its estimate Sl. y me percentage. for practical work it is usually sufficient to investigate the univariate and bivariate distributions.[x sa1symg x] Xl .28 .14.002930J [Xl .000253 [ 17. rz(.. In Chapter 1. The scatter plot should conform to this structure by exhibiting an overall pattern that is nearly elliptical. . These data give Since always. Linear combinations of more than one characteristic can be investigated. Since 'Q > . and 1.62J 7476.39. Using the information from Example 4..000253 303.···. the two statistics are nearly the same (see [13]).30.002930 = 1. (See Chapter 8 and [6] for further details..39 and this point falls outside the 50% t Th .60J . Here xj = [xi!' Xj2. x = [155.155. Its correlation form corresponds to replacing %) by a function of the expected value of standard normalorder statistics and their covariances. Thus.70 .61 Evaluating Bivariate Normality We would like to check on the assumption of normality for all distributions of 2..000253 [ X2 . .182 Chapter 4 The Multivariate Normal Distribution Example 4..584.05 . and 2: qIj) = j=l 10 8. and the contours of constant density would be ellipses.60J' [ . P . We considered univariate marginal distributions earlier. For large sample sizes.3. 50 Yo.072148 X2 _ 14.70 > 1.5) and test for normality. •• ·.60J' [ .s in (Xl. Many statisticians suggest plotting ejXj where Se1 = A 1e 1 in which A1 is the largest eigenvalue of S.X)'Sl(X .70 ' S = [7476.072148 1.10. q = 0.70 . relation coefficient rQ from the QQ plot of Example 4. of data falls would expect about half f th e normally distributed..x)%) = 8. j=l 10 2: (x(j) j=l 10 x)2 = 8. Moreover.14. Thus.62 303.17. a proportion. of {all Xsuch that (x .05 . This difference in for rejecting the notion of bivariate also 4.13. .19 103. procedure.5) = 1. (See • ing Y compar .002930J [108.62 .60J .002930J ..5..2 corresponding to n = 10 and a = . . by Result 4...45 = [ . OtherW1se .9 (see Figure 4.7.62 4 1l1 f th d.28 . If the observations were generated from a multivariate normal distribution. ' .155.12 303. = sales.16 respectively Since fo less 1.002930 .002930 .45 303. . This entry is . as we have pointed out.70 :s 1.11 (A correlation coefficient test for normality) Let us calculate the cor Assessing the Assumption of Normality . we described scatter plots for pairs of characteristics.472. x2 = profits) for the 10 largest r 1S e m xerC1se lA. we should expect rou hi the sa 0 sample observations to lie in the ellipse given b.. 040.623. p dimensions. The linear combination corresponding to the smallest eigenvalue is also frequently singled out for inspection. If not the ' (X(j) . It is now of interest to examine the bivariate case. we do not reject the hypothesis of normality.60J A test of normality at the 10% level of significance is provided by referring rQ = .1 .155.39.072148 17.770 and x and l.994 to the entry in Table 4. Xjp] is the jth observation on the p variables Xl' X 2 .155.) so Sl = 14.14. u ra anthder sUbjecthivel roug . We prefer rQ because it corresponds directly to the points in the normalscores plOt.05J.)' ur samp e SlZe of 10 1S too small to reach this conclusion. any observation x' .r .. 108..9351.... . data compani. Xp..9351.39 the• estimated 50O/C0 con t our. However.795 t bivariate Although not a random sample.62J 26.2 Table 3 in the appendix. . we .19 1 [26. • Instead of rQ' some software packages evaluate the original statistic proposed by Shapiro and Wilk [12].9. the set of bivariate outcomes x such that .28.
it is helpful to plot them as if they were.61 1. When the parent population is multivariate normal and both nand n .A(j .. (See [6].60 2. The plot should resemble a straight line the origin slope 1.. 8 IO 12 Figure 4.77 3.. 8.cv . that merit further attention. it can be used for all p 2. fied in terms of percentages.] Although these distances are not independent or exactly chisquare distributed. where qc.5 • • __ _ _ _ ___ qd(jt)1I0) 567 Quantiles are specified in terms of proportions.53 4.Din) = (n . The resulting plot is called a chisquare plot or gamma plot. . addition inspecting univariate plots and scatter plots.5 4 3.30 1.2.a plot) Let us construct a plot of the generalized distances given Example 4. we should check multlvanate normalIty by constructing a chisquared or d Z plot. Given the small sample size it is difficult to 4 .8 contains dZ • III j 1 2 3 4 5 6 7 8 9 dfj) . If further analysis of the ata were it might be reasonable to transform them to observations ms ne a rl y blvanate normal.62 1.7 and Equations (426) and (427). Figure 4.z 10 J . dfj)) is shown in Figure 4.7..0 . Graph the pairs (qcj(j . Fi blvanate on the evidence in this graph. .13. • 3 2. .pare greater than 25 or 30.5 •• o • 0.) To construct the chisquare plot. A systematic curved pattern suggests lack of normalIty.5 Assessing the Assumption of Normality 185 where XI. Order the squared distances in (432) from smallest to largest as d71) :s d7z) :s ... 1. Appropriate transformations are discussed ec IOn ? g :.Dln).13 (Constructing. :S d[n).20 1.7 A chisquare plot of the ordered distances in Example 4.38 1) qc.99 6 4 2 ".30 .8 Chisquare plots for two simulated fourvariate normal data sets with n = 30. whereas percentiles are speci. l:n are the sample observationl'.16 1. because the chisquare distribution is a special case of the more general gamma distribution..33 . . n 5 4.. each of the squared distances di. qc. One or two POlllts far above the line indicate large distances. ' . 2.64 1. should behave like a chisquare random variable.z( (j ..58 .10 2. The quantiles qc) (j . 8 IO 12 0 .!)/1O).Din quantile of the chisquare distribution with p degrees of freedom.!)In) is the 100(j . .j + Din).'2 C dJ) IO dJ) 8 • IO 10 .12.79 3.d7j)).p( (j .: • • 4 6 qc.10 . or outlying observations.rh of the pairs (qc.•••• • 6 •• 8 6 4 2 0 " 2 4 qc. The ordered.iv .79 5. Example 4.184 Chapter 4 The Multivariate Normal Distribution A somewhat more formal method for judging the joint normality of a data set is based on the squared generalized distances j = 1.5 • • • • 2 1. In particular.' reasona?ly straight. .71 1./ 0 2 ••• • •• . The points in .86 1. Xz. . The procedure we are about to describe is not limited to the bivariate case. [See Result 4. and the corresponding chisquare percentIles for p = 2 and n = 10 are lIsted III the following table: Figure 4. are related to the upper percentiles of a chisquared distribution.!)In) .
50).9 A chisquare plot for the data in Example 4.i quantiles.1:'. x 2" X3 and x 4. The situation can be more complicated with multivariate data.54 1533 4. justifiably. The next example contains a real data set comparable to the sImulated data set that produced !he plots in Figure 4.85 1176 3. we advocate calculating the [see Equation' (432)] and comparing the results with .93 1898 .77 1667 1.) We close this section by noting that all measures of goodness offit suffer the same serious drawback. the plots have a straightline pattern..::: line having slope 1 and that passes through the origin.x) are also presented In the table. We have discussed some rather simple techniques for checking the multivariate j = 1.x) S (Xj . L o r .2. The squared distances dj = (Xj .58 o • .3 were obtained by taking four different measures of stiffness. For example. we must emphasize that not all outliers are wrong numbers. unusual observations are those that are either very large or very small relative to the others. with the .:.90 1477 5.38 1597 3. but the top two or three ordered squared distances are quite variable. pvariate normality is indicated if (See [6] for a more complete exposition of methods for assessing normality. They may.8. When the sample size is small. We will return to a discus• sion of this plot in Example 4..49 2581 12. we constructed the chIsquare plot shown in Figure 4.21 1883 1040 1546 2. As expected. The first measurement involves sending .p( . On the other hand.62 1533 5.33).99 1755 1. of each of n = 30 boards. The two specimens with the largest squared distances are clearly removed from the straightline pattern.9. Before we address the issue of identifying these outliers.14.36 1646 1.99 1874 1.28 1284 2. . XI X2 X3 X4 d2 " N 1 2 3 4 5 6 7 8 9 10 12 14 15 13 11 1889 2403 2119 1645 1976 1712 1943 2104 2983 1745 1710 2046 1840 1867 1859 ]651 2048 1700 1627 1916 1712 1685 1820 2794 1600 1591 1907 1841 1685 1649 1561 2087 1815 1110 1614 1439 1271 1717 2412 1384 15]8 1627 1595 1493 1389 1778 . .48 2222 7.40 1834 2.15. Example 4. dJ. with the next largest point or two. The marginal distributions appear quite normal (see Exercise 4. a shock wave down the board. they make the plot appear curved at the upper end. the second measurement IS determined while vibrating the board. Roughly half of the dy are less than or equal to qc.50 1308 3. . x 1.60 2197 5.. o • 00 10 Observation no. To further evaluate mu/tivanate normalIty. Source: Data courtesy ofWilliam Galligan..08 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1954 1325 1419 1828 1725 2276 1899 1633 2061 1856 1727 2168 1655 2326 1490 2149 1170 1371 1634 1594 2189 1614 1513 1867 1493 1412 1896 1675 2301 1382 1180 1002 1252 1602 1313 1547 1422 1290 1646 1356 1238 1701 1414 2065 1214 1281 16. With data on a single characteristic.58 1469 3. only the most aberrant behavior will be identified as lack of fit.26 1508 . • •••• • • o 4 6 8 lO 12 Figure 4.. . .22 1671 4.70 1678 . very large samples invariably produce statistically significant lack of fit. 4. Yet the departure from the specified distribution may be very small and technically unimportant to the inferential conclusions.80 2037 2. .186 Chapter 4 The Multivariate Normal Distribution Detecting Outliers and Cleaning Data 187 plots based on two computergenerated samples of 30 fourvariate normal random vectors. Specifically.06 1516 . 1.13 1714 1.00 2234 6. Together.14 (Evaluating multivariate normality for a fourvariable data set) The data in Table 4. be part of the group and may lead to a better understanding of the phenomena being studied. Xl X2 X3 X4 d2 Observation no.••••• 2 •• ••••• •• . possible exception of specimen 9.46 1741 2. and the last tw_o are obtained static tests.7 Detecting Outliers and Cleaning Data Most data sets contain one or a few unusual observations that do not seem to belong to the pattern of variability produced by the other observations. .46 2111 9. n normality assumption.
. and"we look for observations that are far from the others. In the bivariate case.35 LlO . we suggest first visually inspecting the data whenever possible. . Examine these distances for unusually large values. Make a scatter plot for each pair of variables.2.1)/2 may prevent viewing them all. For instance.43 . Examine these standardized values for large or small values. .13 . There are n X p standardized values. What should we look for? For a single random variable. even though it cannot be seen visually.06 1. • • • • • @ dJ • • • • • • • • • Xl X2 X3 X4 Xs Zl Z2 Z3 Z4 Zs • • • ••• • •••••••••••• I • • • •: • I •@ . When the number of observations n is large. p. reveals a single large observation which is circled.15 . also circled. "large" must be interpreted relative to the sample size and number of variables. If the sample size is n = 100. Calculate the generalized squared distances (Xj . Figure 4.28 . there are 500 values. .12 . Nine observation vectors. x Steps for Detecting Outliers 1.75 . When the number of characteristics p is large..26 . the large number of scatter plots p(p .. . along with their standardized values. separately. each of its components has a typical value. are given as rows in the following table.32 .04 . In step 4..20 . 2. there can be outliers that cannot· be detected from the univariate plots or even the bivariate scatter plots.38 ..47 . @ I ..05 1. A more extreme percentile must serve to determine observations that do not fit the pattern of the remaining data .31 . the dot diagram Detecting Outliers and Cleaning Data 189 •• •• • •••• . You expect 1 or 2 of these to exceed 3 or be less than 3. 3. the situation is more complicated. we would expect 5 observations to have values of that exceed the upper fifth percentile of the chisquare distribution..12 .13 .28 . the problem is one dimensional.81 . . even if the data came from a multivariate distribution that is exactly normal.04 . "large" is measured by an appropriate percentile of the chisquare distribution with p degrees of freedom. but.21 .. measurements..10 Two outliers. In a chisquare plot.64 ..10 shows a situation with two unusual observations. In step 3.23 .22 ..78 . .X)'SI(Xj ..26 . When n = 100 and p = 5.07 2. dot plots are not feasible.x).43 1. The second outIier. for j = 1. 3.188 Chapter 4 The Multivariate Normal Distribution OutIiers are best detected visually whenever this is possible.. . .60 Figure 4.. 4. Calculate the standardized values Zjk = (Xjk .28 ..37 . The data point circled in the upper right corner of the figure is detached from the pattern.... . n and each column k = 1. Make a dot plot for each variable.Xk)/YS. one univariate and one bivariate. . Here a large value of (Xj .05 .x) will suggest an unusual observation.2. Similar data sets from tl!e same study also contained data on Xs = tensile strength. Even so. these would be the points farthest from the origin..11 . .94 .01 .X)'Sl(Xj ..56 1. and its second coordinate is large relative to the rest of the X2 • • • • • • ••• •• ••• •• • • • • • • • • • •• • @ • • • •• • .01 . In higher dimensions. is far from the elliptical pattern of the rest of the points.3 concerning lumber have already been cleaned up somewhat.J • 1631 1770 1376 1705 1643 1567 1528 1803 1587 : 1528 1677 1190 1577 1535 1510 1591 1826 1554 1452 1707 723 1332 1510 1301 1714 1748 1352 1559 1738 1285 1703 1494 1405 1685 2746 1554 1602 1785 2791 1582 1553 1698 1764 1551 l. as shown by the vertical dot diagram. This outlier cannot be detected by inspecting the marginal dot diagrams.5 might be considered large for moderate sample sizes.<. The data we presented in Table 4.73 ..10 1.ti64 . As a guideline. out of the total of 112...87 .52 .
7 .7 . J.14.6 .11 above.2 . Theast column in Table 4.8 1.6 .5 .7 3.50 3.9 1.l Detecting Outliers and Cleaning Data .2 .0 1.6 3.4 1.8 . and X4. When they checked their records regardin g the values pinpointed by this analysis. the remainin g patter: a er.6 .4 .5 .38 3.P 190 Chapter 4 The Muitivariate Normal Distribution The standardized values are based on the sample mean and variance.8 .1 .1 1 1 J.3.6 1.0 .J = (x· J . The next example returns to the data on lumber discussed in Exampl e 4.2 1. I I I I _r'. differen t from the rest of the I t ' .11 Scatter plots for the lumber stiffness data with specimens 9 and 16 plotted k = 1.8 .L.5 . the second measurement is determined while vibrating the board.0 . X3.2.8 .4 contains the data in Table 4.7 1.0 . clmen a so as a large d 2 value e two speclffiens (9 and 16) with lar ..7 . and the last two measure ments are obtained from static tests. .L .2 .8 13 14 15 16 17 18 19 20 21 22 23 24 25 26 1.3 .5 .58 3.3 .5 .8 1.40 2.00 6.5.30 A lA 1.X x) .6 .6 1.2 .2 .70 .L.99     • ° 0 • o ..8 1.1 .l..3 1. d' .5 .06 .2 .58 1.ID Igure 4.9 .62 5.54 4.3 . squared distance s stand out as clearly removed .5 .7 c@ .6 ..80 2. During their investigation.1 . . .3 . 1.0 .1 1.6 .2 A . The standardized measurements are Table 4..5 lA A .1 .5 ..8 .22 4.4 .} ° xl 8 OcPcfi° • 9 0°0 0 8  Example 4.2 . J. ..7 . There are two extreme standardized values.9 .77 1. 1 2 3 4 5 6 7 8 9 Zl Z2 Z3 Z4 d2 ': L 1889 2403 2119 1645 1976 1712 1943 2104 2983 1745 1710 2046 1840 1867 1859 1954 1325 1419 1828 1725 2276 1899 1633 2061 1856 1727 2168 1655 2326 1490 1651 2048 1700 1627 1916 1712 1685 1820 2794 1600 1591 1907 1841 1685 1649 2149 1170 1371 1634 1594 2189 1614 1513 1867 1493 1412 1896 1675 2301 1382 1561 2087 1815 1110 1614 1439 1271 1717 2412 1384 1518 1627 1595 1493 1389 1180 1002 1252 1602 1313 1547 1422 1290 1646 1356 1238 1701 1414 2065 1214 1778 2197 2222 1533 1883 1546 1671 1874 2581 1508 1667 1898 1741 1678 1714 1281 1176 1308 1755 1646 2111 1477 1516 2037 1533 1469 1834 1597 2234 1284 10 11 12 .2 . along with the standardized observations.13 o¥ C0 1 • I I (Il') tI9 I 600 2200 COO c:1]@ 3.5 .08 ° I 1000 ° ITITTITT as solid dots.3 1. .60 5048 7....2 3. . I I • I 1800 I I I I 2400 L .7 .3 Figure 4.4 .2 27 28 29 30 .1 .5 1.2 1.1 .0 .4 ..3 1. ReCall that the first measurement involves sending a shock wave down the board.4 Four Measurements 'of Stiffness with Standardized Values Xl X2 X3 X4 N • 0 • x2 • 0 • ° t i ' 0 eP°o 0 • cS ° 0 ° 0 o Observation no.36 1. e ID 9 I uaIhmea suremen ts are well within their Th .191 1500 2500 • 16 ° ° 0 0 ° .40 2. on each of n = 30 boards. j = 1.lL.86' yet all of th ..9. measure ments are given in Figure 4.3 .0 . These data consist of four different measures of stiffness Xl.7 ..1 1.3 1. Incorrect readings on an individu al variable are quickly detected by locating a large leading digit for the standard ized value. the research ers recorded measurements by hand in a logbook and then performed calculations that produce d the values given in the table..28 2. Both are too large with standardized values over 4.5 . Once these two points are Scatter plots for the lumber stiffn con orms to the.8 1.1 .4 reveals th I .2.2 . andx4 = 2746 was corrected to 1670.3 .3.L..3 A .6 and the squares of the distance s are d? )'Sl( x· . expected straightline relation ..5 .21 1. calculated from al1112 observations..3 .5 .15 (Detecting outliers in the data on lumber) Table 4.4 1. .t speCIme n 16 IS a multIva nate outlier. I I 1200 r.5 2.0 1.3 .4 .8 .1 .99 1.46 9.2 1.93 1049 • ° Cb 0 0 ° 0° • 0 ° 0 • CO x4 ° 046 2. The value X5 = 2791 was correcte d to 1241.5 .. SIDce = 14.1 .3 .1 . \ I 1 .2 .2 1.1 1.5 .3 .6 .. errors were discovered. IVI respecti ve univaria te Spe .90 5.2 .8 . X2.6 .
what is the next step? One alternative is to increases large values ofx ignore the findings of. A useful family of transformations for this purpose is the family of power transformations. this choice of A corresponds to the transformation. shrinks large values of x 4. the BoxCox solution for the choice of an appropriate power A is the solution that maximizes the expression 1. Power transformations are defined only for positive variables. smce. . It would also appear that specimen 16 is a bit unusual. We can trace the family of transformations as A ranges from negative to positive powers of x. the logit transformation applied to proportions and Fisher's ztransformation applied to correlation coefficients yield quantities that are approximately normally distributed. Box and Cox (3) consider the slightly modified family of power transformations X(A) = {XA . Unf?rtunately. . Since XI = l/x.v:. Proportions. We begin by focusing our attention on the univariate case. For example. Although the dot for specimen 16 stands out in all the plots. The power family of transformations IS mdexed by a parameter A. rx . (See [8].] n j=1 + (A . a of the foregoing transformations should produce an improvement. we define XO = In x. X2. A sequence of possible transformations is . jJ 3. . Similarly.y 2. as was done in the case of the data on lumber stiffness in Example 4. The fmal chOIce should always be examined by a QQ plot or other checks to see whether the tentative normal assumption is satisfactory. ±(xt .192 Chapter 4 The Multivariate Normal Distribution Transformations to Near Normality. Depending upon the nature of the outliers and the objectives of the investigation. because a single constant can be added to each observation in the data set ifsome of the values are negative. X n . an investigator looks at the marginal oot diagram or histogram and decides whether large values have to be "pulled in" or "pushed out" to improve the symmetry about the mean. A convement analytical method is available for choosing a power transformation. . It could lead to incorrect conclusions. r Vy 10git(jJ) = Fisher's 10gC jJ) z(r) = (433) We note that n e(A) = In 2 n [1" :L /=1 (xy)  X{A)2 . .. m many mstances. those based on the sample mean vectors usually will not be disturbed by a few moderate outliers. XI/2 = •VX. It has been shown theoretically that data that are counts can often be made more normal by taking their square roots. the dot for specimen 9 is "hidden" in the scatter plot of X3 versus X4 and nearly hidden in that of Xl versus However. consider XA with A = 1. The transformations we have been discussing are data based in the sense that it is ?nly the of the data themselves that influences the choice of an appropnate There are no external considerations involved. 193 The solid dots in these figures correspond to specimens 9 and 16. (435) 1 (1 + r) 2" log 1 . Scientists specializing in the properties of wood conjectured that specimen 9 was unusually and therefore very stiff and strong. this is not as restrictive as it seems. 1Tansformations are nothing more than a reexpression of the data in different units..!. Normaltheory analyses can then be carried out with the suitably transformed data.. xl/4 1 = . outIiers may be deleted or appropriately "weighted" in a subsequent analysis. Trialanderror calculations . it was not possible to investigate this specimen further because the matenal was no longer available. In choice of a transformation to improve the approximation to IS not obvIOus. For example. a check and as if data normally distributed.15. For A = 0.1) A (436) . Counts. However.r xY) is defined in (434) and X(A) =.8 Transformations to Near Normality If normality is not a viable assumption. Correlations. they should be examIned for content. n . such as simplicity or ease of interpretation.!. To select a power transformation.1=0 (434) lnx Helpful Transformations To Near Normality Original Scale Transformed Scale which is continuous in A for x > O. 1 A*O . A given value for A implies a particular transformation. • If outliers are identified. since both of its dynamic measurements are above average and the two static measurements are low. Let X represent an arbitrary observation. Hawkins [7] gives an extensive treatment of the subject of outliers. It frequently happens that the new units provide more natural expressions of the characteristics being studied. when a histogram of positive observations exhibits a long righthand tail... although the actually used is often determined by some mix of information supphed by the and extradata factors.1) L j=1 " In x.X I  x' xO = In x . Even though many statistical techniques assume normal populations. For such cases. This practice IS not recommended.=1 ±xy) = . A second alternative is to make nonnormal data more "normal looking" by considering transformations of the data. 9 is clearly identified as a multivariate outlier when all four vanables are considered. Appropriate transformations are suggested by (1) theoretical considerations or (2) the data themselves (or both). it is to let the data suggest a transformatIOn.) Given the observations Xl. .. . transforming the observations by taking their logarithms or square roots will often markedly improve the symmetry about the mean and the approximation to a normal distribution.
84 106. a power transformation must be selected for each of t e vana es..34 97.. Rather than program the calculation of (435). Let A A b h e power transformations for the measured .=0.35 104.50 e(A) 70. apart from a constant.83 105.30 maXImIzes A.06 92.···. some statisticians recommend the equivalent procedure of fixing A.46 84. as. For convemence. Since all the observations are positive. the logarithm of a normal likelihood function. The calculation of e( A) for many values of Ais an easy task for a computer.90 1.51) A C(A) 106. creating the new variable j C(A) = 1. The quantile pairs fall very close to a straight line and we . It is helpful to have a graph of eCA) versus A. 1. This plot is shown fi :\ j = 1..94 89.30 Igure. Each Ak can be selected by maximizing P ek(A) = 1040 1..10 . after maximizing it with respect to the population mean and variance parameters.80 . a a Xj were (1/4) x}l4 . one of these may be preferred because of its simplicity.10 86. on page 196. from both table and the plot !hat a value of Aaround .42 040 .33 99.52 75. .2. The outcomes produced by a transformation selected according to (435) should always be carefully examined for possible violations of the tentative assumption of normality. In ot was constructed from the transformed quantities.. Comment. there is no guarantee that even the best choice of A will produce a transformed set of values that adequately conform to a normal distribution. The pairs (A. Example 4.70 .60 .03 101.10 98.39 106. let us perform a power transformation of the data which.00 1. 2. The first term in (435) is. .10 . well as a tabular displflY of the pairs (A.00 .12.80 ..50 104. Restricting our attention to the family of transformations in (434). This warning applies with equal force to transformations selected by any other technique.39 103.20 .20 (. ' would conclude from this evidence that the x(I/4) j are approxImately normal. cFiurve of e(A) versus A that allows the more exact determination sh own In Igure 4.20 105. will produce results that are more nearly normal. we hope.79 96.97 101.30 .30 • Transforming Multivariate Observations ith Wh observations. we must find that value of A maximizing the function e(A) in (435). However.96 89.12 Plot of C(A) versus A for radiation data (door closed).10. .50 In[. The QQ plot of these data in Figure 4.50 . . A pet charactenstIcs. e(A» are listed in the following table for several values of A: 11.10 1. e(A)).1 Xi = :1: A 1. . A= 28 is .65 80. (x)}c) J XiAk»2] + (Ak  1) ±In Xjk J=1 (438) .88 040 . For instance.64 91.28 Figure 4.70 .10 94.pi 194 Chapter 4 The Multivariate Normal Distribution Transformations to Near Normality 195 is the arithmetic average of the transformed observations. The minimum of the variance occurs at the same Athat maximizes (435).00 .90 . in orderto study the near the value A. It is now understood that the transformation obtained by maximizing e(A) usually improves the approximation to normality.20 1.60 .16 (Determining a power transformation for univariate data) We gave readings of the microwave radiation emitted through the closed doors of n = 42 ovens in Example 4.07 82.6 indicates that the observations deviate from what would be expected if they were normally distributed. n (437) and then calculating the sample variance. we choose A = 25 The d t reexpressed as .43 103. if either A = 0 (logarithm) or A = 2 (square root) is near A.
"p A.50 The procedure just described is equivalent to making each marginal distribution approximately normal.00 n = 2"InIS(A)1 1. n where Here Xlk> X2b"" Xnk are the n observations on the kth variable. .. 2. JP 2.14 on page 199 shows QQ plots of the untransformed and transformed dooropen radiation data.. . .J. n (xAi l '" X(Ak) = _ '" _1_ _ £.. .0 1.1) j=! n L n Inxj2 + . .. in practical applications this may be good enough.) x(l) = 1 XAp  _I_P_ _ 1 Ap where AI. the plot indicate the number of pomts occupymg the same (The mtegers III location.0 3. Although normal marginals are not sufficient to ensure that the joint distribution is normal. The approximate maximizing value was A= . "2.) ..1)/Ak ..J A (A. j = 1. . 2. . Example 4. see [8]. a power transformation for these data was selected by maximizing £(A) in (435). In accordance with the procedure outlined in Example 4. + (A p . p.30.0 qljl flgu..0 1..17 (Determining power transformations for bivariate data) Radiation measurements were also recorded through the open doors of the n = 42 microwave ovens introduced in Example 4. Ap obtained from the preceding transformations and iterate toward the set of values A' = (A'I. are the values that individually maximize (438).16. j = 1.10. and Ak' The latter likelihood is generated by pretending there is some Ak for which the observations .. ... re 4 13 A QQ plot of the transformed data (d?or closed).J Ik £. The jth transformed mulis the anthmetlc averag tivariate observation is Maximizing (440) not only is substantially more difficult than maximizing the individual expressions in (438).196 Chapter 4 The Multivariate Normal Distribution X (114) Transformations to Near Normality 197 (jI . . which collectively maximizes 1. . . but also is unlikely to yield remarkably better results. The amount of radiation emitted through the open doors of these ovens is listed in Table 4.50 + (A] 1) j=1 L Jl Inxjl + (A2 . we could start with the values AI. k = 1.00 2. respectively. If not. . (These data were actually . A2. See [3] and [2] for detailed discussions of the univariate and multivariate cases. .. .0 ..5. n have a normal distribution. A 2 .1) '" In X· £..2. e of the transformed observations.00 j=! (440) where SeA) is the sample covariance matrix computed from 3. whereas the method based on (438) corresponds to maximizing the kth univariate likelihood over JLb akk.) _ Xk  1" n j=l 1) (439) n j=l k . 1: and A. The selection method based on Equation (440) is equivalent to maximizing a muItivariate likelihood over ft. Figure 4.0 2. (Also.' . Ap].
04 ..10 .20 1.05 .30 • • 2 5 2 • 3 • 4·· • • 10 11 12 13 14 15 . To do this.10 . Let us denote the doorclosed data by XII .60 .0 (b) Figure 4.40 3.0 (a) 1.2. 31 32 33 34 35 36 37 38 39 40 41 42 Radiation .1 and the dooropen data by X12.40 .) .10 .05 .0 3.28 Oven no.25 .0 Source: Data courtesy of 1.50 and o :S A2 :. X22.45 .0 2.09 .12 . .10 . As we saw in Example 4.17. we must maximize f(Al' A 2) in (440) with respect to both Al and A 2· We computed f(AJ.12 . and we constructed the contour pl<2t in Figure 4. q(j) 2.0 2. we have Al = .30 .45 .01 .16.00 .15 .0 1.0 3.198 Chapter 4 The Multivariate Normal Distribution Transformations to Near Normality 199 Table 4. although the normal approximation is not as good as it was for the doorclosed data. using outcomes from Example 4. (The integers in the plot indicate the number of points occupying the same location.33 . Thus.14 QQ plots of (a) the original and (b) the transformed radiation data (with door open)..12 .2' Choosing a power transformation for each set by maximizing the expression in (435) is equivalent to maximizing fk(A) in (438) with k = 1. X (1I4) (j) transformed by taking the fourth root.0 .. D.0 ''_.20 .X2b"" x42. .12 .15 .09 .00 1. We see that the maxirilUm occurs at about (AI' A2) = (.16.10 .09 .01 :60 . .30 and A2 = . _ _.09 .l_ _L _ _1 _ _ .16 and the foregoing results.10 • ..32 .09 . Cryer. These powers were determined for the marginal distributions of Xl and X2' We can consider the joint distribution of Xl and X2 and simultaneously determine the pair of powers (Ab A2) that makes this joint distribution approximately bivariate normal.20 . as in Example 4.15 9 6 2 .12 .80 2. making each marginal distribution approximately normal is roughly equivalent to addressing the bivariate distribution directly and making it approximately normal.60 1.12 .15 on page 200.30 . 1 2 3 4 5 6 7 8 9 Radiation . It is generally easier to select appropriate transformations for the marginal distributions than for the joint distributions.05 .30 Oven no.10 . X42. . A2) for a grid of Ab A2 values covering 0 :S Al :S .16). 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Radiation .30.S Radiation Data (Door Open) Oven no.50.L.10 .07 ..10 ." .) It is clear from the figure that the transformed data are more nearly normal. The "best" power transformations for this bivariate case do not differ substantially from those obtained by considering each marginal distribution.30 .
p. given that XI tribution in Exercise 4.2X3 . 4.3.3 0. (a) XI andX2 (b) X 1 and X3 ' (c) X 2 and X3 (d) (X1' X 3 ) and X 2 (e) XI and XI + 3X2 . given that XI = xI and X3 tribution in Exercise 4.1· Consider a bivariate normal distributlOn P12 = 4. X2 A x( ) ={ {(x + I)A . . 4.1 225 9 . I) with p.4) and 0. 1) and I 0. wherep.5 222 (c) Determine (and sketch) the. )'II(xp.4 : n 0.' = [3. X2 If the data includes some large negative values and have a single tail. Specify each of the following.6.3..2.8. = X3 for the joint dis = xI and X 2 = X2 for the joint dis Exercises .A.0 = 1 2 Figure 4. and Which of the following random variables are independent? Explain. (a) X 1 and X 2 (b) X 2 and X3 (c) (X1 . (1"22 = 1 and ILl = 1.) as a function of xI and X2' 0.IL2 . ( x . constantdensity contour that contains 50% of the probability.  2..3. 0.)'II(x _ p.0 0.5. (a) Write out the bivariate normal density. and find a 2 x 1 vector a such that X and (a) Find the distribution of 3X1  2X2 + X .2 0 0. (a) Write out the bivariate normal density.' = [1. 3. 1 (1"1 = 2. a more general transformation (see Yeo and Johnson [14]) should be apphe . 3 (b) Relabelthe variables if necessary.X2 ) and X3 Xl + X 2 (d) 2 and X3 (e) X 2 and X 2  X1  X3 0. Consider a bivariate normal popu Iabon PI2 = .1}/(2 .*0 for the joint distribution in + 1) x 2: O. Let X be N 3 (p. (a) The conditional distribution of XI> given that X 2 = Exercise 4.2.I).5.A x < O..A x < O. . Let X be distributed asN3 (p. 200 Chapter 4 The Multivariate Normal Disuibution Exercises 20 I (b) Write out the squared generalized distance expression (x .A '* 2 = = 0 2 (b) The conditional distribution of X 2 .1}/A In(x+l) {(x + 1)2A .4.A) In(x x2:0. . I) with p.1.1 Let X be N 3 (p. (1"22 = 1. Which of the following random variables are independent? Explain.2 11.2) and . (c) The conditional distribution of X 3 .' = [2. 1.  af 2 are independent. (1"11 = 2.1 5 Contour plot of C( AI' A2 ) for the radiation data. WI'th 4..p.)asaqua(b) Write out the squared statistical distance expresslOn dratic function of XI and X2' ILl = I = [ 1 0 2 · WI 'th 4.
that X 3 = x3' _ (b) The conditional distribution of Xl. 0 \ by the first roW .6 and specify each of the following.IL) = [XI . . if A is square.. but modify the construc tion by replacin g the break pomt 1 by Refer to E xerclse 48 . The second equality for / A / follows by consider ing 1 [ A 21 Ajl 0J [Att I A21 A 12 J [I A22 0' p[lXII> 1] = . a standard normal probabIlIty.A 12 A 2iA 2l )1 is the upper lefthand block of AI.A I2 A 2iA 2 Ii Show each of the following.ILl . : \ = IAIIBI (b) 4. Take inverses of the expression. is N(O 1) P[1 < XI S x] = P[x S XI < 1 ) for any x.202 e Multivariate Normal Distribution Chapter 4 Th Exercises 203 (b) 4. Wh en (a) Smce XI< 1 P[X x) = P[X S 1) + P[l <X S X2] = P[XI S 1) 2 2 1 <xI2<_ X' <x2) =2p [X s1) + P[X2S X <l).3174. wIth ?rder (s d db one This procedur e is repeated until 1 X IB lIS obtamed . the joint density exponen t can be written as the sum of two terms correspo nding to contribu tions from the conditio nal and margina l distribut ions.11 by I [ 0' = AlI2A2 1Jl and 0. forlAn I # 0 = IAJ1I1A22 . c so that XI ifc S XI S C X .t.8. (a) X also has an N(O. . I) with / I I # O. . (a) 0' B . Show that.I 12 I iI J/.1 I. Thus.X 2 . Show the followin g if X is Np(IL. Now use the result in Part a. and ifl S XI otherwis e S = I by the last row gives 0' 4. ..12. .IL)'II( x .IL2) (Thus. ShoW each of the following.IL2») + (X2 .II2I2iI2 t>I[X. I:. Expandi ng the determin ant \ I.24) gives 1 times a determin ant of the sam: form. ee Definition 2A.A 2I AjIA 12 1 for/Ali i # 0 1 IAI = IAnllA II .10.10 for the first and third determin ants on the left and for the determin ant on the right.. 2 Hint: Hint: Partition A and verify that . (ExampIe 0 f a n N(O. 1).X3' onnonna l bivariate distribut ion with normal margmal s. .13. \A expanding the determin ant \ by the lastrow gives 0' I = IA I· A \ I_lA X (I'l .IL2)]' Hint: 0 0 \\ I 0 \..I 12 I 2i(X2 .XI)]' (a) postmul tiply by 4. 1) distribution.I 12 I 2i(X2 .) (c) Given the results in Parts a and b.ILl . Hint: Premult iply the expressi on in the hint to Exercise 4.P X 2 S X2 . which equals zero wIth probabIlIty (b) Consl er Take determin ants on both sides of this equality. 4. P[l< 2 P[X [ ] ] _ P[X S 1] + P[1 < XI S X2] = 1 S X2 . gtven that X 2 = X2 and X 3 . Show that. 2 (b) XI and X do not have a bivariate normal dlstnbutl On. J '.0' I 0' B O B .1. (A\1 . (Note that /I/ 2 2 can be factored into the product of contribu tions from the margina l and conditio nal distribut ions..) Let XI be 4. I AIel 1 = 1. evaluate Cov (Xl' X 2) = E[ X:I (XI)] For c very large. ofIre uce Y . Use Exercise 4.{ 2XI elsewhe re osen so that Cov (XI X 2 ) = 0 but that the two random variables Show that c can be ch " are not independent. 'd the II'near combination XI . for A symmetr ic.) (b) Check that IAI # 0 = IAIIBI for (x .. (a) Check that /I/ = IInllIl 1 . evaluate Cov (XI' X 2 ) = E [XI ( . :11:. . SlffitlarIy. (a) The conditional distribution of Xl.wh'h' IC IS Thus. Refer to Exercise 4. identify the margina l distribut ion of X 2 and the conditio nal distribut ion of XI f X 2 = X2' 0\ .Bu tP[X2 S XI <1] I l + P[ I 2 • I' < x ] from the symmetr y argumen t in the fIrst me 0 f t h' X lIS h' m!.
22!. I (x . (a) The distribution ofK: (b) The distribution of (XI .ILl) + (X2 .P.!2.p.16.I" )(x· .. Do the data seem to be normally distributed? Explain.I(XI . Specify each of the following completely. ."" Xj pl.p.p.p.8 7.p..4 VI = 4 and Vz I IX 2 !X = 4XI +4 4 3 lX 4 4 (b) Find the joint density of the random vectors VI and V2 defined in (a). ( )(.X X .1O(a). Let XI.!.. the result follows.24.204 Chapter 4 The Multivariate Normal Distribution Hint: (a) Apply Exercise 4.I(X .SI(X .) with I!' I#'O show that the joint density can be written 4.p. (a) Construct a QQ plot.P. .1"1)'.. 4'and cov ariance matrix!' Find the mean vector and covariance ma. For the sales (XI) and profits (X2) data: (a) Construct QQ plots.:J (a) B= O! J 1 = [0 0 o 0 0 0 0J 1 000 Then write [!'li (x . !. ' 14 If X IS Istn u e as p"' as the product of marginal denslttes for . j = 1. and covariance matrix !'.19.4 IX 4 I Xl .P. Consider the annual rates of return (including dividends) on the DowJones industrial average for the years 19962005. Xj2.P. .X 2 + X3 ..)'!.. _ (b) Note from Exercise 4. I [ 0' .6 26.11. . Exercise 1. Hint: Show by block multiplication that the inverse of I = !.ll(xI .' . (a) Find the marginal distributions for each of the random vectors IX 2 + 4 IX 3 . .). .!') population..I(x . trices for each of the two linear combtna tlOns of random vectors !X !X I I X 3+5 4+55 4.p.p. are 0.15.3 16.1)'!.] Let the significance level be er = . .i"!!.P.2 4. and covariance !'. Let X I.ILl I X2 . Find the maximum likelihood estimates of the 2 x 1 mean vector p. )'!.p.!. X [I 0' P.IJ' 0J [(!. For the random variables XI..P.IL)'!.) 4.p.p.. These data. · d' 'b t d N (11.2 X2  P. }.21 I 0 22 Exercises 205 and Xl .) = [(XI .21. What is the approximate distribution of each of the following? .. . [See (431). ). . (a) The distribution of (XI . (qXI) XI and ((pq)XI) X2 if Il2 = (qx(pq)) 0 4. Here xi = [Xjl. Do these data appear to be normally distributed? Explain.18. X 20 be a random sample of size n = 20 from an N6(P. Also.2. th I> 2.!.!'J2!'i'!] [x.p.0.221 from Exercise 4.2) (b) B Note that I!' I = I!.I(X I . .IL2)'] 0' 0] [XI .) (b) The distributions of X and vIl(X . obtain the covariance between the two linear combinations of random vectors.I] I X2 . .. Show that £. I) random vectors.i"1( X 2 . Now factor the joint density.2It J [ X2 .p. Let XI> X 2.p. X 60 be a random sample of size 60 from a fourvariate normal distribution having mean p. .. and X 4 be independent Np(p. X 2..)'!.10. j=1 } zeros. 4 17 Le X • • X X X and X 5 be independent and identically distributed random vectors 3. . Specify each of the following completely.19.Sl(X .4 contains data on three variables for the world's 10 largest companies as of April 2005. and !'. (b) Carry out a test of normality based on the correlation coefficient 'Q.23.12 that we can write (x . . . X 20 in Exercise 4.)' and (x .2 22. .20. Use these 10 observations to complete the following..J Xj .P. specify the distribution of B(19S)B' in each case.!. X 3.1 25. and the 2 x 2 covariance matrix!' based on the random sample !'12!'i"!J [XI .1) S 4.X 4 + Xs in terms of p. (X2 . X 75 be a random sample from a population distribution with mean p.x)' are both p X P matrices of 4.2 25. t WIt mean vec or p. multiplied by 100.I] Ii"! X2 .) (c) The distribution of (n .I] = [XI .2 . X 2.p.) (d) The approximate distribution of n(X . (a) X (b) n(X .2 = (XI .6 3. .) as l XI .2)'!.IIII !. . Let Xj.II .P.. and X= n j=1 4.2)J from a bivariate normal population.22. 1 2: Xj 11 4.P.11.2 If we group the product so that .) (c) Thedistributionofn(X .P. n.1 6. X 2.) 4.
Exercises 207
206
Chapter 4 The Multivariate Normal Distribution t t of normality based on the correlation coefficient rQ. [See (431).] I I at a = 10 Do the results ofthese tests corroborate the re(b) Carry a.f.es Set the slgm Icance eve ., suits in Part a? th world's 10 largest companies in Exercise 1.4. Construct a chi4 25 Refer to the data for e . '1 . . . II three variables. The chisquare quanti es are square plot uslO.g a 0.3518 0.7978 1.2125 1.6416 2.1095 2.6430 3.2831 4.1083 5.3170 7.8147 . h x measured in years as well as the selling price X2, measured 4.26. Exercise 1.2 glVeds tll e agfe = 10 used cars. Th'ese data are reproduced as follows: in thousands of
0
4.31. Examine the marginal normality of the observations on variables XI, X 2 , • •• , Xs for the
multiplesclerosis data in Table 1.6. Treat the nonmultiplesclerosis and multiplesclerosis groups separately. Use whatever methodology, including transformations, you feel is appropriate.
4.32. Examine the marginal normality of the observations on variables Xl, X 2 , ••• , X6 for the
radiotherapy data in Table 1.7. Use whatever methodology, including transformations, you feel is appropriate.
4.33. Examine the marginal and bivariate normality of the observations on variables
XI' X 2 , X 3 , and X 4 for the data in Table 4.3.
ars, or
2
.
4.34, Examine the data on bone mineral content in Table 1.8 for marginal and bivariate normality.
3
17.95
3 15.54
4
5
6
8.94
8
7.49
9 6.00
11 3.99
18.95
19.00
14.00 12.95
4.35. Examine the data on paperquality measurements in Table 1.2 for marginal and multivariate normality.
4.36. Examine the data on women's national track records in Table 1.9 for marginal and mulxercise 1 2 to calculate the squared statistical distances . ,  [ ] (a) Use the resU Its 0 f E (x  X),S1 (Xj  x), j = 1,2, ... ,10, where Xj  Xj2 • •• I . . Part a determine the proportIOn of the observatIOns falhng the distances m , . . d' 'b . ( b) Us'ng .I _ . d 500"; probability contour of a blvanate normal Istn utlOn. wlthlO the estimate ° distances in Part a and construct a chisquare plot. (c) 0 r d er th e b" I? . P rts band c are these data approximately Ivanate norma. (d) Given the resu Its m a , Explain. . . ( data (with door closed) in Example 4.10. Construct a QQ plot 4.27. ConSider the radla of these data [Note that the natural logarithm transformation for the A = 0 in (434).] Do the natural logarithms to be ?or d? Compare your results with Figure 4.13. Does the chOice A = 4, or .,? mally dlstn u e . A = 0 make much difference III thiS case. The following exercises may require a computer. . . _ ollution data given in Table 1.5. Construct a QQ plot for the 4.28. ConsIder the an p d arry out a test for normality based on the correlation d' r measurements an c . 0 . ra la.l?n [ (431)] Let a = .05 and use the entry correspond 109 to n = 4 ID coeffIcient rQ see . Table 4.2. _ I . ollution data in Table 1.5, examine the pairs Xs = N0 2 and X6 = 0 3 for 4.29. GIven t le alfp bivariate nonnality. , 1 _ • . . I d'stances (x  x) S (x  x), ] = 1,2, ... ,42, where I I I (a) Calculate statlstlca x'·= [XjS,Xj6]' . f 11' I . e the ro ortion of observations xj = [XjS,Xj6], ] = 1,2, ... '.42: a .lOg (b) DetermlO p. p te 500"; probability contour of a bivariate normal dlstnbutlOn. ° within the approxlma (c) Construct a chisquare plot of the ordered distances in Part a. tivariate normality.
4.37. Refer to Exercise 1.18. Convert the women's track records in Table 1.9 to speeds measured in meters per second. Examine the data on speeds for marginal and multivariate normality. .
4.38. Examine the data on bulls in Table 1.10 for marginal and multivariate normality. Consider
only the variables YrHgt, FtFrBody, PrctFFB, BkFat, SaleHt, and SaleWt
4.39. The data in Table 4.6 (see the psychological profile data: www.prenhall.comlstatistics) consist of 130 observations generated by scores on a psychological test administered to Peruvian teenagers (ages 15, 16, and 17). For each of these teenagers the gender (male = 1, female = 2) and socioeconomic status (low = 1, medium = 2) were also recorded The scores were accumulated into five subscale scores labeled independence (indep), support (supp), benevolence (benev), conformity (conform), and leadership (leader).
Table 4.6 Psychological Profile Data
Indep 27 12 14 18 9
:
Supp
Benev 14 24 15 17 22
:
Conform 20 25 16 12 21 17
Leader
Gender 2 2 2 2 2 1 1 2 2 2
Sodo 1 1 1 1 1 2 2 2 2 2
13 13
20 20 22
11 6 7 6 6
10
14 19 27
10
11 12 11 19 17
26 14 23 22 22
10
29
11
18 7 22
13
9 8
4 30. Consider the usedcar data in Exercise 4.26.,
.
. . th power transformation AI that makes the XI values approxImately e d ( a) Determllle nstruct a QQ plot for the transforme data. I C0 norma. , . t I . th power transfonnations A2 that makes the X2 values approxlll1a e y (b) Determme e ct a QQ plot for the transform ed data. I C0 nstru norma. , " ] I . th wer transfonnations A' = [AI,A2] that make the [XIoX2 vaues (c) Deterrnmnna\ee p? (440) Compare the results with those obtained in Parts a and b. jointly no usmg  .
Source: Dala courtesy of C. SOlO.
(a) Examine each of the variables independence, support, benevolence, conformity and leadership for marginal normality. (b) Using all five variables, check for multivariate normality. (c) Refer to part (a). For those variables that are nonnormal, determine the transformation that makes them more nearly nonnal.
208 Chapter 4 The Multivariate Normal Distribution
4.40. Consider the data on national parks in Exercise 1.27.
(a) Comment on any possible outliers in a scatter plot of the original variables. (b) Determine the power transformation Al the makes the Xl values approximately • normal. Construct a QQ plot of the transformed observations. (c) Determine the power transformation A2 the makes the X2 values approximately normal. Construct a QQ plot of the transformed observations. . (d) DetermiQe the power transformation for approximate bivariate normality (440).
Exercises 209 13. Viern, '11 S., and R. A. Johnson "Tabl d CensoredData Correlation £es . LargeSample Distribution Theory for Statistical ASSOciation, 83, no. 404 Journal of the American 14. Yeo, I. and R. A. Johnson "A New R ity or Symmetry." Biometrika, 87,
to Improve Normal
'1
. 15. Zehna, P. "Invariance of Maximu L" Statistics, 37, no. 3 (1966),744. m lkehhood Estimators." Annals of Mathematical
4.41. Consider the data on snow removal in Exercise 3.20 ..
(a) Comment on any possible outliers in a scatter plot of the original variables. (b) Determine the power transformation Al the makes the Xl values approximately normal. Construct a QQ plot of the transformed observations. (c) Determine the power transformation A2 the makes the X2 values approximately normal. Construct a Q Q plot of the transformed observations. (d) Determine the power transformation for approximate bivariate normality (440).
References
1. Anderson, T. W. An lntroductionto Multivariate Statistical Analysis (3rd ed.). New York: John WHey, 2003. 2. Andrews, D. E, R. Gnanadesikan, and J. L. Warner. "Transformations of Multivariate Data." Biometrics, 27, no. 4 (1971),825840. 3. Box, G. E. P., and D. R. Cox. "An Analysis of Transformations" (with discussion). Journal of the Royal Statistical Society (B), 26, no. 2 (1964),211252. 4. Daniel, C. and E S. Wood, Fitting Equations to Data: Computer Analysis of Multifactor Data. New York: John Wiley, 1980. 5. Filliben, 1. 1. "The Probability Plot Correlation Coefficient Test for Normality." Technometrics, 17, no. 1 (1975),111117. 6. Gnanadesikan, R. Methods for Statistical Data of Multivariate Observations (2nd ed.). New York: WileyInterscience, 1977. 7. Hawkins, D. M. Identification of Outliers. London, UK: Chapman and Hall, 1980. 8. Hernandez, E, and R. A. Johnson. "The LargeSample Behavior of Transformations to Normality." Journal of the American Statistical Association, 75, no. 372 (1980), 85586l. 9. Hogg, R. v., Craig. A. T. and 1. W. Mckean Introduction to Mathematical Statistics (6th ed.). Upper Saddle River, N.1.: Prentice Hall, 2004. . 10. Looney, S. w., and T. R. Gulledge, Jr. "Use of the Correlation Coefficient with Normal Probability Plots." The American Statistician, 39, no. 1 (1985),7579. 11. Mardia, K. v., Kent, 1. T. and 1. M. Bibby. Multivariate Analysis (Paperback). London: Academic Press, 2003. 12. Shapiro, S. S., and M. B. Wilk. "An Analysis of Variance Test for Normality (Complete Samples)." Biometrika, 52, no. 4 (1965),591611. ..
Chapter
The Plausibility of /La as a Value for a Normal Population Mean 211
This test statistic has a student's tdistribution with n  1 degrees of freedom (d.f.). We reject Ho, that Mo is a plausible value of M, if the observed It I exceeds a specified percentage point of a tdistribution with n  1 d.t Rejecting Ho when It I is large is equivalent to rejecting Ho if its square,
t
2
=
(X  Jko) 2/ s n

2
= n(X

 Jko)(s) (X  Mo)
2 1 
(51)
X to the test value /lQ. The units of distance are expressed in terms of s/Yn, or estimated standard deviations of X. Once X and S2 are observed, the test becomes:
is large. The variable t 2 in (51) is the square of the distance from the sample mean Reject Ho in favor of HI , at significance level a, if
(52)
INFERENCES ABOUT A MEAN VECfOR
5.1 Introduction
This chapter is the first of the methodological sections of the book. We shall now use the concepts and results set forth in Chapters 1 through 4 to develop techniques for analyzing data. A large part of any analysis is concerned with inferencethat is, reaching valid conclusions concerning a population on the basis of information from a sample. . At this point, we shall concentrate on inferences about a populatIOn mean vector and its component parts. Although we introduce statistical inference through initial discussions of tests of hypotheses, our ultimate aim is to present a full statistical analysis of the component means based on simultaneous confidence statements. One of the central messages of multivariate analysis is that p correlated variables must be analyzed jointly. This principle is exemplified by the methods presented in this chapter.
where t,,_1(a/2) denotes the upper lOO(a/2)th percentile of the tdistribution with n  1 dJ. If Ho is not rejected, we conclude that /lQ is a plausible value for the normal population mean. Are there other values of M which are also consistent with the data? The answer is yes! In fact, there is always a set of plausible values for a normal population mean. From the well"known correspondence between acceptance regions for tests of Ho: JL = /lQ versus HI: JL * /lQ and confidence intervals for M, we have {Do not reject Ho: M = Moat level a} is equivalent to {JkolieS in the 100(1  a)%confidenceintervalx ± t n _l(a/2) or (53) The confidence interval consists of all those values Jko that would not be rejected by the level a test of Ho: JL = /lQ. Before the sample is selected, the 100(1  a)% confidence interval in (53) is a random interval because the endpoints depend upon the random variables X and s. The probability that the interval contains JL is 1  a; among large numbers of such independent intervals, approximately 100(1  a)% of them will contain JL. Consider now the problem of determining whether a given p x 1 vector /Lo is a plausible value for the mean of a multivariate normal distribution. We shall proceed by analogy to the univariate development just presented. A natural generalization of the squared distance in (51) is its multivariate analog or
t l(a/2)
n
5.2 The Plausibility of /La as a Value for a Normal
Population Mean
Let us start by recalling the univariate theory for determining whether a specific value /lQ is a plausible value for the population mean M. From the point of view of hypothesis testing, this problem can be formulated as a test of the competing hypotheses
Ho: M = Mo and HI: M * Mo
Here Ho is the null hypothesis and HI is the (twosided) alternative hypothesis. If Xl, X 2 , ... , Xn denote a random sample from a normal population, the appropriate test statistic is (X  Jko) 1 n 1 n 2 t where X =  XI' and s2 =  (Xj X) = s/Yn ' n n  1 j=l
2:
210
ZIZ Chapter 5 Inferences about a Mean Vector
The Plausibility of JLo as a Value for a Normal Population Mean Z 13
where
1
n
(pXl)
X ="'X· £..; I'
n j=l
S = (Xj  X)(Xj  X) , and Po = (pXp) n  1 j=1 (pXl)
1
2:
n
_

/
The statistic T2 is called Hotelling's T2 in honor of Harold Hotelling, a pioneer in multivariate analysis, who first obtained its sampling distribution. Here (1/ n)S is the estimated covariance matrix of X. (See Result 3.1.) If the observed statistical distance T2 is too largethat is, if i is "too far" from pothe hypothesis Ho: IL = Po is rejected. It turns out that special tables of T2 percentage points are not required for formal tests of hypotheses. This is true because
T
2' IS
l
which combines a normal, Np(O, 1:), random vector and a Wishart W _ (1:) random , matrix in the form ' p,n 1
lLIOJ
1L20
:
.
= (mUltiVariate normal)'
random vector
ILpo
Wishart random matrix ( d.f.
)1
(mUltiVariate normal) random vector (58)
1 = Np(O,1:)' [ n _ 1 Wp ,nI(1:)
This is analogous to or
]1 Np(O,1:)
d' 'b d (n  l)PF Istn ute as (n _ p) p.np
(55)

= (
normal. ) random varIable (
scaled) Chisquare)l random variable ( normal ) random variable d.f.
where Fp• n  p denotes a random variable with an Fdistribution with p and n  p d.f. To summarize, we have the following: Let Xl, X 2, ... , X" be a random sample from an Np(p, 1:) population. Then with X = Xj and S n J=l
2
_
1
2:
n
= (
1
_
n
1)
£..;
(Xj  X)(Xj  X ,
]

)/
1=1
a = PT> (n _ p) Fp.np(a) [
=
(n  l)p
for the univariate case. Since the multivariate normal and Wishart random variables are distributed [see (423)], their joint density function is the product of the margmal normal and Wish art distributions. Using calculus, the distribution (55) of T2 as given previously can be derived from this joint distribution and the representation (58). It is rare, in multivariate situations, to be content with a test of Ho: IL = ILo, mean vector components are specified under the null hypothesis. Ordmanly, It IS preferable to find regions of p values that are plausible in light of the observed data. We shall return to this issue in Section 5.4.
Example.S.1 .(Evaluating T2) Let the data matrix for a random sample of size n = 3 from a blvanate normal population be
/ I (n  l)p ( )] P [ n(X  p)S (X  p) > (n _ p) Fp,np a
(56)
whatever the true p and 1:. Here Fp,llp(a) is the upper (l00a)th percentjle of the Fp,np distribution. Statement (56) leads immediately to a test of the hypothesis Ho: p = Po versus HI: pPo. At the a level of significance, we reject Ho in favor of HI if the observed
n
=4
'*
Evaluate the observed T2 for Po = [9,5]. What is the sampling distribution of T2 in this case? We find .
()/SI( l)p 2 T = n xpo xpo ) > (n (np ) Fp.np () a
(57)
and
It is informative to discuss the nature of the r 2distribution briefly and its correspondence with the univariate test statistic. In Section 4.4, we described the manner in which the Wishart distribution generalizes the chisquare distribution. We can write
_ (6  8)2
+ (10  8)2 + (8  8)2
2
2
_ (6  8)(9  6)
SI2 
T2 =
Vii (X
 Po)/
(
" (Xj 2:
j=l
X)(Xj  X)/
n _ l'
)1
+ (10  8)(6  6) + (8  8)(3  6)
2
6)2
= 3
vn (X 
po)
S22
=
(9  6)2
+ (6  6j2 + (3
= 9
214 Chapter 5 Inferences about a Mean Vector so
The Plausibility of /Lo as a Value for a Normal Population Mean 215
Table 5.1 Sweat Data
Individual Thus,
Xl (Sweat rate)
(Sodium) 48.5 65.1 47.2 53.2 55.5 36.1 24.8 33.1 47.4 54.1 36.9 58.8 27.8 40.2 13.5 56.4 71.6 52.8 44.1 40.9
Xz
X3 (Potassium)
SI
=
(4)(9)  (3)(3) 3 4
and, from (54),
T 2 =3[89, 65)1
[9 3J = iJ [8 9J· [I I]
6=5 =3[1,
= 4Fz,1
1
Before the sample is selected, T2 has the distribution of a (3  1)2 (3  2) F2,3Z random variable.
•
The next example illustrates a test of the hypothesis Ho: f.L = f.Lo data collected as part of a search for new diagnostic techniques at the Umverslty of Wisconsin Medical School.
Example 5.2 (Testing a multivariate mean vector with T2) Perspiration 20 healthy females was analyzed. Three components, XI = sweat rate, XZ.= sodIUm content, and X3 = potassium content, were measured, and the results, whIch we call the sweat data, are presented in Table 5.1. Test the hypothesis Ho: f.L' = [4,50,10) against HI: f.L' "* [4,50,10) at level of
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
3.7 5.7 3.8 3.2 3.1 4.6 2.4 7.2 6.7 5.4 3.9 4.5 3.5 4.5 1.5 8.5 4.5 6.5 4.1 5.5
9.3 8.0 10.9 12.0 9.7 7.9 14.0 7.6 8.5 11.3 12.7 12.3 9.8 8.4 10.1 7.1 8.2 10.9 11.2 9.4
Source: Courtesy of Dr. Gerald Bargman.
Comparing the observed T Z
=
9.74 with the critical value
significance a = .10. Computer calculations provide
(n  l)p 19(3)· (n _ p) Fp,np('lO) = 17 F3,17(.10) = 3.353(2.44) = 8.18
=
x=
and
9.965 S 1.810 5.640
SI .
=
1.81OJ 5.640 3.628
.586 .022 .258J .022 .006 .002 [ .402 .258 .002
We evaluate
TZ =
20[4.640  4, 45.400  50, 9.965  10)
.586 .022 .258J [ 4.640  4 J 45.400  50 .022 .006 .002 [ .402 9.965  10 .258 .002 4.600, .035) .467J .042 .160
=
we see that T Z = 9.74 > 8.18, and consequently, we reject Ho at the 10% level of significance. We note that Ho will be rejected if one or more of the component means, or some combination of means, differs too much from the hypothesized values [4,50, 10). At this point, we have no idea which of these hypothesized values may not be supported by the data . We have assumed that the sweat data are multivariate normal. The QQ plots constructed from the marginal distributions of XI' X z , and X3 all approximate straight lines. Moreover, scatter plots for pairs of observations have approximate elliptical shapes, and we conclude that the normality assumption was reasonable in this case. (See Exercise 5.4.) • One feature of tl1e TZstatistic is that it is invariant (unchanged) under changes in the units of measurements for X of the form
(pXl)
= 20[.640,
[
9.74
Y=CX+d,
(pXp)(pXl) (pXl)
C
nonsingular
(59)
216
Chapter 5 Inferences about a Mean Vector A transformation of the observations of this kind arises when a constant b; is . · subtracted from the ith variable to form Xi  b i and the result is· < by a constant a; > 0 to get ai(Xi  b;). Premultiplication of the f:en!ter,''/ scaled quantities a;(X;  b;) by any nonsingular matrix will yield Equation As an example, the operations involved in changing X; to a;(X;  b;) exactly to the process of converting temperature from a Fahrenheit to a Celsius reading. Given observations Xl, Xz, ... , Xn and the transformation in (59), it immediately . follows from Result 3.6 that y = Cx Under the hypothesis Ho: #L
= 110,
HoteHing's T2 and Likelihood Ratio Tests 217 the normal likelihood specializes to
+ d and =
_1_ (Yj <n  1 j=l
±
The mean 110 is now fixed, but :t can be varied to find the value that is "most likely" to have led, with #Lo fixed, to the observed sample. This value is obtained by maximizing L(IIo, :t) with respect to :to Following the steps in (413), the exponent in L(IIo,:t) may be written as
YJ (Yj
 y)' = CSC'
.!.
2 j=I
±
(Xj  #LO)':tI(Xj  #Lo) =
=
n
.!.
2
j=l
±
tr[:tI(Xj  lIo)(Xj  lIo)'J lIo)(Xj  110)')]
=
Moreover, by (224) and (245),
IIy
= E(Y) = E(CX + d) = E(CX) + E(d) = CII + d
Applying Result 4.10 with B =
j=l
(Xj 
Therefore, T2 computed with the y's and a hypothesized value IIy.o = CIIo + d is
2: (Xj 
fLo)(Xj  110)' and b
n12, we have
(511)
T2 = n(y  IIY.O)'S;I(y  IIy.o)
= n(C(x  lIo»'(CSCTI(C(x  #Lo))
=
n(x  lIo)'C'(CSCTIC(x  #Lo)
with
= n(x  lIo)'C'(CTISICIC(X  #Lo)
= n(x  IIO)'S1(X  #Lo)
The last expression is recognized as the value of rZ computed with the x's.
1 :to = n
A
j=I
2: (Xj 
n
#Lo)(Xj  110)'
5.3 Hotelling's T2 and Likelihood Ratio Tests
We introduced the TZstatistic by analogy with the univariate squared distance t 2• There is a general principle for constructing test procedures called the likelihood ratio method, and the TZstatistic can be derived as the likelihood ratio test of Ho: 11 = 110' The general theory of likelihood ratio tests is beyond the scope of this book. (See [3] for a treatment of the topic.) Likelihood ratio tests have several optimal properties for reasonably large samples, and they are particularly convenient for hypotheses formulated in terms of multivariate normal parameters. We know from (418) that the maximum of the multivariate normal likelihood as 11 and :t are varied over their possible values is given by
(510)
Todetermine whether 110 is a plausible value of 11, the maximum of L(IIo,:t) is compared with the unrestricted maximum of L(II, :t). The resulting ratio is called the likelihood ratio statistic. Using Equations (510) and (511), we get LIkelIhood ratIO = A =
..
.
mfx L(IIo, :t)
fL,
L(:t) =
(Ii I )n/2
A
l:to I
(512)
The equivalent statIstIc A 2/n = Ii III io I is called Wilks' lambda. If the observed value of this likelihood ratio is too small, the hypothesis Ho: 11 = 110 is unlikely to be true and is, therefore, rejected. Specifically, the likelihood ratio test of Ho: 11 = lIoagainstH1:11 110 rejects Ho if
*
(513)
where i =
!
n j=l
±
(Xj  x)(Xj  x)' and
P
= x =
!
n
j=l
±
Xj
are the maximum likelihood estimates. Recall that P and i are those choices for fL and :t that best explain the observed values of the random sample.
where Ca is the lower (l00a)th percentile of the distribution of A. (Note that the likelihood ratio test statistic is a power of the ratio of generalized variances.) Fortunately, because of the following relation between T Z and A, we do not need the distribution of the latter to carry out the test.
218
Chapter 5 Inferences about a Mean Vector
Hotelling's T2 and Likelihood Ratio Tests 219
Result 5.1. Let XI' X 2 , ••. , X" be a random sample from an Np(/L, 'i,) population. Then the test in (57) based on T2 is equivalent to the likelihood ratio test Ho: /L = /Lo versus HI: /L #' /Lo because
A 2/" =
Incidentally, relation (514) shows that T2 may be calculated from two determinants, thus avoiding the computation of Sl. Solving (514) for T2, we have
(
)1 1 + T2
(n  1)
T2 = (n  :) 110 I
_ (n
 1)
I 'i, I
(n  1)
(Xi 
/Lo)(Xj  /Lo),1
 (n  1)
(515)
Proof. Let the (p
+ 1) x (p + 1) matrix
(Xj  x)(Xj  i)'
A
=
r
Ivn
(x 
#LO)J
=
.
A21
I±
1=1
(Xj  x)(Xj  x)'1
i A22
By Exercise 4.11, IAI = IA22I1All  A12A2"1A2d = IAldIA22  A21AIIAI21, from which we obtain
Likelihood ratio tests are common in multivariate analysis. Their optimal large sample properties hold in very general contexts, as we shall indicate shortly. They are well suited for the testing situations considered in this book. Likelihood ratio methods yield test statistics that reduce to the familiar F and tstatistics in univariate situations.
(1)\±
(Xj  x)(Xj  x)' + n(x  /Lo)(x  #La)' \
General likelihood Ratio Method
x)(x,  x)'
1=1
1 (x, 
i)(x,  x)'
111  n(i  ".)' (x, 
r
(x 
,,·)1
Since, by (414),
=
±
j=1
(Xj  x) (Xj  x)' + n(x  /Lo) (x  /Lo)'
the foregoing equality involving determinants can be written
(Xj or
/Lo)(Xj  /Lo)'\
=
(Xj  x)(Xj  X)'\(1)(1 + (n 1»)
A (
We shall now consider the general likelihood ratio method. Let 8 be a vector consisting of all the unknown population parameters, and let L( 8) be the likelihood function obtained by evaluating the joint density of X I, X 2 , ... ,X n at their observed values x), X2,"" XI!" The parameter vector 8 takes its value in the parameter set 9. For example, in the pdimensional multivariate normal case, 8' = [,ul,"" ,up, O"ll"",O"lp, 0"22"",0"2p"'" O"pI,P'O"PP) and e consists of the pdimensional space, where  00 <,ul < 00, ... ,  00 <,up < 00 combined with the [p(p + 1)/2]dimensional space of variances and covariances such that 'i, is positive definite. Therefore, 9 has dimension v = p + p(p + 1 )/2. Under the null hypothesis Ho: 8 = 8 0 ,8 is restricted to lie in a subset 9 0 of 9. For the multivariate normal situation with /L = /Lo and 'i, unspecified, 8 0 = {,ul = ,u10,,u2 = .uzo,···,,up = ,upo; O"I!o' .. , O"lp, 0"22,"" 0"2p"'" 0" p_l,p> 0" pp with 'i, positive definite}, so 8 0 has dimension 1'0 = 0 + p(p + 1 )/2 = p(p + 1)/2. A likelihood ratio test of Ho: 8 E 8 0 rejects Ho in favor of HI: 8 fl eo if max L(8)
A =
max L(8)
lIe8
lIe80
< c
(516)
I n'i,o I = I n'i, I
,
1
+ (n 
T2) 1)
(514)
Thus,
Here Ho is rejected for small values of A 2/" or, equivalently, large values of T2. The critical values of T2 are determined by (56). •
where c is a suitably chosen constant. Intuitively, we reject Ho if the maximum of the likelihood obtained by allowing (J to vary over the set 8 0 is much smaller than the maximum of the likelihood obtained by varying (J over all values in e. When the maximum in the numerator of expression (516) is much smaller than the maximum in the denominator, 8 0 does not contain plausible values for (J. In each application of the likelihood ratio method, we must obtain the sampling distribution of the likelihoodratio test statistic A. Then c can be selected to produce a test with a specified significance level u. However, when the sample size is large and certain regularity conditions are satisfied, the sampling distribution of 2ln A is well approximated by a chisquare distribution. This attractive feature accounts, in part, for the popularity of likelihood ratio procedures.
p)]Fp.a.3 (Constructing a confidence ellipse for p.p. As in (47). For p 2:: 4.n_p(a) ei where Sei = Aiei.p)j1f2 of p.p. where X = [Xl> X2 . we can calculate the axes of the confidence ellipsoid and their relative lengths. A confidence region is a region of likely 8 values..n_p(a)] = 1 . and the mequality and = radiation with door closed X2 == measured radiation with door open .o vectors for which 2 the T test would not reject Ho in favor of HI at significance level a..4 Confidence Regions and Simultaneous Comparisons of Component Means To obtain our primary method for making inferences from a sample.o versus HI: p.2.p. the directions and lengths of the axes of n(x . i = 1.p. we need to extend the concept of a univariate confidence interval to a multivariate confidence region. we shall denote it by R(X).a)% confidence region for p. If the (n _ p) _ (a) p.o [see (57)].For a particular sample. mce thiS IS analogous to testing Ho: P..) s .)'Sl(X .o) and compare it with [pen . The region R(X) is said to be a 100(1 .l)pFp...n_p(a)/(n .)'SI(X .n p are determined by going = l)Fp. x and S can be computed. XnJ' is the data matrix. X will be within [en . we cannot graph the joint confidence region for p. the region will be an ellipsOid centered at X.p.1) n(n _ p) Fp.• 220 Chapter 5 Inferences about a Mean Vector Confidence Regi{)ns and Simultaneous Compa'risons of Component Means 221 . '" P.p)]Fp. ence regIOn.Xn are 5. such that n(x .) s pen . we need to compute the generalized squared distance n(x . squared distance IS larger than [p(n l)/(n . region for the mean of a pdimensional normal dlstnbutlOn IS the ellipsoid determined by all p.n_p(a)/(n .n_p(a). P (519) The ratios of the A.  )' x xI. This ellipsoid is the 100(1 . with probability 1 .a (517) This probability is calculated under the true..x2"". This regIOn IS determined by the data.o lies within the confidence region (is a for p. However.0 " is not in the confid . The confidence region for the mean p. .the space of all possible parameter values.l)pFp.· •. Before the sample is selected. In this case. before the sample is selected.i Xj x Xj the sample observations.p..)'SI(X . . and for the moment. S (x. p.1) F (n _ p) _ (a) p.) s \: Fp. but unknown. S' . of a pdimensional normal populatIOn IS available from (56). x' S = ( _ ) ( n II I' (n _ 1) 1=1 £. ) pen . .17. Ex:ample 5..a)% confidence region if.1) F determine whether any P.). Let XI p[ n(X  p. and In words. provided that distance is defined in of . . Let 8 be a vector of unknown population parameters and e be set ?f possible values of 8. These are from the eigenvalues Ai and eigenvectors ei of S.n_p(a)/n(n _ p) units along the eigenvectors ei' Beginning at the center x the axes of the confidence ellipsoid are ' P[R(X) will cover the true 8] = 1. we see that the confidence region of (518) consists of all P.) s c2 = pen .10 and 4.)'SI(X ..) Data for radiation from microwave ovens were introduced in Examples 4. . value of 8.n _p (a) .s will help identify relative amounts of elongation along pairs of axes.n p an d (518) 1 n 1 n where i = . = P.p) will define a region R(X) .a whatever the values of the unknown p.l)/(n .p.
564 . 6. for c a constant.391) (. Thus.40(.55 The 95 % confidence ellipse for IL consists of all values (ILl.704] when these vectors are plotted with xas the origin.62 • Simultaneous Confidence Statements While the confidence region n(x . The center is at X' = [. An indication of the elongation of the confidence ellipse is provided by the ratio of the lengths of the major and minor axes.589) We conclude that IL' IL = = 1.1.002 = ..589 at tea =.589f .391 163.of significance. a test of Ho: d' f [.018 .0117J .05) = 3.026 4z(4o) (3.564.84(163. A2 = .564 ..603 . since F2. . or. and the halflengths of the major and minor axes are given by p(n .30 :s. .562)(. .1 A 95% confidence ellipse for IL based on microwaveradiation data.603 . c2 . The joint confidence ellipsoid is plotted in Figure 5.564 .ILIJ 200.. Z has an N(a' IL.603 .6 times the length of the minor axis. a corresponding sample of Z's can be created by taking linear combinations.064 2(41) 42(40) (3. We begin by considering simultaneous confidence statements which are intimately related to the joint confidence region based on the T 2statistic.1) n(n _ p) Fp.564 .228) (.391J 200.2..018) (.::==:======== = . l:) popUlation is available.562)2 + 42(200.710] [.np(a) Moreover. .n_p(a) and = '1'. ./p(n . :s.228 SI 203.391 The eigenvalue and eigenvector pairs for S are Al = .603 .603' = [ S = [.IL) :s.710.564 .710] and e2 = [.589 would not be reJecte III avor of HI: IL if:. z = a'x ..··.05) 2(41) Figure 5.045 2v%\j n(n _ p) Fp. X 2. ILz / p(n .ILl) (.589] is in the region.n_p(a) \lA." .4o( .23) = . 6. .1) v% \j n(n _ p) Fp. we adopt the attitude that all of the separate confidence statements should hold simultaneously with a specified high probability. + apXp = a'X From (243).. If a random sample Xl.0117 . . IL2) satisfying 42[ .562J h 05 Ievel . we compute = 42(203. Zn are. This ratio is vx... 42(203. In so doing. by Result 4.23) 2(41) Z = alXI + a2X2 + . any summary of conclusions ordinarily includes confidence statements about the individual component means.23.018 163.562.= [.0146 ' 163.002. n The sample mean and variance of the observed values ZI.IL2) To see whether IL' = The length of the major axis is 3. Equivalently.026..= . The axes lie along et = [. . Xn from the Np(lL. . by (336).161 . .710.704.603 .222 Chapter 5 Inferences about a Mean Vector For the n = Confidence Regions and Simultaneous Comparisons of Component Means 223 2 42 pairs of transformed observations.704] 0. [ .1) 2 AI\j n(n _ p) Fp.1) \IX.ILd 2 + 42(200. et = e2 = [.391J [. Z2.228 .= 3.018 and = Var(Z) = a'l:a respectively.6 / p(n .0144 . correctly assesses the joint knowledge concerning plausible values of IL.391) (.228) (.• . l:) distribution and form the linear combination [.562J . It is the guarantee of a specified probability against any statement being incorrect that motivates the term simultaneous confidence intervals.IL2 :s. 40 F2.589] is in the confidence region.. .2. j = 1.018) (.603 IL2] [ 163.603].n_p(a) = E(Z) = a' IL = \1. .564J x . 203. we find that .ILzf ..562. a'l:a) distribution.62 [. .IL )'SI(X . Let X have an Np(lL.564 .704.ILl.84( 163.
c 2 ..p.) = Tl (523) Z/Lz t = sz/Yn = Yn(a'ia'p. c2 a . values such that t 2 is relatively small for all choices of a. Let Xl.) Va'Sa (520) with the maximum occurring for a proportional to Sl(i _ p. 5 a' p.p. Yn (a'x .0.1) n(n _ p) Fp. where tn_.a.:. . 0..1..1) (n _ p) Fp. that a'Sa = Sll') Clearly. for all a.. Xn and a particular a.!itement Z . . since the coverage probability is determined by the of T2. .1) ) a'x ..1 dJ. 5 a'i + c )a'sa .))2 a'Sa 5 )p(n .p) [see (56)] gives intervals that will contain a' p. values for which implies for every a. the inter:al and leads to the st..1) for the T 2intervals allow us to conclude that Itl= or.. . . it would be desirable to associate a "collective" confidence . equivalently. For example. ..)'Sl(i ."" Xn be a random sample from an N (p.. However. for various choices of a. by all chOIces f a.)2 a'Sa n(a'(i .0].:• a'Sa Using the maximization lemma (250) with X = a.. .a = P[T 2 5 c2). all hold simultaneously with confidence coefficient 1 . with probability 1 ..a'p.. becomes the usual confidence interval for a normal population mean.:u n(a'(i .a with the confidence intervals that can be generated of Clen .p. . . Simultaneous confidence intervals can be developed from a conslderatlOn of confidence intervals for a' p. t of 1 .X 2  c )a'sa .0). Note that without modifying the coefficient 1 . The argument proceeds as follows. with probability 1 .))2J a'Sa = n(i  p. . /Lz 5 Z  + tn1(0'/2) Vn (a'x (521) pen .. [ . X2.. for every a. we get m. . c2... the confidence 10terval m (521) is that set<>f a' p.np(a) t2 = n(a'x . . respectively.a.l)Fp . and B = S. ai. It seems reasonable to expect that the constant in (522) will be replaced by a larger value.0. the confidence associated with all of the statements taken together is not 1 .p. Given a data set Xl.. with a' = [1.0].. when statements are developed for many choices of a.0._1(0'/2) + + (522) )p(n ... .np(O')a'Sa.)2 a'Sa s.0. is based on student's tratio 2 n(a'(i . 1:) population with J: positive definite. a 100(1 . Choosing c = pen .(n1(0'/2) Va'Sa Yn 5 a'p..0')% confidence interval for /Lz = a'p. . 5 _ Va'Sa a'x + tn1(0'/2) Vii will contain a' p. d = (x . a'X + n(n _'p) Fp. For a fixed and unknown. we could make statements the ponents of p. .a. (Note.1) (n _ p) Fp.)1 1 i Va'Sa 5t. • It is convenient to refer to the simultaneous intervals of Result 5.p./Lk d' . Xl. by choos1Og different vectors a..n_p(a)a'Sa pen .a p..p. .. a price must be paid for the convenience of a large con 1dence coefficient: intervals that are wider (less precise) than the 10terval of (521) for a specific choice of a. each with associated confidence coeffiCient 1 .3. ab 0.)l a'Sa =n [ m:x (a'(i .tn_I (0'/2) Vn or s..a'p._p(a)/(n . Intuitively. . Then. Inequality (521) can be interpreted as a statement about the components of the mean vector p.0.. where ai = 1 and . The successive choices a' = [1. a' p.))2 max t = max ''=. m this case. . correspon mg to a = 0. we can make statements about the /L' . However.). . .0). a' = [0. . = /L1. and so on through a' = [0. we are naturally led to the determination of where x and S are the sample mean vector and covariance matrix of the xls. Proof. From (523).). or n(a'x .. 224 Chapter 5 Inferences about a Mean Vector Confidence Regions and Simultaneous Comparisons of Component Means 225 and = a'Sa ConSidering the values of a for which t 2 s. Result 5.np(a) (524) A simultaneous confidence region is given by the set of a' p.a.a.3 as Tlintervals... simultaneously for all a..(0'/2) is the upper 100(0'/2)th percentile of a (distribution with n . .
69 [ 25..29 )5808.3.p (526) 00 and still maintain the confidence coefficient (1 .39 23. X2+ _ )p(n ... X 2 = verbal.500 0.2.29 )5808.. _ Example 5.\18. confidence ellipse on the axesmicrowave radiation data.n_p(a) Sik Sa Xk . ILd belonging to the sample meancentered ellipses o '" n[xi .05 23. ments about (ILi.84 222.69 58._ Xk +)p(n1)F (»)Sii. The 95% simultaneous intervals are shown as shadows.23 42 or pen ..29 .06 54. In addition.ILk] [Sii Sik]I[!i .13 + \18.11 .3· 11 _ )p(n . or projections.p p.1) (np) Fp.2.2 on page 228 for Xl = social science and history.ex) for the whole set of statements.5 (Constructing simultaneous confidence intervals and ellipses) The scores obtained by n = 87 college students on the College Level Examination Program (CLEP) subtest Xl and the College Qualification Test (CQT) subtests X 2 and X3 are given in Table 5.22 25.np(a) . or projections.03] 597..:.3.1) fSll _ Ip(n . .29 + \18.23 2(41) /0144) 42 or (.1) F _ 3(87 .06 In Figure 5.29 .555.2Sik + Skk n :5 ILi . = ( .29 :5 :5 :5 IL3 :5 25. we have redrawn the 95% confidence ellipse from Example 5._ The simultaneous T2 confidence intervals are ideal for "data snooping.1) (np) ...03 23.84 126.612) Let us compute the 95% simultaneous confidence intervals for We have 11 and . from (524).564  2(41) /0144 3. This connection between the shadows of the ellipsoid and the simultaneous confidence intervals given by (524) is illustrated in the next example.06 87 3(86) = ( . <_.69 .16 IL2 IL2 + \18.. and we have the statement Sii .05 or :5 :5 :5 :5 550.13 .84(·05) .7) = 8. we can include the state. The 95% simultaneous T2 intervals for the two component means are.552 2 0.2Sik+ Skk (np) p.603  /0146 2(41) 40 3. 503.(87 _ 3) F3.23 42 ' and we obtain the simultaneous confidence statements [see (524)] (.4 (Simultaneous confidence intervals as shadows of the confidence ellipsoid) .651 r·c.23 42' 40 . These data give 87 :5 ILl ILl :5 526.' ( X2 rs.06 597. 526..516..2Sik + Sa.226 Chapter 5 Inferences about a Mean Vector Confidence Regions and Simultaneous Comparisons of Component Means 227 ak = 1.39 222. and X3 = science.\18. of this ellipse on the axes of the component means.ILk (525) X. we obtained the 95% confidence ellipse for the means of the fourth roots of the doorclosed and dooropen microwave radiation measurements. Figure 5.29 .1) n.29 )12:..ILk n .n_p(·05) Xl + \j (n _ p) F ..h .12 54.ILi]:5 pen . Example 5.1) Fp..555 .05)\j.ILi. I                _I 0.564 + 40 3.29] 54. according to the results in Supplement 5A. The simultaneous T2 confidence intervals for the individual components of a mean vector are just the shadows.p(·05) i = 526.n.\j (n _ p) Fp.n_ p(.651) or = s:4 (2. of the confidence ellipsoid on the component axes. so linear combinations of the components ILi that merit inspection based upon an examination of the data can be estimated. In this case a'Sa = Sjj .604 In Example 5.2 T intervals for the component means as shadows of the Ip(n ." The confidence coefficient 1 .13 and S= [5808.29 51.a remains unchanged for any choice of a..603 + 2(41) /0146) 40 3.npa n .1) p ( XI . Xk .\18. ll.
859(54. For example.228 Chapter 5 Inferences about a Mean Vector or X2 X3 Xl (Social science and (Verbal) (Science) history) Individual X2 Xl (Social science and history) (Verbal) Individual Confidence Regions and Simultaneous Comparisons of Component Means 229 23.JL2.633(25. Johnson.69 . with the same confidence.849(54. .13 .65 s: IL3 s: 26.11 .18.32.56 ± 3. 1]. the marginal QQ plots and twodimensional scatter plots do not reveal any serious departures from normality for the college qualification test data.5. and these projections are the T 2intervals..(nl(a/2) s: ILl s: Xl s: IL2 s: X2 . 25 .68) is a 95% confidence interval for IL2 . along with the 95 % confidence ellipses for the other two pairs of means. (See Section 5. IL3)' we have 23.0.(nl(a/2) x2 .IL3 1[ 126.IL3f .44.39 23. the sample size is large enough to justify the methodology.2(23.2 X 0.IL2)2 + 4.) Moreover.11 J1 [54. xp .. For instance.69 .3 on page 230. and the same 95% confidence holds. . They may also be wider than necessary. (See Exercise 5.0] where ai = 1.(nl(a/2) J!¥ s: ILp s: xp + tnl(a/2) J!¥ . • A Comparison of Simultaneous Confidence Intervals with OneataTime Intervals An alternative approach to the construction of confidence intervals is to consider the components ILi one at a time. 1. Finally.39 87[54.) The simultaneous T 2intervals above are wider than univariate intervals because all three must hold with 95% confidence. ai. for the pair (IL2. the interval for IL2 .IL2) (25. even though the data are not quite distributed.29 This ellipse is shown in Figure 5.12 so (26.13 .69 .25..13) ± \18.29 + 23.1) F (05»)S22 X2 X3 (n _ p) p. 0.IL3 = 0.13 .cause.2S23 n = (54. .IL3' Simultaneous intervals can also be constructed for the other differences.69 .IL2 ] 25. as suggested by (521) with a' = [0. + (nl(a/2) (527) Data courtesy of Richard W.39) 87 = 29. we can make statements about differences. be. + (nl(a/2) .IL3 has endpoints (. .05 23. we can construct confidence ellipses for pairs of means..np' 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Source: 10 468 428 514 547 614 501 421 527 527 620 587 541 561 468 614 527 507 580 507 521 574 587 488 488 587 421 481 428 640 574 547 580 494 554 647 507 454 427 521 468 587 507 574 507 41 39 53 67 61 67 46 50 55 63 59 53 62 65 48 32 64 59 54 52 64 51 62 56 38 52 40 65 61 64 53 51 58 65 52 57 66 57 55 61 54 53 64 72 26 26 21 33 27 29 22 23 19 32 31 19 26 20 28 21 27 21 21 23 25 31 27 18 26 16 26 19 25 28 27 28 26 21 23 23 28 21 26 14 30 31 31 23 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 71 72 494 541 362 408 594 501 687 633 647 647 614 633 448 408 441 435 501 507 620 415 554 348 468 507 527 527 435 660 733 507 527 428 481 507 527 488 607 561 614 527 474 441 607 41 47 36 28 68 25 75 52 67 65 59 65 55 51 35 60 54 42 71 52 69 28 49 54 47 47 50 70 73 45 62 37 48 61 66 41 69 59 70 49 41 47 67 24 25 17 17 23 26 33 31 29 34 25 28 24 19 22 20 21 24 36 20 30 18 25 26 31 26 28 25 33 28 29 19 23 19 23 28 28 34 23 30 16 26 32 + S33 .._ ) ± )p(n .IL3) s: 8. with a' = [0.. This approach ignores the covariance structure of the P variables and leads to the intervals Xl .69 .61 With the possible exception of the verbal scores.13 . The projections or shadows of these ellipses on the axes are also indicated.
. Table 5. . consider the special case where the observations have a joint normal distribution and li = Since the observations on the first variable are independent of those on the second variable. all p means. I and tnI(.C .36) = 4.95 and p = 6.. the individual intervals must be wider than the separate tintervals. as well as the corresponding simultaneous T 2intervals. the multipliers of in (524) and (527) are 1) 4(14) (n _ p) Fp.5 Figure S..a) = l : O"ll 0 0"22 o o : 0 (1  aV If 1 .14 3.28 50.52 6.a that' all of the statements about the component means hold simultaneously. about the probability of all mtervals containing their respective IL/S. The T 2intervals are too wide if they are applied only to the p component means.970 1.5 54..95)6 = . can.14 '" 00 )p(n  o '500 522 .. respectively...08 p = 10 11.  ..025) = 2. . then (ILl.3 gives some critical distance multipliers for oneatatime tintervals computed according to (521). n = 15. and p = 4...a = . as we have seen. Nevertheless. Consequently.3 Critical Distance Multipliers for OneataTime t..95 for the rectangle formed by the T 2intervals. be much less than . some researchers think they may more accurately represent the information about the means than the T 2intervals do.a. i ' 500 522 544 Table ·S..a)(l.010 1.. and the overall confidence associated with a collection of individual t intervals. If ILl lies in its T 2interval and 1L2lies in its T 2interval..145.95.. The oneatatime t intervals are too short to maintain an overall confidence level for separate statements about.95 of covering the mean vector IL with its component means ILl and IL2' Consequently.145 = 93% wider than those derived from the oneatatime t method.19 3. As we have pointed out.145)/2. in general.a = .. The comparison implied by Table 5. and so on.::=..3 is a bit unfair. The confidence ellipse is smaller but has probability ."'. " '" N __. the width of the T 2intervals. the ith interval has 1 .39 5..L2 will be larger than .Intervals and T 2 Intervals for Selected nand p (1 .f covering lLi. To see why. '" 544 .95) )(n . Moreover. is ..l)p (n _ p) Fp..2.. for fixed nand p.. consider the confidence ellipse and the simultaneous intervals shown in Figure 5. IL2) lies in the rectangle formed by these two intervals. relative to the tintervals.. increases for fixed n as p increases and decreases for fixed p as n increases.np(. This rectangle contains the confidence ellipse and more. To shed some light on the problem...just how much wider depends on both p and n.2. if the oneatatime intervals are calculated only when the T 2 test rejects the null hypothesis. this probability is (. for the same n..05)  1 . Before the sample is selected. This result leads us to consider a second approach to making· multiple comparisons known as the Bonferroni method.: n tn_I (·025) 2.064 2. as well as on 1 . the product rule for independent events can be applied.61 4.05) = 11 (3. the probability of covering the two individual means ILl and f.a o.05 4.5 58. In general. __ '.31 3. since the confidence level associated with any collection of T 2intervals.230 Chapter 5 Inferences about a Mean Vector Confidence Regions and Simultaneous Comparisons of Component Means 231 To guarantee a probability of 1 . :.60 3.145 2.'S) = (1 ...a)···(l .a = ...14 . . this probability is not 1 . 15 25 50 100 00 Although prior to sampling.n_p(. we sometimes look at them as the best possible information concerning a mean.95. P[allt_intervalsin(527)containthelL.3 95 % confidence ellipses for pairs of means and the simultaneous T 2 intervalscollege test data.a. in this case the simultaneous intervals are lOD( 4. For 1 ..95. we do not know what to assert.. say. if this is the only inference to be made.74..960 p=4 4.
1  (:1 + : + .327.Ll:$ XI + fij _ t (!:. . 3 .L2+ ···+ a J.... all iJ .ed from !llectio n of data. p[x.".651 . ILl and ILz.560 .l)pFp. Bonferron i 00 =1a .m [C ] = 1. noting that n = 42 and 1 1(. denote a confiden ce state.p ) rs2j 2p '1.607 . m. . Often.e e interval estimates for the restricted set consIstm g Let us develop ..603 ± 2...Lj of J.. The percent age point tn_l(a/2p ) replaces V(n . . _ The method for inequality carrying that name. + : ) mtenns .3 and 5. : \j. from (528).P(Cjtrue » rsn = 1 ..) fs2i.4 The 95% T2 and 95% Bonferroni simultaneous confiden ce intervals for the component meansm icrowav e radiation data.327 I0144 42 or . o . allows an investiInequality (528)... 2. . f th I ss important . of the fourth roots of the doorclo sed and doorop en measure ments with Cli = . i = 1... I earcomb matlOnS 311L.646 .' . with a· = a/m.a/m. XI  o . 1 .604 Figure S.560:$ IL2 :$ .false) i=l = 1 .2:.. If th: number m of 11 simultaneous confidence interval matIons s can be '+aJ .564 ± 2. the Bonferr oni interval falls within the T 2interval .h Therefo re.. Exercis e 5.05/2(2» = t41(.n _p(a)/(n .0125) = 2. ..555 X2 f¥:$ J.L.4.. pnor to the .3.. For each component mean. Consequ ently. small number of individual confidence statements. another chOice or . Example S. f the Bonferroni inequality. a special case 0 + + .327 ). . rs.LI ' 2 ( P ecise) than the simultan eous T 2mterval s.516 521 I 0. is also the flexibility of tion structure behmd the confl ence of important statements and balancin g it by controll ing the error rate for a group . contains J.a· i = 1.Lj] = 1 . to get 4 The Bonferroni Method of Multiple Comparisons . develop ed that are shorter pr mparis ons is called the Bonferroni method. p(C. . + a regardless of the correlagator to control the.p)..3. In t h ese SI ua I T d component means ILi or linear corn b" Result 5.552 .(al + a2 + .. attentIOn IS t?bla d better than the simultan eous intervals of 't fons it IS pOSSI e to 0 .k17·612 0. I ± t 111 2m '1. fonnation on the relative importa nce of these of the components J.2. Lackmg ID. overall al stat::nents. I = Figure 5.521 :$ ILl :$ .L are ..a. but otherwi se the intervals are of the same structur e.I ± t111 i = 1. We shall obtain the simultan eous 95% Bonferr oni confidence interval s for the means.500 . (1 1 .232 Chapter 5 Inferences about a Mean Vector Confidence Regions and Simultaneous Comparisons of Component Means 233 The stateme nts in (529) can be compar ed with those in (524).2... . The. we have. Wit an overall confidence . requI'red Let C. \j. the rectangu lar Goint) region formed by the two Bonferr oni interval s is contain ed in the rectangu lar region formed by the two T 2intervals.level greater than or equal to 1 .L2 :$ X2 + tnI(. + am) . IL2 from Figure 5. ..4 shows the 95% T2 simultan eous confiden ce intervals for ILl.' " 3 J. SIDce P[X.. Now (see ment about the value of aiIL WIth P i t r u e" .0146 42 or .P[ at least one Ci false] m i Xl ± t41(·0125) X2 ± t41(·0125) fsU = y. = alJ.:$ J.Lj.• m .:.32 /L'. because It IS develop. .Llssm a. If we are interest ed only in the compon ent means. we can make the following m = p statements.. confidence stateme nts about m linSuppose that.6 (Constructing Bonferroni simultaneous confidence interval s and comparing them with T2 intervals) Let us return to the microwa ve oven radiatio n data in Exampl es 5.05/2.2. We make use of the results in Exampl e 5. along with the correspo nding 95% Bonferr oni intervals . P[ all C true] = 1 .:. the Bonferr oni interval s provide more precise estimate s than we oooOd: I. nl (529) 0.m.646 contains J..6).
48 . the Bonferroni intervals will always be shorter.58 .n_p(a)/(n . Both tests of hypotheses and simultaneous confidence statements will then possess (approximately) their nominal levels. . How much shorter is indicated in Table 5.IL) is approximately X2 with p d. Let XI.2. if the observed n(x .(a») == 1 .69 .. in addition. the more efficiently the sample information will be utilized in making inferences. A closer examination. since (x... Xn be a random sample from a population with mean which does not depend on the random quantities Xand S.. Length of Bonferroni interval = Length of T2interval tn I (Cl/2m ) .15. . for a small number m of specified parametric functions a' IL.4 with the corresponding normal theory test in (57). the hypothesis Ho: fL = lLa is rejected in favor of HI: IL .17.16.91 . All largesample inferences about IL are based on a . From (428). tlH'''lUgOtlS the closer the underlying population is to multivariate normal. the 95% confidence region for IL gives the plausible values for the pairs (ILl.1L)'(n1Srl(X .75 . ILk)' i. . will contain a' IL. In fact. This follows directly from the fact that (n .p lLa.a)% simultaneous confidence statements s.91 4 . the sample meancentered ellipses .Fp' np( Cl) n. . and thus.5.80 . k = 1.88 .91 .66 15 25 50 100 00 Comparing the test in Result 5.) Result 5. On the other hand.. The advantages associated with large samples may be partially offset by a loss in sample information caused by using only the summary statistics X.4 is appropriate. and S. S) is a sufficient summary for normal populations [see (421)].fL) = n(X .4 for selected nand p. but the critical values are different.IL)'SI(X . n 2 . These procedures are summarized in Results 5.1) '". We see from Table 5. reveals that both tests yield essentially the same result in situations where the x2test of Result 5. IL and positive definite covariance matrix :to When n . On the other hand. tests of hypotheses and confidence regions for IL can be constructed without the assumption of a normal population.78 .f. and 5. 1L2) when the correlation between the measured variables is taken into account...fL) :5 A1.Cl = .90 . X 2. with probability approximately 1 .4 that the Bonferroni method provides shorter intervals when m = p. X 2.05/m m=p mately a.81 10 .62 . Let XI. however. we can make the 100(1 . Because they are easy to apply and provide the relatively short confidence intervals needed for inference. . we know that (X . As illustrated in Exercises 5.a (531) where is the upper (l00a)th percentile of the Equation (531) immediately leads to large sample tests of hypotheses and simultaneous confidence regions.5. for all pairs (lLi.(a) Large Sample Inferences about a Population Mean Vector When the sample size is large. Table S.fLo) > A1.29 .IL)'SI(X .95 and Cli = .p) and are approximately equal for n large relative to p. Consequently. • The Bonferroni intervals for linear combinations a' IL and the T 2intervals (recall Result 5. serious departures from a normal population can be overcome by large sample sizes.s XI ± V A1. a'X ± V Ja'sa .(a) V A1..a. we are able to make inferences about the population mean even though the parent distribution is discrete.4 (Length of Bonferroni Interval)/(Length of T 2Interval) for 1 .3) have the same general form: a'X ± (critical value) Consequently. .p is large. for large n. X2 ± f¥ fi} contains ILl contains 1L2 contains ILp and.p is large. Xn be a random sample from a population with mean IL and positive definite covariance :to If n .4.(a) • Here a) is the upper (100a )th percentile of a chisquare distribution with p dJ. _ n P[n(X .lLa)'SI(x . p.idistribution. at a level of significance approxi Result S.As we have pointed out.5.p . in every instance where Cli = Cl/ rn.4 and 5.. we see that the test statistics have the same structure. (See Tables 3 and 4 in the appendix.234 Chapter 5 Inferences about a Mean Vector Large Sample Inferences about a Population Mean Vector 235 the T 2intervals. for every a.l)pFp. we will often apply simultaneous tintervals based on the Bonferroni method.
i = 1.6 ± Y12. .6 ± Y12.53 :s f.0..L6 :s 23.L3 :s 36.6 35. .) 5.D2 26.. The Bonferroni simultaneous confidence intervals for the m = p statements about the individual means take the same form.2 with c2 = a).4 and 5. Summary statistics for part of the data setare given in Table 5..1 26. with approxi mately 90. .12 3. 28. The next example allows us to illustrate the construction of large sample simultaneous statements for all single mean components.L3 contains f. rs...5 :s f.. .z (a) Vrs.2 v'% Based.0. It is good statistical practice to subject these large sample inference procedures to the same checks required of the normaltheory methods. . certainly larger sample sizes are required for the asymptotic distributions to provide good approximations to the true distributions of various test statistics.1 22.% confidence. .1. . we simply state that f'I . for example. sample sizes in the range 3D to 50.5 are useful only for very large samples.2. is much different than an application with p = 52 and sample size 100 although both have n . Lacking definitive studies.1 ±YI2. and meter components of 1'0 do not appear to be plausible values for the corresponding means of Finnish scores. simultaneous 90.. where ai = 1. [2b.82 5.39 :s f.0]. > ± V Jf. . level of approximately 1 ..76 3. the true error rate may be far removed from the nominal level a..Li are obtained by the special choices a' = [0.3 12.. The question of what is a large sample size is not easy to answer. 22. including transformations. In one or two dimensions.L6 contains f.0..L2 contains f.0.0. 2p i = 1. "2 V. ..7 ± 20.2 23.2.22] Example S.L5 contains f. Specifically. The overall confidence.27 :s f. The ellipsoids for pairs of means follow from Result 5A. An application with p = 2 and sample size 50. upon thousands of American students. ' ..2..236 Chapter 5 Inferences about a Mean Vector Large Sample Inferences about a Population Mean Vector 237 Let us construct 90..2. As the number characteristics bec9mes large. In some instances.D2 35. . the oneatatime confidence intervals for individual means are 12thGrade Finnish Students Participating in a Standardization Program Raw score Variable Xl X2 X3 X4 X5 X6 X7 Mean (Xi) 28.Z f.5.0.31.27 :s f. :s Xi  2p f.61 :s f.7.2 ± Y12. are desirable.85 3. with c2 = The probability level is a consequence of (531)..22.L7 :s 24. the investigator could hypothesize the musical aptitude profile to be 1'0 = [31.67 or or or or or 34.0.. The statements for the f. .93 22.% simultaneous confidence intervals for the individual mean components f. If. .5. once again. . .P must be large and realize that the true case is more complicated.D2 34.% confidence limits are given by Xi Proof. on the basis of QQ plots and other investigative devices outliers and other forms of extreme departures are indicated (see.. tempo.4 34. . . .D2 22.p where z(a/2) is the upper l00(a/2)th percentile of the standard normal distribution.23.93 4. where = 12. When the sample size is large..7.39 21.a for all statements is. 0.D2 96 96 96 96 96 96 contains f..L2 :s 28. i = 1.L5 :s 24.Li :s Xi + Z (a) rs.S Musical Aptitude Profile Means and Standard Deviations for 96 We see from the simultaneous statements above that the melody.14 or 24. or Thus. extreme deviations could cause problems. Table S.2.Li' i = 1. From Result 5.27.4 ± Y12.7 (Constructing large sample simultaneous confidence intervals) A music educator tested thousands of FInnish students on their native musical ability in order to set national norms in Finland..3 = = = = = = = melody harmony tempo meter phrasing balance style . :s Xi . appropriate corrective actions.L7 26..Li :s Xi + Z (a) V.(a) "2 yrs..34. i = 1.13 vT2.75 32. perhaps.6 22. Results 5.LI :s 30. can usually be considered large.0. ai.2. Sell.76 5..L4 :s 36.7 Standard deviation (\t'S. Although small to moderate departures from normality do not cause any difficulties for n large. P .06 :s f. a result of the large • sample distribtltion theory summarized in (531). These statistics are based on a sample of n = 96 Finnish 12th graders.0.L4 contains f. but use the modified percentile z( a/2p) to give Source: Data courtesy ofY.LI contains f.. Methods for testing mean vectors of symmetric multivariate distributions that are relatively insensitive to departures from normality are discussed in [11]. ± Y12.p = 48. The first part follows from Result 5A.D2 23. p.02 4.
13 23. with the relatively large sample size n = 96.07. Bonferroni. = 2.59 24. we see that all of the intervals in Table 5. the sample standard deviation is = 607. police department regularly monitors many of its activities as part of an ongoing quality improvement program. digit.025) = 1.80 23.3(standard deviation) Musical Aptitude Data The oneatatime confidence intervals use t95(. and T2IntervaIs for the Lower control limit (LCL) = x + 3(standard deviation) x . but they can also suggest improvements to the process. These causes of variation often indicate a need for a timely repair. The simultaneous Bonferroni intervals use z( .24 Xl = melody X 2 = harmony X3 = tempo X4 = meter Xs = phrasing X6 = balance X 7 = style Although the sample size may be large.77 36. data should be collected to evaluate the capabilities and stability of the process.04 23.025) = 1.titude Data The oneatatime confidence intervals use z(. and Fbased.57 29.41 34. and T 2Intervals for Multivariate Quality Control Charts 239 5.68 28. Plot the individual observations or sample means in time order. Table 5. 2. To create an X chart.41 21.59 29.27 27. Variable Oneatatime Bonferroni Intervals Shadow of Ellipsoid Lower Upper Upper Lower Upper Lower 26. However.08 23. or about half a year.28 25. it is often the sample standard deviation. We examine the stability of the legal appearances overtime hours. Oneatatime Bonferroni Intervals Shadow of Ellipsoid Lower Upper Lower Upper Lower Upper 26. For single observations.84 21. = 3558 + 3(607) = 5379 = 3558 .51 26. or tenths.7 gives the individual.16 35.24 22. The control limits of plus and minus three standard deviations are chosen so that there is a very small chance.33 32.63 23.43 34.05) = 14. hence. Since individual values will be plotted.88 29.6 The Large Sample 95% Individual.79 36.17 35. The purpose of any control chart is to identify occurrences of special causes of variation that come from outside of the usual process.76 22. and chisquarebased (or shadow of the confidence ellipsoid) intervals for the musical aptitude data in Example 5. If the means of subs am pies of size m are plotted. Bonferroni. then the standard deviation is the sample standard deviation divided by Fm. that indicate the amount of variation due to common causes.35 32.6 Multivariate Quality Control Charts To improve the quality of goods and services.20 21.18 22.44 28.6. Calculate and plot the controllirnits given by Upper control limit (UCL) = the Musical Ap.07 20.24 36.7 The 95% Individual.47 35. The latter constants are the infinite sample size limits of the· former constants.90 24.89 29.93 25. When a process is stable.92 21.69.66 23.83 25.96 34. called control limits.64 24. or shadows of the ellipsoid.79 23.16 30.24 24. and the controllirnits are UCL = Xl + LCL = Xl .94 32.3(607) = 1737 .35 22.52 26.16 22. The simultaneous Bonferroni intervals use t95(.99 34.22 24.89(.6 gives the individual.. or shadows of the ellipsoid.7 with Table 5.76 24.86 36.16 25.16 20. use F7.97 36. Wisconsin. The simultaneous T2.99.12 22. A computer calculation gives Xl = 3558.36 22.30 28.52 24. 3.50 24.23 33.84 36.and tbased percentiles rather than use the chisquare or standard normalbased percentiles.54 20.85 32.75. or shadow of the confidence ellipsoid. data need to be examined for causes of variation.95 25.8 (Creating a univariate control chart) The Madison.11. Bonferroni.64 33. Table 5. Table 5.05) Variable The standard deviation in the control limits is the estimated standard deviation of the observations being plotted. Control charts make the variation visible and allow one to distinguish common from special causes of variation.33 Xl = melody X 2 = harmony X3 = tempo X 4 = meter Xs = phrasing X6 = balance X 7 = style Example 5.025/7) = 2.025/7) = 2. The F and t percentiles produce larger intervals and. assuming normally distributed data. the variation is produced by common causes that are always present.21 21.95 36.90 21.25 27. an observation suggesting a special cause of variation. One useful control chart is the X chart (read Xbar chart).45 35.57 20.63 33.72 28.07 30.10 23. Bonferroni.50 21.48 24. the sample mean of all of the observations. of falsely signaling an outofcontrol observationthat is. are more conservative.7 are larger. Xl is the same as Xl' Also. Each observation represents a total for 12 pay periods. some statisticians prefer to retain the F.79 22.81 25.85 21.96. the differences are typically in the third. The simultaneous T2. Table 5.0(. and no one cause is a major source of variation. When a manufacturing process is continuously producing items or when we are monitoring activities of a service.8 gives the data on five different kinds of overtime hours.36 33. use . A control chart typically consists of data plotted in time order and horizontal lines.238 Chapter 5 Inferences about a Mean Vector Table 5. Comparing Table 5. 1.21 36.7. intervals for the musical aptitude data. Create and plot the centerline X.61 24.
The two most common multivariate charts are (i) the ellipse format chart and (ii) the T 2chart. n J+ 1 .!. we discuss these procedures when the observations are subgroup means.x) s.482 14. _.X) has a chisquare distribution. The variation in overtime hours appears to be due to common _ causes.X) _ = (1 ..i)'SI(X .699 13. X2 Department XI The legal appearances overtime hours are stable over the period in which the data were collected..XI n .X has = Legal Appearances overtime Hours 5500 (1 .329 12.900 15 236 310 1182 1208 1385 1053 1046 1100 1349 1150 1216 660 299 206 239 161 1.1)2 !. X j .534 11. X· ). so no specialcause variation is indicated.). a multivariate approach should be used to monitor process stability.328 12. High correlations among the variables can make it impossible to assess the overall error rate that is implied by a large number of univariate charts. but its approach is limited to two variables.)X . X 2 .236 15.367 13. Such an approach can account for correlations between characteristics and will control the overall probability of falsely signaling a special cause of variation when one is not present.189 15. me an The data.S The c a X 1 '" legal appearances overtime hours.528 12.X is not independent of the sample covariance matrix S.609 14. X2. overtime allowed.> § 3500 x\ = 3558 Cov(Xj . d control limits are plotted as an X' chart in ' Charts for Monitoring a Sample of Individual Multivariate Observations for Stability We assume that XI. Xj2)' The 95% quality ellipse consists of all X that satisfy (x .847 13.979 13.X.. However to set control limits. Later.X)'SI(Xj ..!.. Two cases that arise in practice need to be treated differently: X3 X4 Xs Hours Extraordinary Event Hours 2200 Holdover Hours COAl Hours Meeting Hours 3387 3109 2670 3125 3469 3120 3671 4531 3678 3238 3135 5217 3728 3506 3824 3516 1 Compensatory 875 957 1758 868 398 1603 523 2034 1136 5326 1658 1945 344 807 1223 · 1181 3532 2502 45tO 3032 2130 1982 4675 2354 4606 3044 3340 2111 1291 1365 1175 14.861 11.8 Five'lYpes 0 f0 ver (me Hours for the Madison. Setting a control region for future observations Initially. The two characteristics on the jth unit are plotted as a pair (Xjl.052 12.'" n } ..•• . By Result 4. Police I . Ellipse Format Chart.h rt for Figure S. With more than one important characteristic. along with the center1 Figure 5. • X.X n n 4500 " <a ::> and ..!.X has a normal distribution but. we consider the use of multivariate control procedures for a sample of multivariate observations Xl. Wisconsin. X" are independently distributed as Np(p" !. Multivariate Quality Control Charts 241 240 Chapter 5 Inferences about a Mean Vector Table 5. we approximate that (Xj .5."" X".8. Monitoring the stability of a given sample of multivariate observations 2.l)n2 !. The ellipse format chart for a bivariate control region is the 1500 o ObserVation Number 15 more intuitive of the charts. = n!' (nl) 2500 LCL = 1737 Each X j . ¥Z(05) (532) .!. . + (n . .'!'X'_ n } I .
Notice that the T 2 statistic is the same as the quantity used to test normality in Section 4. Although.5.1) o = 367844. along with the pairs of data.7. and the ellipse becomes Slls22 SllS22 SI2 > Oi ::> '6 . sometimes.2071 (367844.093. A majority of the extraordinary overtime was used in that fourweek period.6 The quality control 5500 99% ellipse for legal x7.7 . or.884. x7. When the lower control limit is less than zero for data that must be nonnegative. When a point is out of the control region. dJ . which consists of all x that satisfy Here p = 2.8J 72.053. and this makes patterns and trends visible.7 72. A T 2chart can be applied to a large number of characteristics.1 3000 2000 1000 We illustrate the quality ellipse format chart using the 99% ellipse.8)2 (X2  5 10 Observation Number 15 Figure 5.(72093.1 .7 . There is no centerline in the T 2 chart. and students at Madison were protesting. it still has a certain stability. Sll xd _ 2s12 (Xl . individual X charts are constructed. Was there a special cause of the single point for extraordinary event overtime that is outside the upper control limit in Figure 5.21. so = 9. it is not limited to two variables.7 TheX'" chart for X2 = extraordinary event hours.. •• • • • • • • • We then plot the T 2values on a time axis. extraordinary overtime occurs only when special events occur and is therefore unpredictable.7 Xl  X 1399053. " 0 c " & '" Notice that one point...:'"':. For the jth point.§ " e Of! . X ( 3558)2 (XI . The LCL = 0 limit is shown by the dashed line in Figure 5.21 This ellipse format chart is graphed.s 0 1000 (Xl '.. .xd (X2 SllS22 X2) + (X2 S22 xd) 2000 3000 LCL = .1478) 367844.7 X 1399053.8) 367844.3558) (X2 .?? During this period.399. we calculate the T 2statistic (533) tI\ S " 0 ••• • +.093.9.05) Appearances Overtime appearances and extraordinary event overtime.(.6. is definitely outside of the ellipse. TheX'" chart for XI was given in Figure 5. by its very definition. • T 2Chart.9 (An ellipse format chart for overtime hours) Let us refer to Example 5. in Figure 5.1 . The lower control limit is zero and we use the upper control limit ' I ueL = 1500 2500 3500 4500 Figure 5.7 X 1399053.6. the United States bombed a foreign capital. indicated with an arrow.( .01). the points are displayed in time order rather than as a scatter plot. it is generally set to zero.8 1. A computer calculation gives _ x = [3558J 1478 and S = [ 367.2( 72093.1 + 1478)2) < 1399053. extraordinary event) hours. Moreover. that for X2 is given in Figure 5. Unlike the ellipse format.8 and create a quality ellipse for the pair of overtime characteristics (legal appear Multivariate Quality Control Charts 243 Extraordinary Event Hours 6000 5000 4000 Oi ::> " ances.242 Chapter 5 Inferences about a Mean Vector Example 5.
7 21.3 53.0 22.9.6 288.9 gives the values of these variable s at fivesecond intervals .8 22.3 53.0 51. Exampl e 5.0 22.2 286.7 287.7 55.0 52. A modified region based on Bonferroni intervals is frequent ly chosen for this purpose.3 21. .0 10 • .0 22. ances hours and X = extraordinary event hours.2 22.3 22.. In order to control the process.0 287.7 51.3 22.0 288. one process engineer measure d four critical variables : Xl = Voltage (volts) X2 = Current (amps) X3 = Feed speed(in /min) X 4 = (inert) Gas flow (cfm) 12 1314 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Source: Data courtesy of Mark Abbotoy. The inputs to the automat ed welding machines must be controlled to be within certain operatin g limits where a machine produces welds of good quality.0 51.6 22.8 The T 2 chart for legal appearances hours and extraordinary event hours.0 287.6 288.5 22.9 22.0 51. Table 5.3 52.0 290. T 2charts with more than 2 variable s are conside red in Exercise 5.0 51. The kth variable is out of control if Xjk does not lie in the interval (Xk .7 52. 11 h 6 4 2 0 0 • • • 2 • • 4 • 6 8 Period • 10 12 14 16 Figure 5.11 (Control of robotic welders more than T2 needed) The assembly of a drivesha ft for an automob ile requires the circle welding of tube yokes to a tube.1 22. When the multivariate T 2chart signals that the jth unit is out of control.0 52.6 288.8 22.0 54.2 288.0 22.0 289.0 288.0 51.9 21.0 288.8. as in pie 5..0 288.1 22.2 286.0 22.01 to be consiste nt the ellipse format chart in Example 5. .0 Gas flow (X4 ) 51.1 21.0 51.0 52.3 288.0 52.0 51.7 290.1 288.0 51.7 Current (X2 ) 276 281 270 278 275 273 275 268 277 278 269 274 270 273 274 277 277 276 278 266 271 274 280 268 269 264 273 269 273 283 273 264 263 . we construc t a T2plot based on the two variable s Xl = legal .0 288.7 51.0 22. Further investigation.26.2 22.0 22.3 22.0 290. We take a = .8 reveals that the pair (legal appeara nces.0 52.2 22.0 51.0 287.0 290. 12 Multivariate Quality Control Charts 245 Table 5.0 52.0 51. nary event) hours for period 11 is out of control.3 52.7 51.0 52.3 289.3 52.7 289.8 22.0 52. 266 263 272 217 272 274 270 Feed speed (X3 ) 289.7 51.2 22.2 290.3 21.5 22.0 288...0 289.0 52.0 52.0 54.244 Chapter 5 Inferences about a Mean Vector Example 5.6 289.0 53.0 52. The T2chart in Figure 5. Xk + where p is the total number of measured variables.0 287. Example 5.0 288.9 Welder Data Case 1 2 3 4 5 6 7 8 9 10 Voltage (Xt> 23.0 52. a = .0 287.6 22..5 22.0 287. confirm s that this is due to the large value of extraord inary event during that period.6 21.0 287.4 290.0 289.0 22..3 23.9.5 22.8 22.3 51.10 (A T 2 chart for overtime Using the police departm ent data· .1 22.0 51.0 289.0 289.0 288.7 22.0 290.0 52.8 22.01. it should be determi ned which variables are responsible.8 22.
IO The 99% quality control ellipse for In(gas flow) and voltage. The univariate X chart for· In(gas flow).X) n+ n .II The univariate X chart for In(gas flow). .0 23.J • • • • • • • • • • • • •• • • • •• • • •• • •• ••• •••• •• •• •• 10 Control Regions for Future Individual Observations The goal now is to use data Xl. or almost masked (with 95% limits).Sl n+1 J n (X .x x. reveals that case 31 is out of and this is due to an unusually large volume of gas flow.X).X has mean O. 4. A T 2chart for the four welding variables is given in Figure 5.5 23. we take the observations to be independently distributed as Np(/L. by Result 4. Now.. Multivariate Quality Control Charts 247 4.). However..). X 2. Voltage Figure S. Because these regions are of more general importance than just for monitoring quality. Result S....X :5 (n .a)% pdimensional prediction ellipsoid is given by all X satisfying . v'nj(n + 1) (X . The region in which a future observation is expected to lie is called a forecast.951 3.0 21.95 Mean = 3. successive observations on each variable.85 20.0 and. = 1 n (n + 1) n 1.11.X) n+1 .246 Chapter 5 Inferences about a Mean Vector The normal assumption is reasonable for most variables. Using the 99% limit.1)p Fp np np .5 21. Since X is a future observation. 3.005 1:>< 3.896 o 10 20 30 40 Case Figure S.1 (X . All the other univariate X charts have all points within their three sigma control limits.10.5 22.05 =. ) n (X .X) is distributed as N p (O. no points are out of contf6l.. 1.6. so _ Cov(X .0 • •• .. but we take the natur_ al logarithm of gas flow. a shift in a single variable was masked with 99% limits. .np a 2 ••••• Proof. shows that this point is outside the three sigma limits.1)p F () n(n _ p) p. We first note that X . region. It appears that gas flow was reset at the target for case 32. The dotted line is the 95% limit and the solid line is the 99% limit. but case 31 is outside the 95% limit. 4.rr. sI (X   X) is distributed as (n . If the process is stable.8. 14 _ _________________________ 12 10 • 95% Limit In this example.90 LCL= 3.9 The for the welding data with 95% and 99% limits. )'Sl( ) (x .s 3. or prediction. What do the quality control ellipses (ellipse format charts) show for two variables? Most of the variables are in control. + 1.. by being combined into a single T2value. collected when a process is stable.90 = Cov(X)' + _ Cov(X) = 1. there is no appreciable serial correlation for. the 99% quality ellipse for gas flow and voltage. and let X be a future observation from the same distribution. X and X are independent.95 •• •••1.00 0 <0:::: • and a 100(1 . In addition. Let Xl. Then T 2 o 20 30 40 Case Figure S.9. 22. in Figure 5. 1.00 UCL=4.5 24.). • 8 6 4 2 0l. . Xn be independently distributed as Np(/L. we give the basic distribution theory as Result 5.••••• . X2.1. to set a control region for a future observation Xor future observations.X) 3. shown in Figure 5.6."" Xn .
All of the points are then in control.6.6.III (I). the 95% prediction ellipse in Result 5.1) nm . For a general subsample mean X j . and take (n . the probability that X will fall in the prediction ellipse is 1 . The constant for the ellipsoid follows from (56). random vector Multivariate Quality Control Charts 249 has the scaled r distribution claimed according to (58) and the discussion on page 213. ". Let's use these data to determine a control region for future pairs of values.X has a normal distribution with mean oand = Cov(Xj . See [9] for discussion of other procedures. Any future observation x is declared to be out of control if it falls out of the control ellipse. so they can serve to determine the 95% prediction region just defined for p = 2.X) § N • • 0 • • • !t .n(n _ 2) 2. I).x)'Sl(x . Any future observation falling in the ellipse is regarded as stable or in control. Example S. " '" • • • 8 '" I 1500 2500 3500 4500 5500 Control Ellipse for Future Observations With P = 2.x) n +1 = ( 1. random matrix in the form ( mUltiVariate normal)' (Wishart random matrix)I (multivariate normal) random vector dJ. We removed this point and determined the new 99% ellipse.1)2 F ( 05) (x .!ities are independent. Based on Result 5. plot T2 = _n_ (x . (534) in time order. random vector and an independent Wishart. Keep in mind that the current observations must be stable before they can be used to determine control regions for future observations.p ) Fp llp(.n 1)2 Cov(Xj) _. Points above the upper control limit represent potential special cause variation and suggest that the process in question should be examined to determine whether immediate corrective action is warranted. From Example 5.x .l)p ] n(n _ p) Fp. Wp . these two qua!!. we obtain the two charts for future observations.X) S (X .ex. _]  :5 (n . From the first sample.9.l)p VCL = ( n . Since P [ (X . Xj . 1:1 . I).6 for a future observed value x is an ellipsoid.+ n . This control ellipse is shown in Figure 5.05) ' . at the same time.ex 2 § 8 0 ] • • • • • e.. and its axes are determined by the eigenvectors of S.248 Chapter 5 Inferences about a Mean Vector which combines a multivariate normal.1CoV (X 2 n 1 ) = (n . Set LCL = 0.9 and Figure 5.112' 2 Appearances Overtime Figure S. we checked the stability of legal appearances and extraordinary event overtime hours.12 CA control ellipse for future overtime hours) In Example 5. we find that the pair of values for period 11 were out of control.lI_p(ex) = 1 . Control Charts Based on Subsample Means It is that each random vector of observations from the process is indepen dIstnbuted as Np(O.6 specializes to )'Sl( ) < (n .X) T2 Chart for Future Observations For each new observation x. we determine its sample mean XI and covariance matrix SI' When the population is normal. We proceed differently when the sampling procedure specIfies that m > 1 units be selected.12 along with the initial 15 stable observations. from the process.r III before any new observations are taken. Np(O.x x . Note that the prediction region in Result 5.12 The 95% control ellipse for future legal appearances and extraordinary event overtime. An observation outside of the ellipse represents a potential outofcontrol observa_ tion or specialcause variation. It is centered at the initial sample mean X.
n)S is independent of each Xj and. (See [10].n .1)(m .(Xj . Further.Cov(X I ) 1 n _ (n = + 1) nm :t Here (nm . n. The best way to handle incomplete observations. = Cov(X) + . depends. of freedom.p + 1) Fp.n)p (nm . The upper control limit is then (n + 1) (m . we bring n/(n + 1) into the (S36) T2 = m(X  X)'SI(X . Fp. Subsamples corresponding to points outside of the elhpse. to a large extent. again.nmnl('OS) _ =.1)2 (x . 250 Chapter 5 Inferences about a Mean Vector where Inferences about Mean Vectors When Some Observations Are Missing 251 Control Regions for Future Subsample Observations _ X = .n ..nmnp+l(. of their mean X. 2. Consequently..n)S is distributed as a Wishart random matrIX with nm .4.X) The VCL is often approximated as .OS) when n is large. . Points outside of the prediction ellipse or above the VCL suggest that the current values of the quality characteristics are different in some way from those of the previous stable process.4J n Xj j=1 As will be . This may occur because of a breakdown in the recording equipment or because of the unwillingness of a respondent to answer a particular item on a survey questionnaire. Notice that we are estimating I internally from the. discussion on multivariate observations.1) F2. .7 Inferences about Mean Vectors When Some Observations Are Missing Often.OS) although the righthand side is usually as .1)p ) Fp.nmnp+1 Control Ellipse for Future Subsample Means.be carefully checked for changes in the behavior of the bemg measured. or missing values.X) S. the sample covariances from the n samples can be combined to give a single estimate (called Spooled in Chapter 6) of the. 1 .n . This may be good or bad.) . _ = _ Cov(X .X) Consequently.n)p Fp.OS)/m.x) S (x .1)(m . Once data are collected from the stable operation of a process.X has a multivariate normal distribution with mean 0 and . To construct a T 2chart with subsample data and p characteristics.nmnp+1('OS) .X) for future sample means in chronological order. control limit and plot the quantity x1( . If X is a future subsample mean. The prediction ellipse for a future subsample mean for p = 2 characteristics is defined by the set of an X such that is distributed as (nm .l)p VCL = (nmnp + 1) .x)'Sl(x . we plot the quantity TJ = for j = 1. some components of a vector observation are unavailable. T2Chart. . In an analogous fashion to our. then X . As before. but almost certainiy warrants a careful search for the reasons for the change. where the VCL =.x) m(nm . common covariance :to This pooled estimate is .. the ellipse format chart for paIrs of subsample means IS _ _ = (n .= m(Xj .described in Section 6. on the The VCL is often approximated as x. . the righthand side is usually approximated as T2Cbart for Future Subsample Means.(. Values of that exceed the VCL correspond to potentially outofcontrol or special cause which should be checked.p +1 S.nmnp+1 nmnp + 1) ( Ellipse Format Chart.n degrees. (nm . 1 _ = (n + l)(m .1)2 (X .x):5 m ( nm . These estimators are combined to give a single estimator with a large number of degrees of freedom.1) F2 ' nmnl('OS) (S37) where. The interested reader is referred to [10] for additIonal diSCUSSion. data collected in any given period. is distributed as (nm . they can be used to set control limits for future observed subsample means.OS) when n is large. = (nm (n .n .
= (n  n = . called the EM algorithm. Compute the revised m' '.6)(0 . We call them the prediction and estimation steps: and XP>X (2).IL (538) Ul2 = (6 .g for any missing values. given x(2). Xn are a random sample from a pvariate normal population. 3 p. To date. consists of an iterative calculation involving two steps. For each vector Xj with missing values. = j E(X. and part s 0 f 0 b servatlOn . . . = I The prediction step consists of usin th . + Xi Xi (539) U23 = 4' 1 If all L" . set Xj = j1. . the algorithm proceeds as follows: We assume that the population mean and varianceIL and respectivelyare unknown and must be estimated.1) + (7 .11): ... the predictionestimation algorithm is based on the completedata sufficient statistics [see (421)] Tl = = . Laird.. P = 3 ._ Tl IL .j1. I J' = ( x/)x)2)' The contributions in (538) and (5 39) nents.21 + j1. However.3 = 3+6+2+5 = 4 Xi  . and Rubin [5].. until the revised estimates do not differ appreciably from the estimate obtained in the previous iteration. such as people with extremely high incomes who refuse to respond in a survey on salaries. 1. Next. A general approach for computing maximum likelihood estimates from incomplete data is given by Dempster. no statisti_ cal techniques have been developed for these cases. 2.. = j=1 2: XiX. in Example 5. for example.6)(2 . _ _ ·. Prediction step. we are able to treat situations where data are missing at randomthat is. . _ ax:Jmum likelihood estImates (see Result 4. The calculation cycles from one step to the other.! 4 from the available observations. construct these estimates using th d" alllblllltIal covanance estimates. . If the pattern of missing values is closely tied to the value of the response.2 . 7 +5 In this case. Their technique.Example 5. .P)X(2)' x(2).6)2 + (7 .the contribution of to T I . Given estimates ii and from the estimation step... the predicted contribution of Xi Xi x?) 1) 1 xlI) to T2 is _ 4 3 (i)(l).xi (2).= 6.. . That is. 1) + (6 4 6)(1 5 4 2 Xi  _ E(X(I) I (2).  1 U 33 = _ 2 6)(1.] x?) Uu = (6 . Given some estimate (j of the unknown parameters.. Xj IL  + Xi (2) .13. cases in which the chance mechanism responsible for the missing values is not influenced by the value of the variables.1) + (5 estimates. Estimation step. When the observations Xl.. Thus. X 2 . Substitutin so that XII = 6.. 252 Chapter 5 Inferences about a Mean Vector Inferences about Mean Vectors When Some Observations Are Missing 253 experimental context. p c s of the predIctIonestimation algorithm . t We obtain the initial sample averages vec ors XI and _ ILl X4 are missing..6)2 + (5 . . Prediction step. and x/x. a gon m eventually pro 0+2+1 p.'. _ _ contributions of the missing values to e IL and to predict the and (539). The results are combined with are1summed ag Xi wit£ missing e samp e data to Yield TI and T2 .J e su Clent statIstIcs Tl and T 2.:. .2 = = 1. We shall ' e IVlsor n ecause the I 'th d uces the maximum likelihood estimate i Thus. . use the mean of the conditional normal distribution of x(l).... the components Xj are missing. Estimation step. to estimate the missing values.13 (Illustrating the EM algorithm IL and covariance using the incom I t d) EstImate the normal population mean pe e ata set i=l 2: Xj = nX 1)S + nXX' n and T2 Here n = 4 . we can obt . _ [(I)' Xi . . subsequent inferences may be seriously biased. [See (538) . T2   1 ii'ji' (540) We illustrate the computational as e t . predict the contribution of any missing observation to the (completedata) sufficient statistics. Use the predicted sufficient statistics to compute a revised estimate of the parameters. _ E(X(l)X(I)' I  i i Xi (2). let xjI) denote the missing components and denote those components which are available.6)2 + (6 U22=2' 6)2 1 .
18) For the two missing components of X4.17 .06 = 0 + 7(2) + 5(1) + 8. 3) = [0.':.'" f.50 74.4] [24. X13) =5.00 4.97 [ 101..97 0(3) + 2(6) + 1(2) + 6.13] 4.97 20.L2 .18] 20.18 20. we partition ii and as IL = 32.08 4.83 2. _ ! [148.59 .05 27.27 27. Also.4) = [6.73 + 7 + 5 + 6.L3 f.27 101.18] [6.L3 4' 1] U11 .61 . 17.L I \ = 0"13 0"12 = i I21 i In ' .(5 .50 .4' 1 1 [1 4 1"2 5 2 [13Jl [lJ 4 1 0"22: 0"23 0"23 [0 1J ii [1 3 .99 + 72 + 52 + 41. the predicted completedata sufficient statistics are X21 + X31 + [5.254 Chapter 5 Inferences about a Mean Vector Inferences about Mean Vectors When Some Observations Are Missing 255 The first component of Xl is missing. + Xli 2 = 2 .73 + (5.:.00 = + [nm.33 [ 1. provides the revised estimates 2 1L3) [8 = \ X43 = 1 5.03] 1.33 1. The next esti!llation step.1\ = [24.3 = 4.00 1.05 4 27. because it is not affected by the missing components. = 148.4 = 5.00 0"33 and predict This completes one prediction step.00 [ \ \ and predict xlI XII = = ILl [X!2 + I12I22 X13  IL2J ..I) = + 1 Ii = .83 .03 20. using (540).08 [6..(2)' f.27 6. .03] 6.50 74.50 101.18 + 7(6) + 5(2) + 32 02 + 22 + 12 + 1.30 = 16. from (539).61 and U22 = .59 are larger than the corresponding initial estimates obtained by replacing the missing observations on the first and second variables by the sample means of the remaining values.17] .50 and Note that U11 = .08 4.6 + [1 f.27 [ 17. Calculations of this sort are easily handled with a computer. _ 2The final entries in I are exact to two decimal places.13] + X22 + X32 + X42 = 0 + 2 + 1 + 1.ii.73)2 = 32.5 = [1i(1)] .73[0.18 27.00 for the contribution to T1.1.99 =Xll[XI2.30 + X23 + x33 + X43 3+6 +2 +5 16.27 101.00] = . The iteration between the prediction and estimation steps continues until the elements of Ii and remain essentially unchanged. The third variance estimate U33 remains unchanged. so we partition ii and as Xll + == X12 X13 are the contributions to T2 • Thus.
Now consIder the large sample nominal 95% confidence ellipsoid for JL.256 Chapter 5 Inferences about a Mean Vector Once final estimates jL and i are obtained and relatively few missing compo_ nents occur in X. this assumption may not be valid. Let the p X 1 random vector X t follow the multivariate AR(l) model ellIpsOId has large sample coverage probability the observations are related by our autoregressive model.ssuI?ption is crucial. To make the easy. through the coefficient matrix <1>. . we must be dubious of any computational scheme that fills in values as if they were lost at random. this :5 This ellipsoid has large sample coverage probability .p)'iI(it . Xn constitute a random sample.950 . .p) + et (542) cp 0 . The predictionestimation algorithm we discussed is developed on the. we have assumed that the multivariate observations Xl. n _ 1 n * (X t . suppose the underlying process has <I> = cpI where Icp I < 1. X 2.8 Difficulties Due to Time Dependence in Multivariate Observations For the methods described in this chapter. The presence of even a moderate amount of time dependence among the observations can cause serious difficulties for tests.··. The independ:nce a. for large n.950 .742 .989 .871 . under multivariate normality.I .) = <1>'1. We will illustrate the nature of the difficulty when the time dependence can be represented as a multivariate first order autoregressive [AR(l)] model.I as the independent variable. If missing values are related to the response levels. which are all constructed assuming that independence holds.548 . dependent.JL) :5 5.834 . and (543) Caution.p):5 (541) Difficulties Due to Time Dependence in Multivariate Observations 257 As shown in Johnson and Langeland [8].632 .X)' Ix as an approximate 100(1 .405 . it seems reasonable to treat allpsuchthatn(jL .95 if the observations are inde (1 .950 .X)(Xt .090 The AR(l) model (542) relates the observation at time t. The simultaneous confidence·.25 . According to Table 5. When more than a few values are missing. then handling the missing values as suggested may introduce serious biases into the estimation procedures. they are independent of one another.999 1.25 1 2 5 10 15 Xt .10.10 shows how the coverage probability is related to the coefficient cp and the number of variables p. statements would then follow as in Section 5. the autoregressive model says the observations are independent. Consequently. the coverage probability can drop very low to 632 even for the bivariate case.1.193 .998 .JL) is approximately normal with mean 0 and covariance matrix given by (543). confidence regions.950 . ' . basis that component observations are missing at random. Under this model Cov (Xt' X t.000 .CP)(l + Table 5. 1 S =.P = <I>(X t . and simultaneous confidence intervals. where the arrow above indicates convergence in probability.. . 5: I 0 Coverage Probability of the Nominal 95% Confidence EllIpSOId . Vn (X . The name autoregressive model comes from the fact that (542) looks like a multivariate version of a regression with X t as the dependent variable and the previous value X t . and the results based on this assumptIOn can be very mlsleadmg If the observations are. missing values are related to the responses being measured. it is imperative that the investigator search for the systematic causes that created them.751 . Moreover. however. Further.950 . If the observations are collected over time. but with x replaced by jL and S replaced by I.641 . {all JL such that n(X . where Ix = P L j=O 00 <I>'IEct>'j .a)% confidence ellipsoid. if all the entries in the coefficient matrix <I> are o.993 . in fact.5 where the et are independent and identically distributed with E [et] = 0 and Cov (et) = lE and all of the eigenvalues of the coefficient matrix <I> are between 1 and 1. 'TYpically.5.JL )'SI(X . to the observation at time t . that is.
P)K I/ 2Z)'(PAl/2z + (I _ P)KI/2z) I2 = (PA /2Z). Let the constant c > 0 and positive definite p x p matrix A determine the ellipsoid {z: z' AIz ::s c2 }.p Supplement Simultaneous Confidence Intervals and Ellipses as Shadows of the p·Dimensional Ellipsoids 259 eu cVu' and its length is cVu' Au/u'u. • Result SA. zin the ellipsoid } 2 { based on AI and c or implies that {fO II U U' .13.I/2z = z'U(U'AUrIU'z S' "./ p'PA.) 3 (u'u) (length of projection? = (z'u)2::s (z'K1z)(u'Au) :. With the unit vector  u/ v u'u. h .1/2z)' (A1/2 ) (I . . Suppose that the ellipsoid {z' z' Alz < c2 } " d IS given an that U = [UI i U2] is arbitrary but of rank two. usmg = PA. d = u.I/2' A = A. u2 plane is an ellipse. the twodimensional confidence ell'Ipse as a proJectlOn o f the p. the shadow extends cVu'Au units.12. establishes ' .P)A. so Iz'ul:.l /2z + z' Alz = fjr2 st establish a basic inequality..DIMENSIONAL ELLIPSOIDS We begin this supplementary section by establishing the general result concerning the projection (shadow) of an ellipsoid onto a line. Setting b = z. Set P = AI/2U(U' AU)lU' AI/2 = P. and z belonging to the ellipsoid.hich extends from 0 along u with length cVu' Au/u'u. • Our next . For a given vector u 0.I/2.13 The shadow of the ellipsoid z' AI z ::s c2 on the UI.l/2z C = z'A. When u is a unit vector.2.} r a . besides belonging to the boundary of the ellipsoid./2A. The shadow also extends cVu' Au units in the u direction. We where A. z IS 1U t ellIpSOId based on (U' AU) 1 and c2 for all U .P)Kl/2Z) (SAI) ( Projection (shadow) Of) {z'A1z::sc 2 }onu = c Vu'Au u u'u 12 z'A. and B = AI. I. We want to maximize this shadow over all z with z' AIz ::s c2• The extended CauchySchwarz inequality in (249) states that (b'd)2::s (b'Bd) (d'B1d). the result follows.P)Al/2z)' «I . (See Figure 5. c2u' Au for all z: z' A1z ::s c2 The choice z = cAul Vu' Au yields equalities and thus gives the maximum shadow. we obtain mce z A Z::S 'I 2 and U was arbitrary.P)P' = P _ p2 = 0' d A. Then z an z Nlote that P = P' and p2 (AI/2z)' (Al/2z ) l2 Result SA. the projection of the 258 "'2 UU'z Figure 5. .1/2PA. By Definition 2A. (PA / Z) 2: 1 + ((I . That is. z' Alz = cZu' Au/u' Au = c2 for this z that provides the longest shadow. with equality when b = kB1d. so (I .d lIDenslOnal ellipsoId. = Next.I/2z. cVu'Au. · result . Proof. the proJectlOn extends The projection of the ellipsoid also extends the same length in the direction u.. Its squared length is (z'u//u'u./ z + (I . .. the *' = (PA. we write z' Alz = (A. Then' SIMULTANEOUS CONFIDENCE INTERVALS AND ELLIPSES AS SHADOWS OF THE p. Consequently. Proof. the projection of any z on u is given by (z'u) u/u'u.
5.. Projecting the ellipsoid z' AIz :s. xamp e 5. for testing Ho: p. using the data 2 X = 8 12] 9 r 6 9 8 10 (b) Specify the distribution of T2 for the situation in (a). (V'z)' (V' AVrl (U'z) ::.2. evaluate Wilks' lambda.. the ellipse contains only coefficient vectors from the projection of {z: z' AIz ::. Consequently. c2 . 5.1.6) (10+6 ) (8 . U2 plane whose coefficients a belong to the ellipse {a'(U' AVrla ::.. (a) Evaluate y2. test Ho at the Cl! = . Also.) Result SA. the Bonferroni inequality in (528) for m = 3 Hmt: A Venn diagram . c 2 so the ellipse {(V'z)' (V' AVrl (V'z) ::.6.' = [7. 5. a'nd C h I 3 may e p. CZ}. U2 plane is Exercises 261  Exercises 5. 5. c 2 } contains the coefficient vectors for the shadow of the ellipsoid. u q ] consists of . U'z belongs to the coefficient vector ellipse. (c) Using (a) and (b). Constru vals using atnhd t ct the 95% Bonferroni intei. the projection (or shadow) of {z'A1z::.. Conduct a test of the I 'H f tra5n sf0rmed microwavelev lof' T I o· P .3.. 5 " 6O] atthe Cl! = 05 tur:d in consistent with the 95% confidence ellipse for p . c }. (See Example 5. e wo se t s 0 mtervals.3. it follows that V'z = V' AV(V' AUrla and = Note that the observations yield the data matrix (6 .4. the shadows of the twodimensional ellipses give the single compon ent intervals.2. Determi ne the lengths of (b) sodium content a case? the three rate scatter plots Thus. xpam. CZ } and two perpend icular unit vectors UI and Uz. and z belongs to the ellipsoid z' AIz :s.[.7. (See Result 2A. What conclusion do you reach? 5.. Results SA.3 remain valid if V = 2 < q :s.. c2 first to the UI..1.3.2..3..2.) (a) the axes of the 90% confidence ellipsoid for p.. p linearly indepen dent columns.. where V = [UI ! U2]' Proof.1 to evaluate A in (513).2 and SA. U2 plane. c2. T Z remains unchanged if each The projection of the ellipsoid {z: z' AIz :s. (a) Use expression (515) to evaluate y2 for the data in Exercise 5.) Find simultaneous 95% y2 confiP3 usmg 5.3. Remark. The quantities X. the projecti on of a vector z on the Ul.for the three events CI..260 Chapter 5 Inferences about a Mean Vector Projection on a plane is simplest when the two vectors UI and Uz determi ning the plane are first convert ed to perpendicular vectors of unit length. and SI are give i E 53 radiation data. If we set z = AV(V' AVrla. Use the sweat data in Table 5. Let Va be a vector in the UI.05Ieve!. Remark. [Ub"" QQ plots for the observations on sweat mu Ivanate normal assumption seem justified in this 5. By Result SA. c 2 } onto the UI. Given the ellipsoid {z: z' AIz :s. U2 plane and then to the line UJ is the same as projecting it directly to the line determi ned by UI' In the context of confidence ellipsoids.3)J' (8+3) a 5.9) [ (6+9) (10 . 11]. (b) Use the data in Exercise 5. C2.1. S. By Result 2A. dence Use the sweat data in Table 51 (S E I interval f . cz} so that VV'z belongs to the shadow of the ellipsoid. CZ} on the u1o U 2 2 plane results in the twodimensional ellipse {(U'z)' (V' AVrl (V'z) ::. Consider the two coordin ates V'z of the projection V(V'z)... . c2 } consists of all VV'z with z' AIz :s. Let z belong to the set {z: z' A1z ::.
5.30. I simultaneous confidence ellipse for mean weight and (b) Obtain the large samp e 95°. h 950.28 1175. R oberts 5. ' a1 f h th ee . I' t < the Alaska Fish and Game departm ent. Do the inferences change? Commen t.26 I (a) Obtain the large samp e 95°.) 1/1 io i) for the sweat data in 5. P t .( ° mean girth. in Exercise 5.85 474.68 2. 3 (1978).25 731.5 evaluate a for the transform ed· It .73 13.50 537. an e o' . and iterate to find the first revised estimates. . .23): prOVide t e O · Variable Weight (kg) Body length (cm) 164.35 281.14. 5.12.64 . (c) Do these data appear to be bivariate normal? Discuss their status with referenc e to QQ plots and a scatter diagram. (e) Refer to Parts c and d.85 13.ILo).4).' Bonferrom confidence rectangIe for t he mean (d) Refer to Part b. I th from 2 to 3 . . e ' oal of maintaining a healthY population. mterv sort e r am . 48. we could use Result 5.' rZ simultaneous confidence intervals for the four populatIO n mean (a) Obtam t e ° . while strontiu m is an indication of animal protein intake. "Mineral Analysis of Ancient Peruvian Hair. _ 6 1 = 7 to alloW for this statement as well as statemen ts about each usmg m . Constru ct the 95% Bonferro ni confiden ce intervals for the set consisting of four mean lengths and three successive yearly increase s in mean length.35 80. h 950/.91 324. . what implications does this have for the results in Parts a and b? (d) Repeat the analysis with the obvious "outlying" observat ion removed . for length.3.7 to estimate IL and I.95 63. H arry. .38 Neck (cm) Girth (cm) Head length (cm) 17. The results for the chromiu m (xd and strontium (X2) levels. Determi ne the initial estimates. Co?struc.37 . e yearly increases m mean lengt . From (523).+ individual mean.52 55. T Z confidence ellipse for the mean mcrease ID eng (c) Obtam r:ean increase in length from 4 to 5 years.10. were as follows: .48 12.2.262 Chapter 5 Inferences about a Mean Vector k that rZ is equal to the largest squared univaria te tvalue 5.37 117.277282.46 1343.53 73. h" resu s ID I Z . Determi ne the approxim ate distribut ion of n In( I i Table 5.54 324. ' t th 95°.68 238.25 179.66 .03 20.22 Sample mean x Covariance matrix 95. e =° Compare this rectangle with the confidence 6 weight and mean girth usmg m .1 1.68 238.73 56.9.(° simultaneous confidence intervals for the six population mean body measurements.15 94.08 Source: Benfer and others. Bonferroni confidence mterval for (e) Obtam t e.15 162. .mean head length . Given the data with missing components.37 537. ellipse in Part b.98 Head width (cm) 31.17 80.) Does it appear as if this Peruvian culture has a mean strontiu m level of 10? That is. Example 5. in Exercise 5.4 using the entries (length of oneata time tinterva l)/ (length of Bonferro ni tinterval).' d microwaveradiatIOn ata.55 . . studies grizzly a natura IS l o r .80 1175. (b) Obtain the individual simultan eous 90% confiden ce intervals for ILl and ILz by"projecting" the ellipse construc ted in Part a on each coordina te axis. Restrict your attention to 5. succeSSlV . ° (b) Refer to Part a.78 4. no." American S= 3266.19 .39 Journal of Physical Anthropo logy.73 94. s .69 93. we nOewlinear combination a'xj with a = stcx . are any of the points (ILl arbitrary. on n = 61 bears wldthhth fgllOwing summary statistics (see also ExerCise 8. Create a table similar to Table 5.13 X2(St) Exercises 263 (d) Refer to Parts a and b. ¥ en'fy that the tZvalue'computed with t IS a IS equa to T .10 (see Table 1. Using the construc ted from th 3 d th H. Refer to the bear grow the measurements oflength.17 39. T Z simultaneous confidence .73 56. h 950' Bonferroni confidence intervals for the SIX means ID ar a.88 9. assuming that these nine Peruvian hairs represen t a random sample from individuals belonging to a particula r ancient Peruvian culture. 11. A physical anthropo logist perform ed a mineral analysis of nine ancient Peruvian hairs.93 .98 63.8. use the predictio nestima tion algorithm of Section 5.54 1343. h .97 721.50 162. (a) Constru ct and plot a 90% joint confidence ellipse for the populati on mean vector IL' = [ILl' ILZ]. 5.17 117.74 .97 731.5.13.80 281. Obt' the 950/. th data in Example 1.17 39.57 40.13 20. years an It is known that low levels (less than or equal 'to . .1. (Alterna tively. in parts per million (ppm). 10]' a plausible value for IL? Discuss. (See Result 5.100 ppm) of chromiu m suggest the presence of diabetes. .29 . Compar e the 95% Bonferr oni confiden ce rectangl e for the mean increase in length from 2 to 3 years and the mean increase in length from 4 to 5 years with the confiden ce ellipse produce d by the T 2procedu re.88 21. If the data are not bivariate normal. 10) in the confiden ce regions? Is [. h 950/. ° mean head width .43 1. (c) ObtaID t e 10 .
. E(p) = P = [ Pq+1 l and (Tjj = Var(X j .=1 q Let Xj. . . We have In this way.p..I) where the elements of I are (Tkk = Pk(l . It has the following structure: Category vn(p  p) is approximately N(O.Pb i Since each individual must belong to exactly one category.. a consultant for the Bank of Shorewood wants to know the proportion of savers that uses the bank's facilities as their primary vehicle for saving.. Xq+I. q + 1.15..q+1 (T q+:. i = 1. + Pq). The kth component... Each individual contacted in a survey responded to the following question: . n. In this result. and Bank D.0. + X qj ). X 2 . we can assign numerical values to qualitative characteristics. Le.. p. and 5. 5. Pq+1 : l = 1 n j=1 2: Xj n WIth . . .2.q is large is interpreted to mean npk is about 20 or more for each category. If the sample counts are large. As part of a larger marketing research project. of Xj is 1 if the observation (individual) is from category k and is 0 otherwise. Bank C.2.a)% confidence regions for all linear combinations a'p = alPl + a2P2 + .. and as a result. ..q+1 (T2. + aq+IPq+1 are given by the observed values of 0 0 1 0 0 0 0 0 1 0 Pq 0 0 0 provided that n .. methods for producing simultaneous confidence statements can be easily adapted to situations involving proportions. Also. . k = 1. 5. An individual from category k will be assigned the «( q + 1) Xl) vector value [0. P2.q+l (T2.q is large.... . Here p = (l/n) Outcome (value) 0 Probability (proportion) PI 0 P2 0 Pk 0 1 Pq+1 = 1 matrix with Ukk = k. O)'with 1 in the kth position. . 0. Approximate simultaneous 100(1 .. When attributes are numerically coded as 01 variables.2. The corresponding probabilities are denoted by PI. i has rank q.15. .. the requirement that n .. .Pk) and Uik = PiPt. n.q+1 l {I C7'I. : (TI.j = 1 . If we let the variable X pertain to a specific attribute.j = 1. j = 1. Xn can be converted to a sample proportion vector. a random sample from the population of interest results in statistics that consist of the counts of the number of sample items that have each distinct set of characteristics. is the upper (100a )th percentile of the chisquare distribution with q d.. then we can distinguish between the presence or absence of this attribute by defining X = . we take Pq+1 = 1 . given the nature of the preceding observations. The random sample X I. . . + Pq ). is a sample mean vector. some or all of the population characteristics of interest are in the form of attributes. attributes are usually numerically coded with respect to their presence or absence. The probability distribution for an observation from the population of individuals in q + 1 mutually exclusive and exhaustive categories is known as the multinomial distribution. Complete discussions of categorical data analysis are available in [1] and [4J. Why must this covariance neceSsarily be negative? = Pi = E(Xji) * 5.. of Xj' (a) Show that JLi p ' = P2 PI [ .q+1 o if attribute present if attribute absent For large n. respectively. be a random sample of size n from the multinomial distribution. The usual inverse of i does not exist. but it is still possible to develop simultaneous 100(1 .. the approximate sampling distribution of p is provided by the central limit theorem. so Pq+1 = 1 ..). . Pq.) = p.. . We consider the situation where an individual with a particular combination of attributes can be classified into one of q + 1 mutually exclusive and exhaustive categories. The consultant would also like to know the proportions of savers who use the three major competitors: Bank B.a)% confidence intervals for all linear combinations a'p.(PI + P2 + . . For convenience. which..t • j=1 Pk(1 ... Let XI. Xn be a random sample from a q + 1 category multinoinial distribution with P[Xjk = 1] = Pt.(Xlj + X 2j + . Xj k. by Uik = P. We have only touched on the possibilities for the analysis of categorical data. (b) Show that (Tik = Cov(Xji. Pq+I' Since the categories include all possibilities..t X ji and X jk be the ith and kth components. Each individual in the population may then be described in terms of the attributes it possesses.(PI + Pz + .2. 1.. .. X 2 .17 refer to the following information: Exercises 265 and (TII Frequently.264 Chapter 5 Inferences about a Mean Vector Exercises 5.16.Pk) and (Tik is estimated k.1 Cov(p) = Cov(X) n ) 1 = I n 1 =n (T21 [ .Xjk ) = PiPbi k.16.Pk) and (Tik = PiPk' The normal approximation remains valid when (Tkk is estimated by Ukk = Pk(l ..(l . Thus. * 1 1 0 0 2 0 1 0 k q q + 1 Result. i 2: Xj' and i n = {uid is a (q + 1) x (q + 1) * 2: Pi .
30J at the a = .S. Is there reason to believe that the group of students represented by the scores in Table 5.090 10.11 consistent with thesevalues? Explain. Given the result in (a). (b) Construct a simultaneous 95/0 confidence mterva Bank of Shorewood with its major competitor.2 is scoring differently? Explain.072 Source: Data courtesy of U.18. are the data in Table 5. and science scores. Also. Using the data in the table. Use the college test data in Table 5. P2. modulus of elasticity) Xz (Bending strength) proportion of savers at Bank C P4 = proportion of savers at Bank D 1 .2.102 7414 7556 7833 8309 9559 6255 10. where ILl = E(X I ) and IL2 = E(X2)' (b) Suppose ILIO = 2000 and IL20 = lO.11. construct the three possible scatter diagrams from the pairs of observations on different variables.DOO represent "typical" values for stiffness and bending strength.I 7. so there are five categories): Bank of Shorewood Another bank *' Bank (category) 105 PI 119 P2 56 P3 25 P4 50 (b) Determine the lengths and directions for the axes of the 95% confidence ellipsoid for p.33 P3 =..50..19.370 1712 1932 1820 1900 2426 1558 1470 1858 1587 2208 1487 2206 2332 2540 2322 7749 6818 9307 6457 10. The units are pounds/(inches)2.sample proportIOn PI  . . \ \ Response: I I I I I BankB BankC BankD \\ \\ Observed number A sample of n = 355 people with savings accounts produced.14 Table 5.266 C hapter 5 Inferences about a Mean Vector Which bank is your primary savings bank? Bank of Another No Shorewood Bank B Bank C Bank D Bank Savings Exercises 7.850 7627 6954 8365 9469 6410 10. h onding responses are were surveyed.the . ive hi h schools articular city a random sample of 200 students from the city s f g P . proportIOn Observed . _ 105 355 = 30 . Measurements of Xl = stiffness and X2 = bending strength for a sample of n = 30 pieces of a particular grade of lumber are given in Thble 5. 30J versus HI: P' [500.50. respectively.327 7320 8196 9709 10. P2 = .30 J' represent average scores for thousands of college students over the last 10 years.50..5.(PI + P2 + P3 + P4) = proportion of savers at other banks (a) Construct simultaneous 95% confidence intervals for PI . 5. Interpret thiS mterval. .05 level of significance.16 P4 = .914 10. . In order to assess the prevalence of a drug pro lem among . (See Example 5. . verbal. Bank B. Forest Products Laboratory. P5' • ()"f • • I th at aIlows a comparison of the . . Suppose [500.D7 P5 = . b h' h school students in a S.(PI + P2)' The following exercises may require a computer. 5.723 5430 12. (c) Construct QQ plots from the marginal distributions of social science and history. One of the survey questIOns and t e corresp as follows: What is your typical weekly marijuana usage? Category None Number of responses Moderate (13 joints) Heavy 1232 1115 2205 1897 1932 1612 1598 1804 1752 2067 2365 1646 1579 1880 1773 4175 6652 7612 10. P2' and P3 = 1 . Do these data appear to be normally distributed? Discuss. when asked to indicate their primary savings banks (the people with no savmgs Will ignored in the comparison of savers. (4 or more joints) 21 117 62 (a) Construct and sketch a 95% confidence ellipse for the pair [ILl> IL2J'.) (a) Test the null hypothesis Ho: P' = [500.67 Construct 95% simultaneous confidence intervals for the three proportions PI.11 Lumber Data Let the population proportions be PI = proportion of savers at Bank of Shorewood P2 = proportion of savers at Bank B P3 = Xl X2 Xl (Stiffness: modulus of elasticity) (Bending strength) (Stiffness: .
9. are they stab le?) Comment..67 9. (See [2].09 7. Tabl Con side r the 30 obse rvations on male E e 6. possible scatt d' ns of fuel.88 12. Only the first 25 multivariate observations for gaso line trucks are given.28 10. Com pare the two sets of intervals.12.27 15.35 5. chisquare plot of the .58 17.13.92 4. and nasheight varia bYes. mul hvan ate obse rvat ions Do th d ' cons truc t Exp lain.16 Cap ital (X3) 11.17 7..20: A wildlife ecologist measured XI = taillength (in millim:ters) and X2 = wing. 349. 0 the data now appe ar to be nor(b) 95% Bonferroni inter vals for t 95% T intervals.59 6. . a t IS.61 5.34 8.80 3. Usin g the data on bone mineral roni conte intervals for the individual means. 5. Do these indiv . the data in the table . . distributio construct the three .78 10. t mdlv ldual cost means.13 9.24 11.70 1. 5. the IndlV 5 2" ldual skull dimension variables.15 14.11 12. These data are displ ayed in Tabl e 5. !:!smg the Madison. Also find the se S 0 Inter vals. idual proc ess . repair. construct individual char acte nshc s seem to be in cont ro\? (Tb 4 .23 8. . find the 95% simultaneous 2 fer T intervals.92 9.20 14. nt in Table 1.43 2.21. 4.24 13.268 Chapter 5 Inferences about a Mean Vector (c) Is the bivariate normal distributio n a viable population model? Exp lain with refer..24 10.16 12. Fue l (xd '16. ence to Q_Q plots and a scatter diagr am.37 10.22 . gyph an skulls for the first time peri od given in (a) Con struc t QQ plots of the mar inal .63 5.32 29. iated with transporting milk from farm s to dairy plan ts for gasoline trucks.32 (a) Find and sketch the 95% confidenc e ellipse for the population mea ns ILl and Suppose it is known that iLl = 190 mm and iL2 = 275 mm for male hook IL2' billed kites.68 7.59 14.18 17.00 Xl X2 (Tai l leng th) (Wing length) . . Are the outliers from the pairs of obse rvat ions on dlagran. ' 5.25 11.11 12.23 11.13 on page .ts the appa rent outliers rem ov' :ze at the QQ plots and mally dlstn bute d? Discuss. .51 26. do the T2_intervals have over the Bonferron i intervals? (c) Is the bivariate normal distributio n a viable popu latio n model? Exp lain with reference to QQ plots and a scatter diagr am.70 7.05 5.  Exercises 269 Table 5.78 5. if any.45 3. length (in millimeters) for a sample of n = 45 fema le hookbilled kites.23 .19 9.89 7.8. omp are the two sets of intervals.39 6.fo! X3 = hold over hour e D t s and m ent data in Table 5. .17 10. Wha t advantage. Are these plausible values for the mean tail length and mea n wing leng th for the female birds? Explain.23 3.44 7.25 13. the scat ter e.44 8.8.26 2.90 10. .98 14. Com pare the two . ese ata appe ar to a be normally distr ibut ed? (b) Con struc t 95% Bon ferro ni inter vals for Also. (b) Construct the simultane ous 95% T2_intervals for ILl and IL2 and the 95% Bonferroni intervals for iLl and iL2' Compare the two sets of intervals.50 13.70 10.10 in Chapter 6 is repr oduc ed in Table 5.73 14.14 12.93 14. 5. COA hours.61 14.13 Milk Tran spor tatio nCo st Dat a Rep air (xz) 12. Wisconsin Polic X char ts .09 12.60 .78 5.05 2. (Tail length) 186 197 201 190 209 187 207 178 202 205 190 189 211 216 189 Xl X2 Xl (Wing length) 266 285 295 282 305 285 297 268 271 285 280 277 310 305 274 (Tail leng th) 173 194 198 180 190 191 196 207 209 179 186 174 181 189 188 x2 (Wing leng th) 271 280 300 272 292 286 285 286 303 261 262 245 250 262 258 284 191 285 197 288 208 273 180 275 180 280 188 283 210 288 196 271 191 257 179 289 208 285 202 272 200 282 192 280 199 Source: Data courtesy of S. Observations 9 and 21 have been identified as outliers from the full data set of 36 observations.) (a) Construct QQ pIo tso f t h e margInal . Temple. A portion of the data contained in Table The se data represent various costs assoc 6.07 6.51 9.13 10. and capi tal costs. construct the 95% Bon Also.68 12.75 7. find the 95% TZintervals C .18 8.88 10.02 17.95 16. of the bash eigh t.78 10.01 16.
15 0.25 0. 5.28.15 0.29 target value so it is appropr iate to test the null hypothesis that the mean vector is zero.24.50 0.19 0.10 0.50 1. Using the data on the holdover and COA overtime hours.68 1.02 0.61 0.80 0.15 0.35 0.13 0.05 0048 0.35 0.10 0.01 1.08 0.36 0.80 1.15 0.08 0046 0046 0046 0046 0.05 0.59 0.37 0.50 0. Compar e with your results for Part a.00 0.15 0.25.13 0.12 0.58 0.• I 270 Chapter 5 Inferences about a Mean Vector 5.16 0.24 0.60 0. Data on 50 cars are given in Table 5.16 0. test Ho: JL = 0 at ll' = .60 0.14 Car Body Assemb ly Data Index Xl X2 X3 X4 X5 X6 X charts? \ \ \ 5.35 0.35 0.60 0.24 0.37 0.10 0..30 0.34 0.25 0.25 0047 0046 0044 0.24 0.20 0.31 0.35 0.81 0. Refer to Exercise 5.20 0.24 0.25 0.37 0.75 1.75 0.85 0.84 0.08 0. These are all measured as deviations from 5.60 0.13 0.18 0. Interpret the result.8 of Example 5.00 0. a major automob ile manufacturer 5.37 0.24 0..65 0.20 0048 0.14 0.90 0. a ellipse should be calculate d from a stable process.22 0.59 0.30 0.24 0.31 0.15 0.30 0.31 0.75 0.58 0.11 0.23 0.15 0. and the differenc e.35 0.30 0.00 1.35 0.10 0.24 0.15 0.83 0.10 0.12 0.30 0.90 0.25 0.34 0.30 0.58 0.96 0.25 0.10 0.Q7 0.60 0.05 0.05 0.10 0.68 1. is it stable?) Commen t.37 0.08 0.20 0.10 0.:0.10 0. .84 0.55 0.00 0. construct a quality ellipse and a r 2 chart.50 0.12 0. l "1 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 11 1 2 3 4 5 6 7 8 9 10 0. Construc t a r 2 chart using the data on Xl = legal appearances overtime X2 = extraord inary event overtime hours.8.85 0.38 0.10 0.10 0.18 0.27.10 0.21 0.00 0. petroleurn minus natural gas.05 Refer to the data on energy consumption in Exercise 3.18 .45 0.25 0.34 0.10 0.61 0.15 0.50 0.18.24 0. (b) Which individual locations seem to show a cause for concern? Refer to the car body data in Exercise 5.13 0.60 0. find the 95% Bonferroni confidence intervals for the two component means.50 0. (a) The process seems stable for the first 30 cases.25 0.10 0.85 0.04 0. Does the process represented by the bivariate observa tions appear to be in control? (That is.61 0.65 1.20 0.55 0.00 1040 0045 0.40 0.60 0. Do you somethi ng from the multivar iate control charts that was not apparent in the' Exercises TABLE 5.65 1. As part of a study of its sheet metal assembly process.34 0.28 uses sensors that record the deviation from the nominal thickness (miJIimeters) at six 10cations on a car.61 0046 0.56 0.78 1.25 1.32 0.90 0042 0.25 0. and X3 = holdover overtime Table 5.60 0.34 0.31 0.13 0.60 0.25 0.21 0045 0.08 0.15 0.75 0.31 Refer to the data on snow storms in Exercise 3.20 0.18 1.10.09 0.12 0040 0046 0.20 0.10 0040 0.37 0.84 0.31 0.11 0.60 0040 0..25 0.10 0.21 0.15 0.20 0.65 1.13 0. a predictio n ellipse for a future observation x' = (X3' X4)' Rememb er. the total of the four.52 0.95 1.35 0.24 0.85 0.05 1.20 0.85 1. (b) On the same scale.16 0.58 0. petroleum tion minus natural gas.8.60 0. Using the data on X3 = holdove r hours and X4 = COA hours from Table 5.28 0.24 0.24 0. r 5.35 0.13 0.16 0.36 0046 0.25 0.38 0.15 0045 0042 0.65 1.65 0.10 0.10 0. The first four are measured when the car body is complete and the two are measure d on the underbo dy at an earlier stage of assembly.00 0.84 0.21 0.16 0.36 0.37 1.00 0.05 0. the total of the four.35 0.05 0.12 0. Does r2 with an additional characteristic change your conclusion about process Explain.35 0.85 0. Using the first 30 cases.37 0.31 0.78 0.34 0.18 0.20 0.20 0.85 0.37 0.16 0.18 0.14 0.18 0.61 0.35 0.05 0.11 0.65 0.30 (a) Obtain the large sample 95% Bonferroni confidence intervals for the mean con· sumptio n of each of the four types.75 0.14.15 0.05 0. 5.20 0.21 0. and the difference.28 0. Include all 50 cases. Compar e this chart with the chart in Figure 5. .10 0.75 0.06 0.37 0.10 k" Source: Data Courtesy of Darek Ceglarek.26.01 0. Use these cases to estimate Sand i.84 0.10 0.26 0.15 0..35 0. (a) Find a 95% confidence region for the mean vector after taking an appropri ate trans formation.05 0.11 0. Then construc t a r 2chart using all of the variables.34 0045 0.61 0.25 0.56 0.36 0.60 0.20.13 0.58 0. (b) Obtain the large sample 95% simultaneous intervals for the mean consump of each of the four types.25 0.16 0.75 0.15 0.10 0..32 0.12 0040 0.
'. "Topics in Statistical Dependence. t' I Methods for Quality Improvement (2nd ed. oc mg. an . Distributions." Sprmger an 00 Springer Berlin. I 299 313 M thematical Statistics Monograph. Eds. .or : 0 n Iey. and R.(a 'th Journal of the Royal Statistical Society Data via the EM Algont m Wl (B) 39 no. .ahlr . . et a . 0. P. . The numerical examples we present will help cement the concepts. Tiku. d nd D B Rubin. 1 (1977).. (1971). "Maximum Likelihood from Incomplete 5. 6. P. 14 6. Cambridge. we take the opportunity to discuss some of the tenets of good experimental practice. New .2 Paired Comparisons and a Repeated Measures Design . MA: The MIt Press. ling a Mean." Applied Statistics. Ed. and K. . O. . tafts Ica ' .. Ryan. New York: John WHey. A. 11. This partitioning is known as the multivariate analysis o/variance (MANOVA). L.. A repeated measures design. 2. H. 2 Multiple m mvana e (1987). 0 sum. WK F "A New Graphical Method for Detectmg Smgle . Paired Comparisons Measurements are often recorded under different sets of experimental conditions to see whether the responses differ significantly over these sets. 3. . U· : and Multivariate Data. ' . 27 ' R H k' "The Analysis of Incomplete Data. 0 k Mathematical Statistics: Basic Ideas and Selected Topics. T. . P. is explicitly considered. . Hartley. Iohnson.138.). 7. R. " B' . no. J:. . For example. BIOmetriCS." CommunIcatIOns In COMPARISONS OF SEVERAL MULTIVARIATEMEANS 6.). Block. We begin by considering pairs of mean vectors. 9851001. along with modifications required to analyze growth curves. .. I (2nd ed. . .' I H db k of Engineering Statistics (2006). 2000. Categorical Data Analysis (2nd ed. . In later sections. . no.. Johnson.783808. . L'k rhood Estimation from Incomplete Data. useful in behavioral studies. "MaXimum I e 1 (1958) 174194. . relation in MultIvanate amp es. . N. band P. M. The corresponding test statistics depend upon a partitioning of the total variation into pieces of variation attributable to the treatment sources and error.272 Chapter 5 Inferences about a Mean Vector References 1 A sti A. NI: PrentIce a. Discrete Multlvanate AnalysIS.1 'f: ant Introduction The ideas developed in Chapter 5 can be extended to handle problems involving the comparison of several mean vectors. 36. (1991) Institute of . .153162. gre . H.. we discuss several comparisons among mean vectors arranged according to treatment levels. S . Hartley. H. mg. IOmetrrcs. A. J. the efficacy of a new drug or of a saturation advertising campaign may be determined by comparing measurements before the "treatment" (drug or advertising) with those 273 .. Pham. .A. IC L l d "A Linear CombmatlOns Test for Detectmg ena or8. Similarly. . 9 (1982). To circumvent these problems. H 11 2000 Vo!. the notation becomes a bit cumbersome. we shall often review univariate procedures for comparing several means and then generalize to the corresponding multivariate cases by analogy. Because comparisons of means frequently (and should) emanate from designed experiments. Bickel. StatisticsTheory and Methods. v k J h WI 's . BaconSone . R. . ". A. a d R L' "Multivariate Statistical Process Control Schemes for Control9. and .. t M S' h "Robust Statistics for Testing Mean Vectors 0 f M uI' tlvana e 11.W Holland B' h Y M M S E Fem erg. H. . The theory is a little more complicated and rests on an assumption of multivariate normal distributions or large sample sizes. 1977.. Theory 4. 10.). Upper Saddle River. M L .
1 dJ. and assume.. and responses can be compared to the effects of the treatments.2. That is. By design.t. Consequently. l:d) random vectors.•• .. Dn be a random sample from an Np ( 8. regardless of the form of the underlying population of Proof. an alevel test of Ho: 8 = 0 versus HI: 8 0 for an N p ( 8.p are both large. D 2 . follows from (428). n should reflect only the differential effects of the treatments. . Then TZ = has a tdistribution with n . for n andn . corresponding to the random variables in (65).274 Chapter 6 Comparisons of Several Multivariate Means after the treatment. The exact distribution of T2 is a restatement of the summary in (56). on average. and let X jZ denote the response treatment 2 (or the response after treatment) for the jth trial. 0. The approximate distribution of TZ. In general. .1. the variable l5 ._ (Dj _l5)z 1 j=l 2: Dj n J=I 1 Il and Sd = _. two treatments. . Here d and Sd are given by (68).1 dJ. l:d) population.l)p F d (n _ p) () a X 2jZ = variable 2 under treatment 2 X 2j p = variable p under treatment 2 where Fp.8) (67) (68) where _ D Yn n where = D =  _ 1 n j=I n 2: Dj 1 " and = . is to assign both treatments to the same or identical (individuals.p large.. let X jI denote the response treatment 1 (or the response before treatment). j = 1.. n. . . T Z is approximately distributed as a random variable. and so forth). • d .X j2 ) is provided the statement * is distributed as an [( n . We label the p responses within the jth unit as Xli! = variable 1 under treatment 1 Xl j2 The condition 8 = 0 is equivalent to "no average difference between the two treatments. are measurements recorded on the jth unit or jth pair of like units. . than treatment 2. see [11]. in addition.8)'Sd I (D .np random variable. D jz .1.. . Dn are independent N p ( 8._I(a/2) _ Vn Sd :5 8 :5 d + fll I(a/2) Yn  Sd (64) (For example. with vectors of differences for the observation vectors. distribution.p) )Fp.p dJ. A 100(1 .. One rational approach to comparing two treatments. an alevel test of n(D . n differences . Given that the differences Dj in (61) represent independent observations an N (0... .n_p(a) is tEe upper (l00a)th percentile of an Fdistribution with p and n . for j = 1.D)(Dj .8)'S. The paired responses may then analyzed by computing their differences.a) % confidence interval for the mean difference 0 = E( Xi! . that (66) If. If nand n . j = 1. stores. d j p).) Additional notation is required for the multivariate extension of the pairedcomparison procedure. .2: n 1 1 n j=I (Dj .?(D .8 t=Sd/ Paired Comparisons and a Repeated Measures Design 275 and the p paireddifference random variables become = X lj1 Dj2  X ZiI = X lj2 X 2j2 X 2jp (65) Djp = X ljp  Let Dj = fDjI ." For the ith variable. Specificall y. Djp). T Z = n(D . ••• .. It is necessary to distinguish between p responses. (Xjl.2.8) Ho: 0 versus = 0 HI: 0 0 may be conducted by comparing It I with tll _l(a/2)the upper l00(a/2)th percentile of a tdistribution with n . l:d) population rejects Ho if the observed = variable 2 under treatment 1 * 1 under treatment 2 X lj p = . > 0 implies that treatment 1 is larger. whatever the true 8 and l:d' . . plots of land..2. .. dj2 . inferences about the vector of mean differences 8 can be based upon a TZstatistic. Let the differences Db Oz.. thereby eliminating much of the of extraneous unittounit variation. or the presence and sence of a single treatment. Given the observed differences dj = [djI . In other situations. inferences about 8 can be made using Result 6. two or more treatments can be aOInm:istelrl'j to the same or similar experimental units. In the single response (univariate) case.1 )p/ (n . . TZ = nd'SId > (n .. D I . n. .D)' 2: Result 6. and n experimental units.
26 88.47 J199. Since T2 = 13.27J [ .74) State lab of hygiene X2j2 (SS) X2jl (BOD) [J2: 13.32.p large. The data are displayed 111 Table 6. we reject Ho and conclude that there is a nonzero mean difference between the measurements of the two laboratories. simultaneous confidence intervals for the individual mean [Ji are given by 1)p (610) (n _ p) Fp. for n = 11 sample splits.9(·05) = 9.2. Other choices of a1 and a2 will produce siIl1ultaneous intervals that do not contain zero. what is their nature? The T 2 statistic for testing Ho: 8' = [01. that the commercial lab tends to produce lower BOD measurements and higher SS measurements than the State Lab of Hygiene. 100( 1 .:"as sent to the Wisconsin State Laboratory of Hygiene. The 95% simultaneous confidence coefficient applies to the entire set of intervals that could be constructed for all possible linear combinations of the form al01 + a202' The particular intervals corresponding to the choices (al = 1.38J 418. It appears.3. 13. a2 = 1) contain zero. from the two laboratories. Table 6. en  g X2jl X2j2 19 22 18 27 12 4 10 1 14 17 9 4 19 10 7 d j2 = 10 42 15 11 4 60 2 where di is the ith element of ii.05. treatment plants are required by law to momtor t elr lSC arges mto t was t ewa er . I ( . b'l' fd t f rivers and streams on a regular basis. These intervals are Checking for a mean difference with paired observations) Municipal Examp Ie 6 .a)% confidence region for B consists of all B such that ( d . and this result is consistent with the T 2test.B) n( n . a 2 ) = [O. Taking a = .1 Effluent Data Commercial lab Xlj2 (SS) Xljl (BOD) Samplej 27 6 1 23 6 2 64 lR 3 44 8 4 30 11 5 75 34 6 26 28 7 124 71 8 54 43 9 30 33 10 14 20 11 Source: Data courtesy of S.np(a) \j.47.27 ' s d = [199.0026 13.47.lI_p(a) . The point iJ = 0 falls outside the 95% confidence region for li (see Exercise 6.0055 .61 11 or (5.) The Bonferroni simultaneous intervals also cover zero.p)JFp.OJ is constructed from the differences of paired observations: dj! = Xljl Xlj2  differences Also. yet the hypothesis Ho: iJ = 0 was rejected at the 5% level.1).p ) Fp.6 > 9. 01: d] ±  ( ) Fp np(a) .n_p(.p»)Fp.36 ± V9.B) Sd (d . (See Exercise 6. _ . then all simultaneous intervals would include zero.36J .1. np' n = 9. we find that [pen 1)/(n . 6 n .36J 13. Concern about the rella 1 Ity 0 a a rom one of these selfmonitoring programs led to a study in samples of effluent were divided and sent to two laboratories for testing.61 a i ± i: d (610a) where t _t(a/2p) is the upper 100(a/2p)th percentile of a tdistribution with n T2 = l1[ 9.05) = [2(1O)/9)F2 .) .276 Chapter 6 Comparisons of Several Multivariate Means Paired Comparisons and a Repeated Measures Design A lOD( 1 .71.38 88.t (n1)p (69) Do the two laboratories' chemical analyses agree? If differences exist.26 .and is the ith of Sd' . What are we to conclude? The evideQ.25) 25 28 36 35 15 44 42 54 34 29 39 15 13 22 29 31 64 30 64 56 20 21 The 95% simultaneous confidence intervals include zero. from inspection of the data.46.27 = 13.36 . The 95% simultaneous confidence intervals for the mean differences a1 and 02 can be computed using (610).11 or (22.1 dJ. (If the hypothesis Ho: li '" 0 were not rejected.0012 . Onehalf of each sample .0012J [9. . a2 '" 0) and (aJ = 0.27 ± V9.47 )418.ce points toward real differences. and onehalf was sent to a prIvate merciallaboratory routinely used in the monitoring of biOchemical oxygen demand (BOD) and suspended solIds were o?tamed.a)% simultaneous confidence mtervals for the individual mean differences are Here d= and d = 2 [9.l)p/(n . For n . [en .lI_p(a) = Xp(a) and normalIty need not be assumed. Weber. h' d' h . ' The Bonferroni 100(1 .
name repeate e ures s ems om the fact that all treatments are administered to each unit. (See Exercise The numerical results of this example illustrate an unusual circumstance can occur when.) These data can be transformed to data more nearly normal. Whenever an investigator can control the aSSignment of treatments to experimental units. A ttention IS usually t d ' Ea h .13) IS . . cen ere on contrasts when comparing treatments. two outliers. IS WIse 0 ca cu ate these differences in order to check normality and the assumptIOn of a random sample. and hence T2. A separate independent randomization is conducted for each pair.3.. Differences. X2p] (611) j = 1. the . and the conclusions not pertain to laboratory competence. ar y.·. In fact.is. 2"". an appropriate pairing of units and a randomized assignment of ments can' enhance the statistical analysis. may be calculated from the fullsample quantities x and S. t I I 1. e Xj.. n d = ex Thus. measuring techniques.n and S is the 2p x 2p matrix of sample variances and covariances arranged as S == S21 (pXp) 522 (pxp) where X ji is the response to the ith treatment on the .2.·. (See 6.. the situation further complicated by the presence of one or.e p vana es on treatment 2. Further.2. between supposedly identical units must be identified and mostalike units paired.'th unl't The d m as t fr . m a t' (6 . is ignored by the test t s a IShc presented m thIS section. X12. We conclude our discussion of paired comparisons by noting that d and Sd. . smce Ci . Here x is the 2p x 1 vector of sample averages for the p variables on the two treatments given by x' == [XII.9) that j = 1.."" Xl p' X2l> Xn.278 Chapter 6 Comparisons of Several Multivariate Means Paired Comparisons and a Repeated Measures Design 279 Our analysis assumed a normal distribution for the Dj.making inferences. .1 actually divided a sample by first shaking it then pouring it rapidly back and forth into two bottles for chemical analysis. Each row e of . Defining the matrix f the sample variances and covariances for the p variables on . The two laboratories would then be working with the same. and Sd = esc' (614) Like pairs of experimental units {6 Treatments I and 2 assigned at random 2 3 n t D D Treatments I and 2 assigned at random t D ••• 0 D ···0 Treatments I and2 assigned at random (615) d 0 th th and it .arbIances computed from Observations on pairs of treatment 1 and treatment 2 vana les. if any. . but with small sample. c contrast IS perpendIcular to the vector l' = [1 1 1]' '1 . experimental units. "". or even like.. . a random assignment of treatment 1 to one unit and treatment 2 to the other unit will help eliminate the systematic effects of uncontrolled sources of variation. I or . S12 = Sh are the matrices of sample cov. I" pec 0 a smg e response variable. a contrast vector because its elements I nx e' In sum t 0 zero. possibly. and so forth. (px2p) e = r 0 0 1 0 0 1 1 0 0 0 1 (613) 0 0 j (p + 1 )st column we can verify (see Exercise 6. One can conceive of the process as follows: Experimental Design for Paired Comparisons th . . The remaining treatment is then assigned to the other unit. the overall treatment sum. 22 contaIns the sample variances and covariances computed or .0 Th com t 1" . not necessary first to calculate the differences d d hand t . Randomization can be implemented by flipping a coin to determine whether the first unit in a pair receives treatment 1 (heads) or treatment 2 (tails). The experimenter in Example 6. n' n eo er . This prudent because a simple division of the sample into two pieces obtained by the top half into one bottle and the remainder into another bottle might result in suspended solids in the lower half due to setting. t ••• Treatments I and2 assigned at random t A Repeated Measures Design for Comparing Treatments Atnothter generalization of the univariate paired tstatistic arises in situations where q rea ments are compared with res tt . it is difficult to remove the effects of the outlier(s). Finally. Each subject receIves each treatment once over successive periods of time Th eJ 0 servatlOn IS .
This follows because each C has the largest possible number.CIL). .nq+l(a) is the upper (lOOa)th percentile of an Fdistribution q q _ 1 and n . l = ILq . because their q . .q + 1) q1.1) or Both Cl and C are called contrast matrices. In one 19 dogs were initially given the drug pentobarbitol. and 4. the hypothesis that there are no differences in treatments (equal treatment means) becomes CIL = 0 for any choice of the contrast matrix C. Paired Comparisons and a Repeated Measures Design 281 . and IL4 correspond to the mean responses for treatments 1.a)% c?nfIdence mtervals for single contrasts c' IL for any contrast vectors of interest are gIven by (see Result 5A.(BC2SCiBTI(BC2) = = Q(C Sq)I C2 • so T2 computed with C2 orC I = BC2gives the same result. 2 (ILl .0 is as follows: Reject Ho if (n . There are three treatment contrasts that might be of interest in the experiment. We shall analyze the anesthetizing effects of CO 2 pressure and halothane from thIS repeatedmeasures design.1)(q .(ILl + + + IL2) = j=1 Halothane contrast representing the) difference between the presence and ( absence of halothane between hIgh and Iow CO 2 pressure It can be shown that T2 does not depend on the particular choice of C.1) (616) T2 = n(Cx)'(CSCTICX > (n _ q + 1) FqI. Then (IL3 x= 1 LJ  n Xj and S = 1 LJ (Xj =1 j=1 n ) ( Xj x  )' x l + 1L4) + IL3) + IL4) .(CSCTlCX C02 pressure Table 6.ILq [' (n . IS determmed by the set of all CIL such that n(Cx . we have means 2 C x and covariance matrix CSC'. . Consequently. Here x and S are the sample mean vector and covanance matrix defined. When the treatment means are equal.2. experimenter should randomize the order in which the treatments are presented to each subject. An alevel test of Ho: CIL = 0 (equal treatment means) versus HI: CIL *. all perpendicular to the vector 1. . based on the contrasts CXj in the observations.(CSCT\Cx . respectively.1 rows are linearly' 2 independent and each is a contrast vector. Each dog was then admIlllstered carbon dioxide CO 2 at each of two pressure levels.1) F ( ) (n . The milliseconds between heartbeats. Let ILl .1) F ( ) (n . and the administration of CO 2 was repeated. IL3. These could be 1 0 0 1 ILl :. l:) population.nq+1 a )CIsc n (618) = C 21L Example .3.1.(IL2 IL4) = (C0 2 contrast. we consider contrasts of the components IL = E(X j ). C1IL = C 2IL = O. was measured for the four treatment combinations: Present Halothane Absent Low High n(Cx). simultaneous 100(1 . Next halothane (H) was added. (ILl .1)(q .2 contains the four measurements for each of the 19 dogs. and we test CIL = 0 using the T statistic T2 = : ] : . Then (BC2). In general.q + 1) ql.2 (Testing for equal treatments in a repeated measures design) Improved 0 0 0 1 1J ILq anesthetIcs are often developed by first studying their effects on animals. of linearly independent rows. Of course. . by Treatment 1 Treatment 2 Treatment 3 Treatment 4 = high CO 2 pressure without H = Iow CO 2 pressure without H = high CO2 pressure with H = Iow CO2 pressure with H . and let C be a contrast matrix.280 Chapter 6 Comparisons of Several Multivariate Means For comparative purposes. with IL the mean of a normal population.ILql C'IL: c'x ± )(n  1)(q . where Test for Equality of Treatments in a Repeated Measures Design Consider an N q ( IL.IL3 = .q + 1 dJ.nq+l(a) where F 1. A region for contrasts CIL.6. Consequently. q . respectIvely.(IL2 IL3) = Contrast representing the influence ) of halothane on CO 2 pressure differences ( (H C02 pressure "interaction") . representing the difference) I Any pair of contrast matrices Cl and C2 must be related by Cl = BC2.CIL) :5 ILl . The nature of the design eliminates much of the influence of unittounit variation on treatment comparisons. with B nonsingular.nq+1 ex (617) 1 0 0 x S are as defined in (616).
The test in (616) is appropr iate when the covariance matrix.44 The presThe first confidence interval implies that there is a halotha ne effect.) (X) = l:. (Ideally.L2 + J.97 C= The data (see Table 6. at both occurs This ts.282 Chapter 6 Compari sons of Several Multivariate Means Paired Comparisons and a Repeate d Measure s Design 283 Table 6.05 ± VlO. Similarly.94 2 )9432. from t differen ntly significa (J.92 5195.60.12.) or (17J in design block" ized a discussion of the "random rZ = n(Cx)'( CSCTl (Ci) = 19(6.70 d by where ci is the first row of C. contrast ion interact levels of CO2 pressure . is not there is an confidence interval.35 4065.(J.L4): .05) )CiSCl = 16" 3 .21J 404.16(·05) = 16 (3.2 SleepingDog Data With a = .(J. heartbea between times longer s produce ence of halotha ne .3 19 = 209.16(.42 7963.63 479. we construc t of n rejectio the for ble responsi are s contrast the of which see To (618). since the HC0 2 pressure third the (See zero.Il_q+l(a ) = nt effects).= 60.24) = 10.63 It can be verified that 209. rZ = 116> 10.l)(q .2) give 368. (For l: with the equal correlation structur [22). =: 0 (no treatme HQ.L3): .54 7557.Lz + J.l. 1. and we reject Ho: Cp.70 19 Source: Data courtesy of Dr.62] 1098.98 6851. If it higher have mind in e structur this with designed tests has a particular structure. e (814).89 1 1 1 i = f and S= f 2819.L3) .(li2 .31 ± v 10.05.79 ± 65.L4) . = (IL3 + IL4) . the contrast matrix C is HC02 pressure "interac tion" = (J.32 2295. those follow must ne trials with halotha determi ned at to a time trend. Atlee.= 12..11) = 116 .I . the time order of all treatme nts should be _ random.1) F3.79 ± VlO.92 927.05 .) The second confidence interval indicate s that between times longer s produce pressure CO 2 effect due to CO2 pressure : The lower heartbeats. With p. the remaining contrasts are estimate CO2 pressure influence = (J.(XI + X2) ± 18(3) F . Treatment 2 609 236 433 431 426 438 312 326 447 286 382 410 377 473 326 458 367 395 556 3 556 392 349 522 513 507 410 350 547 403 473 488 447 472 455 637 432 508 645 4 600 395 357 600 513 539 456 504 548 422 497 547 514 446 468 524 469 531 625 18(3) 18(3) (n .31 ± 73.L4) .' = [P.54 = [ 927.62 914.94. From (616).Ll + J.LI + J.L3).LI + J. see power than the one in (616).Ll + J.49 5303.31] Cx = 60.8 .94 4 )7557.14 2943.q + 1) Fq. Cov that l: assume to ble reasona is cannot be assumed to have any special structure.05 ± 54.J. the 95% simulta neous confide nce intervals for these contrasts.L2) is estimate d by the interval =: Dog 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 426 253 359 432 405 324 310 326 375 286 349 429 348 412 347 434 364 420 397 halotha ne influence (X3 + X4) . IL4j.4 19. ILz. the Some caution must be exercised in our interpre tation of the results because due be may t Heffec t apparen The without.94 (n .84 914.79 and CSC' 9432. [ 12. 209.29 3568. From contrast cip.94 4 )5195. IL3.44 4499.(J.26 502.32 1098.
284
Chapter 6 Comparisons of Several Multivariate Means'
Comparing Mean Vectors from l\vo Populations
285
6.3 Comparing Mean Vectors from Two Populations
A TZstatistic for testing the equality of vector means from two multivariate tions can be developed by analogy with the univariate procedure. (See [l1J for cussion of the univariate case.) This T 2 statistic is appropriate for <Ulnn,.r... ;;;' responses from oneset of experimental settings (population 1) with independent sponses from another set of experimental settings (population 2). The can be made without explicitly controlling for unittounit variability, as in pairedcomparison case. If possible, the experimental units should be randomly assigned to the sets experimental conditions. Randomlzation will, to some extent, mitigate the of unit"tounit variability in a subsequent comparison of treatments. Although precision is lost relative to paired comparisons, the inferences in the case are, ordinarily, applicable to a more general collection of experimental units simply because unit homogeneity is not required. . Consider a random sample of size nl from population 1 and a sample of', size n2 from population 2. The observations on p variables can be arranged as follows: Sample (Population 1) XII,xI2"",XlnJ (Population 2) X21, XZ2, ... , X2n2 In this notation, the first subscriptl or 2denotes the population. We want to make inferences about (mean vector of population 1)  (mean vector of population 2) =
ILl  IL2 = ILl  ILz.
Further Assumptions When nl and n2 'Are Small
1. Both populations are muItivariate normal. 2. Also, = (same covariance matrix). (620)
The second assumption, that = is much stronger than its univariate counterpart. Here we are assuming that several pairs of variances and covariances are nearly equal. When
n2
=
=
j=1
L
n1
(xlj  XI) (Xlj  xd is an estimate of (n}  and
L(X2j  X2)(X2j  xz)'isanestimateof(n2  j=1 information in both samples in order to estimate the common covariance We set
(621) Summary statistics Since
n2 
1 dJ., the divisor (nl  1) + (nz  1) in (621) is obtained by combining the two component degrees of freedom. [See (424).J Additional support for the pooling procedure comes from consideration of the multivariate normal likelihood. (See Exercise 6.11.) To test the hypothesis that ILl  IL2 = 8 0 , a specified vector, we consider the squared statistical distance from XI  Xz to 8 0 , Now, £(XI  X 2)
j=1
L (Xlj 
XI) (xlj  xd has
nl 
1 dJ. and
j=1
L (X2j 
X2) (X2j  xz)' has
= £(XI)
 £(X 2)
= ILl
 ILz
For instance, we shall want to answer the question, Is ILl = IL2 (or, equivalently, is O)? Also, if ILl  IL2 * 0, which component means are different? With a few tentative assumptions, we are able to provide answers to these questions.
Since the independence assumption in (619) implies that Xl and X 2 are independent and thus Cov (Xl, Xz) = 0 (see Result 4.5), by (39), it follows that   COV(XI Xz) = Cov(Xd +
 ) Cov(X z
= 1
nl
+ 1
nz
= (1  + 1)
nl nz
(622)
Because Spooled estimates
we see that
(:1
Assumptions Concerning the Structure of the Data
1. The sample XII, X I2 ,.·., X ln1 , is a random sample of size nl from a pvariate population with mean vector ILl and covariance matrix 2. The sample X 21 , X 2Z , ... , X 2n2 , is a random sample of size n2 from a pvariate population with mean vector IL2 and covariance matrix (619) 3. Also, XII, X IZ ,"" XlnJ' are independent ofX2!,X zz "", X 2n2 . We shall see later that, for large samples, this structure is sufficient for making inferences about the p X 1 vector ILl  IL2' However, when the sample sizes nl and n2 are small, more assumptions are needed.
+
:J

Spooled
is an estimator of Cov (X I  X 2). The likelihood ratio test of
Ho: ILl
ILz = 8 0
is based on the square of the statistical distance, T2, and is given by (see [1]). Reject Ho if
T = (XI  X2  ( 0 )' [ (:1 +
Z
:JSPooled JI (XI 
X2  ( 0 )>
Z (623) C
286
Chapter P Comparisons of Several Multivariate Means
Comparing Mean Vectors from Two Populations 287
where the critical distance cZ is determined from the distribution of the twosample T 2.statistic.
Result 6.2. IfX ll , X 12 ' ... , XlIII is a random sample of size nl from Np(llj, I) X 2 1> X 22, ••. ' X 21lZ is an independent random sample of size nz from N p (1l2, I),
We are primarily interested in confidence regions for III  1l2' From (624), we Zof Xl  xz constitute conclude that all III  112 within squared statistical distance C the confidence region. This region is an ellipsoid centered at the observed difference Xl  Xz and whose axes are determined by the eigenvalues and eigenvectors of Spooled (or S;;';oled)'
Example 6.3 (Constructing a confidence region for the difference of two mean vectors)
T = [Xl  Xz  (Ill  Ilz)]
is distributed as
2



, [(
nl + nz Spooled
1
1)
Jl ( [XI  X z  III  Ilz)j
(n! + ( nl + nz 
nz  2)p
P  1)
Fifty bars of soap are manufactured in each of two ways. Two characteristics, Xl = lather and X z = mildness, are measured. The summary statistics for bars produced by methods 1 and 2 are
XI = [8.3J 4.1'
Fp.",+I7,pl
SI = Sz =
Consequently,
P
[ 1   (Xl Xz  (Ill  Ilz» , [ ( III 1 ) Spooled + nz
JI (Xl  X2 
(Ill  1l2» s c
zJ
= 1  er .
X = [1O.2J 2 3.9'
U!J !J
(624) where
Obtain a 95% confidence region for III  1l2' We first note that SI and S2 are approximately equal, so that it is reasonable to pool them. Hence, from (621),
49 SI + 98 49 Sz = [2 Spooled = 98 1 51J
Proof. We first note that
Also,  Xl X2 = [1.9J .2
_ 1 X  X =  X ll
1 2
n1
1 1 IX 1X IX +X + '" +  XI  21  22  '"  2 n1 I2 nl "I n2 nZ nZ "2
is distributed as
so the confidence ellipse is centered at [ 1.9, .2)'. The eigenvalues and eigenvectors of Spooled are obtained from the equation
by Result 4.8, with Cl = C2 l/nz. According to (423),
= .. , =
C'"
= llnl
and C",+I
= C"I+2 = .. , =
j
C"'+"2
=
0= ISpooled  All = /2  A I / = A2  7A 15 A
+9
(n1  1 )SI is distributed as w,'I l (I) and (nz  1)Sz as W1l2 
Cl)
so A = (7 ± y49  36)/2. Consequently, Al = 5.303 and A2 = 1.697, and the corresponding eigenvectors, el and ez, determined from
i = 1,2
By assumption, the X1/s and the X 2/s are independent, so (nl  l)SI and (nz  1 )Sz are also independent. From (424), Cnl  1 )Sj + (nz  1 )Sz is then distributed as Wnl+nzz(I). Therefore, T2 =
(

are .290J el = [ .957 By Result 6.2, and ez = [
_ .290
1
nl
+
1
nZ
_  )1 /2(Xl Xz 
(Ill 
, Ilz»
1
(
nl
1
+
1
nZ
)l/Z  (Xl  X z random vector
.957J
(Ill  IlZ)
= (multivariate normal)' (Wishart random matrix)I (multivariate normal)
random vector
= N (0, I)' [Wn l +nr
P nl
dJ.
1 1) 2 (1 ( nl + n2 C = 50
+ 50 (97) F2•97 (·05)
1 ) (98)(2)
= .25
2(I)J1 N (0, I) + nz  2 P
since F2,97(.05) = 3.1. The confidence ellipse extends
which is the TZ·distribution specified in (58), with n replaced by nl (55). for the relation to F.]
+ n2

1. [See •
v'A;
\j
1(1.. + 1..) c
nl n2
2
=
v'A; v'25
..
288 Chapter 6 Comparisons of Several Multivariate Means
Comparing Mean Vectors from lWo Populations 289 are both estimators of a'1:a, the common popUlation variance of the linear combinations a'XI and a'Xz' Pooling these estimators, we obtain
2.0
pooled
== (nl + 112  2)
== a' [111 ';
(111  I)Sf,a
+ (I1Z 
2 SI + 111 '; 2 S2 J a
[a'(X I  X 2  (ILl ILz»]z a ,( 1 + 1 ) Spooleda 111 I1Z
(625)
== a'Spooled a
To test Ho: a' (ILl  ILz) == a' 00, on the basis of the a'X lj and a'X Zj , we can form the square of the univariate twosample 'statistic
1.0
Figure 6.1 95% confidence ellipse forlLl  IL2'
(626)
units along the eigenvector ei, or 1.15 units in the el direction and .65 units in the ez direction. The 95% confidence ellipse is shown in Figure 6.1. Clearly, ILl  ILz == 0 is not in the ellipse, and we conclude that the two methods of manufacturing soap produce different results. It appears as if the two processes produce bars of soap with about the same mildness (Xz), but lhose from the second process have more lather (Xd. •
According to the maximization lemma with d = (XI  X 2 B == (1/111 + 1/11z)Spooled in (250),

(ILl  IL2»
and
z (XI   ta:s: X z  (ILl  ILz» , [(1
== T Z for all a # O. Thus,
(1  a) == P[Tz:s: c Z] = P[t;:s: cZ,
11.1
+ 1 ) Spooled
I1.z
JI (XI 
Simultaneous Confidence Intervals
It is possible to derive simultaneous confidence intervals for the components of the vector ILl  ILz· These confidence intervals are developed from a consideration of all possible linear combinations of the differences in the mean vectors. It is assumed that the parent multivariate populations are normal with a common covariance 1:.
for all a]
==p[la'(X I
Xz) 
a'(ILI  ILz)1 :s: c
a ,( 1 + 1 ) Spooleda nl I1Z
for all
a]
•
where cZ is selected according to Result 6,2.
Result 6.3. Let cZ == probability 1  a.
[(111
+
I1Z 
2)p/(nl +
I1Z 
P  1)]Fp.l1l+n2pI(a). With
Remark. For testing Ho: ILl  ILz == 0, the linear combination a'(X1  xz), with coefficient vector a ex  xz), quantifies the largest popUlation difference, That is, if T Z rejects Ho, then a'(xI  Xz) will have a nonzero mean. Frequently, we try to interpret the components of this linear combination for both subject matter and statistical importance. Example 6.4 (Calculating simultaneous confidence intervals for the differences in mean components) Samples of sizes 111 == 45 and I1Z == 55 were taken of Wisconsin homeowners with and without air conditioning, respectively, (Data courtesy of Statistical Laboratory, University of Wisconsin,) Two measurements of electrical usage (in kilowatt hours) were considered, The first is a measure of total onpeak consumption (XI) during July, and the second is a measure of total offpeak consumption (Xz) during July. The resulting summary statistics are XI
will cover a'(ILI  ILz) for all a. In particular ILli

ILZi
will be covered by
+ Sii,pooled
111 112
for i == 1,2, ... , p
Proof. Consider univariate linear combinations of the observations
XII,XIZ,,,,,X1nl
and X21,X22"",XZn2
given by a'X lj == alXljl + a ZX lj2 + ., . + apXljp and a'X Zj == alXZjl '+ azXZjz + ... + ap X 2jp ' These linear combinations and covariances a'X1 , a'Sla and a'Xz, a'S2a, respectively, where Xl> SI, and X 2 , Sz are the mean and covariance statistics for the two original samples, (See Result 3.5.) When both parent populations have the same covariance matrix, sf.a == a'Sla and == a'Sza
=
[204.4J 556.6'
. [13825.3 SI == 23823.4
23823.4J 73107.4 '
[130.0J Xz == 355.0'
Sz ==
[8632,0 19616.7J 19616.7 55964.5 '
nz == 55
290
Chapter 6 Comparisons of Several Multivariate Means
Comparing Mean Vectors from TWo PopuJations
291
(The offpeak consumption is higher than the onpeak consumption because there are more offpeak hours in a month.) Let us find 95% simultaneous confidence intervals for the differences in the mean components. Although there appears to be somewhat of a discrepancy in the sample variances, for illustrative purposes we proceed to a calculation of the pooled sample covariance matrix. Here
nl  1
n2 
300
200
Spooled and
= nl
+
n2 
2 SI + nl +
1 2 S2
n2 
[10963.7 21505.5J 21505.5 63661.3
100
o
P"  P21
Figure 6.2 95% confidence ellipse for
JLI  JL2
= (f.L]]
 f.L2], f.L12 
f.L22)·
=
(2.02)(3.1)
= 6.26
 X2)' (See Exercise 6.7.)
With ILl  IL2 = [JLll  JL2!> JL12  JL22), the 95% simultaneous confidence intervals for the population differences are
JLlI JL2l:
The coefficient vector for the linear combination most responsible for rejection 
(204.4  130.0) ± v'6.26
45 + 55 10963.7
(onpeak)
The Bonferroni 100(1  a)% simultaneous confidence intervals for the p population mean differences are
or
21.7 :s:
JL12 
JLlI 
JL2l
:s: 127.1
JL22: (556.6  355.0) ± V6.26
74.7 :s:
+ 515)63661.3
(offpeak)
where
nl
tnJ +nz2( a/2p)
is the upper 100 (a/2p )th percentile of a tdistribution with
or
JL12 
+
n2 
2 dJ.
JL22
:s: 328.5
We conclude that there is a difference in electrical consumption between those with airconditioning and those without. This difference is evident in both onpeak and offpeak consumption. The 95% confidence ellipse for JLI  IL2 is determined from the eigenvalueeigenvector pairs Al = 71323.5, e; = [.336, .942) and ,1.2 = 3301.5, e2 = [.942, .336). Since
The TwoSample Situation When 1: 1 =F 1:2
When II *" I 2 . we are unable to find a "distance" measure like T2, whose distribution does not depend on the unknowns II and I 2 • Bartlett's test [3] is used to test the equality of II and I2 in terms of generalized variances. Unfortunately, the conclusions can be seriously misleading when the populations are nonnormal. Nonnormality and unequal covariances cannot be separated with Bartlett's test. (See also Section 6.6.) A method of testing the equality of two covariance matrices that is less sensitive to the assumption of multivariate normality has been proposed by Tiku and Balakrishnan [23]. However, more practical experience is needed with this test before we can recommend it unconditionally. We suggest, without much factual support, that any discrepancy of the order eTI,ii = 4eT2,ii, or vice versa, is probably serious. This is true in the univariate case. The size of the discrepancies that are critical in the multivariate situation probably depends, to a large extent, on the number of variables p. A transformation may improve things when the marginal variances are quite different. However, for nl and n2 large, we can avoid the complexities due to unequal covariaI1ce matrices.
and
vx; )
+ c2 = v'3301.5 )
U5 +
;5) 6.26
= 28.9
we obtain the 95% confidence ellipse for ILl  IL2 sketched in Figure 6.2 on page 291. Because the confidence ellipse for the difference in means does not cover 0' = [0,0), the T 2statistic will reject Ho: JLl  ILz = 0 at the 5% level.
292
Chapter 6 Comparisons of Several Multivariate Means
Result 6.4. Let the sample sizes be such that 11)  P and 112  P are large. Then,
Comparing Mean Vectors from Two Populations
293
approximate 100(1  a)% confidence ellipsoid for 1'1 satisfying

1'2 is given by all 1'1

Example 6 .• S (Large sample procedures for inferences about the difference in means)
We shall analyze the electricalconsumption data discussed in Example 6.4 using the large sample approach. We first calculate 1 S
111
[x\  Xz  (PI  I'z)]'
+ [x) 111
112
xz  (I')  I'z)]
$
1
+
1 S
I1Z
1 [13825.3 2 = 45 23823.4
= [ 886.08
23823.4J 1 [ 8632.0 19616.7J 73107.4 + 55 19616.7 55964.5
where (a) is the upper (l00a }th percentile of a chisquare distribution with p d.f. Also, 100(1  a)% simultaneous confidence intervals for all linear combinations a'(I')  I'z) are provided by a'(I')  1'2) belongs to a'(x)  Xz) :;I:
464.17
886.08J 2642.15
V
\j;la' (l..8 1 + l..sz)a
I1r 112
The 95% simultaneous confidence intervals for the linear combinations a '( 1')  ILz ) and
= [0][1'11 1,
=
1')2  I'Z2
 I'ZIJ
= 1'1)
= 1'12
 I'ZI
Proof. From (622) and (39),
£(Xl  Xz) = 1'1  I'z
a '( ILl  ILz )
are (see Result 6.4)
1'))  I'ZI: J.L12  J.L2Z:
[0,1 ] [1'))  1'21] 1'12  1'22

1'2Z
and
74.4 ± v'5.99 v'464.17 201.6 ± \15.99 \12642.15
or or
(21.7,127.1) (75.8,327.4)
By the central limit theorem, X)  Xz is nearly Np[l')  ILz, 11Z I z]· If Il and I2 were known, the square of the statistical distance from Xl  X 2 to 1')  I'z would be
I
Notice that these intervals differ negligibly from the intervals in Example 6.4, where the pooling procedure was employed. The T 2statistic for testing Ho: ILl  ILz = 0 is T Z = [XI 
18 + 8 1 xz]' [ 1 2
11) I1Z
Jl
[XI  X2] 886.08JI [204.4  130.0J 2642.15 556.6  355.0 20.080J [ 74.4J 10.519 201.6 and, since T Z
This squared distance has an approximate x7,distribution, by Result 4.7. When /11 and /12 are large, with high probability, S) will be close to I) and 8 z will be close to I z· Consequently, the approximation holds with SI and S2 in place of I) and I 2, respectively. The results concerning the simultaneous confidence intervals follow from Result 5 A.1. •
Remark. If 11)
= [ 556.6  355.0
204.4. 130.0J' [464.17 886.08 201.6] (104) [
= [74.4
For er
59.874 20.080
= 1566
.
= .05,
the critical value is
= 5.99
= 15.66 >
= 5.99, we reject Ho.
= I1Z = 11, then (11
112

1)/(11
+
11 
2)
= 1/2, so
The most critical linear combination leading to the rejection of Ho has coefficient vector
1 1  SI +  S2
/1)
=  (SI + S2) =
=
1 /1
(11 
1) SI + (11  1) 82 (1 1)  +11 + n  2 11 n
a ex:
(l..8
/11 I
+ l..8
/12 2
)1 ( _)
Xl Xz
= (104) [
59.874 20.080
20.080J [ 74.4J 10.519 201.6
SpoOJedG
+;)
= [.041J
With equal sample sizes, the large sample procedure is essentially the same as the procedure based on the pooled covariance matrix. (See Result 6.2.) In one dimension, it is well known that the effect of unequal variances is least when 11) = I1Z and greatest when /11 is much less than I1Z or vice versa.
.063
The difference in offpeak electrical consumption between those with air conditioning and those without contributes more than the corresponding difference in onpeak consumption to the rejection of Ho: ILl  ILz = O. •
294 Chapter 6 Comparisons of Several Multivariate Means A statistic similar to T2 that is less sensitive to outlying observations for and moderately sized samples has been developed byTiku and Singh [24]. if the sample size is moderate to large, Hotelling's T2 is remarkably unaffected slight departures from normality and/or the presence of a few outliers.
Comparing Mean Vectors fromTho Populations 295 For normal populations, the approximation to the distribution of T2 given by (628) and (629) usually gives reasonable results.
One can test Ho: ILl  IL2 = .a when the population covariance matrices are unequal even if the two sample sizes are not large, provided the two populations are multivariate normal. This situation is often called the multivariate BehrensFisher problem. The result requires that both sample sizes nl and n2 are greater than p, the number of variables. The approach depends on an approximation to the distribution of the statistic
An Approximation to the Distribution of r2 for Normal Populations When Sample Sizes Are Not Large "
Example 6.6 (The approximate T2 distribution when l:. #= l:2) Although the sample sizes are rather large for the electrical consumption data in Example 6.4, we use these data and the calculations in Example 6.5 to illustrate the computations leading to the approximate distribution of T Z when the population covariance matrices are unequal. We first calculate
 [13825.2 23823.4J
nl
I 
= [307.227
45 23823.4
73107.4
529.409J 529.409 1624.609
1
nz S2 =
1 [8632.0 19616.7] 55 19616.7 55964.5
=
[156.945 356.667] 356.667 1017.536
which is identical to the large sample statistic in Result 6.4. However, instead of using the chisquare approximation to obtain the critical value for testing Ho the recommended approximation for smaller samples (see [15] and [19]) is given by
and using a result from Example 6.5,
nl
Consequently,
+
n2
= (104) [ 59.874 20.080]
20.080 10.519
vp F T  vp + 1 P.vp+1
2 _
where the d!,!grees of freedom v are estimated from the sample covariance matrices using the relation
(629) [
where min(nJ> n2) =:; v =:; nl + n2' This approximation reduces to the usual Welch solution to the BehrensFisher problem in the univariate (p = 1) case. With moderate sample sizes and two normal populations, the approximate level a test for equality of means rejects Ho: IL I  ""2 = 0 if
(XI  Xz  (ILl  IL2»' [ SI + S2 nl n2
307.227 529.409
529.409] (104) [ 59.874 1624.609 20.080
20.080] = [ .776 10.519 .092
.060J .646
and
+ = [ .776 nl nl nz .092
Further,
.060][ .776 .646 .092
.060] .646
= [ .608 .085]
.131 .423
1
1
J (Xl  Xz I 
(ILl  ILz»
>
v
_ vp + 1 Fp.v_p+l(a)
p
where the degrees of freedom v are given by (629). This procedure is consistent with the large samples procedure in Result 6.4 except that the critical value is vp replaced by the larger constant v _ p + 1 Fp.v_p+l(a). Similarly, the approximate 100(1  a)% confidence region is given by all #LI  ILz such that
(XI  X2  (PI 
[
and
156.945 356.667](104)[ 59.874 356.667 1017.536 20.080
20.080] = [.224 10.519 .092
 .060] .354
]1 (Xl _  Xz _IL2»' nl SI + n2 Sz
1 1 [
(""1  ""2»
vp
=:; v _ p
+ 1 Fp, vp+l(a)
(630)
= n2 nl + l...sz]I)Z n2
[
.224 .060][ .224 .060] [.055 .092 .354 .092 .354 = .053
.035] .131

296
Chapter 6 Comparisons of Several Multivariate Means
Comparing Several Multivariate Population Means (Oneway MANOVA) 297
Then
2. AIl populations have a common covariance matrix I. 3. Each population is multivariate normal. Condition 3 can be relaxed by appealing to the central limit theorem (Result 4.13) when the sample sizes ne are large. A review of the univariate analysis of variance (ANOVA) will facilitate our discussion of the multivariate assumptions and solution methods.
= 55 {(.055
1
+ .131) + (.224 + .354f} =
A Summary of Univariate ANOVA
In the univariate situation, the are that XCI, Xez, ... , XCne is a random sample from an N(/Le, a 2 ) population, e = 1,2, ... , g, and that the random samples are independent. Although the nuIl hypothesis of equality of means could be formulated as /L1 = /L2 = ... = /Lg, it is customary to regard /Lc as the sum of an overalI mean component, such as /L, and a component due to the specific population. For instance, we can write /Le = /L + (/Le  IL) or /Lc = /L + TC where Te = /Le  /L. Populations usually correspond to different sets of experimental conditions, and therefore, it is convenient to investigate the deviations Te associated with the eth population (treatment). The reparameterization
Using (629), the estimated degrees of freedom v is
v
and the a
vp
=
=
.0678 + .0095
2 + 2z
= 77.6 = 6 76. 3.12 = 6.32
155.2
.05 critical value is
77.6 X 2 0' ,·_p+I(.05) = 77. 6  2 + 1 F?776,+l05) v  p + . . 1 .
From Example 6.5, the observed value of the test statistic is rZ = 15.66 so hypothesis Ho: ILl  ILz = 0 is rejected at the. 5% level. This is the same cOUlclu:sioIi reached with the large sample procedure described in Example 6.5. As was the case in Example 6.6, the Fp • v  p + 1 distribution can be defined noninteger degrees of freedom. A slightly more conservative approach is to use integer part of v.
ILe
( eth pOPUlation) mean
+
OVerall) ( mean
Te
eth population ) ( ( treatment) effect
(632)
leads to a restatement of the hypothesis of equality of means. The null hypothesis becomes Ho: Tt = T2 = ... = Tg = 0 The response Xc;, distributed as N(JL form
6.4 Comparing Several Multivariate Population Means (OneWay MANOVA)
Often, more than two populations need to be compared. Random samples, "V'.n..",,,u.,,,,,,, from each of g populations, are arranged as Population 1:
Xll,XI2, ... ,Xlnl
+ Te, a 2 ), can be expressed in the suggestive +
(
Te
XC;
=
/L
+
ec;
(overall mean)
treatment) effect
(random) (633) error
Population 2: X ZI , X zz , ... , X2", Population g: X gI , Xgb ... , Xgn g MANOVA is used first to investigate whether the population mean vectors are the same and, if not, which mean components differ significantly.
where the et; are independent N(O, a 2 ) random variables. To define uniquely the model parameters and their least squares estimates, it is customary to impose the constraint
t=1
±
nfTf
= O.
Motivated by the decomposition in (633), the analysis of variance is based upon an analogous decomposition of the observations,
XCj
x
overall ) ( sample mean
+
(XC  x)
estimated ) ( treatment effect
+ (xe;  xc)
(634)
Assumptions about the Structure of the Data for OneWay
L XCI, X C2 ,"" Xcne,is a random sample of size ne from a population with mean e = 1, 2, ... , g. The random samples from different populations are
(observation)
(residual)
where x is an estimate of /L, Te = (xc  x) is an estimate of TC, and (xCi  xc) is an estimate of the error eej.
treatmen t. or 216 = 128 + 78 + 10. .t xd + 2(xt  x)(xej . should be approxImately equal. + (Xg .xe)2 (635) SS } + (Wifuin + g S S) 2 (x) (:1 j:1 g 2: x7i "i' (n] + n2 + .x)2 e we get + (XCj .1. ·..xc) (XCi . + (x. the mean vector xl = [x" .7 is so basic that the algebrai c equivale nt will now be develop ed. conside red as vectors. and obtain 2. without having to calculate' the individual residuals . An analysis of variance proceeds by comparing the relative SIzes of and SSres· If Ho is true.j:] (Xti .n ....X3) = 4 + (2 . XI. Its squared length is Similarly.Xe estimate of zero. Subtrac ting x from both sides of (634) and squaring gives (XCi .x) = n(xc .2 Population 3: 3.!. summing both sides over (XCi . The observatIOn vector y can he anywhe re m n = nl + n2 + . The breakup into sums of apportio ns variability in the combined samples into mean. The size of an array is quantified by stringing the of the array out mto a vector and calculating its squared length. Ho should. called the sum of squares (SS). Population 1: 9. That is. : Xl "l' Xz j.SSlr' Howeve r. Xgll ].. we could obtain SSre. this is false econom y because plots of the residuals provide checks on the assumpt ions of the model.x) + .(XCj . and (error) compon ents. for example.1 (The sum of squares decomposition for univaria te ANOVA ) Consider Comparing Several Multivariate Population Means (Oneway MANOV A) 199 " the following independent samples. = Y". /=1 (XCi .x)2 = ' :)07'("::' t ru:)fu' _ ') 3 1 2 observation (xCi) 4 4 4 mean + 2 2 2 treatment effect (xe . . = 12 + (_2)2 + 12 + (If + 12 + 12 + (1)2 + 02 = 10 The sums of squares satisfy the same decomposition..4) + (3 .x/ Z + 2. XgngJ. and the treatment effect vector 1 1 0 0 0 0 (SSobs) j:1 2. treatme nt effects. are perpend icular whateve r the observa tion vector y' = [XlI. This quantity IS. these arrays. and residuals are orthogonal.9 population 2: 0.. + n climensIOns.X)2 = (xc . For an ar set of let [XII. SSobs = SSmean + SSlr + SSre. x]' must lie along the equiang ular line I.9.6. .. The vector represen tations of the arrays involved in the (634) also have geometr ic interpre tations that provide the degrees of freedom . . Consequently. be rejected . because SS res = SSobS .XC) or ± ± ±i.' . For the observations.  x) 1 = (Xl . X2I'···' xz Il2 ' . + (X2 .3.SSme. we have verified that the arrays represen ting the mean.x of Te always satisfy neTe = O.xc) and the residual sum of squares is SSre.) If the treatment contribution is large.. as the observat ions. by subtract ion. .2) = 4 + (2) + 1 The sum of squares decomp osition illustrat ed numerically in Exampl e 6. 2 Since.x) } n2 + . + n g )x2 + Th uestion of equality of means is answered by assessing whether the 'be relative to the residuals. I.. ... (634). each Tc is an ma t es Te ..x)u g }n. variances computed from SSlr and SSre..xel = 0.  }n.•. X3 = (3 + 1 + 2)/3 = 2 and x = (9 + 6 + 9 +0 +2 3 + 1 + 2)/8 = 4..x) 0 0 1 1 0 0 0 0 0 0 1 (XI .xel z Next.0 .198 Chapter 6 Comparisons of Several Multivariate Means Example 6. .X)UI + (X2 .n) + (SSres) (636) + In the course of establishing (636). Consequently. SS c:] 2: nc(xc  g x)2 {:I Ir = 42 + 42 + 42 + (_3)2 + (3f + (2)2 + (_2)2 + (_2)2 = 3(4 2) + 2(3f + 3(2j2 = 78 = 42 + 42 + 42 + 42 + 4 2 + 4 2 + 42 + 4 2 = 8(4 2) = 128 (SSme.x/ + (xCj  We can sum both sides over j. wefind that 3 = X31 = + (X3 . . · . (Our esticont n u IOn 0 f the treatment array is large g . Under Ho. X21l2' . we construct the vector y = [9.6.x)uz + .x) + 1 1 0 residual (xCi . note that j:1 . (:"we<n ncCxc . 2J.2..
f.g) = 78/2 = 195 10/5 71 . U2.g = (3 + 2 + 3) .f. XCj where F 1 :2:n _g(O') is the upper (I00O')th percentile of the Fdistribution with g _ 1 c .27.. (See Exercise 6.es SS. First we note that the product (XCj x)(XCj  x)' ..10." + (xe . The calculations of the sums of squares and the associated degrees of freedom are conveniently summarized by an ANOVA table.g '" (nl + n2 + . to SSt" and n .) Thus. we specify the MANOVA model: f=l j=1 2: 2: (XCj  ne C=1 Lne .1 = n that is perpendicular to their hyperplane. . by appealing to the univariate distribution theory. + ng) . .g . to SSmean.2..g dJ.(g ..." + Te + eCj."" u g • Since 1 = Ul + U2 + .es + SSt.g (638) the eCj are independent Np(O.1) ./SS.1) According to the model in (638). and TC represents the eth treatment effect with O. ANOVA Table for Comparing Univariate Population Means Source of variation neatments Residual (error) Total (corrected for the mean) Sum of squares (SS) SSt. + n g • Alternatively./(g ..es or for large values of 1 + SSt.f.g C=1 ± MANOVA Model For Comparing g Population Mean Vectors X Cj IS g ne. but the covariance matrix l.) Since F = 19. is the same for all populations..es· The statistic appropriate for a multivariate generalization rejects Ho for small values of the reciprocal x ( overall mean. we find that these are the degrees of freedom for the chisquare distributions' associated with the corresponding sums of squares.7..8 CA univariate ANOVA table and Ftest for treatment effects) Using the information in Example 6..nc and e = 1. . . SS./(g . The usual Ftest rejects Ho: 71 = 72 = .X)Ul + . .1 d. to SS.. Thus.[(Xl . Degrees of freedom (d. we reject Ho: effect) at the 1% level of significance.s(. The residual vector. Here the parameter vector. ] A vector of observations may be decomposed as suggested by the model. To summarize. we attribute 1 d.3 = 5 88 C=1 L nc .es = Comparing Several Multivariate Population Means (Oneway MANOVA) 301 Example 6.1 = 7 F = SSt. l.01) = 13.1) SSres/(l.2.. j = 1.x)u g ] is perpendicular to both the mean vector and the treatment effect vector and has the freedom to lie anywhere in the subspace of dimension n . Sum of squares SStr = Degrees of freedom 78 g1=31=2 (=1 SS.x) estimated) treatment ( effectTc + (XCj  Xe) (639) '2:ri (observation) _ eCj 1 1 + SSt.5 > F2 . /SS.1 =. we have thefoIlowingANOVA table: Source of variation neatments Residual Total (corrected) Consequently.es = 10 SScor = ± g ne .(Xl) . and it is always perpendicular to the treatment vector.nc ./S5. The errors for the components of Xc' are lated. the mean vector has the freedom to lie anywhere along the onedimensional equiangular line and the treatment vector has the freedom to lie anywhere in the other g . This is equivalent to rejecting Ho for large values of SSt." + ug . + (x g .1 mensions. (637) The decomposition in leads to the muItivariate analog of the univariate sum of squares breakup in (635). each component of the observation vector XC' satisfies the univariate model (633)..300 Chapter 6 Comparisons of Several Multivariate Means lies in the hyperplane of linear combinations of the g vectors 1I1.e = y .g degrees of freedom. .es SS. = 72 = 73 = 0 (no treatment _ C=1 = 2: ne(xc g g x)2 XC)2 g1 g Multivariate Analysis of Variance (MANOVA) Paralleling the univariate reparameterization. = 7 g = 0 at level a if L neTc = C=1 SSt.es' The total number of degrees of freedom is n = + n2 + .. the mean vector also lies in this hyperplane." an overall mean (level).) variables...
and Roy' s largest root statistic can also be written as particular functions ofthe eigenvalues ofW.xc)' 2: ne C=I g g g 2Wilks' lambda can also be expressed as a function of the eigenvalues of Ab A 2 .) . It e plays a dominant role in testing poo for the presence of effects. all of these statistics are. Hence.1).x)(XCj .xc)' is too small.x)'. com pon ent by com pon ent.x)' Seve ral MuIt ivari ate Popu latio n Mea ns (One way MAN OVA ) 303 = [(x!. Table 6.3. the rank of B.x)(XCj .. Iwl IB+wl tota l (correcte sum d») of squares an d cross ( products / (residual (Wi thin ) sum ) of squares and cross prod ucts I± C=I j=1 .Z ('I:. Formally.i )' + (Xt.p p 1) VA VA * *) 1) 2) e VA VA * *) A* e FgI. (xc .x){xc .ic)(xCj . . As of W1B as B + W = (xc. . A* = Iwl/lB + wl No.n erl) Fp. Z('I:.2) 2: ne(xe (=1 g x) (ic .. the LawleyHotelling statistic. For othe r cases and larg e sam ple sizes.P P (Ln e . (xc. "f g ic) (XCj . For exam ple.302 Chap ter 6 Com paris ons of Several Multivariate Means xc) + (Xt .s(X t.. a modification of A* due to Bart lett (see [4]) can be used to test Ho. we may conS.x)(XCj  x)'1 (642) (640) The within sum of squares and cross prod ucts matrix can be expressed as W= C=I j=1 2: L (xej . essentially equivalent. (See the addit ional discussion on page 336. The exac t distributIon of A * can be deri ved for the special cases liste d in Table 6. g . . MANOVA Table for Comparing Popu lation Source of variation Treatment Residual (Error) Total (corrected for the mean) B= (Ln c g.x)' g1 W= t=1 j=1 L 2: (xc. Othe r statistics for checking the equality of multivariate means.i)' The sum over j of the middle two expressions is the zero matrix.X)(Xtj .3 Dist ribu tion ofW ilks ' Lam bda. We reject Ho if the ratio of gene raliz ed variances C=1 /=1 (x. The quan tity A * = IWill B + w I.: :1 g.ic) + (xc x)J' = (XCj . of variables grou ps Sam plin g distr ibut ion for mul tiva riate norm al data p = = (n) + (n2  + .xc)' . of No. we summarize the calculations leading to the test statistic in a MAN OVA table. (See [1]. prop osed orig inally by Wilks (see [25]). + (ng  I)Sg (641) where Se is the sample covariance matr ix for the fth This matr ix is a gener.xc)' + (Xe .. 'I:..1 g) e A* ) A* p= 2 p. .) One test of Ho: TI = TZ = '" = Tg = 0 involves generalized variances.ne g FZ(g I)..1B. For large samp les.::2 g= 2 g= 3 (Ln c . Equivalen tly. c . The degr ees of free dom corr espo nd to the univ ariat e geom etry and also to som e mul tiva riate distr ibut ion theo ry involving Wis hart densities.'I :.g g.j  This tabl e is exactly the sam e form .it) = O.1 where s = min (p. corr espo nds to the equi vale nt form (637) of the Ftest of Ho: no trea tmen t effects in the univ ariat e case . summing the cross product over e and j yields can be written as (XCj . 1 g. such as Pillai's statistic.x)] [(XCj .•• .1)SI g "I Xe)(Xfj . f the (n + n2 .Ider the re I" atlve SlZes 0 fth e residual and total (corrected) sum of squares and cross products. n.x? beco mes (xc .p. as the ANO VA table.  A* = xc) (XCj .1 (Ln e . t ted by considering the relative sizes of the treatment and residual Ise s sums of squares and crosS products. =Tg = 0 .x)' + c=) treatment <_Between») sum of squares and ( cross products ± . Analogous to the univariate result.x)' + (Xt . "'' (/ x) (xc' ..:: 1 F Zp. Wilk s' lamb da has the virtu e of bein g conv enie nt and rela ted to the likel ihoo d ratio z criterion. the hypotheSIS of no trea tme nt effects.nep 1 Mean Vectors Matrix of sum of squares and cross products (SSP) Deg rees of free dom (dJ. exce pt that squa res of scal ars are repl aced by thei r vect or coun terp arts. .: :2 Ho: T) = T2 = ..X)(Xc . (=1 /=1 1: (xc.x)(x c .i)' = / nc(xc .2) S ) d matrix enco a}Izat) on 0 ) untered III the twosample case.x)' (=1 j=1 g nl e=1 ne.) p.xc) (xc .
we reject Ho at significance level a if .(11? 6215 .0385 88(72) ... Proceeding row by row in the arrays for the two variables.SSmean = 272 .160 = 11 WithXl = andx = [!l x2 = [:J X3 = Thus.. + 2(7) = 149 Total (corrected) cross product = total cross product .SSmean = 216 ..1) dJ.128 = 88 Repeating this operation for the obs.(1)2 239 = .ervations on the second variable.7. and residual in our discussion of univariate ANOVA.200 = 72 (p + g») 2 In ( ) IB Iwl + wl > x7.(n . and (n1(P+g»)lnA*=(n1(P+g»)ln( IWI) IB+ WI (643) SSobs = SSmean + SStr Total SS (corrected) + SSres 272 = 200 + 48 + 24 has approximately a chisquare distribution with peg .. we get 10 . n2 = 2. and n3 = 3. Consequently.(gl)(a) is the upper (l00a)th percentile of a chisquare distribution with peg .(gl)(a) (644) where x. + 0(1) = 1 Total: 9(3) + 6(2) + 9(7) + 0(4) + . treatment effect.= . the MANOVA table takes the following form: Source of variation Treatment Residual Matrix of sum of squares and cross products Degrees of freedom We have already expressed the observations on the first variable as the sum of an overall mean. Arranging the observation pairs Xij in rows. we obtain These two singlecomponent analyses must be augmented with the sum of entrybyentry cross products in order to complete the entries in the MANOVA table. for Lne = n large. we obtain the cross product contributions: Mean: 4(5) + 4(5) + '" + 4(5) = 8(4)(5) = 160 Treatment: 3(4)(1) + 2(3)(3) + 3(2)(3) = 12 Residual: 1(1) + (2)(2) + 1(3) + (1)(2) + .1) dJ.304 Chapter 6 Comparisons of Several Multivariate Means Comparing Several Multivariate Population Means (Oneway MANOVA) 305 Bartlett (see [4]) has shown that if Ho is true and 2 2 Ln( = n is large. Example 6.1  = SSobs . we have (! 7) 8 9 7 (observation) 5) 5 5 5 (mean) + 1) 3 3 3 ( treatment) effect + 3) 0 11 (residual) Using (642).9 CA MANOVA table and Wilks' lambda for testing the equality of three mean vectors) Suppose an additional variable is observed along with the variable introduced in Example 6.mean cross product = 149 . The sample sizes are nl = 3. We found that [ 78 12 10 12J 48 2!J l1J 72 3 . A* IWI 11 = IB + WI = 88 11 24 11 72 I 111 10(24) ..1= 2 (P:) (observation) and G::) (mean) + J treatment) ( effect + (: [ [ 1 3+2+33=5 :) Total (corrected) (residual) 88 11 7 SSobs = SSmean + SStr + SSres Equation (640) is verified by noting that 216 = 128 + 78 + 10 Total SS (corrected) = SSobs .
003 .596 x2 = .7 of Chapter 4. we can use the result in Table 6.J oJ S = 2 lS61 .3 they were pooled [see (641)] to obtain .7714 :2:: ne = 516 3However. intermediate care facility. computed on a perpatientday basis and measured in hours per patient day.006 with a percentage point of an Fdistribution having Vi = 2(g .g .Xf Source: Data courtesy of State of Wisconsin Department of Health and SociatServices.428 l Also.x Group Number of observations .383 A* = IB IWI + WI = . and government).017 .l)SI + (n2 .3 for g = 3.g1)'.581 2. . the residual vectors eej == Xej . based on factors such as level of care.408 8.962 4.01. and B  £. 1.025 .19 > F4.610 . were selected for analysis: XI == cost of nursing labor.1) = 8 19 v'A 3.) X (Xc .360 l2.007 .125 .002 oJ should be examined for normality and the presence of outhers using the techniques discussed in Sections 4.018 .01) = 7.695 .1) == 4 == 2( Lne .004 .001 When the number of variables.8(.0385) (8 3 . or a combination of the two).005 .037 .011 .235 . nonprofit organization. the MANOVA table is usually not constructed.001 .167] _ . no difference in average costs among the three types of ownersprivate. Example 6.306 Chapter 6 Comparisons of Several Multivariate Means Comparing Several Multivariate Population Means (Oneway MANOVA) Sample covariance matrices 307 Since p = 2 and g = 3.011 . p. we compare the test statistic * ( 1 v'*A*) (Lne(g . and average wage rate in the state.001 . mean wage rate.X2 = cost of dietary labor. . and X 4 = cost of housekeeping and laundry labor.200 1.004 . V2 S3 = . To carry out the test.10 CA multivariate analysis of Wisconsin nursing home data) The Wisconsin Department of Health and Social Services reimburses nursing homes in the state for the services provided. Since the Se's seem to be reasonably compatible. it is good practice to have the computer print the matrices Band W so that especially large entries can be located.1 . is large.538 ] 182.418 _ X3 = l2.0385 \f. nonprofit. and 4. C=1 )' = nc(Xe . Also. A total of n = 516 observations on each of the p == 4 cost variables were initially separated according to ownership. a normaltheory test of Ho: I1 = I2 = I3 would reject Ho at any reasonable significance level because ofthe large sample sizes (see Example 6.003 . One purpose of a recent study was to investigate the effects of ownership Or certification (or both) on costs.01 level and conclude that tI:eatment differences exist. we reject Ho at a = . SI = l·291 . Still.273] .480 .000 .6.124.225 . Summary statistics for each of the g == 3 groups are given in the following table.1)S2 + (n3 . The department develops a set of formulas for rates for each facility. Since 8.066] .002 . Computerbased calculations give e = 1 (private) e = 2 (nonprofit) e = 3 (government) e=1 3 n2 = 138 _ XI = 2.082.010 .000 .001 ..821 .584 1.521 .1) == 8 dJ.030 .633 9. Table 6.I)S3 . equivalently. .1) = (1 V..453 . Four costs.000 .X3 = cost of plant operation and maintenance labor. Nursing homes can be classified on the basis of ownership (private party. and government) and certification (skilled nursing facility.12).394 6.230 Sample mean vectors To test Ho: 1'1 = 1'2 = 1'3 (no ownership effects or.3 indicates that an exact test (assuming normal_ ity and equal group covariance matrices) of Ho: 1'1 = 1'2 = 1'3 = 0 (no treatment effects) versus HI: at least one Te 0 is available. W = (ni .484 . l .
2:. depending on type of ownership.XCi is the difference between two independent sample means.g = 271 1 .01) == 2. n e .76 belongs to xki  Xc.. Var (Xki .428 .7714 v:77I4) Relation (528) still applies.(1 1)  Xe.01) == /s(.Te (or ILk . + 1.10 that average costs for nursing homes differ.484 T13 and n = 271 733 = .039 . For the model in (638). and Ho can be tested at the a = . A comparison of the variable X 3 . ± t n .S. costs of plant operation and maintenance labor. It is informative to compare the results based on this exact test With those obtained using the largesample procedure summarized in (643) and (644).g ( pg(g .020 .) nk ne with = X§(·01) =: 20. Let n = (646) Let a = . The twosample (based confidence interval is valid with an appropriately modified a.023 = . + ne Uii 182. so that W33 where Wji is the ith diagonal element of Wand n = n l + . + n g • J( 1 + n3 1) n . we have 6. where m = pg(g .5 to estimate the magnitudes of the differences.6? > F8•1020( .1)/2 pairwise = 17. . Example 6...7714) = 132..a).Te.2) (1 4 v.11 (Simultaneous intervals for treatment differencesnursing homes) We saw in Example 6.10. • _ _ Let Tki be the ith component of Tk· Smce Tk IS estimated by Tk = Xk . For the present example. p and all differences ith diagonal element of W.X ei) is by dividing the corresponding element of W by its degrees of freedom. we reject . Here Wii is the • We shall illustrate the construction of simultaneous interval estimates for the pairwise differences in treatment means using the nursinghome data introduced in Example 6.1)/2 is the number of simultaneous confidence statements.00614 1) 1. Var(Xki . Using (639) and the information in Example 6.3 = . and they require critical values only for the univariate tstatistic. e< k == 1. .09 .10.01 level by comparing (l .Ho at the 1 % level.p ( Simultaneous Confidence Intervals for Treatment Effects 309 It remains to apportion the error rate over the numerous confidence state 2) (1  p v'*A*) v'A = (516 . where U·· is the ith diagonal element of:t.408 8.003 _ _ (1 1) nk.) =  nk + ne n .1) a ) n .394 Consequently.67 differences.633 1.g Wii + 138 + 107 n1 = 516. .308 Chapter 6 Comparisons of Several Multivariate Means and 2:. we reject Ho at the 1% level and conclude that average costs differ. so that F2(4). .002 [ . . We can use Result 6.020 ..200 [ 1.581 2." " . [ . so each twosample tinterval will employ the critical value tn.D70j . those effects that led to the rejection of the hypothesis are of interest.g ( a/2m). with confidence at least en .51.01)/8 = 2. . Bonferroni approach (see Section 5.695 .. k=I f nk.S Simultaneous Confidence Intervals for Treatment Effects When the hypothesis of equal treatment effects is rejected. As suggested by (641).lLe)· These mtervals are shorter than those obtained for all contrasts.. This result IS consistent With the result based on the foregomg Fstatistic.484 9. For pairwise. for all components i = 1.09. Since 17. Result 6.01.J + 107 516 .137j .Tfi = XA.020 73 • = (X3 . Since > X§(·Ol) = 20. between privately owned nursing homes and governmentowned nursing homes can be made by estimating T13 .. .X)= • _ _ . depending on the type of ownership.023 .4) can be used to construct sImultaneous intervals for the components of the differences Tk . That is. Notice that Var(Tki .043 ___ _ .Xli) = 71=(X1.i(51O)(.962 W = 4.T33.X (645) and Tki . g. There are p variables and g(g . comparisons.5 In (.x) = _ _ .4 .) = Var(Xki .. .1  (p + g)/2) = 511.g (1.nr = n = 516 is large.51.
2 In A (see Result 5.061.u)M = (1 . where IS(l) I and IS(g) I are the minimu m andmore maximu m determi nant values. Table 1. g = ' i .026) 'T13 • 23 and 7"23 _ • 33 belongs to the interval (. .u){[ l)Jtn ISpooled I . == 'i..1)]ln I Spooled I .l)Sg} (649) Example 6. hen compari ng mean vectors.1) t {(nl _ l)SI + (nz . consequently.. test .87(.1)S2 + ... m' Chapter 11 when we discuss discrimina(Th' umptlon wIll appear agam d IS ass'fi f n) Before pooling the variatio n across samples to a tlOn an clas. Box's Test for Equality of Covariance Matrices 6. we can say a owne d nursmg 0 .sl ca 10 . Setting 21n A = M (Box's M statistic ) gives M = J( 1+ 1) nl n3 W33 n . the null hypothesIs IS Ho: 'i. If the null hypothesis is false. I Spooled I. th t . the product of the ratios in (644) will get closer to O. Then C = (1 .018.10. the presumed common covanance ma trix.00208) • _ == 3 for 95% simultan eous confidence we require Box's test is based on his X 2 approxi mation to the samplin g distribu tion of . .310 Chapter 6 Comparisons of Several Multivariate Means Testing for Equality of Covarian ce Matrices 311 == 2:87. To illustrat e. h 'ff' 's observed between nonprof it and government nursmg omes.. 2.019) . do not differ too much from the pooled covarian ce matrix.2:[(ne . we test the hypothe sis HO::I1 = :I2 = :I3 = 'i. r" ance matrix for the eth population. will lie somewh ere near the "middle " of the determi nants ISe I's of the individual group covarian ce matrices. As the latter quantiti es become more disparat e.021. l)ln I Se IJ}(652) + 1)(g . With g populations.043 ± . note that the determi nant of the pooled covarian ce matrix.2).1) _ 1) J[ 2p2 + 3p . ( 6 .00614) == . One common ly employed test the equa I y 0 ([8] [9]) test for equal covariance matrices is M . A will be near 1 and Box's M statistic will be small.) The 95% SImultaneous confidence statement is belongs to.061 hour per patient day than for privately nursmg homes IS Ig er y. e 1. but no dI erence 1 If the null hypothe sis is true. e matrices are not equal.. .. .g [2:(n e . the sample covarian ce matrices can differ more and the differen ces in their determi nants will be more pronoun ced. At significance level reject Ho if C > A= n e (I I Se I Spooled )(n CI)12 (648) I Box's approxi mation works well if each ne exceeds 20 and if p and g do not exceed 5..1) (651) where p is the number of variable s and g is the number of groups.12 (Testing equality of covariance matrice snursin g homes) We introduced the Wisconsin nursing home data in Exampl e 6.. T13 .4 7 ) 1 Set u  [2: 1 1 e (ne .. [8]) has provide d a more precise F approxi mation to the samplin g distribu tion of M. In that example the sample covarian ce matrices for p = 4 cost variables associat ed with g = 3 groups of nursing homes are displayed.6 Testing for Equality of Covariance Matrices . the ratio of the determi nants in (648) will all be close to 1. _ belongs to the interval (.Zp(p + 1) = Zp(p (1'. d when compari ng two or more multivar iate mean vecOne of the ma et' of the potentia lly different populati ons are the tors is that the ma nces .2 = .l)ln ISell e e (650) == . ing (&47) is given by (see [1]) 1 1 1 + 1) . + (ng .. In situations where these conditions do not hold.025) mainten ance and labor cost for governm entown ed We to . h mes With the same 95% confIden ce.Se is the sample covariance . K Here ne is the sample size for the eth group. as the ISf I's increase in spread. In this case A will be small and M will be relatively large. g. Box ([7J.043 ± 2.T33 ± t513(. = 'i. (653) degrees of freedom .1 ] 6(p + l)(g . matnx an d Spooled 'IS the pooled sample covanan ce matnx given by Spooled == 1 . . . . . it can be worthwhile to pooled covariance matrices. (See Appendix. In fact. the individual sample covarian ce matrices are not expecte d to differ too much and. .058. In this case.. or ( . Thus a difference m I . The alternative hypothesis is that at least . and I is where Ie IS the cova 1 . IS(1) I1I Spooled I reduces the product proporti onally than IS(g) I1I Spooled I increases it. Assumi ng multiva riate normal data. two of the f ons a likelihood ratio statistic for testAssuming multlvanate normaI popu Ia I.1) has an approxi mate X2 distribu tion with v = gzp(p . respectively. th's cost exists between private and nonprofit nursing homes. .
We conclude that the covariance matrices of the cost variables associated with the three populations of nursing homes are not the same.2. Let the two sets of experimental conditions be the levels of.5. The particular experimental design employed will not concern us in this book. . these experimental conditions represent levels of a single treatment arranged within several blocks. some differences in covariance matrices have little effect on the MANOVA tests. for instance.g k = 1. the MANOVA tests of means or treatment effects are rather robust to nonnormality.0133 g r = 1.1 s31 = 14. we shall briefly review the analysis for a univariate twoway fixedeffects model and then simply generalize to the multivariate case by analogy..4 Suppose there are g levels of factor 1 and b levels of factor 2.564. assume that observations at different combinations of experimental conditions are independent of one another..741) J = 289..926) + 106( 15. it is clear that Ho is rejected at any reasonable level of significance. we specify the univariate twoway model as n3 = 107 and 1 SI 1= 2. n2 == 138.2. . (See (10) and (17) for discussions of experimental design. In 1 s31 = 15. (T2) random variables. and 1Spooled 1 = 17. .n where = 20 degrees of freedom. .741 and In 1 Spooled 1= 15. . _ = [270 +137 + 106)(15. Level of factor 2 Figure 6. b The presence of interaction.. k = 1. respectively. .1 s21 = 89. f3 k represents the fixed effect of factor 2. . Figures 6. 2 (a) 3 4 Level of factor 2 Univariate TwoWay FixedEffects Model with Interaction We assume that measurements are recorded at various levels of two factors.0133)289.270 + 137 + 106 1 ][2W) + 3(4)  f3k + 'Yek 1. The factors discussed here should not be confused with the unobservable factors considered in Chapter 9 in the context of factor analysis.g. b e= + Te + + eekr (654) 1] 6(4 + 1)(3 _ 1) = . Taking the natural logarithms of the determinants gives In 1 SI 1= 17.926. with reasonably large samples.3 = 285. factor 1 and factor 2. and 'Ye k is the interaction between factor 1 and factor 2.) We shall.579 X 108 . . More broadly.. Denoting the rth observation at level of factor 1 and level k of factor 2 by X fkr .2. The expected response at the eth level of factor 1 and the kth level of factor 2 is thus JL 2: Te = k=1 2: f3k = e=1 2: 'Yek = k=1 2: 'Yek = 0 e=1 b g b and the elkr are independent + + Tt + + f3k + 'Yek Box's Mtest is routinely calculated in many statistical computer packages that do MANOVA and other procedures requiring equal covariance matrices.7 TwoWay Multivariate Analysis of Variance Following our approach to tile oneway MANOVA..783 e X ekr = JL u = [ 270 If + 137 + 106 1 . we may decide to continue with the usual MANOVA tests even though the Mtest leads to rejection of Ho. However.[270(17.. It is known that the Mtest is sensitive to some forms of nonnormality. and that n independent observations can be observed at each of the gb combi2 Level 3 of factor I Level I offactor I Level 2 offactor I 3 (b) 4 4The use of the tenn "factor" to indicate an experimental condition is convenient. To summarize. Here JL represents an overall level. with equal sample sizes. we have nl = 271. Thus the Mtest may reject Ho in some nonnormal cases where it is not damaging to the MANOVA tests. In 1 Sz 1= 13.397) + 137(13. 'Yek> implies that the factor effects are not additive and complicates the interpretation of the results...564) . We calculate lWoWay Mu/tivariate Analysis of Variance 313 .398 X 108.. however.397. Referring C to a i table with v = 4(4 + 1)(3 1)12 M N(O. Moreover. in the presence of nonnormality. normal theory tests on covariances are influenced by the kurtosis of the parent populations (see [16]).nations of levels. .3(a) and (b) show Level I offactor I Level 3 offactor I Level 2 offactor I 6. In some cases.312 Chapter 6 Comparisons of Several Multivariate Means Using the information in Example 6.2. Te represents the fixed effect of factor 1.3 and C = (1.539 X 10.10. 8 8 X 10.3 Curves for expected responses (a) with interaction and (b) without interaction...2.. . . mean) ( response ( overall) level ( effect Of) factor 1 ( effect Of) factor 2 2) + InteractIOn (655) e=I.
1) (662) C=1 k=! .1 = (g .i' k + i)' (661) = SSres = Total (corrected) SScor ±2: 2: ±2: 2: = b " f=1 C=I 2: 2: n(xCk k=1 k=l r=1 b n + X)2 (g .x)2 with the corresponding matrix (i e· .::£) random vectors. we specify the twoway fixedeffects model for a vector response consisting ofp components [see (654)] 2: 2: 2: (Xtkr (=1 k=1 . . respectively. = SSfacl + SSfac2 + SSint + SSres and the eCkr are independent Np(O. .n where 2: 'T C = k=1 2: Ih = C=I 2: 'Y Ck = k=1 2: 'Ye k = Q g b O. x'k is the average for the kth level of factor 2.and k. . SS.1» can be used to test for the effects of factor 1. respectively.1) (b ...2.1). Following (656).x)2 gbn . Xf· is the average for the eth level of factor 1. ic.1) + (X'k .. .f.b (659) Xc.1) (658) TheANOVA table takes the following form: ANOVA Table for Comparing Effects of Two Factors and Their Interaction Source of variation Factor 1 Factor 2 Interaction Residual (Error) SSfac1 = SSfac2 = SSint Sum of squares (SS) Degrees of freedom (d.xc·  i' k + i) + (XCkr  XCk) (660) + (b .1). In a manner analogous to (655). . . or SSco.i)(xc. .1 g b where i is the overall average of the observation vectors.=1 (Xek' . SSfact/(g .Xt· . Thus. Straightforward generalizations of (657) and (658) give the breakups of the sum of squares and cross products and degrees of freedom: 2: 2: 2: (XCkr (=1 k=1 r=1 n i)(XCk' . and ie k is the average of the observation vectors at the eth level of factor 1 and the kth level of factor 2.2. e= 1.x) gives Multivariate TwoWay FixedEffects Model with Interaction Proceeding by analogy.1) + gb(n .i' k + i) (iek .1)( b .1) (XCkr .i)'. we can decompose the observation vectors xtk..) g1 b . Squaring and summing the deviations (XCkr . each observation can be decomposed as e The Fratios of the mean squares. as XCkr = X + (xe· . and Xlk is the average for the eth level factor 1 and the kth level of factor 2. X'k + X)2 g r = 1. tbe responses consist of p measurements replicated n times at each of the possible combinations of levels of factors 1 and 2. and factor Ifactor 2 interaction.1) to the mean square.1) (g .es I (gb( n .314 Chapter 6 Comparisons of Several Multivariate Means TwoWay Mu/tivariate Analysis of Variance 315 expected responses as a function of the factor levels with and without interaction. and SSintl (g . .1 = (g .1) + (b . (See [11] for a discussion of univariate twoway analysis of variance. the generalization from the univariate to the multivariate analysis consists simply of replacing a scalar such as (xe. The absense of interaction means 'Yek = 0 for all .) where x is the overall average.1)(b . The vectors are all of order p X 1. .X'k + 2: gn(i' k k=l g i)(i' k  i)' k=1 g + 2: 2: n(itk t=1 k=l + b Xc· .x) + (XCk .. factor 2. SSfaczl(b .1)(b .x) The corresponding degrees of freedom associated with the sums of squares in the breakup in (657) are gbn .1) + (g .2.x)' = 2: bn(ic· C=I b g i)(xe· .=1 g b n x)2 = 2: bn(xf· f=1 g X)2 + 2: gn(x'k k=1 b X)2 X ekr = po + 'Te + Ih + 'Ytk + eCk.1) gb(n .1) + gb(n .1 Again.i)' (=1 b 2: bn(xe. i' k is the average of the observation vectors at the kth level of factor 2.fed gbn .. 2: gn(x'k b g x)2 x)2 Xc· .g + f=1 k=1 2: 2: n(Xfk  g b k = 1. is the average of the observation vectors at the etb level of factor 1.
gl *" x) (X'k .[ gb(n .P + 1 .. Let MANOVA Table for Source of variation Factor 1 Factor 2 Interaction Residual (Error) Total (corrected) SSPint = Factors and Their Interaction Matrix of sum of squares and cross products (SSP) SSPtacl = SSPtac2 = e=1 b 2: bn(xe· 2: gri(X'k  g x) (I.ISSPint + SSPres I A* = :"'""=. respectively.1).'Tm of the factor 1 effects and the components of Pk .=. the likelihood ratio test is as follows: Reject Ho: 'Tl SSPres = SSPcor = (=] 1: ±:± k=1 r=1 (XCkr  XCk)(XCkr  xcd gbn 1 = 'T2 = .f.. are Tti . reject Ho: I'll = 1'12 = '" = l' go = 0 at the a level if . belongs to (Xt. can be referred. that p 5The likelihood test procedures reqwre (with probability 1).l!p d. In any event.X'k + x) (Xlk .. interaction plots similar to Figure 6.it· .'Tm.x)' b. variatetests.. Small values of *" HI: Atleast one 1't k *" 0 ISSPresl .3. = 1'gb = 0 (no interaction effects) where A * is given by (666) and Xtgl)p(a) is the upper (l00a)th percentile of a Chisquare distribution with (g .i is the ith component of I. In the multivariate model. A *.e· .(g2 1 )(b l)JInA* > xTgI)(bl)p(a) i[.f. . . = Pb = 0 and HI: at least one Pk O.1)(? . Simultaneous confidence intervals for contrasts in the model parameters can provide insights into the nature of the·factor effects. These hypotheses specify no factor 1 effects and some factor 1 effects.p umvanate twoway analyses 0 f variance .Xm.1) . it is not advisable to proceed WIth the addltich0n .xm •• I L . /SSPfac2 /SSPres / + SSPres / (668) is conducted by rejecting Ho for small values of the ratio A* are consistent with HI' Once again..Tm. . so that SSPres will be positive where v = gb(n ..''"''.  ± tv Cg(ga_ l»))..x)' A test (the likelihood ratio test)5 of Ho: 1'11 versus = 1'12 = . = Pb = 0 (no factor 2 effects) at level a if For large samples..5 are available for the twoway model. we test for factor 1 and factor 2 main effects as follows. effects. :5 where A * is given by (668) and XTbI)p( a) is the upper (100a)th percentile of a chisquare distribution witlt (b . The Bonferroni approach applies to the components of the differences 'Tt . respectively.e· .e a clear in. When interaction effects are negligible.l)J 2 In A* > XtbI)p(a) (669) [gb(n . for large samples and using Bartlett's correction: Reject Ho: PI = P2 = .1 A* = '':':0.a)% simultaneous confidence intervals for 'Tei .Pq of the factor 2 effects. factor 2 effects are tested by considering Ho: PI = P 2 = .l)p d. consider the hypotheses Ho: 'Tl = 'T2 = . the factor effects do not hav.. where A * is given by (664) and xfgI)(bl)p(a) is the upper (lOOa)th percentile chisquare distribution with (g . we may concentrate on contrasts in the factor 1 and factor 2 main .17 others.. to a .. Ordinarily the test for interactIOn IS earned out before the tests for fects. best clarify the relative magnitudes of the main and interaction effects.t4.I. First. n Using Bartlett's multiplier (see [6]) to improve chIsquare approxlmatto .1)  p + 1 . is the ith diagonal element of E = SSPres . (one e for res eanses are often conducted to see whether the interaction appears m som po .erpallret8Itl( From a practical standpoint. = 'Tg = 0 (no factor 1 effects) at level a if P+1(g1)] 2 InA*>xfg_l)p(a) (667) (=1 k=1 2: 2: r=1 2: g b n gb(n1)[ (Xtkr  X)(Xfkr .(b .. Results comparable to Result 6. Ei.:. Wilks' lambda.x)' . The 100(1 . .1) P degrees of freedom.1). In a similar manner. Instead.. = 'Tg = 0 and HI: at least one 'Tt O. but with treatinent sample means replacing expected values._ _ ±± e=1 k=1 k=l /SSPresl I SSPtacl + SSPres I (666) n(Xtk .. If interadtion effects exist. provided that the latter effects exist. Those responses without interaction may be interpreted in terms of additive factor 1 and 2 effects.e. and xe.X'k + x)' so that small values of A * are consistent with HI' Using Bartlett's correction.i (670) go(n .316 Chapter 6 Comparisons of Several Multivariate Means The MANOVA table is the following: '!WoWay Multivariate Analysis of Variance 3.
3005 .2 1.dat'.79 6.8] 1.26550000 C.50150000 1.5445 .9240 16 Total (corrected) [42655 2395] 1.6 9.2 Low (10)% [5.9605 1 Interaction [ 0445] 1 Residual 3.1 5.9005 1.78500000 F Value 15.7 [6.x. We have considered the multivariate twoway model with replications.13 USING PROC GLM Low (1.4685 3.1 SAS ANALYSIS FOR EXAMPLE 6. data film. proc glm data =film.0855 .9095 74.) Example 6. (See [9]. High (10%) 9.13 (A twoway multivariate analysis of variance of plastic film data) The optimum conditions for extruding plastic film have been examined using a technique called Evolutionary Operation.f3 qi are f3ki .a) percent simultaneous confidence intervals for f3ki .6] 3. the 100(1 . manova h =factorl factor2 factorl *factor2/printe.8] X3 [6.11025000 Root MSE 0. That is.8] 4.5 9.4 5.0165 . the model allows for n replications of the responses at each combination of factor levels.2 9.0%) [6. X2 = gloss.8 High (1.1 9.9 3. and X3 = opacitywere measured at two levels of the factors.) In the course of the study that was done.i·qiis the ith component ofx·k .9 9. The measurements were repeated n = 5 times at each combination of the factor levels.f.6 9.586449 Source OF Sum of Squares 2. infile 'T64.9] [6.5%) title 'MANOVA'.7 6.0 8.7395 .4] [7.7] [7.4 Plastic Film Data Xl = tear resistance.13.9] X2 X3 X2 X3 General linear Models Procedure Class Level Information Class Levels FACTOR 1 2 FACTOR2 2 Number of observations in OF 3 16 19 RSquare 0.4205 .00050000 Pr> F 0.0011 0.8555 ] d.1 2. This enables us to examine the "interaction" of the factors.8 [6. the twoway model does not allow for the possibility oca general interaction term 'Yek· The corresponding MANOVA table includes only factor 1.9] [6. If only one observation vector available at each combination of factor levels. model xl x2 x3 =factorl factor2 factorl *factor2/ss3.5 [6.0700] . 1 where jJ and Eiiare as just defined and i·ki .1 [6.i·qi) a) fE::2 ± tv ( pb(b 1) (671) 1 change in rate ractor : of extrusion [1. The data are displayed in Table 6.5 10.D200 2.5 1.7405 SSP 1.5 [6. three responsesXl = tear resistance.9305] 1.2 [7.4] 6.3 8.6 9. Xz = gloss.9 9.00 Pr> F 0.2055 19 PANEL 6.9471 (continues on next page) .V.0] 4. (See Exercise 6.5045 1.6825 .76050000 0.83383333 0.7855 5.1 9.332039 Mean Square 1.9] [7.4] 3.2 Xz 4.f3 qi Source of variation n belongsto (i·ki .2 8. leading to the following MANOVA table: 6Additional SAS programs for MANOVA and other procedures discussed in this chapter are available in [13]. 4. and residual sources of variation as components of the total variation.4] L I Source Model Error Corrected Total Values 0 1 0 1 data set =20 F Value 7.0023 The matrices of the appropriate sum of squares and cross products were calcu6 lated (see the SAS statistical software output in Panel 6.1] 3.1] 0.6 [7.893724 Xl Mean 6.5520 64.318 Chapter 6 Comparisons of Several Multivariate Means TwoWay Multivariate Analysis of Variance 319 Similarly.1 9.6280 .4.9 9.6125 .56 OUTPUT Mean Square 0.7] [7.0183 0.3 8.0] [6.90 0.8 5. rate of extrusion and amount of an additive. Table 6.74050000 0. means factorl factor2. q • Comment.7] [7.5 2. and X3 = opacity Factor 2: Amount of additive n 2 amountof ractor : additive 1.2 10. factor 2.76400000 4. input xl x2 x3 factorl factor2.7325 4.1 ).5 [6.0 2.2] [7. class factorl factor2.4 8. PROGRAM COMMANDS Factor 1: Change in rate of extrusion 9.3 9.
764 0.43000000 X3SO 1.85379491 2. Numb!' 3 3 3 3 DenDF 14 14 14 14 Pr> F 0.47696510 0.79000000 4.92 3.57580861 I.81916667 0.3018 0.73 3.5543 To test for interaction.07 X2 0.49000000 SO 0.54450000 F Value 7.22289424 0.42018514 0.98000000 X2Mean 9.06000000 SO .54450000 Mean Square 0.14000000 9.77710.56015871 0. of no Overall fACTOR1 Effect 1 H = Type'" SS&CP Matrix for FACTORl S= 1 M =0.98 0.05775000 F Value 0.7771 .59000000 6. 0.08000000 1.960SOOOO Mean Square 3. we compute 3 3 A* = /SSPres / /SSPint + SSPres / 275.44000000 4.5 o Level of FACTOR2 N 10 10 Mean 3.62800000 5.08000000 XlSO 0.125078 Sum of Squares 9.30050000 0.5543 7.5315 Source Model Error Corrected Total H = Type III SS&CP Matrix for FACTOR 1*FACTOR2 S = ·1 M = 0.61814162 1.30123155 Pillai's Trace HotellingLawley Trace ROy's Greatest Root 0.576 Pillai's Trace HotellingLawley Trace Roy's Greatest Root 0.5 N=6 Value 0.2556 3 3 3 14 14 14 0.76 Pr> F 0. 51.3379 o Level of FACTOR 1 N 10 10 Mean ·6.350807 Type /11 SS 1.3018 RootMSE 2.16425000 \\ source Root M5E ·0.02 2.45750000 2.7517 0.21 0.08550000 C.014386 Mean Square 0.91191832 4.61877188 1.3385 1.29832868 0.7098 354.V.07 0.61877188 7.42804465 1HYpOthi!sis.3385 1.320 Chapter 6 Comparisons of Several Multivariate Means Two·Way Multivariate Analysis of Variance 321 PANEL 6.96050000 Source OF F Value 0.19151 Type /11 SS 0A20SOOOO 4.552 X3 o 1 Level of FACTOR 1 3.57000000 9.3018 0.91191832 0.42050000 4.405278 Mean Square 1.32 OF pillai's Trace HotellingLawley Trace Roy's Greatest Root 0.483237 Sum of Squares 2.28682614 0.F 1.0247 0.0247 0.92400000 74.V.924 N 10 10 X3Mean SO 3.09383333 4. 4.28150000 64.18214981 SO 0.7906 = .99 I Hypothesis of no Effect I i source Model Error corrected Total OF 3 16 19 R·Square 0.3385 1.3385 .40674863 0.552 64.32249031 X2Mean 9.3018 0. Xl X2 X3 E= Error SS&CP M'!trix Xl 1.5543 7.28682614 .90050000 3.2556 4.2881 0.612soOOo 0.49000000 7.0247 Manova Test Criteria and Exact F Statistics for the Hypothl!sis of no Qverall Effect E = Error SS&CP Matrix [ X3.1 OF 3 16 19 R·Square 0.1 (continued) (continued) Manova Test Criteria and Exact F Statistics for the F Value 4.55077042 2.90050000 3.628 0.61250000 0.47328638 o Manova Test Criteria and Exact F Statistics for the Level of FACTOR2 N 10 10 XlMean 6.300$0000 0.20550000 C.02 3.2556 4.10 1.
is usually of greater interest.3819 = ( (11. from (665).14('OS) = 3.."" JLlp] and 1'2 = [JLz!> JL22.. (See [1]. Further. (gb(n . In that exercise..1) = 1..p + 1.26 > F3. = 11 .3 + 1) = 14 and F3 ._ _L . The nature of the effects of factors 1 and 2 on the responses is explored in Exercise 6. whatever their nature.. we reach the same conclusion as provided by the exact Ftest.34.7771) = 3. pi + 1 From before.) In our case.P + 1)/2 1.1S. gb (n .31 + 1)/2 34 VI = (11(1) . _ F = (1 .55 L. the question of equality of mean vectors is divided into several specific possibilities.pi + 1.1bis brokenline graph is the profile for population 1. V2 = gb(n . Profiles can be constructed for each population (group). are the profiles linear?" is considered in Exercise 6.3819 A.5230 (11 .34 < F3.8 Profile Analysis Profile analysis pertains to situations in which a battery of p treatments (tests.. Assuming that the profiles are parallel."" JL2p] be the mean responses to p treatments for populations 1 and 2. .3 + 1)/2 . We shall concentrate on two groups. we reject Ho: 'TI = 'T2 = 0 (no factor 1 effects) at the S% level. connected by straight lines.(/Lli2 + iL2i2).. in practice the question of whether two parallel profIles are the same (coincident).14('OS) = 3. A.)Forourexample.5230 ISSPfacZ + SSP.p. .3819) (16 .JLlil (I (g  1)  pi + 1)/2 and _ (1 Fz A.P + 1 and VI = I (b .1) . simultaneous confidence intervals for contrasts in the components of 'T e and Pk are considered. are the population mean vectors the same? In profile analysis. we might pose the question.2.3 + 1)/2 = 1 . acceptable? V2 = (2(2)(4) .pi + 1. i = 1.) (gb(n .1) .1 For both g .1(1»/2] In(.05) = 7. we can formulate the question of equality in a stepwise fashion.5230) (16 .1) .OS) = 3..31 + 1)/2 = 4.OS) = 3. To test for factor 1 and factor 2 effects (see page 317). and they do so in an additive manner.4 The population profile p = 4. Fz = 4.34.(/Llil + /L2H) = (/Llil + iL2H) . = (1 . F = 1 A*) (I (g (A* (gb(n 1) .JLzil. are the profiles coincident? 7 Equivalently: Is H 02 : JLli = JLZi. The null hypothesis of parallel linear profIles can be written Ho: (/Lli + iL2i) .31+ 1)/2 2 3 4 l F2 and VI 1 . Are the profiles parallel? Equivalently: Is H01 :JLli .34.1347 . Similarly. JLI2 ._ . Although this hypothesis may be of interest in a particular situation. "Assuming that the profiles are parallel. it is assumed that the responses for the different groups are independent of one another. respectively.7771 (11(1) . In terms of the population profiles.p + 1)/2 (i (b .31 +1 = 3 V2 = (16 .0212 . The hypothesis Ho: 1'1 = 1'2 implies that the treatments have the same (average) effect on the two populations.1)(b . . Since F = 1. = ISSPres I = 275.3 + 1)/2 = 7. and we reject Ho: PI = pz = 0 (no factor 2 effects) at the S% level..1) gb(n 1) . acceptable? 2. p.34.l)(b .3. = JLzi .34. Ordinarily. All responses must be expressed in similar units. we calculate and = ISSPres I = 27S.12.322 Chapter 6 Comparisons of Several Multivariate Means For Profile Analysis 323 (g .l_ _l_ _l.1) . p.1) .(See[1]. i = 3.f. A plot of these means. and therefore.31 + 1) = 3 6.14( .pi + 1)/2 has an exact Fdistribution with VI = I(g . .p + 1)/2 l)(b . and so forth) are administered to two or more groups of subjects.pi + 1)/2 Mean response have Fdistributions with degrees of freedom VI = I(g . .7771) (2(2)(4) .26 '·J···· . Note that the approximate chisquare statistic for this test is (3 + 1 . we do not reject hypothesis Ho: 'Y11 = 'YIZ = 'Y21 = 'Y22 = 0 (no interaction effects).1) . .66.5S > F3. tively.. JLI3 .1) . We conclude that both the change in rate of extrusion and the amount of additive affect the responses.3 + 1) = 14 7The question. questions.14('OS) = 3.es I 527.14( . JL14] representing the average responses to four treatments for the first group..p + 1d. We have FI = 7..7098 = ..81. is shown in Figure 6. Since x1(.7098 = ISSPfac1 + SSPres I 722.1) . . Consider the population means /L 1= [JLII. JLl2._ _+ Variable FI Figure 6. i = 2. .4. Let 1'1 = [JLll.1 = 1 and b _(1 Pi = 1. F3 .
.. Test for Parallel Profiles for Two Normal Populations Reject HoI : C#'l Test for level Profiles. Consequently... Under this condition.X2)'C{ where + J l C(Xl  X2) > c 2 When the profiles are parallel. and l'X2. using the 8point scale in the figure below. or vice versa. the null hypothesis at stage 2 can be written in the equivalent form Example 6. When HOI and Hoz are tenable. Cl:C:) and NpI(CiL2. Assuming that the profiles are coincident.. X12.2.. Given That Profiles Are Parallel where C is the contrast matrix 1 C ((pI)Xp) = 1 0 0 1 1 0 0 (672) For coincident profiles. . Xl' ) . . CIC') distributions.1) ( ) (nl + n2 .324 Chapter 6 Comparisons of Several Multivariate Means Profile Analysis 325 3. the common mean vector #' is estimated.l)(p .'·" Xl nl and XZI> xzz. the first is either above the second (iLli > JL2j.. surveyed adults with respect to their marriage "contributions" and "outcomes" and their levels of "passionate" and "companionate" love. . Therefore. . using all nl + n2 observations.2.. .. an application of Result 6.2.. = iLp' and the null hypothesis at stage 3 can be written as H03: C#' = 0 where C is given by (672). and pooled covariance matrix CSpooledC" Since the two sets of transformed observations have Np1(C#'1. . j=l £...( "+ "I " "2) = nl + nz . nI. Hatfield. j = 1. n2' 4 5 6 7 8 .2..... E. for all i). then iLl = iL2 = . a sociologist. + iL2p = 1'1'"2 are equal. H02 : I' #'1 = I' #'2 2 3 We can then test H02 with the usual twosample tstatistic based on the univariate observations i'xli' j = 1.n2 and CX2j.. _= . Given That Profiles Are Coincident For two normal populations: Reject H03: C#' = 0 (profiles level) at level a if I (nl + n2)x'C'[CSCT Cx > c 2 (675) (673) where S is the sample covariance matrix based on all nl + n2 observations and c 2 = (nl + n2 . the profiles will be coincident only if the total heights iLl 1 + iL12 + ... so that the common profile is level. are the profiles level? That is.. Receqtly married males and females were asked to respond to the following questions. we have the following test. respectively. .. j=1.nl+nzP+l et = C#'2 (parallel profiles) at level a if T2 = (Xl . .1 .nl j = 1. xu. respectively. If the common profile is level. + iLlp = l' #'1 and IL21 + iL22 + . are all the means equal to the same constant? Equivalently: Is H03: iLl I = iL12 = . = iL2p acceptable? The null hypothesis in stage 1 can be written Test for Coincident Profiles... by x [ o 0 0 For independent samples of sizes nl and n2 from the two popu]ations.14 CA profile analysis of love and marriage data) As part of a larger study of love and marriage. xZ n2 are all observations from the same normal popUlation? The next step is to see whether all variables have the same mean.P + 1) Fpcl. the null hypothesis can be tested by constructing the transformed observations CXI.=1 £.2 provides a test for parallel profiles. . X2' nl (nl ) + n2) Xl _ + (nl nz_ X2 + n2) These have sample mean vectors CXI and CX2. = iLlp = JL21 = JL22 = . ..
125 .5 on page 327.207 .X2) = l' (Xl .8) = 8. this finding is not surprising.X2) = . using the 5point scale shown. X2 = X3 = X4 an 8point scale response to Question 2 a 5point scale response to Question 3 CSpOoJedC' = a 5point scale response to Question 4 = married men = married women and [ 1 = and the two populations be defined as Population 1 Population 2 1 1 0 0 1 1 .143 . All things considered.637 . how would you describe your outcomes from themarriage? SubjeGts were also asked to respond to the following questions. What is the level of companionate love that you feel for your partner? 6 4 None at all Very little Some A great deal Tremendous amount Key: X d  x . it is of interest to see whether the profiles of males and females are the same. we compute l ·606 .719 .058 .000 4.S Sample profiles for marriagelove responses.. (females) T2 = [.101 . even though the data.751 125] Xl = r.4)JF3.005 ktl [ .:n 4.167. .56(.161j . are clearly nonnormal.751 . To test for parallelism (HOl: CILl =CIL2).11(2. .533 Thus. with a= ._ L _ _ _L _ _ _L_ _ _L_ _+_ Let Xl Variable = an 8point scale response to Question 1 2 3 4 Figure 6.262 .262 .7.167] .066.751 .101 .oFemales I I 2 4 5 2 L .066 1.751 1.066 . A sample of nl = 30 males and n2 = 30 females gave the sample mean vectors .306 Moreover.067) = 1.005 < 8. To test H 02 : l'ILl = l' IL2 (profiles coincident). All things considered.367 Sum of elements in Spooled = I'Spooled1 = 4. which are integers.200J (k + = 15(.268 The population means are the average responses to the p = 4 questions for the populations of males and females.173 .810 . Since T2 = 1.. Assuming a common covariance matrix I.326 Chapter 6 Comparisons of Several Multivariate Means Profile Analysis 327 Sample mean response 'i (i 1.05.058 0 1 1 0 fj [ = . Assuming that the profiles are parallel.719 .066 . What is the level of passionate love that you feel for your partner? 4.125 . Since the sample sizes are reasonably large.268 1. c 2 = [(30+302)(41)/(30+30.700J (males) X2 = _ l 6. Given the plot in Figure 6.. we can test for coincident profiles.143 .000 4.200 and pooled covariance matrix SpooJed = The sample mean vectors are plotted as sample profiles in Figure 6.029 .7.029 . how would you describe your contributions to the marriage? 2.268 1.05) = 3.x Males 0. we shall use the normal theory methodology. we need Sum of elements in (Xl .125]1 [.268 ..5. 3.173 . we conclude that the hypothesis of parallel profiles for men and women is tenable.161 .633j 7.
328 Chapter 6 Comparisons of Several Multivariate Means Using (674).0 72. the PotthoffRoy model for quadratic growth becomes To illustrate the growth curve model introduced by Potthoff and Roy [21).2 73.8 66.6 67.8 68. the responses of men and women to the four questions posed appear to be the same.7 70.5 65.0 67.58(.1 64.29 2 year 86. it does not make sense to carry out this test for our example.3 82.0 86.9 78.3 59.2 where the time between heartbeats was measured under the 2 X 2 treatment combinations applied to each dog. When the p measurements on all subjects are taken at times tl> t2. (See [13). F1. In fact.5 gives readings after one year. When a study involves several treatment groups.367 )2 = .4 86. g.. When some subjects receive one treatment and others another treatment.1 71.8 57.6 54.1 55. In this context.2 79.4 72. since Que'stions 1 and i were measured on a scale of 18.501 < F1. For instance.0 76. That is.7 72.0 75. (b) A single treatment may be applied to each subject and a single characteristic observed over a period of time.) 11 12 13 14 15 Mean Source: Data courtesy of Everett Smith.2 69. and three years for the control group. however. this model does not require the four measurements to have equal variances. The model assumes that the same covariance matrix 1: holds for each subject. Under the quadratic growth model. X2. Readings obtained by photon absorptiometry from the same subject are correlated but those from different subjects should be independent. the general measures of comparison are analogous to those just discussed. .3 66.2 67. the treatment group.7 61. _ When the sample sizes are small. summarizes the growth which here is a loss of calcium over time.6 73.3 72.2 73. Unlike univariate approaches.7 60. .2 76. and T2 = .3 70.38 1 year 86. the term "repeated measures" refers to situations where the same characteristic is observed. [18).2 81. the mean vectors are .3 68.6 69.9 66.9 60.6 74.4 67.3 72.9 Repeated Measures Designs and Growth Curves As we said earlier. a profile analysis will depend on the normality assumption.0.1 75. Besides an initial reading. we cannot reject the hypothesis that the profiles are coincident.501 Subject 1 2 3 4 5 6 7 8 9 10 Initial 87. 6. at different times or locations. we refer to the curve as a growth curve. It is the curve traced by a typical dog that must be modeled. two years."" Xene be ne vectors of measurements on the ne subjects in group e.2 79. The treatments need to be compared when the responses on the same subject are correlated.0. X{2. the growth curves for the treatments need to be compared."" tp. (a) The observations on a subject may correspond to different treatments as in Example 6. The incompatibility of these scales makes the test for level profiles meaningless and illustrates the need for similar measurements in order to carry out a complete profIle analysis. X4).3 67.A profile. using methods discussed in Chapter 4.9 56.05. that received special help with diet and a regular exercise program.3 74.. an extra subscript is needed as in the oneway MANOVA model.8 68.0 82.5 76. we consider calcium measurements of the dominant ulna bone in older women.0 77.8 85..6 57.7 84. We could now test for level profiles.4 69. All of the X ej are independent and have the same covariance matrix 1:. with the original observations Xej or the contrast observations CXej' The analysis of profiles for several populations proceeds in much the same fashion as that for two populations. Table 6. constructed from the four sample means (Xl. we could measure the weight of a puppy at birth and then once a month.47 3 year 75.3 49. while Questions 3 and 4 were measured on a scale of 15. on the same subject. Table 6.9 65. Let X{1. X3. we obtain Repeated Measures Designs and Growth Curves 329 Table 6_S Calcium Measurements on the Dominant Ulna.8(.1 57.05) = 4.79 With er = . Control Group T2 = ( + .8 75. This assumption can be checked.6 gives the calcium measurements for a second set of women. for e = 1. Assumptions.4 60.5 53. Can the growth pattern be adequately represented by a polynomial in time? where the ith mean ILi is the quadratic expression evaluated at ti • Usually groups need to be compared.05) = 4.
9 53.9 79.0274 70. The k. Pe and Ph are independent.3 74.I)Sg) =N 1 _ gW and. .3 59. Treatment Repeated Measures Designs and Growth Curves 331 with N = Group Subject 1 2 3 4 5 6 7 8 9 10 .66 Cov(Pe) = . f3eo f3n and [Pr.8 65.p + q)(N .03(2 (2.g .q .3 69.9 48.4 55.8534 B= 1 tp tp Pe = so the estimated growth curves are Control group: 73. the error sum of squares and cross products matrix is just the within groups W that has N .7 58.2 67.0701 3. the error sum of squares and cross products Wq = e=1 L g j=l (X ej .7 51.. Also. so their covariance is O.¥) (N .6 Calcium Measurements on the Dominant Ulna. Thus there are (p . Under a qthorder polynomial. For large sample sizes.(B SpooledB) ne A 1 1 for f = 1.5 64.0240 1. then tl t2 t'{ t5.8368 [ 0.6 75.6444 [ 2.2.. by (679).5 and 6.6 74.7 74. q f3eq where Under the assumption of multivariate normality.2184] 3.9 65.0 64.85t2 (.27) 0.3 76.3 57. We can formally test that a qthorder polynomial is adequate.0 77.3 79.6.3 70.2. the null hypothesis that the polynomial is adequate is rejected if where l 1 tz B = f 1 1 tl t1] and [ f3eo ] Pe = ( N  .1387] 4.g + p .7 53.1051 (B'Sp601edBr1 = 1 Spooled = (N _ g) «nl .9 79.1 degrees of freedom. .3 72.2 75.7 71.g .1 70. the standard errors given below the parameter estimates were obtained by dividing the diagonal elements by ne and taking the square root.83) (.0 84.0 70.5 66.2 68.I)SI + .58) (. The model is fit without restrictions.1.IS (Fitting a quadratic growth curve to calcium loss) Refer to the data in Tables 6. Fit the model for quadratic growth.2 67.q+ g») In A * > xrpql)g(a) (682) (676) Example 6.5 66.3 76.5 68.53 estimated covariances of the maximum likelihood estimators are 11 12 13 14 15 16 Mean 85. pzJ (677) = 73. is the pooled estimator of the common covariance matrix l:. g (679) where k =IN .2 66.6 71.5699 3.0 57.g .6 80. there are q + 1 terms instead of the p means for each of the groups..q .3 67.9 77.7 77.7 74.0 65.3 50.28) . Under the polynomial growth model.4 57.2 60.0 74. L e=l g ne.6 64.7 56. Initial 83.2184 + 4.8 68. A computer calculation gives If a qthorder polynomial is fit to the growth data.g degrees of freedom.7 54.09t .14 (2.8368 9.0 76.BPe) (Xej .5 76.29 1 year 2 year 86.5 74.0 51.9 70. the maximum likelihood estimators of the Pe are (678) where Treatment group: 70.6 57.330 Chapter 6 Comparisons of Several Multivariate Means Table 6.80) 5.Bpe)' A A (680) has ng .18 3 year 81.0900 1.1 70.1744 5. + (ng .l)j(N .3 81..07 + 3. The likelihood ratio test of the null hypothesis that the qorder polynomial is adequate can be based on Wilks' lambda A* = IWI IWql (681) Source: Data courtesy of Everett Smith. .0 70.4 70.l)g fewer parameters.5 72.64t .p + q + 1).0240 (. for f # h.50) 93.
5.12J be the populat ion mean vector for the first group.996 2362. xt42l)2( .5 1 4. Example 6.l.0 7.2 + 2») In .0 2 5. The scatter plots suggest that there is fairly strong positive correlat ion between the two variables for each group.9 2 4.6 6. MA . growth.0 4.2 1 3.544 2335.749 2660.l.18(.2 6.01) .2l. Again.2 1 6.0 2 It is clear from the horizontal marginal dot diagram that there is conside rable overlap in the Xl values for the two groups.6 on page 334.589 2363.714 2098. /L22J be the populat ion mean vector for the second group..8 2 6.235 2381.160 2363.0 1 5.8 5.46 with III = 1 and 112 = 18 degrees of freedom .86 < We could. 3.5 1 7.16 (Comparing multivariate and univariate tests for the differences in means) Suppose we collect measure ments on two variables Xl and X 2 for ten randomly selected experimental units from each of two groups.6 4. Further. it is to the probability of making any incorrect decision. I .2l at any reasonable significance level (F1.912 23?7.l.. the group 1 measurements are generally to the southea st of the group 2 measurements.996 2314.961 2098. as the next example demonstrates. over Ime.7 2 7.12 = J. Similarly.009 2343.2 6. test for par dent calcium loss using profile analYSIS. We refer the sion for Its covanance matnx ecomes m reader to [14] for treated here.68 with III = 1 and 112 = 18 degrees of freedom.1I = J. A single multivariate test. with its associated. Let PI = [PlI.1 2 5. a univariate analysis of variance gives F = 2.749 2756. The hypothetical data are noted here and displayed as scatter plots and marginal dot diagrams in Figure 6.l. single pvalue.0 6.1 6.308 2343. The outcom e tells us whether or not it is worthwhile to look closer on a variable by variable and group by group analysis.7627 _ = 7. we cannot reject Ho: J.485 2369308 2335.0 1 3. d e .430 2331.al f such as equally correlated (b) Restricting the covariance matriX to a specl onn responses on the same individual. is preferab le to performing a large number of univariate tests. we cannot reject Ho: J.21 _ .5 6.9.228 2698.O17 X2 Group Since. univariate tests ignore importa nt informa tion ·and can give misleading results. This IS testing for the equality of two or more treatme nts as the exarnp es In .514 2301.6 2 5. . with a _( N _ (p  = .1 2 3.6 1 5. say.01).228 2331.160 2089.8 7. .91] indicate.8 5. Consequently.l.253 2381. b (6 78) and the expresNOVA Howeve r the fJ( are no onger gIven y oneway. Use nonline ar parametric models or even nonpara metnc sphnes.4 2 6.235 2303. without restr!cti ng to growth.3 4.544· 2277.0 6. owth curve model holds for more general designs than The Potthoff and Roy gr . on the same IndIVIdual.514 2369. th sis that the quadratic growth model IS Wilks' lambda for testIng e nu1I hypothe adequate becomes 2660. They include the There are many 0 following: (a) Dropping the restriction to. there o s not seem to be any substantial difference between the two .332 Chapter6 Comparisons of Several Multivar iate Means Perspect ives and a Strategy for Analyzing Multivar iate Models 333 Examination of the estimates and the standard errors reveals that the (2 terms are needed.3 6.9 2 4. the vertical margina l dot diagram shows there is considerable overlap in the X2 values for the two groups.22 at any reasonable significa nce level.10) = 3.687 2089.6 1 5.452 = . a univariate analysis of variance gives F = 2. Using the X2 observa tions.!___ _ 4.9 5. . Loss of calcium is predicte d after 3 years for both groups. and let /Lz = [J. J.9 6. and that. Using the Xl observations.01.3 1 ___?} ___________________________ _ ________________ _____________ . b' ore complic ated than (679). ' .0 4. q + g»)tn A * = (31  i (4 .7627 2698. l'781. .589 2832. A single multivariate test is recomm ended over.10 Perspectives and a Strategy for Analyzing Multivariate Models We emphasize that with several characteristics.l.8 8. . although there is some overlap.p univariate tests because. bl f (c) Observing more than one vana e. This results in a multivariate verSIOn of the growth curve model.
331 41.0 62.5 62.700 10.Rn° .342 4.666 4.0 56.14539J 0. • Example 6.513 5.665 6.35305 1 0.0 115.367 3.5 66.5 91.8 In(SVL) 3. Bonine.5 55.867 11.832 15.0 108.0 56.5 79. The data for the lizards from two genera.0 81.0 60.3 4. collected in 1997 and 1999 are given in Table 6.450 14.7.230 5.0 69.350 7.0 96.530 7.0 95. The multivariate test takes into account the positive correlation between the two measurements for each groupinforma2 tion that is unfortunately ignored by the univariate tests. After taking natural logarithms.334 Chapter 6 Comparisons of Several Multivariate Means Perspectives and a Strategy for Analyzing Multivariate 335 Table 6.0 63. T2 = 17.94 and we reject Ho: 111 = 112 at the 1% level.0 61.200 13.092 5. Cnemidophorus (C) and Sceloporus (S).368J K2 = [ 4.811 6. Figure 6.690 6. .0 94.. nl = 20 K 1 = [2. we find SVL = snoutvent length.0. he measured mass (in grams) and the snoutvent length (in millimeters).0 84. The univariate tests suggest there is no difference between the component means for the two groups.0 69.0 73.526 4.5 74.5 109.5 94.7 Lizard Data for Two Genera C Mass 7.04255 !".0 80.493 7.369 7.6 4.0 111. This T test is equivalent to the MANOVA test (642).4 4. Among other variables.240J 4.5 4.0 68.0 64.0 Mass 13.308 S2 = [ 0.0 62. after taking natural logarithms.P 2 0 .0 77.747 S SVL 77.5 91.5 72.7.911 5.0 60.plot of IS.0 64.525 22. S: nz = 40 2.0 4.0 fjgure 6.020 5.0 64.9 measurements for S lizards.0 82.5 75.610 18.831 9.1 4.032 5.6 Scatter plots and marginal dot diagrams for the data from two groups. and hence we cannot discredit 111 = 112' On the other hand.120 21.0 84.394 s = [0.5 91.5 52.0 90.900 9.216 9.7 Scatter plot of In(Mass) versus In(SVL) for the lizard data in Table 6. the snoutvent length is a more representative measure of length.02595 0.2 Qi0cY tit ° • 4.0 90.14539 In (Mass ): In(SVL): ILll .220 5.5 70.17('01) = 2. shown (Mass) versus snoutvent length (SVL). Because the tails sometimes break off in the wild.09417J 0.IL21: IL12 .0.995 3.476.183) .048 4.5 97.077 42.0 71.080 14.5 87.910 17.977 8.0 69.103 13. Notice that there are nl = 20 measurements for C lizards and n2 = 40 3 00 of.118 X 6.236 37.035 16.962 4.5 99.630 13.989 27.220) (0.520 13.902 SVL 74.5 70.838 6.787 S SVL 80.0 96.7 4. the summary statistics are Figure 6. 1 ?f o •• # 4.7.IL22: ( 0.0 69.0 75.09417 0.901 19.201 38.50684 0.5 67.29 > c 2 = (18)(2) F2.790 5.264 16.247 16. C '.11 = 12.0 64.0 79.980 16.0 106. '•• ° ° ° 800 ° S <e. Source: Data courtesy of Kevin E.011.419 13.5 Mass 14.11 (Data on lizards that require a bivariate test to establish a difference in means) A zoologist collected lizards in the southwestern United States.088 2. The large sample individual 95% confidence intervals for the difference m In(Mass) means and the difference in In(SVL) means both cover O. if we use Hotelling's T2 to test for the equality of the mean vectors.0 67.763 9.610 13.109 12.685 11.781 31.
Is this result consistent with the test of Ho: I) = 0 considered in Example 6. Note that the point I) = 0 falls outside the 95% contour.4 with a pvalue less than . while the last. We summarize the procedure developed in this chapter for comparing treatments. Remove sample 8. the multivariate tests signals a difference. For this example. More discussion is given in terms of the multivariate regression model in Chapter 7. the differences hold for only a few treatment combinations.08. In the context of random samples from several populations (recall the oneway MANOVA in Section 6. A Strategy for the Multivariate Comparison of Treatments 1. Clearly. We encountered exactly this SituatIOn with the efflllent data in Example 6. all comparisons are based on what is necessarily a limited number of cases studied by simulation. which is equivalent to Wilks' lambda test.0001. That is. with the current data. or from the null hypothesis.4 (also see Example 6.xe)(xcj . The data corresponding to sample 8 in Thble 6. LawleyHotelling trace = tr[BWI ] Pillai trace = tr[B(B + W)IJ = Roy's largest root maximum eigenvalue of W (B + W)I All four of these tests appear to be nearly equivalent for extremely large samples. 3.16 and 6. Roy's test. T2 = 225.7. the T2statistic has an approximate distribution.4). Perform a multivariate test of hypothesis. Are the results consistent with a test of Ho: I) = O? Discuss. Using ReSUlt 6. The simultaneous confidence statements determined from the shadows of the confidence ellipse are. a bivariate analysis strongly supports a difference in size between the two groups of lizards.336 Chapter 6 Comparisons of Several Multivariate Means Exercises 337 The corresponding univariate Student's ttest statistics for test.x)' e=1 Throughout this chapter. we have used Wilks'lambdastatisticA* = which is equivalent to the likelihood ratio test. consistent with the scatter diagram in Figure 6.1. If no differences are significant. Using the information in Example 6. . If the multivariate test reveals a difference. we cannot detect a in mass means or a difference in snoutvent length means for the two genera of lizards. From the simulations reported to date the first three tests have similar power. Exercises Construct and sketch a joint 95% confidence region for the mean difference vector I) using the effluent data and results in Example 6.1.1. The oneatatime intervals may be suggestive of differences that We must issue one caution concerning the proposed strategy. at the same time. then proceed to calculate the Bonferroni confidence intervals for all pairs of groups or treatments. 6. further. these few active differences may become lost among all the inactive ones. However.17 demonstrate the efficacy of test relative to its univariate counterparts. A multivariate method is essential in this case.1 seem unusually large. Be aware of any outliers so calculations can be performed with and without them. the power is large. All four statistics apply in the twoway setting and in even more complicated MANOVA. too large.1? Explain. Construct a joint 95% confidence region for the mean difference vector I) and the 95% Bonferroni simultaneous intervals for the components of the mean difference vector. Three other multivariate test statistics are regularly included in the output of statistical packages. behaves dif power is best only when there is a single nonzero eigenvalue and. The best preventative is a good experimental design. Our choice is the likelihood ratio test.2. • Examples 6.5). When.x)(xe . This may approximate situations where a large difference exists in just one characteristic and it is between one group and all of the others. we suggest trying transformations on the original data when the residuals are nonnormal.ing for no difference in the individual means have pvalues of . do we probe deeper. and all characteristics. For moderate sample sizes. respectlvely. typically. cannot be taken as conclusive evidence for the existence of differences.1. 2. e=1 ± j=! (xej . Then. try looking at Bonferroni intervals for the larger set of responses that includes the differences and sums of pairs of responses. Does the "outlier" make a difference in the analysis of these data? 6.xe)' and B = ± ne(xe . and only when. Check the data group by group for outliers. Calculate the Bonferroni simultaneous confidence intervals. To design an effective experiment when one specific variable is expected to produce differences. Also check the collection of residual vectors from any fitted model for outliers. We recommend calculatmg the Bonferonni intervals for all pairs of groups and all characteristics. There is also some suggestion that Pillai's trace is slightly more robust against nonnormality. Try to identify outliers.3. multivariate tests are based on the matrices W = merit further study but. It may be the case that differences would appear in only one of the many characteristics and. 6. However. The first step is to check the data for outliers using visual displays and other calculations. the overall test may not show significance whereas a univariate test restricted to the specific active variable would detect the difference. construct the 95% Bonferroni simultaneous intervals for the components of the mean difference vector I). Compare the lengths of these intervals with those of the simultaneous intervals constructed in the example.46 and . from a univariate perspective. do not include too many other variables that are not expected to show differences among the treatments.
Set a = . Assume that the profiles given by the two mean vectors are parallel. .0 55. . Consider the univariate oneway decomposition of the observation xc' given by (634).1 I.t22. as in (639).1) SI + (n2 .tliZ + J. Construct the corresponding arrays for each variable. Use the data for treatments 2 and 3 in Exercise 6. Also. and 6.tzid .8. verify the relationships d· = Cx·.01.0 80.8. Show that this likelihood is maximized by the choices ill = XI.J. .tI1.4· = [101.3 to test for treatment effects. = 0 1 1 (a) All three indices are evaluated for each patient.X)U2 + .. (a) Redo the analysis in Example 6. Set a = . . 6.[(nl . .J.t2i)(J.4 = 1 0 }n. (b) Construct the 95% Bonferroni simultaneous intervals for the components of the mean vector B of transformed variables. [See (618)..t3i.J.t li + J. determine the linear combination of mean components most responsible for the rejection of Ho. L(ILI. (c) Construct 99% simultaneous confidence intervals for the differences J. and residual components. (Test for . and use Table 6.] Compare the conclusions. we reject Ho: C (1'1 + 1'2) = 0 at level a if (a) Break up the observations into mean.) (b) Using the information in Part a.) Let ILl [J.0 71.1) S2] = (nl + n2 nl n2 nl + n2 Hint: Use (416) and the maximization Result 4.6. Treatment 2: Treatment 3: [!J DJ DJ DJ UJ GJ (a) ShowthatthehypofuesisthattheprofilesarelinearcanbewrittenasHo:(J.. Sd = CSC' in (614).l.2 55.J. Test for the equality of mean indices using (616) with a = .6 71.tIZ. + (Xg .2) X P matrix 2 1 2 0 o o 1 o 0 0J 000 (b) Following an argument similar to the one leading to (673). 6. assuming that 11 = 1 2. Using the summary statistics for the electricitydemand data given in Example 6.1.9. ) ) 6.3 63.0] 63. i = 1.IL3 = 0 employing a twosample approach with a = .12. 6.Ol.] 6. 1 2) Spooled are Treatmentl: linear prOfiles. Observations on two responses are collected for three treatments. where the (p .tZi .tlp] and 1'2 = [J. il2 = X2 and I = +.4. (c) Discuss any possible violation of the assumption of a bivariate normal distribution for the difference vectors of transformed observations.t2 = 0.J. d = Cx.10..X)UI + (xz .tliI + J..··· . (b) Judge the differences in pairs of mean indices using 95% simultaneous confidence intervals. A likelihood argument provides additional support for pooling the two independent sample covariance matrices to estimate a common covariance matrix in the case of two normal populations.tZI. 6.2. (a) Calculate Spooled' (b) Test Ho: ILz ..S. respectively. 6.1 after transforming the pairs of observations to In(BOD) and In (SS).tliI + J. [See (643).9.·· . Give the likelihood function. Using the contrast matrix C in (613). A *.tzid = (J. treatment.Dg x= 46. The values of these indices for n = 40 heartattack patients arriving at a hospital emergency room produced the summary statistics . for two independent samples of sizes nl and n2 from Np(ILI' I) and N p(IL2' I) populations. compute T Z and test the hypothesis Ho: J.3 and S [ 50. A researcher considered three indices measuring the severity of heart attacks. i = 3.6 97. Repeat the test using the chisquare approximation with Bartlett's correction.05.(J. T = (XI + X2)'C[ where Z + + X2) > c Z .J.4.1] 57.10.tZiZ).tl .tz p ] be the mean responses to p treatments for populations 1 and 2. (See Example 6. (c) Evaluate Wilks' lambda. Show that the mean vector x 1 is always perpendicular to the effect vector (XI .. construct the oneway MAN OVA table. The observation vectors 0 }n.05. P or as Ho: C(I'I + 1'2) =0.x)u g where 1 1 0 UI = 0 0 0 }n. given that the profiles are parallel. Refer to Example 6. IL2' I). respectively.°2 0 0 1 0 0 0 .338 Chapter 6 Comparisons of Several Multivariate Means Exercises JJ9 6.
this decomposition will result in several 3 X 4 matrices. SSPre .05 level.X.1). is the average for the lth level of factor 1.k + x) where x is the overall average. 6.64 . Consequently. g . SSPfac2 .26 .14.x) + (X'k . Repeat these calculations for factor 2 effect parameters. Here x is the overall average. 6.xe. will be positive definite (with probability 1). A replicate of the experiment in Exercise 6.k + x) similar to the arrays in Example 6.17 . b .161 .13 and carry out the necessary calculations to complete the general twoway MANOVA table. and compute the sums of squares SStot = SSme. Compare these results with the results for the exact Ftests given in the example.0]. is the average for the lth level of factor 1. (c) Summarize the calculations in Part b in a MANOVA table.n and sums of cross products SCPtot = SCPmean + SCPt• cl + SCPf•c2 + SSfac I + SSfac2 + SSre.13 yields the following data: Factor 2 Level 3 Level 4 Level 1 Factor 1 Level 2 Level 3 Level 1 Level 2 Level 3 Level 4 [ [=:] g f=1 [:] J b [1:J [ DJ [ [ [ [ With no replications.1].14 . xc. For each response. and SSPre. (a) Decompose the observations for each of the two variables as Xek = X + (xc. with n = 1.9.13. (d) If main effects. Set a =.5. 6. . and SpooJed = Test for linear profiles.1). Refer to Example 6.340 Chapter 6 Comparisons of Several Multivariate Means Let nl Exercises 341 = 30. in the general twoway MANOVA table is a zero matrix with zero degrees of freedom. The matrix of interaction sum of squares and cross products now becomes the residual sum of squares and cross products matrix.03 . and X'k is the average for the kth level of factor 2.26 .1)(b .4. + SCPre. construct simultaneous 95% confidence intervals for differences in the factor 1 effect parameters for pairs of the three responses. xi = [6.16 . test for factor 1 and factor 2 main effects.03 .1 s.13.07 . (d) Given the summary in Part c. exist. (b) Combine the preceding data with the data in Exercise 6. Use a l ·61 . Use the likelihood ratio test with a = . (b) Using (670).1)(b . and (g .81 .05. Note that. respectively.9. and if the interactions do not exist. test for factor 1 and factor 2 main effects at the a = . . .x) + (XCk .1. Hint: Use the results in (667) and (669) with gb(n .1. n2 = 30. displayed in the form of the following twoway table (note that there is a single observation vector at each combination of factor levels): Factor 2 Level 1 Level 1 Factor 1 Level 2 Level 3 Level 2 Hint: This MANOVA table is consistent with the twoway MANOVA table for comparing factors and their interactions where n = 1. where the eek are independent Np(O.4.1) so that SSPre . the twoway MANOVA model is 2: 'rf = 2: Ih = k=1 (a) Use these data to decompose each of the two measurements in the observation vector as 0 xek = x + (xe. Note: The tests require that p :5 (g .) Consider the observations on two responses.i2 = [4.6. examine the of the main effects by constructing Bonferroni simultaneous 95% confidence intervals for differences of the components of the factor effect parameters. but no interactions. test for interactions.07 .3. Explain any differences. (c) Given the results in Part b.7.1) replaced by (g . 7. XI and X2. .x.1. assuming that the profiles are parallel.k . Form the corresponding arrays for each of the two responses.05.17 . (b) Regard the rows of the matrices in Part a as strung out in a single "long" vector. xe. (a) Carry out approximate chisquare (likelihood ratio) tests for the factor 1 and factor 2 effects.x) + (X.31 = . and X'k is the average for the kth level of factor 2. with degrees of freedom gb . Interpret these intervals.1) (b .x) + (Xfk .5.14 .xe· .8.3.05.!) random vectors. (Twoway MANOVA without replications. obtain the matrices SSPcop SSPf•cl .3.
Cost data on X I == fuel.0 731.5 954.5 926.5 1289.0 901.0 657. and half of the subjects received the blocks of trials in the reverse order. cognition.0 711.5 1140.0 674. The subjects were asked to respond "same" if the two numbers had the same numerical parity (both even or both odd) and "different" if the two numbers had a different parity (one even.0 751.5 1311.0 659.0 683.0 915. are repeated in sense they were made one after another.0 625. 983. Note in particular that observations 9 and 21 for gasoline trucks have been identified as multivariate outIiers. (a) Test for equality of the two population mean vectors using a = .5 Source: Data courtesy of J.0 797.0 657.0 839.0 751. (c) Construct 99% simultaneous confidence intervals for the pairs of mean components. find the linear combination of mean components most responsible for the rejection.0 941. find the linear combination of mean components most responsible for rejecting Ho.0 909.0 958.342 Chapter 6 Comparisons of Several Multivariate Means The following exercises may require the use of a computer. (b) If the hypothesis of equal cost vectors is rejected in Part a.1.5 789.0 1044.0 (a) Test for treatment effects using a repeated measures design. the order of "same" and "different" parity trials was randomized for each subject.0 691.0 612.5 814. 6.0 553. (b) Construct 95% (simultaneous) confidence intervals for the contrasts representing the number format effect.0 843.0 863.0 909.0 887. Set a = . Four measures of the response stiffness on .0 925.8 were collected to test two psychological models of numerical .14).5 833.0 663.8 Number Parity Data (Median Times in Milliseconds) ArabicSame ArabicDiff WordSame WordDiff (Xl) 869.16. Assuming that the measures of stiffness anse from four treatments test for the equality of treatments in a repeated measures design context.22 and [2].10 on page 345 for nl = 36 gasoline and n2 = 23 diesel trucks.0 1201.) Repeat Part a with these observations deleted. (c) Find simultaneous confidence intervals for the component mean differences.0 931.0 662.0 728.0 986.5 896.5 796.7.0 1059. (See Exercise 5. In the first phase of a study of the cost of transporting milk from fanns to dairy plants.0 930. Construct a 95% (simultaneous) confidence interval for a in the mean levels representing a comparison of the dynamic measurements WIth the static measurements.0 872. Set a = .0 867.0 766. construct three difference scores corresponding to the number format contrast.5 1076.0 1011.3 (see ' Example 4. (a) Test for differences in the mean cost vectors.0 754.0 1126.5 601.0 583. The data in Table 6. .0 746.5 1028. one odd).0 1004.0 1172.0 724. if any.5 624.5 777.0 833.0 856. Table 6. (c) The absence of interaction supports the M model of numerical cognition.0 854.0 662.0 813.5 673. Interpret the resulting intervals.0 1114. Exercises 343 quick numerical judgments about two numbers presented as either two number words ("two.0 708. and X3 = capital.0 904.19.0 (X2) (X3) (X4) 860. on a given board.0 774.0 671.0 572.0 767.0 894.18.5 ' 659.0 761.5 1130. Set a = .01. Does the processfng oLnumbers on the the numbers presented (words.5 875. appear to be quite different? (d) Comment on the validity of the assumptions used in your analysis. the parity type contrast. Compare with the Bonferroni intervals.0 735.05.05.5 945.5 811. the median reaction times for correct responses were recorded for each subject. all measured on a permile basis.0 831.0 669. Which costs.05.0 735.0 640.0 678.0 982. a survey was taken of finns engaged in milk transportation.0 888.5 1408.0 1009. while the presence of interaction supports the C and C model of numerical cognition.5 1083.0 611.0 995. followed by a block of number word trials.0 826.0 657. the parity type effect and the interaction effect.0 1081. Half of the subjects were assigned a block of Arabic digit trials.0 735. and the interaction contrast.0 1225. Hint: You may wish to consider logarithmic transformations of the observations. 6. 10licoeur and Mosimann [12] studied the relationship of size and shape for painted turtles. For each of the four combinations of parity and format.0 747. Here ' Xl = median reaction time for word formatdifferent parity combination X z = median reaction time for word formatsame parity combination X3 == median reaction time for Arabic formatdifferent parity combination X 4 = median reaction time for Arabic formatsame parity combination 6.0 918. Carr.0 752.0 735. Comment on the results.0 925.0 865.0 1046.0 737.0 905.5 539.0 975." "four") or two single Arabic digits ("2.0 1056.0 1037.each of 30 boards are listed in Table 4.0 656.0 752.0 726.0 1179. Is a multivariate normal distribution a reasonable population model for these data? Explain. Which model is supported in this experiment? (d) For each subject. 6.0 785.5 919.0 687.0 639.5 1096.5 750. are presented in Table 6.9 contains their measurements on the carapaces of 24 female and 24 male turtles. Arabic digits)? Thirtytwo subjects were requued to make a senes of Table 6. Within each block. The measures.0 744. X 2 = repair. (b) If the hypothesis in Part a is rejected.0 700." "4").0 572.
88 12.67 6.39 6.21 15.47 11.14 12.05 2.88 12.22 12.32 8.32 29.24 13.20.18 4.52 13.13 10.23 6. (You may want to eliminate any out/iers found in Part a for the male hookbilled kite data before conducting this test.77 11.50 7.92 4. observation 31 with Xl = 284.70 10.75 7.23 3.06 9.43 10.65 21.88 10. Similar measurements for female hookbilled kites were given in Table 5.26 6. and (visually) check for outliers.43 2.00 19.86 X3 11.22 X3 9.28 10.32 8. samples of 20 Aa (middlehigh quality) corporate bonds and 20 Baa (topmedium quality) corporate bonds were selected.98 9. Using Moody's bond ratings.51 26.37 10.15 9.18 12.95 11.14 6.24 11.44 8.70 7.73 14.ILz = 0 is rejected.17 13. Does it make any difference in this case how observation 31 for the male hookbilled kite data is treated?) (c) Determine the 95% confidence region for ILl .94 9.70 9.72 8.22 13.23 8.92 9.78 5. Set a = .78 10.70 1.61 14.09 7.45 16.72 4.25 13.09 Width Width Height (X2) 81 84 86 86 88 92 95 99 102 102 100 102 98 99 105 108 107 107 115 117 115 118 124 132 (X3) 38 38 42 42 44 50 46 51 51 51 48 49 51 51 53 57 55 56 63 60 62 63 61 67 (X2) 74 78 80 84 85 81 83 83 82 89 88 86 90 90 91 93 89 93 95 93 96 95 95 106 (X3) 37 35 35 39 38 37 39 39 38 40 40 40 43 41 41 41 40 44 42 45 45 45 46 47 98 103 103 105 109 123 123 133 133 133 134 136 138 138 141 147 149 153 155 155 158 159 162 177 93 94 96 101 102 103 104 106 107 112 113 114 116 117 117 119 120 120 121 125 127 128 131 135 6.26 2.86 9. 6.18 17.23 5.59 6.68 12. The tail lengths in millimeters (xll and wing lengths in rniIlimeters (X2) for 45 male hookbilled kites are given in Table 6.38 19.20 14.20 23.10 Milk TransportationCost Data Gasoline trucks Xl Diesel trucks Male Height Length (Xl) X2 12.00 4.IL2' (d) Are male or female birds generally larger? 16.9 Carapace Measurements (in Millimeters) for Painted Thrtles Female Length (Xl)  Table 6.IL2 and 95% simultaneous confidence intervals for the components of ILl .72 4.87 7.59 6. KeatoD.25 11.95 2.19 9.63 2.83 5.45 3.60 9.15 14.16 4.13 9. For each of the corresponding companies.90 11.66 17.34 8.66 28.54 10.21. the ratios Xl = current ratio (a measure of shortterm liquidity) X 2 = longterm interest rate (a measure of interest coverage) X3 = debttoequity ratio (a measure of financial risk or leverage) X 4 = rate of return on equity (a measure of profitability) .93 14.42 10.98 14.16 12.11 12.11 17.26 5.94 4.15 11.63 5.53 8.90 10.80 3.72 9.27 15.09 14.59 14.70 8.79 9.77 22.49 8.89 9.90 5.42 9.60 6.84 35.18 8.11 12.59 8.50 13. If Ho: ILl .91 8.68 7.77 17.09 12.28 10.89 7. (Note.344 Chapter 6 Comparisons of Several Multivariate Means Exercises 345 Table 6.18 17.16 7.35 5.06 17. in particular.05.00 14.70 12.22 9.99 29.24 10. you may want to interpret XJ = 284 for observation 31 as it misprint and conduct the test with XI = 184 for this observation.47 19.53 13.12.85 11.05 5.44 21.02 17.68 20. Alternatively.66 10.75 13.49 11. find the linear combination most responsible for the rejection of Ho.67 9.28 11.44 7.78 10.22 12. (a) Plot the male hookbilled kite data as a scatter diagram.32 14.44 Xl X2 12.69 16.18 9.) (b) Test for equality of mean vectors for the populations of male and female hookbilled kites.35 9.17 10.49 11.58 17.13 3.03 Source: Data courtesy of M.29 15.51 9.13 11.01 16.17 12.09 8.25 10.49 17.61 9.86 11.61 5.23 11.95 16.00 20.11 on page 346.16 12.78 5.17 7.07 6.94 5.
589 .854 Baa bond companies: n2 = 20. (You may want to consider transformations of the data to make them more closely conform to the usual MANOVA assumptions. Researchers interested in assessing pulmonary function in nonpathological populations asked subjects to run on a treadmill until exhaustion.083 . The data are shown in Thble 6. Four measurements were made of male Egyptian skulls for three different time periods: period 1 is 4000 B. and period 3 is 1850 B. were recorded. (628) and (629».949 .1'2 = 0 in Part b.24. Comment on the possible implications of this infonnation.719 1 .026 . Construct 95% simultaneous confidence intervals to detennine which mean components differ among the populations. Compare with the corresponding Bonferroni intervals. 944 .12 on page 348.388 .089 16.c. Construct a oneway MANOVA of the crudeoil data listed in Table 11.701 Spooled = 481 .012 9.1 1 Male HookBilled Kite Data Xl Xl Xl X2 Xl x2 (Tail length) (Wing length) 278 277 308 290 273 284 267 281 287 271 302 254 297 281 284 (Tail length) 185 195 183 202 177 177 170 186 177 178 192 204 191 178 177 (Wing length) 282 285 276 308 254 268 260 274 272 266 281 276 290 265 275 (Tail length) 284 176 185 191 177 197 199 190 180 189 194 186 191 187 186 (Wing length) 277 281 287 295 267 310 299 273 278 280 290 287 286 288 275 ISO 186 206 184 177 177 176 200 191 193 212 181 195 187 190 6. find the linear combination most responsible.155.22. If you reject Ho: 1'1 .089 . Are the usual MANOVA assumptions realistic for these data? Explain. Comment on the validity of the assumption that I. Researchers have suggested that a change in skull size over time is evidence of the interbreeding of a resident population with immigrant populations.432 .719 19.2441 .481 9.2 = I. The measured variables are XI .354 .400 .044 S2 .05. (d) Bond rating companies are interested in a company's ability to satisfy its outstanding debt obligations as they mature.027 . Use a = . (c) The data in Thble 6.C. Construct a oneway MANOVA using the width measurements from the iris data in Thble 11.. . Does it appear as if one or more of the foregoing financial ratios might be useful in helping to classify a bond as "high" or "medium" quality? Explain. The summary statistics are as follows: Aa bond companies: nl = 20. Construct a oneway MANOVA of the Egyptian data.3' 6.4.002 .05.094 61.23.404.com/statistics).) 0.254 27. mean Baa bonds? Using the pooled covanance matnx.7.094 [ .004 34. _ .244 .494 .083 21. and (a) Look for gender differences by testing for equality of group means.102 6.25.102 [ .JL2i.l = I.254 . and thus they do not represent a random sample. 6.026 .2.. Set a = .589 .. Use a = . period 2 is 3300 B.024 .346 Chapter 6 Comparisons of Several Multivariate Means Exercises 347 (c) Calculate the linear combinations of mean components most responsible for rejecting Ho: 1'1 .12 were collected from graduatestudent volunteers.400 19. test for the equa Ity 0 vectors.830J.0041 .494 .267 SI = . (e) Repeat part (b) assuming normal populations with unequal covariance matices (see (627). Construct 95 %' simultaneous confidence intervals to determine which mean components differ among the populations represented by the three time periods.044 .600..287.. Does your conclusion change? Table 6. (b) Construct the 95% simultaneous confidence intervals for each JLli .002 .030 . 14. f th e with (b) Are the financial characteristics of with bonds different rof.459 .1'2 = 0.3.388 X 4 = nasalheightofskujl(mm) (a) Does pooling appear reasonable here? Comment on the pooling procedure in this case.c.267 . Construct 95% simultaneous confidence intervals for differences in mean components for the two responses for each pair of populations. 6.12. x.5.347.7 on page 662.012 . 12.524. xi = [2. = [2. The results on 4 measures of oxygen consumption for 25 males and 25 females are given in Table 6.05. .840J.465 . Samples of air were collected at definite intervals and the gas contents analyzed. The variables were XI = resting volume 0 1 (L/min) X 2 = resting volume O 2 (mL/kg/min) X3 = maximum volume O 2 (L/min) X 4 = maximum volume O 2 (mL/kg/min) Source: Data courtesy of S. Temple.prenhall.854 _ = maximum breadth of skull (mm) Xl = basibregmatic height of skull (mm) X3 = basialveolar length of skull (mm) and [. i = 1.13 on page 349 (see the skull data on the website www. .
c:1 :g 0000000000000000000000000 6. for both the test group and the control group with baseline consumptIOn measured on a similar day before the experimental began.10g(baseJine consumption) 348 . Jackson. would to an electncal tImeofuse pricing scheme. Wisconsin.26.Exercises 349 Table 6.13 Egyptian Skull Data MaxBreath (xd BasHeight (X2) BasLength (X3) NasHeight (X4) Tlffie Period 131 125 131 119 136 138 139 125 131 134 124 133 138 148 126 135 132 133 131 133 : 138 131 132 132 143 137 130 136 134 134 138 134 134 129 124 136 145 130 134 125 : 89 92 99 96 100 89 108 93 102 99 101 97 98 104 95 98 100 102 96 94 91 100 94 99 91 95 101 96 100 91 49 48 50 54 56 48 48 51 51 48 48 45 51 45 52 54 48 50 46 52 50 51 45 49 52 54 49 55 46 44 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 0000000000000000000000000 132 133 138 130 136 134 136 133 138 138 130 131 137 127 133 123 137 131 133 133 Source: Data courtesy of 1. Hourly consumptIon (m kIlowatthours) was measured on a hot summer day m and compared. The responses. The cost of electricity during peak penods for some customers eight times the cost of electricity during hours. A project was to investigate how consumers in Green Bay. log( current consumption) .
257] .27.722 .228 .) 6. x3 nz = 58. (a) Plot the mean vectors for husbands and wives as sample profiles.28..test for level profiles with a = . for the first 24 subjects in Table 1. investiga te equality between the dominan t and nondominant bones.239 .14.233 . 1 p. 6.prenhall.i\ = [. Compar e the data from both tables to determm e whether there has been bone loss.153. What is the level of passionate love that your partner feels for you? 3. 1 year after particIpa tion in an experim ental program .8. if any? Commen t. If the hypothes es of equal mean vectors is rejected. Hatfield. What is the level of companionate love that you feel for your partner? 4. where XI = a Spointscale response to Question 1. torrens? '!est for the equality of the two population mean vectors using a = .30. . What is the level of passionate love you feel for your partner? 2. carteri and L. Biological differenc es such as sex ratios of emerging flies and biting habits were found to exist.and 3 the following summary statistics: P. University of Wisconsin. .comlstatistics indicate any difference in the two species L.151.199 .232] . Using the data on bone mineral content in Table 1.233 . What is the level of companionate love that your partner feels for you? The responses were recorded on the following Spoint scale.32 2. and compare these with the mtervals m Part b.14.OS.29.the bone mineral contents .479 x2 X3 X4 Spooled Source: Data courtesy of Statistical Laboratory. (Use a significance level of a = .OS. None at all I Very little I Some I 3 A great deal I 4 Tremendous amount 5 Thirty husbands and 30 wives gave the response s in Table 6.592 . that for many years they were thought to be the same. (a peak hour). (b) Is the husband rating wife profile parallel to the wife rating husband profile? Test for parallel profiles with a = .1S on page 3S2 and on the website www.14 Spouse Data Husban d rating wife Wife rating husband X4 XI Xl Test group: Control group: and nl = 28.OS. ii = [.OS.OS. As part of the study of love and marriage in Example 6. (b) Constru ct 9S% simultan eous confiden ce intervals for the mean differenc es. and compare these with the mtervals In Part b. X3 = a Spointscale response to Question 3.OS for any statistical tests. 6. 2 5 4 4 3 3 3 4 4 4 4 5 4 4 4 3 4 5 5 4 4 4 3 5 5 3 4 3 4 4 3 5 5 3 3 3 4 4 5 4 4 5 4 3 4 3 5 5 5 4 4 4 4 3 5 3 4 3 4 4 5 4 5 4 5 4 4 5 5 3 5 4 4 5 5 4 4 5 4 4 4 4 5 5 3 4 4 5 3 5 5 4 5 4 5 5 4 5 5 3 5 ·4 4 5 5 5 4 5 4 4 4 4 5 5 3 4 4 5 3 5 4 4 4 4 4 3 4 3 4 3 4 5 4 4 4 3 5 4 3 5 5 4 2 3 4 4 4 3 4 4 4 5 4 5 4 3 3 4 4 4 5 5 4 4 4 4 5 5 4 3 3 5 5 4 3 4 4 4 4 4 5 5 5 5 5 4 5 5 5 4 5 5 5 4 5 4 5 4 4 4 4 4 5 5 5 4 5 4 5 5 5 5 5 5 5 4 4 5 4 4 5 5 5 4 5 4 5 4 4 4 4 4 5 5 5 4 5 4 4 5 S()urce: Data courtesy of E.M. Justify your use of normalt heory methods for these data. (b) Construc t 9S% simultan eous confiden ce intervals for the mean differenc es.M.350 Chapter 6 Comparisons of Several Multivariate Means for the hours ending 9 A. Do the taxonom ic data listed in part in Table 6. Table 6.256.8. (a) Test using a = . if the profiles are coincident. Exercises (a peak: hour) produced 351 Table 6.199 . If the profiles appear to be parallel.339] Xz . 1\vo species of biting flies (genus Leptoconops) are so similar morphol ogically. Finally. and X 4 == a Spoints cale response to Question 4. Perform a profile analysis.804 355 = [ 228 . X = a Spoints cale response to Questio 2 n 2. test for coincident profiles at the same level of significance. 231.239 . (a) Test using a = .180. Does timeofuse pricing seem to make a differenc e in electrical consumption? What is the nature of this difference. What conclusi on(s) can be drawn from this analysis? 6.232 355 . (c) the Bonferro ni 9S% simultan eous intervals .16 on page 3S3 .M. . (c) the Bonferr oni 9S% simultan eous intervals .M.. a sample of husband s and wives were asked to respond to these questions: 1.ll A. determin e the mean compone nts (or linear combina tions of mean compone nts) most responsible for rejecting Ho.
352 1.643 1. Use a = .106 1.734 .875 . (a) Perform a twofactor MANQVA using the data in Table 6.693 .789 . (b) Analyze the residuals from Part a.761 .656 .159 1.825 .756 1.515 . crop scientists routinely compare varieties with respect to several variables.860 1.051 .809 1.540 .698 .842 1.857 . a variety effect.041 1.846 1.463 .718 1.698 .811 . 46 47 47 43 50 47 13 14 15 17 14 12 16 17 14 11 15 14 15 14 13 15 14 15 12 14 11 14 14 12 15 15 14 18 15 16 14 17 16 14 15 14 15 14 16 14 9 13 8 9 10 9 9 9 9 9 : 8 13 9 9 10 9 9 9 9 10 : 106 105 103 100 109 104 95 104 90 104 86 94 103 82 103 101 103 100 99 100 L.980 1.826 .634 .906 .537 .869 .785 .802 .776 2.640 41 38 44 43 43 44 42 43 41 38 47 46 44 41 44 45 40 44 40 46 19 40 48 41 43 43 45 43 41 44 42 45 44 43.506 .420 1.673 . torrens : palp length 31 32 36 32 35 36 36 36 36 35 38 34 34 35 36 36 35 34 37 37 37 38 39 35 42 40 44 40 42 43 38 41 35 38 36 38 40 37 40 39 (Thl'd) (FO_) palp width palp length 25 22 27· 28 26 24 26 26 23 24 : X4 Xs X6 X7 Subject number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ( Longtb of ) ( Length of antennal antennal segment 12 segment 13 Dominant radius 1.726 .931 1.411 Dominant ulna . There were two replications of the experiment.31.352 Chapter 6 Comparisons of Several Multivariate Means Table 6.027 .743 Dominant humerus 2.764 .228 1.923 . Three varieties (5. and 8) were grown at two geographical locations (1.886 . of 100 seeds) Source: Data courtesy of William Atchley.912 .949 .932 .991 .804 .787 . the three variables representing yield and the two important gradegrain characteristics were measured. can we conclude that the location and/or variety effects are additive? If not.823 .765 .753 .577 .17. The three variables are Xl X z X3 99 110 99 103 95 101 103 99 105 99 33 36 31 32 31 37 32 23 33 34 = Yield (plot weight) = Sound mature kernels (weight in gramsmaximum of 250 grams) = Seed size (weight.727 .756 .769 .627 .900 .869 .730 .637 .815 1.740 . The data for one twofactor experiment are given in Table 6.883 .779 .579 1.526 . (c) Using the results in Part a. 6.661 1.670 .470 1.654 .689 .933 1.692 .941 1.715 .573 2.668 1.443 1.947 .442 Humerus 2.640 .710 1.17 on page 354. .190 1.651 1.875 .923 2.05.602 .687 .851 .773 .999 1.580 .6.585 Ulna .265 1.851 1.879 .964 .813 .905 .782 . Peanuts are an important crop in parts of the southern United States.619 .925 . carteri : 26 31 23 24 27 30 23 29 22 30 25 31 33 25 32 25 29 31 31 34 : 10 10 10 10 11 10 9 9 9 10 9 6 10 9 9 9 11 11 10 10 9 9 10 10 8 11 11 11 12 7 10 11 10 10 10 10 10 10 10 10 9 7 10 8 9 9 11 10 10 10 9 10 10 10 8 11 11 10 11 7 Source: Data courtesy of Everett Smith.16 Mineral Content in Bones (After 1 Year) Xl X2 X3 Exercises 353 (Wing) (Wing) length width 85 87 94 92 96 91 90 92 91 87 L.2) and. Test for a location effect. In an effort to develop improved plants. and a locationvariety interaction.953 1. but not for others? Check by running three separate univariate twofactor ANQVAs.873 .570 .729 .765 .498 . does the interaction effect show up for some variables.826 .844 .708 .770 .765 .686 1.656 .865 .997 1.746 .742 1.268 1.242 2.977 .330 2.843 .776 Radius 1.246 1.609 2. in grams.747 1.914 .164 1.551 .880 .130 1.396 1. Do the usual MANQVA assumptions appear to be satisfied? Discuss.817 .378 1.738 . in this case.806 .
(b) Construct a twoway ANOVA for the 560CM observations and another ANOVA for the nOCM observations.0 67. and Julian day 320 [3]) during the growing season.37 25.33 8.87 22.4 54. Are these results consistent MANOVA results in Part a? If not.3 194.89 36.62 15.22 17. can you explain any differences? JL JL JL JL JL JL JL JL JL JL JL JL LP LP LP LP LP LP LP LP LP LP LP LP Source: Data courtesy of Mairtin Mac Siurtain.03 12.12 15. Test for a species effect.5 187.6 65.79 8. Table 6.33.74 67.15 24.l5 77.67 37.354 Chapter 6 Comparisons of Several Multivariate Means Exercises 355 6.l 161.55 19. 560CM 10.05.4 49.93 37. The data in Table 6.86 8.8 173.78 26.31 33.24 12.42 10.44 37.67 40.51 13.1\vO of the variables sured were Xl = percent spectral reflectance at wavelength 560 nrn (green) X 2 = percent spectral reflectance at wavelength 720 nrn (near infrared) The cell means (CM) for Julian day 235 for each combination of species and level are as follows.16 21.40 17.5 121. Use a = .00 23.18.31 8.14 19. Refer to Exercise 6.33 40. and a suboptimal level. The species of seedlings used were spruce (SS).l3 10.8 45.7 180.74 9. a time effect and speciestime interaction.25 16.6 193.4 53.57 71.78 10.93 38.18 Spectral Reflectance Data 560 run 9.37 7. 6.22 10.33 12.1 166.18 are measurements on the variables Xl = percent spectral reflectance at wavelength 560 nm (green) X 2 = percent spectral reflectance at wavelength no nm (near infrared) for three species (sitka spruce [SS]. (d) Larger numbers correspond to better yield and gradegrain characteristics. the spectral reflectance of three lyearold seedlings was measured at various wavelengths during the growing The seedlings were grown with two different levels of nutrient: the optimal coded +.8 166.0 166.27 10.7 55.8 164.32 22.34 7.05.77 720nm 19.7 l39. Using cation 2.87 44.00 25.77 12.63 25. These averages are based on four replications. using 95% Bonferroni simultaneous intervals pairs of varieties.48 12.04 13.17 Peanut Data Factor 1 Location 1 1 2 2 1 1 2 2 1 1 2 2 Factor 2 Variety 5 5 5 5 6 6 6 6 8 8 8 8 Xl X2 X3 Yield 195.3 189.8 58.94 8.38 14.45 6.5 200. (a) Perform a twofactor MANOVA using the data in Table 6.21 9.2 Table 6.32. and lodgepole pine [LP]) of lyearold seedlings taken at three different times (Julian day 150 [1].8 SeedSize 51.90 40.21 8. Japanese larch [JL).0 195.12 26.00 40.32. Use a = . coded . The seedlings were all grown with the optimal level of nutrient. Julian day 235 [2].50 33.28 38.71 44.35 38.5 44.45 29.1 167. and 10dgepoJe pine (LP).32 27.48 33.54 14. perform a twoway test for a species effect and a nutrient effect.37 31.87 Species SS SS SS TIme 1 1 1 1 2 2 2 2 3 3 3 3 1 1 1 1 2 2 2 2 3 3 3 3 1 1 1 1 2 2 2 2 3 3 3 3 Replication 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Source: Data courtesy of Yolanda Lopez.0 201.24 16. In one experiment involving remote sensing.69 14.07 11.5 165.03 32.l 57.67 24. Japanese larch (JL).40 nOCM SS SS SS SS SS SS SS SS SS Species SS JL LP Nutrient 25.74 36.14 78.0 SdMatKer 153.78 10.4 203.9 202.41 7.73 7.93 60.27 20.7 197.1 156.20 + + + SS JL LP (a) 'freating the cell means as individual observations.73 26.25 41.8 60.35 13. . can we conclude that one variety is better than the other two for each acteristic? Discuss your answer.43 45.
5 247. (c) Find the 95 % Boneferroni confidence intervals for the mean differenc es males and females on both length and weight.0 228. The maximum hood estimate of the (q + 1) X 1 P is Table 6. 6.5 307.0 229. Given the summary information on electrical usage in this pie.07 7.00 9.9 using the results for the WIth the female national track neat the data as a random sample of siz 64 f h' m.34.0 244.2 290. e 0 t e twelve recordSOOm values.3 336.9 page 344 contains the carapace measurements for 24 female and 24 male ties.53 5.5 258.0 261.1 .5 268. large sample statistic.50 Gender Snout vent length 176.7 438.25 9.2 225.4. (a) Test for equality of means between males and females using a = .7 315.p + q)n (B'SIB r J Fit a quadrati c growth curve to this single group and comment on the fit.75 6.98 24. Comment on the comparison.00 7.00 3.3 466.50 6. as ml a y c assified as low or high severity.75 10. where is the populatio n matrix for carapace measurements for female turtles.00 53.9 331.' may be appropriate to analyze differences.49 6.75 7. use Box's Mtest to test the hypothesis Ho: IJ = =' I.5 238.5 223.75 9. and is the electrical usage covariance matrix for the of Wisconsin homeowners without air conditioning.00 25. A [lISt step ment .3 384.05.35. Refer to Exampl e 6. Refer to Exampl e 6. amounts of money so it is importa nt to be wrreless can lose great toward understanding the problem s' I d ' IX problems expedItiously. records in Table 1.25 7. 6. Use Box's Mtest to test Ho: = = I. Compare the male national track records in 1: b .00 9.00 Gender P= (B'SlB rIB'Sl x  where S is the sample covariance matrix.05.7 259.0 226.' g was rated as relatively new (novice) or . The estimate d covariances of the maximum likelihood estimators are CoV(P) =' (n . Set a = .97 56.4 280.75 23. Table 11. and 1500m races. 6.3 319. Here Il is the ance matrix for the two measures of usage for the population of Wisconsi n with air conditioning. the components of Xl versus time and those of X2 versuS the same graph.50 63.0 417.51 19.75 3.8 233. (b) Is it reasonable to pool variances in this case? Explain.50 57.4 223.00 10. F F F F F F F F F F F F F F F F F F F F F F F F F F F F M M M M M M M M M M M M M M M M M M M M M M M M M M M M Source: Data Courtesy of Jesus Ravis.50 69.7 235. When cell phone relay towers are not worki .41.s: groups.involving three factors. teraction show up for one variable but not for the other? Check by running· variate twofactor ANOVA for each of the two responses.40 33.7 365. I s usmg a . A from a designed experisimple or complex and the en ineer .65 7.0 223.7 page 662 contains the values of three trace elements and two measures of drocarbons for crude oil samples taken from three groupS (zones) of Box's Mtest to test equality of population covariance matrices for the sandstone.75 44.25 30.50 23.4 246. Take a = . . Table 6.0 306.25 21. (a) Plot the profiles.00 34.P (n . Anacondas are some of the largest snakes in the world. (a) Test for equality of means between males and fema e . Jesus Ravis and his searchers capture a snake and measure its (i) snout vent length (cm) or the length the snout of the snake to its vent where it evacuates waste and (ii) weight sample of these measurements in shown in Table 6.36.45 6.00 61.0 236. and I2 is the populatio n ance matrix for carapace measurements for male turtles.05.2) + q) (n .15.50 82.25 9.7 312. (c) Foresters are particularly interested in the interaction of species and time.0 440. (d) Can you think of another method of analyzing these data (or a different tal design) that would allow for a potential time trend in the spectral numbers? 6.3 365. Set a = .75 5.l)(n .7 347. expert (guru ).0 477.8 257.00 54. Explam why It (b) Find the 95% Bonferroni confidence in male and females on all of the races. three. 4OOm.38.00 15.0 303.50 5.85 10.8 360. Refer to Example 6.15 29.19. .75 9.5 186.00 9.01.00 39.00 24.3 441.7 235.0 237.8 326. 6. 6.3 Weight 3.84 7.356 Chapter 6 Comparisons of Several Multivariate Means (b) Do you think the usual MAN OVA assumptions are satisfied for the these data? cuss with reference to a residual analysis.3 222..7 212.7 231.6 172. Set a '" .9 236.0 215.00 30.39.31.75 5.15 but treat all 31 subjects as a single group.7 435. (b) Test that linear growth is adequate. and the possibility of correlated tions over time.19 Anacon da Data Exercises :357 Snout vent Length 271. Here there are p = 5 variables and you may wish to conSIder formations of the measurements on these variables to make them more nearly 6.7 224.1 Weight 18.05. .00 15.75 7.75 5.05.00 70.6 221.6 377.37 15.40. tervals for the mean differences between 6.
66 (2004).339354.A. Source: Data courtesy of Dan Porter.. .317346. "A Solution to the Multivariate BehrensFisher Problem. . 10. 25. Hunter.. Growth Curves.9 5. E.K . T. 1959. " . R.8 9.. Bartlett.3 19. S. F. Applied Multivariate Statistics with SAS® Software (2nd ed. New York: John Wiley. and W. M.. D. CA: Brooks/Cole Thomson Learning.153162. G. .0 7.4 10.6 23. Scheffe.0 15. S. R." Biometrika. 24 (1960). Pearson. 2003. "The Effect of Nonnormality on some Multivariate Tests and Robustnes to Nonnormality in the Linear Model. 15 (1986).1 5. (1950). 14. "A Generalized Multivariate Analysis of Variance Model Useful Especially for Growth Curve Problems.3 7. 24. 16 (1954).7 7. Naik.5 10.. New York: John Wiley. 166 (1937). . Bhattacharyya. Mosimann. and C. L. The data are given Table 6. and 1. vol. and G.296298.358 Chapter 6 Comparisons of Several Multivariate Means References 359 Tho times were observed. "Problems in the Analysis of Growth and Wear Curves. W. Fung "A New Graphical Method for Detectmg Smgle and J and W.). .4 9. Van der Merwe. M..5 4. M. Box. N. New York: John Wiley. . NC: SAS Institute Inc. 17. "Size and Shape Variation in the Painted ThrtJe: A Principal Component Analysis. and N.4 15. K. New York: John Wiley. Problem Severity Level Low Low Low Low Low Low Low Low High High High High High High High High Problem Complexity Level Simple Simple Simple Simple Complex Complex Complex Complex Simple Simple Simple Simple Complex Complex Complex Complex Engineer Experience Level Novice Novice Guru Guru Novice Novice Guru Guru Novice Novice Guru Guru Novice Novice Guru Guru Problem Assessment Tune 3. and S. Wilks." Communications in StatisticsTheory and Methods. and 1. "Certain Generalizations in the Analysis of Variance. Bartlett. D. 1969.6 3.).• Note on the Multlplymg Journal of the Royal Statistical Society (B). References 1..3 Problem Implementation Time 6.). H.7 1." Statistics & Probability Letters. 1999. . 21. P." Biometrics..4 8." Journal of the Royal Statistical Society Supplement (B). L. S. Tiku. Biometrika. 1972. R. 161169.9 6.8 15. 30333051. F. "Testing the Equality of VarianceCovariance Matrices the Robust Way. Tiku.4 13. E. V. eds.0 5.2 9. P. and N. Kshirsagar. . M." 7.6 12. K.7 12. Singh. Multiple Outliers in Univariate and Multivariate Data.5 4. Anderson.. G. 176197. 18. 14. Cambridge.. HUnter. Jolicoeur. Box. E. Roy. A.1 3. C.1 1. Balakrishnan. Box. New York: John Wiley. Bartlett. 1995. If· rta t Perform a MANOVA including appropriate confidence mterva s or Impo n I" 9. Krishnamoorthy. S.3 1.. 16.6 15. E. 34 (1938).7 14. 985100l.8 8.9 14. ." Communications in StatisticsTheory and Methods. 23. S. H. 11. W. Box.3 5.6 4. M. Design and Analysis of Experiments (6th ed. 5. 2 B acon. 2005. Draper. Johnson. G. 13.). " r d S . and M.5 21.6 12. P. M. "Properties of Sufficiency and Statistical Tests. Proceedings f the Cambridge Philosophical Society. N.8 19. App le (1987). Mardia. 471494.. 0 4. D." Biometrika. 37193735.6 14.362389. 12 (1985). 268282. 2005.4 Total Resolution Time 9. An Introduction to Multivariate Statistical Analysis (3rd ed. England: Cambridge University Press.. 51 (1964). P. no." Communications in StatisticsTheory and Methods. Montgomery.313326. E.). and 1. S. h R I 3. E." Proceedmgs of t e oya Society of London (A). A. The Analysis of Variance. 15. M. 105121. 2005. Cary. Multivariate Statistical Methods (4th ed. 6. B.". P. . Nel. Yu.3340. Biometrika Tables for Statisticians. Smith. Statistics for Experimenters (2nd ed. 58 (1971). Statistics: Principles and Methods (5th ed. G.2 6.). Hartley. "Multivariate Analysis. 36 no 2 tatlstrcs. Morrison. 22. 36 (1949).. .0 2. Potthoff. R. K. "Modified Nel and Van der Merwe Test for the Multivariate BehrensFisher Problem.7 3. "Further Aspects of the Theory of Multiple RegressIOn. . . and D. "A General Distribution Theory for a Class of Likelihood Cntena. . F:ac torsor f Various X2 ApprOXimations.3 2. S. .8 2. Evolutionary Operation:A Statistical Method for Process Improvement. "Robust Statistics for Testing Mean Vectors of Multivariate Distributions.9 10. . S." Growth. . 24 (1932). 9 (1982). New York: Marcel Dekker. Belmont.Shone. . 20. 19.. 6 8. Bartlett.7 6. 2005. and H. 11.20. New York: John Wiley..3 5. O. 9 (1947)." Biometrika. G. Khattree. G. The time to assess the pr?blem and plan an the time to implement the solution were each measured In hours. no.
Specifically. Zz. Seber and Goldberger [16]. . . COV(8j.. The regression model states that Y is composed of a mean.. error (and hence the response) is viewed as a vanable whose behavlOr IS characterized by a set of distributional assumptIons. With n independent observations on Yand the associated values of z· the comI' plete model becomes Yl = = 130 130 + 13lZ 11 + 132Z12 + ..· The predictor variables mayor may not enter the model as fIrstorder terms. Var(8j) = a2 (constant).13. Galton [15]. Our treatment must be somewhat terse. which depends m a contmuous manner on the z. Neter. which accounts for measurement error and the effects of other variables not explicitly considered in the values of the predictor variables recorded from the experiment or set by the mvestigator treated as fixed . the linear regression model with a single response takes the form Y = 13o + 13lZl + . This model is then generalized to handle the prediction of several dependent variables.. In this chapter.1 Introduction Regression analysis is the statistical methodology for predicting values of one or more response (dependent) variables from a collection of predictor (independent) variable values. Cov(e) 1. . be r predictor variables thought to be related to a response variable Y. Z2r 131 Zlr] [130] : : [8 + : 1 82 ] Znl Znr 13r 8n Y = (nX(r+l» ((r+l)xl) Z fJ + e (nxl) Y = current market value of home 360 and the specifications in (72) become 2. + 13rZ2r + 82 + 13lZnl (71) Yn = 1.8k) = O. and a random error 8.Z.2 The Classical linear Regression Model Let Zl.z. and Nachtsheim [20].. z. E(e) = 0..)] [Response] = [mean (depending on + [error] The term "linear" refers to the fact that the mean is a linear function of the unknown 13o. Bowerman and O'Connell [6]..) Our abbreviated treatment highlights the regressIOn assumptions and their consequences. E(8j) = 0.Chapter and Zl Z2 = The Classical Linear Regression Model 361 Z3 Z4 == square feet ofliving area location (indicator for zone of city) = appraised value last year = quality of construction (price per square foot) MULTIVARIATE LINEAR REGRESSION MODELS 7. Unfortunately. Kutner. or breadth of application of this methodology. + 8 Zl. we first discuss the multiple regression model for the predic· tion of a single response.Z2.. (71) becomes Zll ZZl Z12 Z22 ::: 1. Draper and Smith [13].'s. as a vast literature exists on the subject. 130 + 132Zn2 + . see the following books. + 13... (If you are interested in pursuing regression analysis. and = E(ee') = a2I. the name regression.. with r = 4. . Cook and Weisberg [11]. (72) In matrix notation. + 13rzl r + 81 + 13lZ21 + 132Z22 + .. + 13rZnr + 8 n where the error terms are assumed to have the following properties: 2. Wasserman.. It can also be used for assessing the effects of the predictor variables· on the responses.. in ascending order of difficulty: Abraham and Ledolter [1]. and the general applicability of regression techniques to seemingly different situations. 131>···. For example. in no way reflects either the importance . we might have or (nXl) ... alternative formulations of the regression model.. culled from the title of the first paper on the sUbject by F. .j * k. . and 3.
we shall later need to add the assumption of joint normality for making confidence statements and testing hypotheses. we obtain the observed response vector and design matrix 9 6 9 0 2 3 1 2 Before the responses Y' = [Yi.132 = 7"2. (nXl) Note that we can handle a quadratic expression for the mean response by introducing the term 132z2.2 (The design matrix for oneway ANOVA as a regression model) Determine the design matrix if the linear regression model is applied to the oneway ANOVA situation in Example 6. (nXn) lj = • and (1"2 are unknown parameters and the design matrix Z has jth row Example 7. the errors [el.6.. Zjb . Thus. as in Example 7. . es] are random. I I Each columnof Z consists of the n values of the corresponding predictor variable· while the jth row of Z contains the values for all predictor variables on the jth trial: linear Regression Model (nXl) y= (nX(r+I» ((r+I)XI) Z P+E. . JL2 = JL + 7"2.•• .. allows the whole of analysis of variance to be treated within the multiple linear regression framework. if the observation is from population 1 otherwise if the observation is from population 3 otherwise and 130 = JL. .... . Ys] are observed.. Z = .. . j=1..2. so The Classical Linear Regression Model 363 The data for this model are contained in the observed response vector y and the design matrix Z.. .1 (Fitting a straightline regression model) Determine the linear regression model for fitting a straight liiie Mean response to the data = E(Y) = f30 + f3l zl o y 1 4 1 2 3 3 8 4 9 = 7"1. We now provide some examples of the linear regression model. e2. We create socalled dummy variables to handle the three population means: JLI = JL + 7"1. + {3r Zj. and we can write Y = E' = ZP + e (8XI) Y = (8X4) Z = where Y = Yl] [1' 5 . . Zjr]' Although the errorterm assumptions in (72) are very modest. The linear regression model for the jth trial in this latter case is lj = or 130 130 + 131Zjl + 132zj2 + Sj + 13lzjl + 132zJI + Sj E(E) where 13 = 0 (nXl) and Cov(e) = (1"2 I. constant term 130' It is customary to introduce the artificial variable ZjO = 1.[1 T ZIl] 'P 1 ZSl = E = [SI] Ss 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 1 • The construction of dummy variables..2.8 where we arrange the observations from the three populations in sequence. with Z2 = zy.131 I Z2 = { o if the observation is from population 2 otherwise Example 7. and JL3 = JL + 7"3' We set [ZjO. + 13rzjr = {3oZjO + {3I Zjl + .133 = 7"3' Then lj = 130 + 131 Zjl + 132Zj2 + 133Zj3 + Sj.362 Chapter 7 MuItivariate Linear Regression Models Note that a one in the first column of the design matrix Z is the multiplier of the. Yi. where I 130 + 131Zjl + .
]. Consider the difference Yj . + sum squares of the differences from the observed Yj is as small as possIble.3 (Calculating the least squares estimates.ZP)'(y .. Note that (Z'Z)l exists since Z'Z has rank r + 1 :5 n.Z(Z'ZfIZ'][I .b) + 2(y .Z(Z'Z)lZ'Jy == O. Let Z have full rank r + 1 (73) is given by :5 Zp since (y . (If Z'Z is not of full rank.2.] l 7.'"  {3rZjr =E ".b) = (y . Let = (Z'ZfIZ'y as asserted.  . (Z'Z)l is replaced by (Z'Zr.Zb) = (y . where H Let y = ZfJ "hat" matrix. we must determine the values for regression coefficients fJ and the error variance (}"2 consistent with the available Let b be trial values for fJ. . .2Z(Z'Zf z.Z (Z'Zflz.b) so S(b) = (y .Z' = O. Then £ [I . + + . phasize their role as of fJ.y) = Z'[I .b1zj1 .Zb = y . with the data In the sense that they estimated (fitted) mean responses.. + brzjr that be expected if b were the ·"true" parameter vector.Z(Z'ZfIZ']y. The coefficients fJ are consistent. 1)rpicaJly. The de\IlatlloriJ:i Sj + (P .ZP) = Yj  . [I . BecauseZhasfullrank.Z'i = Z'(y ..ZP) = (y . .  j = 1.bo .Zb = Y .H)y satisfy Z' e = 0 and Y' e = O. To verify the expression for fJ..brzjr will not be zero.b1zj1 .) .y'ZfJ fit to the data IIf Z is not full rank.Z(p . Then the residuals = Z (Z'Z)I Z ' is called P i = y .) Result 7.. + Z(Z'Z)IZ'Z(Z'Z)IZ' (idempotent)..Z(Z'Z)IZ'J[I = y'[1 _ Z (Z'Z)lZ']Y = y'y .1 shows how the least squares estimates and the residuals £ can be obtained from the design matrix Z and responses y by simple matrix operations.b) '# 0 if fJ '# b. method of least squares selects b so as to miI). and the residual the residuals i. = y'[1 .b). Also. the = j=l n 2: (Yj  residual sum of squares  {3IZjl . ZI o 1 Y 1 4 2 3 3 8 4 9 .Z(Z'Zf1z.b).ZP + Z(P . They will henceforth be denoted by fJ to .2. so the minimum sum of squares is unique and Occurs for b = = (Z'Zf1Z'y. Zjr' That is. The vector of residuals i == y contains the information about the remaining unknown (See Result 7. [I .6.ZP + ZP . Additionally.y = [I . . Z'[I . . (76) 3. l The least squares estimate of fJ P P= (Z'ZfIZ'y = Hy denote the fitted values of y. n..Zb)'(y .) Result 7..'" E Example 7. we write !'e 2 bo b1zj1 '" brz jr ) j=l y .Z(Z'Z)IZ'] 2.ZP)'Z(P .ZP)'(y . and the resIdual sum of squares for a straightline model of squares) Calculate the least square estimates P.b)'Z'Z(P .ZP)'Z = £'Z = 0'.Zb) The coefficients b chosen by the least squares criterion are called least squqres mates of the regression parameters fJ.Z(Z'Z)IZ'] = I . Exercise 7. = y'[1 _ Z(Z'ZrIZ']Y = y'y . a generalized inverse of Z'Z. the residuals.y'ZfJ. Consequently. . but then a'Z'Za = 0 or Za = 0 which con.3 least Squares Estimation One of the objectives of regression analysis is to develop an equation that will the investigator to predict the response for given values of the predictor Thus it is necessary to "fit" the model in (73) to the observed Yj cOlTes.Z(Z'ZfIZ'] satisfies 1..b) + (P .pollldill2:Jf8: the known values 1.bo .. Z'Za = 0 for some a '# 0.Zb)'(y . because the response fluctuates manner characterized by the error term assumptions) about its expected value..] = Z' .364 Chapter 7 Multivariate Linear Regression Models Least Squares Estimation . the Yj . Zjl> . = [I ..imize the sum of the squares of differences: S(b) = n 2: (Yj  P =y  y=y _ Zp = (symmetric). = [I .'" between the observed response Yj and the value bo + b1zj1 + . The matrix [I .Z(Z'Zflz. The first term in S(b) does not depend on b and the' .n are called residuals. • tradicts Z having full rank r + 1.1.soY'e = P'Z'£ = O.Z'Z(P .365 Proof.Z(Z'ZrIZ']Y = (I .
L j=l n n n j=l Yj' or y = y. Zr have no influence on the response. or attributable to. because of the random error E. R2 is 0 if (3o = Y and f31 = f32 = . E(Y) is a linear combination of the columns of Z.2] . the condition Z'e = 0 includes the requirement Z' y z'z (Z'Zr l '£L o= l'e = j=l 2: ej = 2: Yj . j=l n n (78) .1 2: (Yj  n y)2 = j=l 2: (Yj .. that Sj = 0 for all j.366 Chapter 7 Multivariate Linear Regression Models We have Least Squares Estimation 367 Since the first column of Z is 1.1. p= = (Z'ZrlZ'y = = ( about mean ) = + (error)) squares sum 0 squares and the fitted equation is The preceding sum of squares decomposition suggests that the quality of the models fit can be measured by the coefficient of determination Y= 1 + 2z The vector of fitted (predicted) values is R2 = 1 _ j=! L e1 n 2: (Yj j=l 11 y)2 (79) ± j=! (Yj .y)2 j=l ± (Yj _ y/ The quantity R2 gives the proportion of the total variation in the y/s "explained" by. the observation vector y will not lie in the model plane. . . the predictor variables Zl.' . Z2. so the total response sum of squares y'y = satisfies y'y = Thus. y is not (exactly) a linear combination of the columns of Z. As P varies..6 . Consequently.2 . Z2. the predictor variables Zl. ZP spans the model plane of all linear combinations. According to the classical linear regression model... we obtain the basic decomposition of the sum of squares about the mean: or j=l 1 1 1 1 2 3 m 10J 30 [ . y'i = 0.Y/ + 2: e.Zr' Here R2 (or the multiple correlation coefficient R = + VJi2) equals 1 if the fitted equation passes through all tpe da!a points. At the other extreme.. Usually. + Przr f31 1 Znl ZIIr SumofSquares Decomposition According to Result 7. In this case. The residual sum of squares is Mean response vector = E(Y) = ZP = f30 ll [Zlll [Zlrl [ + + . . Subtracting n). so Geometry of least Squares A geometrical interpretation of the least squares technique highlights the nature of the concept.2 = n(W from both sides of the decomposition in (77). = f3r = O. that is... Recall that Y + E (y + Y _ y)'(y + Y _ y) = (y + e)'(y + e) = y'y + e'e response) ( vector error) ( vector .
For any c. [I .Z(Z'Zf 1Z'] is the matrix for the projecti on of y on the plane perpend icular to the plane spanned by the columns of Z.+2 = . 'd al JectlOn 0 Y . . #0 k or 1 if i = k. sition (216) to write Z'Z = Alelel + Azezez + . Then Z (Z'Zr Z ' . .+1 A :1/2Zej.Z(Z'Z fl Z ']Y nr.J 2: Ai1eiei. and Z has full rank r + 1.1)c?..qr+l} is r+l (q.2 and Definiti on 2A.l E(sz) = c? Y'[I ..:n. the projecti on of y on a linear combination of {ql.1f Z IS of full rank.= (Z'Z)1 = + ezez + . Markov proved a less general result. lor e . which is a linear combination of the columns T hen qiqk ConsIde r q" 1/2 1/2 ' _ 0 if . which misled many writers into attaching his name to this theorem. As illustrat ed in The squared y all as :ssible when b is selected such that Zb is the point in Figure 7.. 1 1. (See [23J. we use the spectraI d ecompo multipli cation by the matrIX . The followin g result concern s "best" estimato rs of linear paramet ric function s of the form c' fJ = cof3o + clf31 + . 2 =n i'i .y) q. This geometry holds even when Z IS vector 13 = Y . as described in Exercise 7.. + c. The rest u. " + clf31 + " . .Z Similarly. = 2: . S(b) IS as srn oint occurs at the tip of the perpend icular prothe model plane closest tTho y. where E(e) = 0.) 3Much later.r .+1 qiq! has rank rl + 1 and generates the unique projection of y on the space spanned by the linearly i=1 independent columns of Z.f3. so defining s we have Moreov er. the estimato r c' fJ = cof3o " Al Az .368 Chapter 7 Multivariate Linear Regression Models 3 Least Squares Estimation Accordi ng to Result 2A. Result 7.H]Y nr.3 (Gauss·3 Ieast squares theorem ).Z(Z'Z flZ '] i have the properti es E(i) =0 and Cov(i) = aZ[1 = aZ[1 . qz. the r + 1 1/2A1 /2 'Z'Ze = A· Ak ejAkek I = Ai k ej k 'e endicular and have unit length. The residual s · Once the observa t IOns become available' the least squares solution is derived from the deviation vector y _ Zb = (observation vector) .sls ng d' ular to that plane. the corresponding eigenvectors.1 Least squares as a projection for n = 3. Z(Z'Z)I Z ' To see this. ez..6.··· . f on the plane at IS.. = qjqi) y = Z(Z' Zfl Z 'y (r+l = ZfJ· A Thus. > A > 0 are the eigenvalues of Z'Z and el. + c f3r r for any c. rJ+I 2If Z is not of full rank.. the least squares jJ = (Z'Zfl Z 'Y has Figure 7. Under the general linear regressi on model in (73). + Aer+le r+1 .YIS perpen IC not of full rank.+1 z 2: .. = r+l i=1 Ai1ZejejZ' = qiqj . we can use the generalized inverse (Z'Zr = Al 2: rl+l A2 2: ..H] Also. COY (e) = c? I. ti 'of all linear combinations of the columns of Z. Moreov er.. full k the projection operation is expresse d analytic ally as When Z has ran. mUltipli cation by Z (Z'Zfl Z ' projects a vector onto the space spanned by the columns of Z.(r + 1) Y'[I .+l.=1 . Sampling Properties of Classical Least Squares Estimators The least squares estimato r detailed in the next result. (See webpage : www.+l >0 = A.. = A.c om/stati stics) • The least squares estimato r jJ possesse s a minimu m variance propert y that was first establis hed by Gauss.. . This is true for any choice of the generalize d inverse. r = 1. Let Y = ZfJ + 13.. IDatlOns span t e space 0 .2...E (i'i) = (n .Zb) is the sum of squares S(b).l z .. jJ and the residual s i have the samplin g properti es estimato r Result 7.. er+1 are where Al 2: A .···. • I: p th choiceb = Q yA = is the projecti on of .. That IS... 2: A.1.12. Their linear cornb' qi ahre combinations of the columns of Z.. .. where Z(Z'Z).pr enhall. + E(jJ) = fJ and Cov(jJ) = c?(Z'Z fl .(vector in model plane) ( _ Zb)'( .. Proof.l ZP jJ and i are uncorre lated. y on th: plane c. J • .
Consider the symmetric squareroot matrix (Z'Z)I/2. I ° • P A confidence ellipsoid for P is easily constructed.1. .) is the diagonal element of s2(Z'Zr corresponding to f3i' • squares estimator p= p.a*'Z = c' . Therefore. + + . It is expressed in terms of the l estimated covariance matrix s2(Z'Zr .21)..nrl( a) Vv. and u' = Inferences Concerning the Regression Parameters Before we can assess the importance of particular variables in the regression function P E(Y) = Po + {3..(a .1 d. tor of c' P for any c of interest. where Fr+ I. hence. . so c' P = a*'Y is an unbiased estimator of c' p. by assumption. (r + l)s Fr+l. let a'Y be any unbiased estimator of c' p. Then a 100(1 .rl distribution.r .4 (n . (See (222). . IS posltIye unless a = a*. ..nrl( a).. Cov(V) = (Z. c2 = (r + I)Fr+ 1. . Moreover..I)J = [V'V/(r + l)J.a*) + a*'a*] :s. .'s.+I/(r + 1)l![rnrl/(n .a*)'a* = (a .S. where. For any fixed c. Let Y = Zp + E.a) percent confidence region for P is given by (PP) Z Z(PP) .6. Further. since it consists of linear combinations of the f3.ll. 0.a*) (a . V (r + I)Fr+l.. simultaneous 100(1 .ZI + . the estimator c' P is called the best (minimumvariance) linear unbiased estimator (BLUE) of c' p. C' = c'(Z'Zf'Z'Y = a*'Y with a* = Z(Z'Z) c.4 Inferences About the Regression Model We describe inferential procedures based on the classical linear regression model !n (73) with the additional (tentative) assumption that the errors e have a distribution. Methods for checking the general adequacy of the model are conSidered in Section 7.2 is the maximum likeiihood estimator of (T2. Let Y = ZP + E. To do so.2 E(P) = P.a'Z)p = for all p. independently of V. the confidence ellipsoid will be very long in the direction of the corresponding eigenvector. A 7.21). Also. Consequently.z//2Cov(p)(Z'Z)I/2 = O'2(Z'Z)I/\Z'Zr 1(Z. a'Zp = c' p or·(c' .+1' By Result 7. .SZ has an Fr+l..J Set 1/2 V = (Z'Z) (P .O' (Z'Zr 2 1 ) The confidence ellipsoid is centered at the maximum likelihood estimate P.P)' (Z'Z)(P '. (See webpage: www.0. . where Var(f3i) IS the diagonal element of s2(Z'Z) corresponding to f3i' Proof. Proof..P)'(Z'Z)I/2(Z'Z//2(P . In statistical tenninology..z)I/2 = 0'21 and V is normally distributed.) V(r + I)Fr+1.l)s2 = i'i is distributed as U2rn_r_l> independently of and. indudmg the chOIce P = (c . where Z has full rank r + 1 and Eis Nn(O. . Thus..a*)'(a .nrl(a) 2 since (a '. where 1 A Var(f3.r .. Proof. and the confidence ellipsoid for P follows.r . E( E(a'Zp + a'E) = a'Zp.. + anYn na2 =e'i is distributed as O'2rn_r_1 where 0.a*)'Z(Z'Zrlc = 0 from the (: = a'Z . Result 7.4.comlstatistics) I that are unbiased for c' p. Result 7.:.370 Chapter7 Multjvariate Linear Regression Models of c' p has the smallest possible variance among all linear estimators of the form a'Y = all! Inferences About the Regression Model 371 and is distributed independently of the residuals i = Y  Zp. i'i. Result 7. Moreover.a This implies that c' = a'Z for any unbiased estimator.a* + a*) .f. This powerful result states that substitution of for p leads to the be. Then the maximum likelihood estimator of P IS the same as the leas P [0. . whatever the value of p. .. where Z has full rank r + and E is distributed Nn(O. we shall assume that the errors e have a normal distribution.P) and note that E(V) = 0.t . E(a'Y) = c' p. 1 i = O. Equating the two valu: .I. I Now.a) percent confidence intervals for the f3i are given by f3i ± "'.c' = 0'..1 with AI = Z'Z/ s2.r . 0.r(Pi). If an eigenvalue is nearly zero.1).0.. DJ yields I f3i '" Pd :s. (Z'ZrIZ'Y is distributed as Nr +l (p.. for a satisfying the unbiased requirement c' = a'Z. Var(a'Y) is minimized by the choice a*'Y = c'(Z'Z) Z'Y = c' p. + (3rzr (710) (P  we must determine the sampling distributions of and the residual sum of squares. [X. = i'i/(n . Also.a* = = a'IO' a 2 + a*). Projecting this ellipsoid for P) using Result SA.r . P • V%(P. ' " Var(a'Y) = Var(a'Zp + a'e) = Var(a'e) = O' 2 (a .nrl (a) is the upper (lClOa )th percentile of an Fdistribution with r + 1 and n ..nrl(a) .P) is distributed as U 2 X. V'V = (P .P) = (P . and its orientation and size are determined by the eigenvalues and eigenvectors of Z'Z.prenhall. Because a* is fIxed and (a .
OGRAM COMMANOS Total dwelling size (100 ft2) 15.) If the residuals E pass the diagnostic checks described in Section 7.2544 .8 Z2 Y Selling price ($1000) 74.6 60.045 when searching for important predictor variables.8 74.35 Assessed value ($1000) 57.18 14.1 RealEstate Data Zj title 'Regression Analysis'.47254 76.1.20 16.25 14.0 69.158 Variable INTERCEP zl z2 DF 1 = 9.5 73.967 (7. neighborhood. proc reg data estate.37 18.0038 0. = I Rsquare Adj Rsq ".06 16.43753 12. indicating that the data exhibit a strong regression relationship. to these using the method of least squares.372 Chapter 7 Multivariate Linear Regression Models Inferences About the Regression Model 373 Practitioners often ignore the "simultaneous" confidence property of the interval estimates in Result 7. the fitted equation is (Z'ZrIZ'y = [ 30.r 1(a/2) and use the intervals and jJ = Thus.87000 3.8344.05 15.31 15. Fit the regression model y= 30.4 (Fitting a regression model to realestate data) The assessment data Table 7.634z1 + (.6 68.0 73.966566 Standard Error 7.353 0.967] 2.8 63.1 SAS ANALYSIS FOR EXAMPLE 7.1 66.57 17.4 60.8 65.785) .88220844' 0.5 71.0512 [ .929 3.2 57. Also.634 .88) + 2.76 19.33 14.20 25.0 63.0 Model: MODEL 1 Dependent Variable: Analysis of Variance Sum of Squares 1032_87506 204. input zl z2 y.2 57.0 76.6. A computer calculation yields (Z'Zr1 = 5. Wisconsin.1523 ] .8149 Parameter Estimates Parameter Estimate' 30.44 14.045z2 (.7 56.05853 OUTPUT Source Model Error C Total DF 2 17 19 f value 42.99494 1237.5 71.4 55.87 18. the fitted equation could be used to predict the selling price of another house in the neighborhood from its size PANEL 7.4 57.8760 . data estate.0 72.28518271 Tfor HO: Parameter 0 3.9 86.dat'.48 14. The numbers in parentheses are the estimated standard deviations of the least squares coefficients. Table 7.6 63.9 76.1463 .9 70.473.1 were gathered from 20 homes in a Milwaukee.33 14.89 15.55000 4.0001 J Root MSE Deep Mean CV I 0. infile 'T71.0 102.0011 0.5 74.78559872 0. Z2 = assessed value (in thousands of dollars).0172 . R2 = .0 78. they replace (r + l)Fr+l.1 89.045184 Prob> ITI 0.63 15.91 15.0 84.3 65.3 63.nrl( a) with the oneatatime t value tn .1 0.2 67.25 13. (See Panel 7. Example 7.6 62. Instead.0 72.828 Prob > F 0.0 74.53630 Mean Square 516. which contains the regression analysis of these data using the SAS statistical software package. model y = zl z2.5.285) Yj = 130 + 131 Zj 1 + f32Zj2 + Sj where Zl = total dwelling size (in hundreds of square feet).2 60.4 USING PROC REG.5 68. and Y = selling price (in thousands of dollars).834.0067 with s = 3.0 88.
···.045 ± 2. Moreover.es(Z»/(r .S. C have full rank.r .4 it is possible to formulate null hypotheses concerning r . 4Jn situations where Z is not of full rank.6.f. Given the data and the normal assumption. rank(Z) replaces Result 7.q and n .u2 ) = en / 2 2 (271' )R/2of p(t) = likelihood Ratio Tests for the Regression Parameters Part of regression analysis is concerned with assessing the of particular predictor variables on the response variable.Zp)/(n .q linear combmatIons of P of the form Ho: Cp = A Q • Let the (r . [f3 q +1> /3q+2'"'' f3r]· Z = [Zl nX(q+1) 1 1 nX(rq) Z2 ].squares) is compared to the residual sum of squares for the full model via the FratlO.q) F nUZ/(n . (See [22] or Result 7. The likelihood ratio test is implemented as follows. The likelihood ratio test of HO:P(2) = 0 is equivalent test of Ho based on the extra sum of squares in (713) and SZ = (y .025) VVai = .374 Chapter 7 Multivariate Linear Regression Models and assessed value.SSres(Z) = (y _ zJJ(1»'(Y .ZP)'(y .Zf3) (y .SSres(Z»/(r . and consider Ho:CP = 0 (This null hypothesis reduces to the previous choice when C = [0 ii (rq)x(rq) I ]. = /3r = 0 or Ho: p(Z) =0 (712) is equivalent to rejecting Ho for large values of (cT} .1 d. The same procedure applies even in analysis of variance situations . r + 1 and rank(ZJ) replaces + 1 in ."" Zr do not influence Y translates Z Z q+l' Z q+2.110(.nrl(a) Comment. the variable Z2 might be dropped from the regression model and the analysis repeated with the single predictor variable Zl' Given dwelling size. Y = ZIP(1) + e. Let Z have full rank r + 1 and E be distributed as Nn(O.556. let Ao = 0. 0. the likelihood associated with the parameters P and u Z is ± or tl7( .q) X (r + 1) matrix.UZ)/(r .(271')"/20" = 1 en/2 Since the confidence interval includes /3z = 0.1). the likelihood ratio test rejects Ho if (SSres(ZI) .) • E we can express the general linear model as y = Zp +e = [Zl 1 Zz] • [/!mJ + p(Z) = ZIP(l) + Z2P(2) + e Under the null hypothesis Ho: P(2) of Ho is based on the Extra sum of squares where = 0.Z{J)'(y .r . + e and P(l).647) = (271' t/2u n 1 e(yzp)'(yZP)/2u <: 2 .r .nrl(a) is the upper (l00a)thpercentile of anPdistribution with r .. • with the occurring at p = (Z'ZrIZ'y and oZ Under the restnctlOn of the null hypothesis. One null hypotheslS of mterest states that certain of the z.(y .285) (. . Y = ZIP (I) 1 (y . ro where the maximum occurs at (ZjZlr1Ziy.1 d. The improvement in the residual sum of squares (the • sum of. To test whether all coefficients in a subset are zero.6. We note that a 95% confidence interval for given by Inferences About the Regression Model 375 132 [see (714)] is Proof.ZJJ(1» . fit the model with and without the terms corresponding to these coefficients. likelihood ratio test (713) = SSres(ZI) .f.U max L(p{!). Result 7.21). n(cT} .q) _ (SSres(Zl) . where Z is not of full rank. assessed value seems to add little to the prediction selling price.1) S2 The preceding Fratio has an Fdistribution with r . In particular.r .) q where Frq.UZ)/UZ or its scaled version. Rejecting Ho: P(2) = 0 for small values of the likelihood ratio into the statistical hypothesis Ho: f3 q +1 = /3q+z where p(Z) = Setting = .q) > Frq..11 with m = 1. These predictors will be labeled ' The statement that Zq+l' Zq+2.'s do not influence the response Y.q and n . The.Zp)/n.Z{J) p(!) = (ZiZt>lZjy.
1 18.2 21. subject to the restriction that Cp == (See [23]). The design matrix Z is not of full rank.rank(ZI) .6 15. tthehTils the effects of gender on the service index. '. rank(Z) = 6.r .3 36.Zp)/(n . /33.4/12 == .rank(Z) = 18 .) In fact.1 dJ. Example 7.89 .Zp)'(y .q)Frq.0 44.1) and Frq. s2 > (r .376 Chapter 7 Multivariate Linear Regression Models Inferences About the Regression Model 377 2 Under the full model. a C (Z'ZrlC').SSres(Z»/(6 .2 30. 1'31. 1'12. and the numerator in the Fratio is the extra sum of squares incurred by fitting the model. Each data point in the table is categorized according to location (1. and the 'Yik'S represen t e ocatlOngender interaction effects. The (714) is the likelihood ratio test.2.4)/2 2977. /3 j. we can develop a regression model linking the service index Y to location. the combination of location 1 and male has 5 responses. The service ratings were converted into an index.4 92.2 36. column 1 equals the sum of columns 24 or columns 56.1 .SSres(Z»/2 2  SSres(Z)/12 _ (3419.0 63. == 18 .2 27.S (Testing the importance of additional predictors using the extra squares approach) Male and female patrons rated the service in three establish: ments (locations) of a large restaurant chain. 1'32J whe:e the /3. we reject Ho: Cp = 0 if (CP)' (C(Z'ZrIC') 1(CP) constant location gender inter!lction 1 1 1 1 1 0 000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000 0 1 1 1 .6 = 12. We Ho: C P = 0 at level a if 0 does not lie in the 1DO( 1 .4 27. Cp is distributed as Nr_q(CP.9 1 1 1 1 1 1 o o 1 1 1 1 1 1 0 0 0 0 0 1 010000 010000 001000 00100 0 001000 001 000 o 0 1 000 000 000 1 0 1 0 0 0 } 2 responses Z= 1 1 1 1 1 1 1 o o o o 1 } 2 responses } 2 responses } 2 responses 1 0 1 0 1 1 000 0 1 0 000010 00000 00000 1 1 The coefficient vector can be set out as {J' = [/30.4 40. For instance.2 21.2 contains the data for n = 18 customers. 1'11 1'12 1'32 .'S. /32.q and n .2977.r .4 and n . The next example illustrates how unbalanced experimental designs are handled by the general theory just described. results from a computer program give SSres(Z) = 2977. (For instance. '!he without the interaction terms has the design matrix Zl consisting of the flTSt sIX columns of Z. (i > 0) represent the effects of the locations on the determination of service.llrl(a) 1 1 100 100 100 100 100 100 100 010 o 1 0 o 1 0 010 010 010 010 001 001 001 001 1 1 1 1 1 0 0 0 0 0 1 I' "'pon'" where S2 = (y . Table 7.3 21.nrI(a) is the (l00a)th percentile of an Fdistribution with r . gender. or 3) and gender (male = 0 and female = 1). while the combination of location 2 and female has 2 responses.3 15. Tj. 1'21> 1'22.a) % confidence ellipsoid Cp. 1'11. T2.2 9.0 (no locatIOngender mteractlOn).2 RestaurantService Data Location 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 Gender 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 Service (Y) 15. For the complete model. Introducing three dummy variables to account for location and two dummy variables to account for gender. and their "interaction" using the design matrix Table 7.2 50.4 . Equivalently. This categorization has the format of a twoway table with unequal numbers of observations per cell. we compute S 1'21 = 1'22 = 1'31 = F == (SSres(Zl) .4) _ (SSres(Zl) . We find that SSres(ZI) == 3419.1 == 14 110 test no· 1I • with _ n .
fY. zb(Z'Zr estimato r of E(Yolzo) with minimum variance. by a . o.ZOl"". .4 asserts that fJ is Nr+1(fJ. ( o( P fJ» .S Inferences from the Estimated Regression Function be used to Once an investig ator is satisfied with the fitted regression model. zoP is the unbiase If the 1zo0'2. zOP. at z' = [1 . and . which IS dIstribu ted as t n"'rI' Th e pred'" IctIon mterval follows immediately. r is unbiase d Since B0 and P are m d epen d ent.z' P)/V. We forecast y. l onseque . zor]lsm oreunce rtam o.ZoP) Ys2(l + zo(Z'Zr Jzo) . I C z' _ y. thanest imating theexpe cted I va ue 0 o· Accordm g to the regression model of (73). upon any location gender interaction.r_l(a /2) is the upper lOO(a/2 )th percenti le of a tdistrib ution with r . but BO does not.. According to the model (715) E(Yo I zo) = f30 + f3lZ0l + . and op· onseque nt y."" [1. p " t Tb e errors E III uence the est' Illla ors p an s through the responses Y.J (Y. in za = [1. 2 0" (1 + zo(Z Z) ZO) is distribu ted as N(O. ZOr]. . For the linear regression model in (73).S. Var distriby normall is E by Result 7. dlstnbu ted.z(JP)/Y 0"2 z0 (Z'Z)I ZO (' zoP . This Fratio is not significant for any reasona depend not does index service the that conclude nificance level a. which uted. such as Y. amounts ent independ into d separate be usually tions cannot y to fit necessar is it case. so . . = Zo 4t s. . or Yo = zoP + BO 7. a lOD( 1 zoP ± t"_r_1 Ys2(1 + ZO(Z'ZrIZO) zop. z YS10'2 S Zo .3 applies.7..0) O(z'zr 0'2 N(zop. this in relative influenc es of the predictors on the response iate appropr the compute and question in terms the without the model with and • Ftest statistics.. zOJ. 0.· .zofJ) = Var (BO) + Var (zom = 0'2 + zo(Z'Z) I Z00'2 = 0'2(1 + then P °is If It IS assumed that E has a normal distribution. Var(zoP ) = for interval ce confiden % a) 100(1 a then errors E are normall y distributed..r 1) is distribu ted as YX"rl which . Result 7.. a new observatIOn YcJ has the unbiased predictor Po + PIZOI + is The variance of the forecast error Yo  ZoP = zoP . 0'2(Z'Z) ) indepen dently of 'a h' h . Predicti on of a new observa tion. and these terms can be dropped model.'s. J) = z'(Z'Z) lz 2. Dividing this ratio by V 2 2 b' /(n .1 degrees of freedom .1) C ation zofJ is combin hnear the ntIy. .2.BO zoP BO + zafJ_ EO : =.378 Chapter 7 Multivar iate Linear Regression Models Inferences from the Estimate d Regression Function 379 point of an The Fratio may be compared with an appropriate percenta ge ble sigFdistrib ution with 2 and 12 d. By ReSUlt 7. a (1'0 . . .0 . by provided is zofJ E(Yo I zo) = ution with where t"rl(a /2) is the upper l00(a/2 )th percentile of a tdistrib n . • . . interval ce confiden The IS IS n uted as (nrl' (zoP . . " f. Also. tion combina linear the is so normall y. (fJ) = . ).0 so the predIcto . Cov since 0'2 lzo zo(Z'Zr = (fJ)zo Cov Zo = (zofJ) 7. + tion f30 + f3lz01 at zoo (new respons e Yo) = (expecte d value of Yoat zo) + (new error) . .7. Result 7. + f3r zor = zofJ Its least squares estimate is Var(Yo  ZoP) = 0'2(1 + zb(Z'Z) I ZO ) a) % prediction interval for E have a normal distribution. . d' t 'b follows. the varia. but that gender is significa ' males and females do not give the same ratings to service. Then Zo and fJ can be Y response the of value the estimate to (2) and Zo at f3rzor + . we 0 taln Proof. .1 d.0' a d 2 fl . ' V (Y. 0 d linear Result 7. zo(PP + =.. is distribu ted as _ /(n . n . we from the . no differ_ Using the extra sumofsquares approach. 0 . ar. s/ . that is ence between location s (no location effect). unequal are counts cell the where s situation nce ofvaria analysis In their interac_ tion in the respons e attributable to different predictor variables and evaluate the To .f. The f orecast error IS E(zofJ) = zofJ and Var(zof then 00" . distributed as N(O 2) ap d"IS Illdependent of E and hence of a is BO where and S2 .O . + PrZor Estimating the Regression Function at Zo s have values Let Yo denote the value of the response when the predictor variable 000 is value d expecte the (73). Given the linear regression model of (7 3 ) . zofJ)s just a combination of the f3. zoP has . zo(Z'Z rlz ).f. UIlder the further that s2/0'2. Consequently. 1). problem on predicti solve two funcon regressi the estimate to (1) used predicto r variables. Proof.' W IC estImates E(Yo I zo).r ..zoP) Z zo) • Forecasting a New Observation at Zo . For a fixed Zo. .r . = E(BO) + zoP) E(Yo Thus.' y. it can for the values selected be ZOr] ZOl. we may verify that there is nt.
Artis.8 160. Table 7.17969 e2 = Y2 . .776(. and the studentized residuals are j = 1. like independent drawings from an N(0. the correlations are often small and the variances are nearly equal. many statisticians prefer graphical diagnostics based on studentized residuals.06411 [ . • 1. . A computer scientist collected data from seven similar company sites so that a forecast equation of computerhardware requirements for inventory management could be developed.97 ± 2.94).125 Z2 Y (CPU time) 141.1 108.H] is not diagonal.6 Model Checking and Other Aspects of Regression Does the Model Fit? Assuming that the model is "correct.  f3rZ2r (Z'ztl = .380 Chapter 7 Multivariate Linear Regression Models Model Checking and Other Aspects of Regression 381 The prediction interval for Yois wider than the confidence interval for estimating the value of the regression function E(Yo Izo) = zop· The additional uncertainty in forecasting Yo.025)sY1 + zo(Z'Zr1zo = 151.5].025)sYzo(Z'Zrlzo = 151.213 1.2. .08(130) + ..5 136. .71.58928) = .0 (Adddelete items) 2.5 151.603 1.025) confidence interval for the mean CPU time at Zo is .776. we have e Zl (Orders) 123.7.6 (Interval estimates for a mean response and a future response) Companies considering the purchase of a computer must first assess their future needs in to determine the proper equipment." we have used the estimated regression function to make inferences.061 8. All the sample information on lack of fit is contained in the residuals = customer orders (in thousands) = Z2 = Y 130 adddelete item count (in thousands) CPU (central processing unit) time (in hours) Construct a 95% confidence interval for the mean CPU time.130.1 133.86).153.. are substantially different. ..n (717) j == 1.2.00052 .776(1. Using the residual mean square S2 as an estimate of (1'2.204. Forecasting Computer Requirements: A Forecaster's Dilemma (Piscataway.97 and s Yzo(Z'Zrlzo = 1. The data are given in Table 7.. Consequently.(ei) = s2(1 .H].16071) = 1.204)(1.815 1.08z1 + .08.9 128.905 . which is the residual mean square when the jth observation is dropped from the analysis. Because the residuals have covariance matrix (1'2 [I .n (718) Source: Data taken from H.Z(Z'ZfIZ']Y = [I . NJ: Bell Laboratories.00107 en = Yn  .(j). 1979).42Z2 8. which is represented by the extra term S2 in the expression s2(1 + zo(Z'Zrlzo). approximately.108 9. A. Although the residuals .9 154. each residual ej is an estimate of the error ej' which is assumed to be a normal random variable with mean zero and variance (1'2.1) distribution.3 Computer Data If the model is valid.08831 and s = 1.40. . Consequently.kJj). y = 8. a 95% prediction interval for the CPU time at a new facility with conditions Zo is z'oP ± t4(. We have t4( .40) or (148. comes from the presence ofthe unknown error term eo· Example 7.130  f31Z21 ..155.3 for ZI Since sY1 + zO(Z'ZflZO = (1. so the 95% zoP ± t4(. .. E(Yolzo) '= + fJrzol + f32Z02 at Zo '= [1. find a 95% prediction interval for a new facility's CPU requirement corresponding to the same zo° A computer program provides the estimated regression function 81 = Yl  ..5) = 151.204( . the variances of the ej can vary greatly if the diagonal elements of H. We expect the studentized residuals to look..42(7. Residuals have unequal variances and nonzero correlations.5 Va.2 92.Z(Z'Zr1Z'] = (1'2[1 .00.71) or (150.5 172. P. Of course..8 146. .H]y zoP = 8. the leverages h jj . it is imperative to examine the adequacy of the model before the estimated function becomes a permanent part of the decisionmaking apparatus.. (716) or e= = [I . Fortunately..42 + 1. . . Also..5 146. Some software packages go one step further and studentize ej using the deleteone estimated variance .97 ± 2.5 168.42 + 1..: 2.
The QQ plots. but hard to check. The pattern of residuals may be funnel shaped.3 Residual plots for the computer data of Example 7. I . so that there is large variability for large Yandsmall variability for small y.2(c).6. minor departures from normality will not greatly affect inferences about p. A systematic pattern in these plots suggests the need for more terms in the model..rd is called the DurbinWatson test. Model Checking and Other Aspects of Regression 383 4.0 ••• (a) •• (a) (b) 1.) For instance.6. (b) The variance is not constant. Figure 7. _ e 1.1ena: (a) A dependence of the residuals on the predicted value. as in Figure 7. 3. the residuals Sj or can be examined using the techniques discussed in Section 4.2 Residual plots.) In Figure 7. The assumption of independence is crucial. + P. the residuals form a horizontal band. not constant.0 • • • • • (b) 1. or products ofpredictor variables. If this is the case. and transformations or a weighted least squares approach (or both) are required. The numerical calculations are incorrect.0 • (c) (d) Figure 7.382 Chapter 7 Multivariate Linear Regression Models Residuals should be plotted in various ways to detect possible anomalies. (See Exercise 7. however. or a f30 term been omitted from the model.) si Example 7. QQ plots and histograms. Do the errors appear to be normally distributed? To answer this question. If n is large. it appears as if the regression assumptions are tenable.6 are shown in Figure 7. If the data are naturally chronological.0 • z.2( d). Plot the residuals Bj against a predictor variable.3.0 0 1..3. the variance of the error .2(a). such as ZI. A statistical test of independence can be constructed from the first autocorrelation. such as ZI or ZI Zz. This is illustrated in Figure 7. and dot diagrams help to detect the presence unusual observations or severe departures from normality that may require special attention in the analysis. the following are useful graphs: 1. (See (14] for a description of this test and tables of critical values. histograms.2(bY. a plot of the residuals versus time may reveal a systematic pattern. For general diagnostic purposes. Plot the residuals Bj against the predicted values Yj = Po + 13) Zjl + . 1. This situation is illustrated in Figure 7.0 •• •• (c) 1. Plot the residuals versus time. (A plot of the positions of the residuals in space may also reveal associations among the errors. 2.7 (Residual plots) Three residual plots for the computer data discussed in Example 7. The sample size n == 7 is really too small to allow definitive judgments. A popular test based on the statistic (Bj n Bj_I)2 / BT == n 2(1 .is . This is ideal and indicates equal variances and no dependence on y. residuals that increase over time indicate a strong positive dependence.Zj'" Departures from the assumptions of the model are typically indicated by two' types of pheno1J. (719) of residuals from adjacent periods.
l)/(n . If the list of predictor variables is very Jong.•. Techniques and computer programs designed to select the "best" subset of predictors are now readily available. •. These references are recommended for anyone involved in an analysis of regression models. the leverage hjj' is a measure of pull that a single case exerts on the fit. it is often difficult to formulate an appropriate regression function immediately. R2 always increases with the inclusion of additional variables. [13].4 C p plot for computer data from Example 7. there may be "outliers" in either the response or explanatory variables that can have a considerable effect on the analysis yet are not easily detected from an examination of residual plots. and [10]. ZI and Z2. including an intercept Cl' = ( (residual variance forfull model) . in the space of the explanatory variables.) P= r +1 Figure 7. For simple linear regression with one explanatory variable z.. departures from the regression model are often hidden by the fitting process. . (See [10]. The good ones try all subsets: ZI alone. Although this problem can be circumvented by using the adjusted Rl. In Figure 7.(n . the leverage is associated with the jth data point measures. Z2 alone.6 with three predictor variables (z) = orders. Which predictor variables should be included? What form should the regression function take? When the list of possible predictor variables is very large. First. 10 9 7 6 5 4 (1. (See [13] for a discussion of the pureerror lackoffit test. Plots based upon leverage and influence statistics and their use in diagnostic checking of regression models are described in [3]. then a formal test for lack of fit can be carried out. Good models typically have (p. Z3 = number of items. fJ.) = hjj ( change in Yj) If the leverage is large relative to the other hjk> then Yj will be a major contributor to the predicted value Yj· Observations that significantly affect inferences drawn from the data are said to be influential.1).4. a better statistic for selecting variables seems to be Mallow's C p statistic (see [12]).. no serious violations of the assumptions are detected. In fact. will indicate models that forecast the observed responses well. these outIiers may determine the fit.=1 2: (z. JI 1 n (ZjZ)2 n . we can make inferences about fJ and the future Y values with some assurance that we will not be misled.2.) . called step wise regression (see [13]).(1 . The leverage h jj the (j. Selecting predictor variables from a large set. we have circled the point corresponding to the "best" subset of predictor variables. Leverage and I!lfluence Although a residual analysis is useful in assessing the fit of a model.) Second. and [23].Rl) (n . . cost considerations limit the number of models that can be examined. C p ).  z)2 The average leverage is (r + l)/n. after the diagnostic checks. not all of the variables can be included in the regression function.1 observations. attempts to select important predictors without considering all the possibilities. j) diagonal element of H = Z(Z' Zrl Z.384 Chapter 7 Multivariate Linear Regression Models Model Checking and Other Aspects of Regression 385 If several observations of the response are available for the same values of the predictor variables. In practice. (See Exercise 7. [5]. Another approach. The best choice is decided by examining some criterion quantity like Rl. see the example and original source). when observations are deleted. The vector of predicted values is y = ZjJ = Z(Z'Z)IZy = Hy where the jth row expresses the fitted value Yj in terms of the observations as Yj = hjjYj + k*j 2: h jkYk 1800 1600 1200 11 Provided that all other Y values are held fixed ( change in Y.2p) A plot of the pairs (p. .8. [See (79).3) Additional Problems in Linear Regression We shall briefly discuss several important aspects of regression that deserve and receive extensive treatments in texts devoted to regression analysis.] However. For example. R2 = 1 . C p) coordinates near the 45° line. one for each subset of predictors.r . how far the jth observation is from the other n . can be interpret" ed in two related ways. Methods for assessing)nfluence are typically based on the change in the vector of parameter estimates. [11]. If. Z2 = adddelete count. residual sum of squares for subset model) with p parameters.
.E(Y) = (Z. Akaike's information criterion (AIC) is residual sum of squares for subset mOdel) with p parameters. This implies that Z'Z does not have an inverse. suppose the true model has Z = [ZI i Z2] WIth rank r + 1 and (720) where E(E).. Zr. Section 8. we want to select models from those having the smaller values of Ale. including an intercept nln ( n Bias by a misspecified model.. also balances the size of the residual sum of squares with the number of parameters in the model. Step 4. . All possible simple linear regressions are considered.. and the first few principal components are calculated as is subsequently described in .ljm] be the responses. . + f3rlZr + el + . proposed regression model. the calculation l of (Z'Zr l is numerically unstable. Because of the stepbystep procedure. ej2. Overall.. In this situation. . 1.. the indivi<fual contributions to the regression sum of squares of the other variables already in the equation are checked for significance using Ftests. the least squares estimates P(1) may be misleading.ZI/3(1). Yet.. ZiZ2 = 0>. the rows zj of Z are treated as a sample. the variable is deleted from the regression function.I. the diagoqal entries of (Z'Zr will be large. it is unlikely that Za = 0 exactly.ZI/3(I»'(Y . the investigator unknowingly fits a model usmg only the fIrst q predictors by minimizing the error sum of squares_ (Y .If important variables are missing from the model. Then. Yz = f302 f30m + + f311Z1 f312Z1 + .3. some linear combination. . so that Ale = + 2p Yi = f301 It is desirable that residual sum of squares be small.. . That is. (See Result 7. we consider the problem of modeling the relationship between m Y1.. The predictor variable that explains the largest significant proportion of the variation in Y (the that has the largest correlation with the response) is the first variable to enter the regression function..· . the columns are said to be colinear.1 Multivariate Multiple Regression In this section. A second drawback is that the (automatic) selection methods are not capable of indicating when transformations of variables are useful. Y. unlike the situation when the model is correct .= 0 and Var(E) = (1"21. Step 3. . the design matrix Z10 Zll Z21 : Znl Z (nX(r+1) = r Z20 : Z2r ZlrJ ZnO Znr . let Yj = [ljJ. but the second term penalizes for too many parameters. Once an additional variable has been included in the equation.Y2. . . For most regression analyses. The next variable to enter is the one (out of those not yet included) makes the largest significant contribution to the regression sum of squares.n and a single set of predictor variables ZI. The response Y is then regressed on these new predictor variables.Zlr Z. The error term E' = [el' e2. The nificance of the contribution is determined by an Ftest. must equal O. 1 E(P(1» = (Z. and let El = [ejl.Zd1Z. . Zz.386 Chapter 7 Multivariate Linear Regression Models Multivariate Multiple Regression 387 The procedure can be described by listing the basic steps (algorithm) involved in the computations: Step 1. iflinear combinations of the columns of Z exist that are nearly 0. Another popular criterion for selecting an appropriate model.Zjr] denote the values of the predictor variables for the jth trial. Typically. called an information criterion. for example. However.Zlr1Z. Thus the error terms associated with different responses may be correlated. Steps 2 and 3 are repeated until all possible additions are nonsignificant and all possible deletions are significant... Each response IS assumed to follow its own regression model. + /3r2zr + e2 f3rmzr (722) Ym = + /31mZl + .Z2/3(2) (721) That is.. . em] has E(E) = 0 and Var(E) = . there is no guarantee that this approach will select.. This yields large estimated variances fqr the f3/s and it is then difficult to detect the "significant" regression coefficients /3i. such as Za. ' To establish notation conforming to the classical linear regression model. P(1) is a biased. let .. + + em Colinearity.6. If Z is not of full rank. The problems caused by coIinearity can be overcome somewhat by (1) deleting one of a pair of predictor variables that are strongly correlated or (2) relating the response Y to the principal components of the predictor variablesthat is.. Ejm] be the errors. If the Fstatistic is less than the one (called the F to remove) corresponding to a prescribed significance level. estimator of /3(1) unless the columns of ZI are perpendicular to those of Z2 (that IS. The least squares estimator of /3(1) is P(I) = (Z. the best three variables for prediction. At this point the selection stops.Zd lZ. Suppose some important predictor variables are omItted the. .(ZI/3(I) + Z2P(2) + E(E» = p(]) + (Z.. In matnx notatIOn. Step 2.) The of the Fstatistic that must be exceeded before the contribution of a variable is deemed significant is often called the F to enter.Y..
Z(Z'ZrIZ')Y (728) Y= (nX(r+I» «r+1)Xm) Z p+e (/lXm) with The orthogonality conditions among the residuals.'''' Zjr)' Simply stated.. jJ'Z'[1 Z(Z'ZrIZ'jY = 0 (730) confirming that the predicted values Y(iJ are perpendicular to all residual vectors' Because Y = Y + e.. Also. the generalized variance I (Y .ZB)'(Y .. . .. and columns of Z..Zjl.y i = Y . Here p and O"ik are unknown parameters. servations from different trials are uncorrelated.Zb(1» = Yn1 fJ «r+l)Xm) = [Po. the design matrix Z has jth row [ZjO. i b(m»). predicted values.388 Chapter 7 Multivariate Linear Regression Models is the same as that for the singleresponse regression model. Using the least squares estimates fJ. we determine the least squares estimates P(n exclusively from observations Y(i) on the ith response. Y'Y = (Y + e)'(Y + e) = Y'Y + e'e + 0 + 0' Y'Y + + with Cov (£(i) = uijl. but ob..ZB) I is minby the choice B = (See Exercise 7. i P(m)] f3rm (Y(1) . we take or Y'Y e'e residual ( error) sum) of squares and ( cross products (731) total sum of squares) = (predicted sum of squares) ( and cross products and cross products L lie.Z(Z'ZrIZ') = Z' .)] (726) For any choice of parameters B = [b(l) i b(2) i .ZB. "m] : = [E(1) " i E(2) i .m c ' (729) so the residuals E(i) are perpendicular to the columns of Z.Zb(m» ] (Y(nt)  f3r2 EI2 E22 [ (Y(m) . : Ynm " = [Y(!) . . Y'e = E(k). Specifically.») The (Y(i)  selection b(i) = p(iJ minimizes the ith of squares Enl En 2 e nm The multivariate linear regression model is (nxm) Zb(i)'(Y(i) ..) . However. we can form the matrices of p.Y = [I .ZB) (Y(1) . z'i = Z'[I . we obtain jJ = or [fl(1) i fl(2) i .Zb(i). In conformity with the solution. the matrix of errors is Y . which hold in classical linear regression..tr[(Y . the ith response Y(il follows the linear regression model Y(iJ= ZPU)+E(i)' i=1. [See (73).ZB») is minimized Also.Zb(m). i Y(2) i '" i Y(".Z(Z'Zr'Z']Y = 0 The m observations on the jth trial have covariance matrix I = {O"ik}. /3.Z' = O. : f3r1 f3!I' pom] [P(J) i P(2) i .Zb(I»'(Y(m) . Set Multivariate Multiple Regression 389 Collecting these univariate least squares estimates. They follow from Z'[I .Zb(l»)'(Y(1) ..] The matrix quantities have multivariate counterparts. hold in multivariate multiple regression.2. The error sum of squares and cross products matrix is (Y .Consequently.ZB)'(Y .'(Y(1) .00 _ Y (nXm) = [Y" : Yl2 122 Y n2 f302 f312 ¥Om] 12".11 for an additional generalimized by the least squares estimates ized sum of squares property. Given the outcomes Y and the values of the predic!or variables Z with column rank. . i fl(m)] = (Z'Zr IZ '[Y(1) i Y(2) ! . Predicted values: Residuals: Y = ZjJ = Z(Z'Zrlz.Zb(l) e (nXrn) = ['" diagonal sum Zb(m» (727) : 82m . i E(".. the errors for different responses on the same trial can be correlated.ZB)' (Y .
28427 Mean Square 40.6 .390 Chapter 7 Multivariate Linear Regression Models Multivariate Multiple Regression 391 The residual sum of squares and cross products can also be written as E'E . These data.0000 Num OF 2 2 2 2 OenOF 2 2 2 2 Pr> F 0. we fit a straightline reg._[1 111IJ (Z'Zr1 = [ .1 RSquare 0. . .8 USING PROe.00000000 RSquare 0.4286 0.00000000 4.00000000 e..V. manova h = zl/printe. are as follows: o Y:t Y2 1 4 1 1 1 2 3 2 3 8 3 4 9 2 Source Model Error Corrected Total OF 1 3 4 Sum of Squares 10. We find that Z01234 .12 2.0000 15.0625 I OF 1 3 4 Sum of Squares.0208 Std Error of Estimate 1.44721360 \ Example 1.2J .2 SAS ANALYSIS FOR EXAMPLE 7. proc glm data = mra. Tfor HO: Parameter = 0 1. title 'Multivariate Regression Analysis'.0714 The design matrix Z remains unchanged from the singleresponse problem.8 {Fitting a multivariate straightline regression model) To illustrate the calculations of jJ.47 Pr> ITI 0.0714 PROGRAM COMMANDS 'IE= Error SS & CP Matrix Y1 Y1 Y2 OUTPUT F Value 20. .33333333 F Value 7.714286 C.00000000 14.jJ'Z'ZjJ OF 1 Type 11/ SS 40.V.0625 0. data mra.0208 Statistic Wilks' lambda Pillai's Trace HotellingLawley Trace Roy's Greatest Root Y2 I General Linear Models Procedure loepelll:lenwariable: Source Model Error Corrected Total F 15.00000000 6.z = f101 + f1ll Zjl + Sjl = f10z + f112Zjl + Sj2.l Y.06250000 0.00000000 2.93750000 15.154701 Y2 Mean 1. GlM.0625 0.00000000 Mean Square 10. 115.0714 Std Error of Estimate 0.50 Pr> F 0.00000000 1.4701 Root MSE 1.00000000' F Value 20.00000000 Source Zl OF Type III SS 10. 40.2.00 Pr> F 0.00000000 46.00000000 Mean Square 10. and E.414214 Y1 Mean 5.ession model (see Panel? Y.00000000 Mean Square 40.74 Pr> ITI 0. t.09544512 0.0000 15.869565 Root MSE 1.36514837 PANEL 7.0625 0. . Tfor HO: Parameter = 0 0.00000000 Manova Test Criteria and Exact F Statistics for the Hypothesis of no Overall Zl Effect E = Error SS&CP Matrix H = Type 1/1 SS&CP Matrix for Zl S=l M=O N=O Value 0.89442719 0.1 = Y'Y  y'y = Y'Y .91 4. 28. augmented by observations on an additional response.2 .3450 0. input y1 y2 zl. infile 'Example 78 data.02011 j = 1.00000000 15..00000000 FValue 7.50 Pr> F 0.0000 15.00000000 .5 to two responses Y 1 and Yz using the data in Example? 3. model y1 y2 = zllss3. .00 Pr> F 0.
fJm)(P(k) .E(eo. The remainder of this section is devoted to brief discussions of inference for the normal theory multivariate mUltiple regression model.fJ(k»'ZO .·. n (r + 1) + m. Also.ZOPU» (1'Ok .(p(k) . From the covariance matrix for P (i) and P (k) ..) .prenhall.(Z'ZrIZ') = 0 E(/J) = fJ and Cov (p(i). The forecast error for the ith component of Vo is 1'Oi . .P(i))(fJ(k) . The forecast errors have covariances E(YOi . Once a multivariate multiple regression model has been fit to the data. • is an unbiased estiffiator zoP since E(zoP(i» = zoE(/J(i» = zofJ(i) for each component. that the model requires that the same predictor variables be used for all responses. for large samples. The multivariate mUltiple regression model poses no new computational squares (maximum likelihood) estimates. the estimation errors zofJ (i) .Z(Z'Zr IZ ') O"ik«Z'ZrIZ' . Extended accounts of these procedures appear in [2] and [18]. eoz.6 for the singleresponse model.zOI. The mean vectors and covariance matrices determined in Result 7. the hypothesis that the responses do not depend on Zq+l> Zq+z. The residual vectors [EjJ..ZOr]. .) = 0 and E( eOieok) = O"ik.fJ(i))) (eok . 8jZ.6. it should be subjected to the diagnostic checks described in Section 7.zOP(i) have covariances E[zo(fJ(i) . 8jm] can be examined for normality or outliers using the techniques in Section 4. i) = (27Trmn/2/i/n/2emn/2.. respectively.ZO(P(k) .fJ(k»')ZO Ho: fJ(Z) =0 where fJ = «rq)Xm) (737) fJ(Z) = O"ik(1 + zo(Z'Zr1zo) Setting Z = [ Note that E«PU) .zo(/J(i)  fJ(i) so E(1'Oi . (nX(q+ I» Zl ! i (nX(rq» Zz ].com/statistics) Result 7.··. Zz] fJ(2) = ZlfJ(l) + zzfJ(Z) . P(k» A fJ and fJ . and let the errors E have a normal distribution.·..zO(P(i) .. Yoml at Zoo According to the regression model.Z(Z'Zr1z'y (Z'ZrIZ'O"ikI(I ..Z{J)'(Y .10 provides additional zoP = [ZOP(l) 1 ZOP(2) 1..zofJ(i) + z'ofJU)  Vo ZOP(i) = eOi .r . we can write the general model as E(Y) = zfJ = [Zl i. Then is the maximum likelihood estimator of (Z'ZrIZ'E(E(i)E(k»)(I .has a normal distribution with l so each element of Pis uncorrelated with each of e. (See website: www.p(k»'zol = zo(E(fJ(i) .. is independent of the maximum likelihood estimator of the positive definite I given by 1 I = E'E = (V .J (I) lAA A A = U'ik(Z'Zr . becomes E(eo.10. Therefore. The related problem is that of forecasting a new observation vector = [Y(ll.fJ(i)eOk) = 0 since Pm = (Z'ZrIZ' E(i) + fJ(iJ is independelllt of EO.r .zoE«p(i) . .fJ(k»)'). . . Z. CoV(P(i).394 Chapter 7 Multivariate Linear Regression Models MuItivariate Multiple Regression 395 Dividing each entry E(i)E(k) of E' Eby n . Finally..p(i) = (Z'Zr1Z'Y(i)' are computed mdlVldually for each response variable. we obtain the unbiased estimator of I. . Let the multivariate multiple regression model in (723) hold with full rank (Z) = r + 1.fJ(i)eok) .ZOP(i» = E(eo. Collectively.zoE(PU) .ZOP(k» = = likelihood Ratio Tests for Regression Parameters The multiresponse analog of (712).9 enable us to obtain the sampling properties of the least squares predictors.zo/J(i) = YOi .P(i»)(fJ(k) . fJ and nJE'E are the maximum likelihood estimators of fJ and ::t. 1ZoP(m)] for using least squares estimates. Note. /J The maximized likelihood L (IL.. they have nearly the smallest possible variances.fJ(k») E(eoieod + zoE(PU) .:here the "new" error EO = [eOI. however. YOi = zofJ(i) + eOi . eo m ] is independent of the errors E and satIsfies E( eo.fJ(i) = 0. .zfJ) n n and ni is distributed as Wp •n..1. We first consider the problem of estimating the mean vector when the predictor variables have the values Zo = [l. O"ikZO(Z'Zr1zo (735) Comment. The mean of the ith response variable is zofJ(i)' and this is estimated by ZOP(I)' the ith component of the fitted regression relationship. Proof. Y oz .P(k»')ZO = When the errors are normally distributed.. indicating that ZOPU) is an unbiased predictor of YOi .Z(Z'Zr IZ ')] = = = Result 7. A similarresult holds for E(eoi(P(k) . Maximum likelihood estimators and their distributions can be obtained when the errors e have a normal distribution.E(k» = E[(Z'ZrIZ'EUJE{k)(I .
norol(I) independently of n(II which. In the context of Example 7.ZIP(I»)' (Y .11.!.rand n .28 toa chisquare percentage point with m(rl . Y = Zt/J(1) on the quantities involved in the + e and the likelihood ratio test of Ho is extra sum ofsquares and cross products f =: (Y  ZJJ(1»)'(Y .12 Equivalently. under the null hypothesis of no interaction terms. nI is distributed as Wp. where C is (r . then = (Z'Zrz'Y. by linear dependencies among the columns of Z.ZJJ(I» .75 More generally.2p X d has. Proof.ZIP(I»' From Result 7 .49.[n . (m 2 . gender.Zp). For the choices .1ter program provides ( residual sum of squares) = nI = [2977. of full rank r + 1 and (r + 1) + m:5 n. Wilks'lambda statistic A2/n = lId can be used. A compl. Let the errors e be normally Under Ho: /3(2) = 0. (Y . we shall illustrate the calculations involved in the test of Ho: /3(2) = 0 given in Result 7. (See Supplement 7A.72J and cross products 1021.r .rq(I).1  5  + 3 + 1)}n(.16 = 162.I)I = [18 .) id If Z is not of full rank.95 extra sum of squares) ( and cross products = n(I I _ i) = [441. Although the sample size n = 18 is not large. The likelihood ratio test of Ho is . is distributed as Wp.1  .5 the modified statistic . we test Ho by referring [nrll. the likelihood ratio.2 X 2 X 4 2417.(mrl+ql'+l)]ln( 2 InI + n(II . we could consider a null hypothesis of the form Ho: c/3 = r o.396 Chapter 7 Multivariate Linear Regression Models Multivariate Multiple Regression 397 Under Ho: /3(2) = 0.7) . can be expressed in terms of generallizec variances: Example 7. P = 2 response variables. For a model that includes d predictor variables counting the intercept. let 2lnA = nln ( lId III) = lnil nIn ln:£ + n(:£1 :£)1 I I ) lId For n large. Nevertheless. (residual sum of squares and cross products matrix) n Then.39 1021.r + q + 1) ] In ( id = .ql) = 2(2) = 4d. so AIC = n In (I I I) . proVIded that r replaced by rl and q + 1 by rank (ZI)' However.(Y . to rejecting Ho for large values of Let /3(2) be the matrix of interaction parameters for the two responses. The first servicequality index was introduced in Example 7. the multivariate mUltiple regression version of the Akaike's information criterion is AIC = n In(1 I) . A. Result 7.9. The interaction terms are not needed. both n .11.7605) = 3. (Z'Zr is the generalized inverse discussed in [22J.m should also be large to obtain a good chisquare applroxilnatlf P This criterion attempts to balance the generalized variance with the number of Models with smaller AIC's are preferable. in turn.9 (Testing the importance of additional predictors with a multivariate response) The service in three locations of a large restaurant chain was rated according to two measures of quality by male and female patrons. not all hypotheses concerning can be tested due to the lack of uniqueness in the identification of Pca.76 246.88]1) .5) remains the same for the tworesponse situation.5.28 < = 9.I) where P(1) = (ZlZlrIZ1Y and II = nI(Y .5 .16J 366. and the locationgender interaction on both servicequality indices.!.used.Zp) = n(II .10. we have n = 18. but has rank rl + 1. STechnicaUy. The design matrix (see Example 7. the gene:abzed allows all of the important MANOVA models to be analyzed as specIal cases of multivariate multiple regression model.6.!. Let the multivariate multiple regression model of (723) hold with.fSince3.16 246. to a close approximation. _ Information criterion are also available to aid in the selection of a simple but adequate multivariate mUltiple regresson model.72 2050. Setting a = .11. a chisquare distribution with mer .07 = 18 X In(20545. and d = 4 terms.q) dJ. Suppose we consider a regression model that allows for the effects of location. (See also Exerc!se 7.q).11 remain the same. we do not reject Ho at the 5% level.88 1267. We shall illustrate the test of no locationgender interaction In either response using Result 7.05.) distributional conclusions stated in Result 7.q) X (r + 1) and is of full rank (r .2 p X d = 181 n 18 1267.
The second prediction problem is concerned with forecasting new responses Vo = /3 ' Zo + EO at Z00 Here EO is independent of e.f. One problem is to predict the mean responses corresponding to fixed values Zo of the predictor variables. for example.(C(Z'ZrICT1(CjJ .r _ m Fm.) is the ith column of jJ and Uji is the ith diagonal element of i..) = ZOP(!) are ZOP(i) ± WIIks'lambda = PilIai's trace = HotellingLawley trace n s 1=1 i=1 1 + 1]i s ± 1 IEI 1. .r . (See [21] and [17].md. If there is a large discrepancy in the reported Pvalues for the four tests. Inferences about the mean responses can be made using the distribution theory in Result 7.11. or extra.jJ' zo)' ( nr:s.m Fm.=1 = tr[HEI] where p(. with normal errors e. (743) .a) % confidence ellipsoid for /3 ' Zo is provided by the inequality Ir 1 ( (739) (740) where Fm. where s = min (p. If the model is adequate. we report Wilks' lambda. (See. . Roy's greatest root. has been fit and checked for any inadequacies. Charts and tables of critical values are available for Roy's test.a)% simultaneous confidence intervals for E(Y.I) is distributed as Wrq(I) independently of I. = lE HI + 1].r . sum of squares and cross products matrix E = nI that results from fitting the full model. from the discussion of the T 2 statistic in Section 5. r . Popular computerpackage programs routinely calculate four multivariate test statistics.nrm(a) \j zo(Z'Zf Zo n _ r _ 1 Uii .) jJ'zo isdistributedas Nm(/3lzo.2. Equivalently. + EO is distributed as Nm(O.jJ)'zo 1]1 Roy's greatest root = 1+ 1]1 Roy's test selects the coefficient vector a so that the univariate Fstatistic based on a a ' Y.. so the 100(1 . The 100(1 .1 Other Multivariate Test Statistics Tests other than the likelihood ratio test have been proposed for testing Ho: /3(2) == 0 in the multivariate multiple regression model.. n 1 i)l (Yo . which is the likelihood ratio test. H = n(II .I) . Vo . Under the null hypothesis.I) = (CP . E be the p X P error. From this result. . (1 + zb(Z'Z)lzo)I) independently of ni.n(II . it can be employed for predictive purposes.nrm(a) \j (1 + zo(Z'Z)lZO) n _ r _ 1 Uii i=1.zo(Z'Z)lzoI) and nI is independently distributed as Wn . . ..I) The statistics can be defined in terms of E and H directly.2 •.m . 1 and the 100( 1 . When several of the eigenvalues 1]i are moder large. the statistic n(II .) Wilks' lambda. or in terms of the nonzero eigenvalues 7JI 1]2 . has its maximum possible value. n . This distribution theory can be employed to develop a test of Ho: c/3 = fo similar to the test discussed in Result 7.a) % simultaneous prediction intervals for the individual responses YOi are z'oP(i) ± I (n) \jl(m(nr1») n .fo).398 Chapter 7 Multivariate Linear Regression Models C Multivariate Multiple Regression 399 = [0 ill (rq)x(rq) and fo = 0. The p X P hypothesis.jJ' zo) (1 ] + zo(Z'Z)lzO) [( m(nr1») Fm nrm( a) nrm ' (742) The 100( 1 . or residual.. sum of squares and crossproducts matrix . It can be shown that the extra sum of squares and cross products generated by the hypothesis Ho is . In this text. Simulation studies suggest that its power will be best when there is only one large eigenvalue. [18]. Now.m (741) = tr[H(H + Efl] = 2: 7Ji . To connect with their output.q).jJ'zo = (/3 .fo) . Predictions from Multivariate Multiple Regressions Suppose the model Y = z/3 + e. we can write T2 = ( C .10. + I 1 (n \jl(m(nr1») n _ r .a)% prediction ellipsoid for Yo becomes (Vo . we introduce some alternative notation. the eigenvalues and vectors may lead to an interpretation..nrm( a) is the upper (100a)th percentile of an Fdistribution with m and . this null hypothesis becomes H[): c/3 = /3(2) == 0. Roy's test will perform poorly relative to the other three. 1]s of HEI . and the HotellingLawley trace test are nearly equivalent for large sample sizes. ) i = 1. they are the roots of I (II . Let.2. The definitions are • The unknown value of the regression function at Zo is /3 ' ZOo So.7JI I = O. we determine that the case considered earlier.
307.396. It is the prediction ellipse that is relevant to the determination of computer • requirements for a particular site with the given Zo.. 42J.67J. Computer calculations provide the fitted equation o Response I confidence and prediction ellipses for the computer data with two responses.97.34725.14. one variablefor example.349.IS.1] Figure 7.l7 n = 7..25z1 + 5. but is larger than the confidence ellipse. ZI. The extra width reflects the presence of the random error eo. corresponding to the ZI and Z2 values in that example were 340 ellipse yz = [301..97. . From Example 7.. and Fm. Measurements on the response Y z.2. we see that the only change required for the calculation of the 95% prediction ellipse is to replace zb(Z'Zrlzo = . the 95% prediction ellipse for Yb = [YOb YozJ is also centered at (151. 400 Chapter 7 Multivariate Linear Regression Models The Concept of Linear Regression 40 I Response 2 where Pc. with s = 1.08.151.328. we see that the prediction intervals for the actual values the response variables are wider than the corresponding intervals for the values.3(. and zb(Z'Zrlzo = .···.L = [zofJ(1) . zbP(2) = 14.25.34725 with = bo + bt Z l + .zbfJ(2) .151. Thus. + brZr = bo + b'Z (745) 6If l:zz is not of full rank.7.8. Its orientation and the lengths of the and minor axes can be determined from the eigenvalues and eigenvectors of Comparing (740) and (742). Yoz ] for a site with the configuration Zo = [1.5.369. Zkean be written lis a linear combination of the other Z. Partitioning J.13 zbfJ(2) . Z2.utt!rI'eQluirlemerit 360 dPrediction ellipse problem discussed in Example 7.362.812.4. Suppose all the variables Y.349..1.130.5]. 5.uyzJ (.30Jl [zofJ(1) .5) = 349. .). Z2.17](4) 5. Thus. and m = 2. . Consider the problem of predicting Yusing the linear predictor with F2.5. a 95% confIdence ellIpse .1. aii.17).349.14 + 2. ..34725 = [8. not necessarily I . f rom zofJ(2) (740). with mean vector J.· 380 Example 7. input/output capacity.97J 13.5 95% Obtain the 95% confidence ellipse for 13' Zo and the 95% prediction ellipse 'for Yb = [YOl . paring (741) and (743).34725) (744) Izz can be taken to have full rank. the set J. zbP(l) = 151.6."" Zr' The regression model that we have considered treats Y as a random variable whose mean depends uponjixed values of the z. for p a' Zo = [zbfJ(1)J' r = 2.2.10 (Constructing a confidence ellipse and a prediction ellipse for responses) A second response variable was measured for the cOlmp. f3J. Both ellipses are sketched in Figure 7.17 and 7.97J 349.67(7. .8 The Concept of Linear Regression The classical linear regression model is concerned with the association between a single dependent variable Yand a collection of predictor variables ZI.97. Z may be replaced by any subset of components whose covariance matrix has the same rank as l:zz· ..4. P(2) p(1) h = 14.55. f3r.25(130) + 5. we write Since P' Zo = a' Zo = z' a 1"(2) 01"(2) = [151.nrm(a) are the same quantities appearing in (741).42. We find that 1 + zb(Z'Z)I Z0 = 1.L and covariance matrix (r+l)Xl (r+l)X(r+l) and in an obvious fashion.17 $ (rXl) 6 and I = [ :'] Uyy : UZy (IXl) : (1Xr) with UZy = [uYZ"uYZz.'s.L normal.05) = 9. Zr are random and have a joint distribution.97.s and thus is redundant in forming the linear regression function Z' p_ That is.229. .14 + 2. This mean is assumed to be a linear function of the regression coefficients f30.67zz = [14.6. The linear regression model also arises in a different setting. This ellipse is centered at (151.349.17).
z1zuzy = p.y . unlike other correlation coefficients. . I Uyy .y . Next. Corr(Y.!Tyy !Tyy (749) = If phz) = 0. = bo + b'Z + (/LY  b' p.b so the best linear predictor is f30 Employing the extended CauchySchwartz inequality of (249) with B = l. we obtain !Tyy  = 10  [1.(p./Ly .phz».bo ./30 . f30 + P'Z = /Ly + .z py(Z) = + (748) has minimum mean square among all linear predictors of the response Y.uzyl.p' P. From Result 7. and the multiple correlation coefficient) Given the mean vector and covariance matrix of Y. Adding and subtracting we obtain E(Y . First. Note that.z) i + (p. the multiple correlation coefficient is a positive square root.2Z2. and (c) the multiple correlation coefficient. It is possible to express the "optimal" linear predictor in terms of these latter quantities.Z»2 = Uyy .p.bo . Also.b'Z)2 Now the mean square error depends on the joint distribution of Y and Z only through the parameters p.P'p.12..y») = /Tyy + b'Izzb + (/Ly .Y)f:s.. its mean square error. 2_ [b'uZy)2 [Corr(bo+bZ. .bo .d .z) is the linear predictor having maxi The square of the population mUltiple correlation coefficient.p.[1.bo  b' p./Ly .1] [_:: = 10  3 = 7 .p. and I.b' p.z)f E(Y . phz) = 1 implies that Y can be predicted with no error.bo .Z2. verify that the mean square error equals !Tyy(1 .11 (Determining the best linear predictor./Tyy(b'Izzb)' p = f30 = G + p'Z = [:: = = p.b' p.12.zf .:. The mean square error is forallbo. the error in the prediction of Y is prediction error or [Corr(bo =Y .p.b'p. the mean square error in using f30 + p'Z to forecast Yis .(b'Z . we note that Cov(bo + b'Z. Result 1.Uz y.ZIZUZy = UZyp = = p'l. 1. 2{ ] = 3 =3 + Zl .Y)] .zz(b . PY(Z) :s.402 Chapter 7 Multivariate Linear Regression Models The Concept of Linear Regression 403 For a given predictor of the form of (745).zzp· • The correlation between Yand its best linear predictor is called the population mUltiple correlation coefficient + /3' Z with /3 = /30 = /Ly .bo . The mean square error is minimized by taking b = l.b' p. At the other extreme.p'Z)2 = E(Y . The linear predictor /30 with equality for b = = p. b'zf = /Tyy .bo .ho .p. making the last term zero. + (/LY . ... and then choosing bo = /Ly .2E[b'(Z .b'Z)2 = = E[Y .z) . so 0 :s.y .phz» = !Tyy . it is customary to select bo and b to minimize the mean square error = E(Y . Writing bo + b'Z + b'Z) Uyy = !Tyy(1 .z = 5 . we get E(Y .2b' UZy determine (a) the best linear predictor f30 + f3 1Z1 + f32Z2. that is. (b) its mean square error. Y) = Cov(b'Z. is called the population coefficient of determination.z) + (p. Example 7.blZI . The alternative expression for the maximum correlation follows from the equation UZyl.bo .z). phz). ZI. Uyy Because this error is random.brZr =Y .y ./3o + /3'Z) = /3'I zz /3 /Tyy Proof. there is no predictive power in Z.z)(Y ./Ld + E(b' (Z .zz.b'Z + b'Z. Its mean square error is E(Y . The population coefficient of determination has an important interpretation. The minimum mean square error is thus Uyy .(IZ1Zuzy)' p'z = f30 to make the third term zero.z? + (b .b'p. Y) = b'uzy so . )'l.zzuzy mum correlation with Y. Also.
..1 . Fortunately.. Zr) is the best linear predictor of Y when the population is N r + 1(/L.12 (Maximum likelihood estimate of the regression functionsingle response) For the computer data of Example 7. this wider optimality among all estimators is possessed by the linear predictor when the population is normal. from Result 7. the n = 7 observations on Y (CPU time).). predicts Y with the smallest mean square error./J.6) is for + (TZyl.548 10 Po + P'z = y + . That is.zf. in order to obtain the unbiased estimator E(Y/z 1 .. Suppose the joint distribution of Yand Z is Nr+1(/L. [See (420).). Specifically. The restriction to linear predictors is closely connected to the assumption of normality.. The conditional expectation of Y in (751) is called the regression function./J' Z f is n .ZIZ(Z .Z = E(Y . f30 = JLy ..l.. Z2. Zr fixed (see Result 4. • (750) and the maximum likelihood estimator of the mean square error E[ Y .'''' Zr) = JLy + .12. if we take + /J'z = JLy + . l. Z2.1 CTyy·Z = (Syy . (p.(TZyl."" Zr). X) the conclusions follow upon substitution of the maximum likelihood estimators then the conditional distribution of Y with Z I.?hz) = 10(1  fo) = 7 is the mean square error. it is linear. Then the maximum likelihood estimators of the coefficients in the linear predictor are [i] Po = y .404 Chapter 7 Multivariate Linear Regression Models The Concept of Linear Regression 405 and the multiple correlation coefficient is PY(Z) Consequently. We use Result 4.ibulod" N. 2 f30 .1 (T Zy zz Zy CTyy = . For normal populations. it can be shown (see [22]) that E(Y / Z]. Nevertheless. ..Z) Note that CTyy(1  .12. Zz..13. and Z2 (adddelete items) give the sampJe mean vector and sample covariance matrix: = [¥J P= and S = P'Z # be the sample mean vector and sample covariance matrix. Z2.. whatever its form. CTyy.". f30 where Pyy is the upperlefthand corner of the inverse of the correlation matrix determined from l. for a random sample of size n from this population./Lz) . Zz.5) that 1 PY(Z) = Pyy 2 n 1 Proof.11 and the invariance property of maximum likelihood estimators. Let ) (Syy ( _n__1_ nr. Result T. = .] Since. and mean square error = CTyy·Z = CTyy [1:1 N(JLy to be d. CTyy . =y  s ] .JLz).SZySZZSZY) It is possible to show (see Exercise 7. the maximum likelihood estimator of the linear regression function is = (T' l.f30 . Zr) need not be of the form f30 + /J'z./J'Zj) 1 (752) Example T.(r + 1) in the estimator of the mean square error.. .Zlz(TZY) The mean of this conditional distrioution is the linear predictor in Result 7.1  2: (If = j=t n nr A. When the population is not normal.JLz) (751) = f30 + fJ'z and we conclude that E(Y /Z].. . ZI (orders). • It is customary to change the divisor from n to n .6.f30 . respectively. the regression function E(Y / Zt.···.
YmJ'.1 j=l 2: (Y n J Po . Ym is almost immediate.Py . they must be estimated from a random sample in order to construct the multivariate linear predictor and determine expected prediction errors.983 E(Y  + = py + . obtain the estimated regression function and the estimated mean square error. z" is called the multivariate regression of the vector Y on Z.421 ] (754) = 150.fJZ) (Y .Pz) (753) It can be shown that an unbiased estimator of I yy.006422J [418. The regression function and the covariance matrix for the prediction errors follow from Result 4.···.1 (Syy _·SYZSZlZSZY) = 'This conditional expected value.pz)J [Y .763J) .6.r .fJz J) .44 . .13 gives the maximum likelihood estimates The error of prediction vector Y .. •• . Zr of the predictor variables.079.yy .763J = [1. Zz.plZ = 150..[418. which minimizes the mean square error for the prediction of Yi. fJ = = Y (rXI) is distributed as Nm+r(p.406 Chapter 7 Multivariate Linear Regression Models The Concept of Linear Regression 407 Assuming that Y. Zl> and Z2 are jointly normal.and 'l:.079J . given the fixed values Zl> Z2. '  Po .Pz) = I yy .'l:. ..086404 35. the first component of the conditional mean vector is /LYl + .IyzIzIZIzy = .py .yy . Zr). are typically unknown.003128 _ .894 • Based on a random sample of size n.983J [ Po . . zrJ = py + . Zz.z = I yy .yy·z = E[Y .42Z2 Because P.1.Po .yz)' + = 'l:. is E(Y IZl> Zz.0Bz1 + . Suppose Yand Z are jointly distributed as Nm+r(p. Result 7. It is composed of m univariate regressions.yz'l:.Pz) has the expected squares and crossproducts matrix 'l:.44 = 8.Pz) = E(Y11 Zl. . . 35.142.086404 35.z is n .420 [1. 'l:.006422J [41B.PZ)J' = 'l:.Pz) The expected squares and crossproducts matrix for the errors is = (%) (467. Suppose Po + pz = Y + and the maximum likelihood estimator of I yy·z is Z) with l l (mXI) I yy. P= Sl = [ .1 ) ( n .zlz(z . .42 . The m X r matrix = 'l:. the maximum likelihood estimator of the regression function is Prediction of Several Variables The extension of the previous results to the prediction of several responses Y h Y2 .z = (n : 1) (Syy .fJIzzfJ' we deduce the maximum likelihood statements from the invariance property (see (420)J of maximum likelihood estimators upon substitution of By Result 4. ) fi'Z = 8.913 . p 1 n ..r ./Ly .006422 .6. .zlz is called the matrix of regression coefficients. For instance.yz'l:. . We present this extension for normal populations..019 and the estimated regression function fio + . . ' I (755) .) Proof. considered as a function of Zl. the conditional expectation of [Yl> Y2.I)..zIz('l:. Using the relationships Po I yy·z = py . Result 7.···.fJz J) (Y J .14.yy 'l:. Po + fJ z = py + Iyz'l:.fJZ)' = Iyy.983 . Then the regression of the vector Y on Z is The maximum likelihood estimate of the mean square error arising from the prediction of Y with this regression function is n ( n 1) ( Syy  Szy ZZSZy I Sl Po + fJz = py .420J Po = y . .763.
42Z2 Y1  /LY l .763 3072. 1'2. .079(ZI . the second estimated regression function. The same values.42z2 .034 28. with smaller error than the second response. eliminating ZI> (756) '7..200 35.913 1148.034 13.79 + 2.4911 418.893 indicates that overprediction (underprediction) of CPU time tends to be accompanied by overprediction (underprediction) of disk capacity.983 1008. 150.:.:Iyz:Izlz:IZY' The corresponding sample partial cor. + f3rl zr f3r2 zr + Assuming normality.42 + 1. = ( obtained from using the best linear predictors to predict Y1 and 1'2.536 3072.665 (Z2 .42 + 1.006422J . ll.420( Z2 .003128 . r vayly!'z vaY2Y f Z 35. Their correlation.14 states that the assumption of a joint normal distribution for the whole collection ll.67z2.547) Similarly. we find that the estimated regression function is Ym = + + .420(Z2 . Similarly. IS the same as that given in Example 7. 6) ([ 467. ZI.893J 1.913 1148.086404 35. The are estimates of the (i.130. The positive covariance .042J [..657 35.079( Zl .547) Thus. k )th entry of the regression coefficient matrix p = for i. 1.. k .254 (ZI .44 + 1.Zr' Z2""'Z" We define the partial correlation coefficient between by PY l Y 2' Z II and Y2 . determined from the error covariance matrix :Iyy·z = :Iyy . Result 7. we have The Concept of Linear Regression 409 The first estimated regression function..13014) + 5. ZI = orders.24) + .5581 28.983J [ .976J) 140. 8.67z2 [1."" Zr leads to the prediction equations t '" + and S = lSzy 1 Szz = r 467.976 140.44J [418. Z2.763 _ [418.10.3.25z 1 + 5.205 .3. 2. k)th entry in the matrix :Iyy·z = :Iyy ./Lz) /LY2  1'2  .043 1. For Y1 = CPU time.086404 130.547)J Partial Correlation Coefficient Consider the pair of errors = 8.13014) + .763 .3. Z2"".44J = [ 327.08z1 + .894.relation coefficient is (757) 7 6) [1.08z 1 + .763 1008. . 983 1 YI = = + f3llZ1 f312Z1 + .13 (M aximum likelihood estimates of the regression functionstwo responses) We return to the computer data given in Examples 7. the best predictor of Y2 is 14.006422 = ( where aYiYk'Z is the (i.12 for the singlerespons.6 and 7. .558 .408 Chapter 7 Multivariate Linear Regression Models Example 1. + Po + /Jz = y+  z) We note the following: 150.. measures the association between Y1 and Y2 after eliminating the effects of ZI. are the same as those in Example 7. Y"" ZI. Y2. [ZI Z2  .976 140. Comment.572 = . 150.558 X [ 1../Lz) The maximum likelihood estimate of the expected squared errors and crossproducts matrix :Iyy·z is 'given by (n : 1) (Syy . We see that the data enable us to predict the first response.894 .24J 3. .893 2.983 140.. and Z2 = adddelete items.79 + 1008.983J = [ 327. Z2.536} 1148.e case. + + .14 + 2. Y2 = disk 110 capacity.14 + 2..042 2. and the associated mean square error. 14.006422J [418..006422 .547 We conclude this discussion of the regression problem by introducing one further correlation coefficient.10.763 35.556/ 418. the minimum mean square error predictor of l'! is.25z1 + 5.003128 .9761 377.'''' Zr are used to predict each Yj.491 l008.558 = • r.
410 Chapter 7 Multivariate Linear Regression Models with Sy;y.·z the (i,k)th element ofS yy  SYZSZ'zSzy.Assuming that Y and Z have a joint multivariate normal distribution, we find that the sample partial correlation coefficient in (757) is the maximum likelihood estimator of the partial correlation coefficient in (756).
Example 7.14 (Calculating a partial correlation) From the computer data Example 7.13,
Comparing the Tho Formulations of the Regression Model 41 I with f3. = f30 + f311.1 + ... + f3rzr. The mean corrected design matrix corresponding to the reparameterization in (759) is
z<{
2: 1(Zji j=l
n
Zll Z21 
Zl ZI ... ZZr  Zr
'"
Znl  Zl
Znr  zr
"'"J
Syy  SyzSzzSZy Therefore,
1
_
[1.043 1.042J 1.042 2.572
where the last r columns are each perpendicular to the first column, since
z;) = 0,
i = 1,2, ... ,r
Further, setting Zc = [1/ Zd with = 0, we obtain Calculating the ordinary correlation coefficient, we obtain rYl Y 2 = .96. Comparing the two correlation coefficients, we see that the association between Y1 and Y2 has been sharply reduced after eliminating the effects of the variables Z on both responses.
z'z
c c
= [ 1'1
l'ZczJ =
[n 0
0'
]
•
so
7.9 Comparing the Two Formulations of the Regression Model
In Sections 7.2 and 7.7, we presented the multiple regression models for one and several response variables, respectively. In these treatments, the predictor variables had fixed values Zj at the jth trial. Alternatively, we can startas in Section 7.8with a set of variables that have a joint normal distribution. The process of conditioning on one subset of variables in order to predict values of the other set leads to a conditional expectation that is a multiple regression model. The two approaches to multiple regression are related. To show this relationship explicitly, we introduce two minor variants of the regression model formulation.
(760)
Mean Corrected Form of the Regression Model
For any response variable Y, the multiple regression model asserts that
That is, t.!I e regression coefficients [f3h f3z, ... , f3r J' are unbiasedly estimated by and f3. is estimated by y. Because the definitions f31> f3z, ..• , f3r remain unchanged by the reparameterization in (759), their best estimates computed from the design matrix Zc are exactly the same as the best estimates computed from the design matrix Z. Thus, setting = [Ph PZ, ... , Pr J, the linear predictor of Y can be written as (761)
The predictor variables can be "centered" by subtracting their means. For instance, f31Z1j = f31(Z'j  1.,) + f3,1.1 and we can write
with(z  z) = [Zl  1.bZZ  zz"",Zr  zr]'.Finally, Var(P.) [ Cov(Pc,
lj
=
(f3o + f3,1., + .. , + f3r1. r) + f3'(Z'j ., 1.,) + ... + f3r(Zrj  1.r) + Sj
= f3.
+ f3,(z'j
 1.,)
+ ... + f3r(Zrj
 1.r)
+ Sj
P.)
(762)
412 Chapter 7 Multivariate Linear Regression Models the same mean t The multivariate multiple regression model yields Commen. . f h corrected design matrix for each response. The least squares estImates 0 t e coeffi· cient vectors for the ith response are given by
Multiple Regression Models with Time Dependent Errors 413 Although the two formulations of the linear prediction problem yield the same predictor equations, conceptually they are quite different. For the model in (73) or (723), the values of the input variables are assumed to be set by the experimenter. In the conditional mean model of (751) or (753), the values of the predictor variables are random variables that are observed along with the values of the response variable(s). The assumptions underlying the second approach are more stringent, but they yield an optimal predictor among all choices, rather than merely among linear predictors. We close by noting that the multivariate regression calculations in either case can be couched in terms of the sample mean vectors y and z and the sample sums of squares and crossproducts:
P
A
(i)
Y{i) ] = Y{iJ
[
i
'
= 1,2, ... ,m
Sometimes, for even further numerical stability, "standardized" input variables slope
(ZI' .' _ Z.)2 (Zji _ )/ Zi VI £.i ,
= (z.· . I"
z·)/'V(n  J)sz.z·
I I
are used. In this case, the
1) SZiZ;,
The least squares estimates ofthe beta coefficients' f3; 11; = /3.; Y n  1) i = 1,2, ... , r. These relationships hold for each response In the multIvanate mUltIple regression situation as well.
f3i in the regression model are by = Y (n 
Relating the Formulations
. bl es Y ,), Z Z2,"" Zr areJ'ointlynormal, the estimated predictor of Y Wh en th evana (see Result 7.13) is + jrz = y +  z) = [Ly +  p;z) (764)
A
where the estimation procedure leads naturally to the of centered Recall from the mean corrected form of the regreSSIOn model that the best lm· ear predictor of Y [see (761)] is y = +  z)
WI
A
z/s.
This is the only information necessary to compute the estimated regression coefficients and their estimated covariances. Of course, an important part of regression analysis is model checking. This requires the residuals (errors), which must be calculated using all the original data.
·th {3A • = a' 'z (Z' Z )1 . Comparing (761) and (764), we see that y and Pc  Y c2 c2 c2 7 {3. = y = {3o and Pc = P smce = (765)
_ A , ' .
7.10 Multiple Regression Models with Time Dependent Errors
For data collected over time, observations in different time periods are often related, or autocorrelated. Consequently, in a regression context, the observations on the dependent variable or, equivalently, the errors, cannot be independent. As indicated in our discussion of dependence in Section 5.8, time dependence in the observations can invalidate inferences made using the usual independence assumption. Similarly, inferences in regression can be misleading when regression models are fit to time ordered data and the standard regression assumptions are used. This issue is important so, in the example that follows, we not only show how to detect the presence of time dependence, but also how to incorporate this dependence into the multiple regression model.
Example 7.16 (Incorporating time dependent errors in a regression model) power companies must have enough natural gas to heat all of their customers' homes and businesses, particularly during the cold est days of the year. A major component of the planning process is a forecasting exercise based on a model relating the sendouts of natural gas to factors, like temperature, that clearly have some relationship to the amount of gas consumed. More gas is required on cold days. Rather than use the daily average temperature, it is customary to nse degree heating days
Therefore, both the normal theory conditional and the classical regression model approaches yield exactly the same linear predIctors. . A similar argument indicates that the best linear predictors of the responses m the two multivariate multiple regression setups are also exactly the same.
predictor) The . I e V CPU tinIe were analyzed m ExanIple 7.6 USIng the classlcallinthe smg e respons 'I . . 12' ear regression model. The same data were analyzed agam In Example 7.. ' assuIIUD? . bl es Y1> Z I, and Z2 were J' oindy normal so that the best predIctor edict of Y1 IS tha t the vana . the conditional mean of Yi given ZI and Z2' Both approaches YIelded the same pr or,
Example 7./5 (Two approaches yield the same
y=
y'Zc2
8.42
+ l.08z1 + .42Z2
•
7The identify in (7·65) is established by writing y = (y  jil)
= (y 
jil)'Zc2
+ jil so that + jil'Zc2 = (y  jil)'Zc2 + 0' = (y  jil)'Zc2
Consequently,
= (y 
jil)'ZdZ;2Zd'
= (n 
l)s'zy[(n  l) S
zzr' = SZySZ'Z
416 Chapter 7 Multivariate Linear Regression Models
Multiple Regression Models with Time Dependent Errors 417
When modeling relationships using time ordered data, regression models with noise structures that allow for the time dependence are often useful. Modern software packages, like SAS, allow the analyst to easily fit these expanded models.
Autocorrelation Plot of Residuals Lag 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Covariance 228.894 18.194945 2.763255 5.038727 44.059835 29.118892 36.904291 33.008858 15.424015 25.379057 12.890888 12.777280 24.825623 2.970197 24.150168 31.407314 Correlation 1.00000 0.07949 0.01207 0.02201 0.19249 0.12722 0.16123 0.14421 0.06738 0.11088 0.05632 0.05582 0.10846 0.01298 0.10551 0.13721 1 9 8 7 6 543 2
I I
o1
1** I I
234 5 6 7 891
1*******************1
I
I I I I I
PANEL 7.3
SAS ANALYSIS FOR EXAMPLE 7.16 USING PROC ARIMA
1**** .
data a; infile 'T7 4.d at'; time =_n...; input obsend dhd dhdlag wind xweekend; proc arima data = a; identify var = obsend crosscor dhd dhdlag wind xweekend ); estimate p = (1 7) method = ml input = ( dhd dhdlag wind xweekend ) plot; estimate p = (1 7) noconstant method = ml input = ( dhd dhdlag wind xweekend ) plot;
I
I I I I I I I
=(
PROGRAM COMMANDS
*** I 1*** 1*** *1 **1 *1 *1 **1 I 1** . *** I
I I I I I I I I I I I I I I I
" ." marks two standard errors
ARIMA Procedure Maximum Likelihood Estimation Approx. Std Error 13.12340 0.11779 0.11528 0.24047 0.24932 0.44681 6.03445 Lag 0 7 0 0 0 0 OUTPUT
Parameter MU AR1,l AR1,2 NUMl NUM2 NUM3 NUM4 Constant Estimate
EstimatEl! 2.12957 . 0.4700/,1 0.23986 5.80976 1.42632 1.20740 10.10890 0.61770069 228.89402.8\ 15.1292441 528.490321 543.492264 63
T Ratio 0.16 3.99 2.08 24.16 5.72 2.70 1.68
Variable OBSENO OBSENO OBSEND DHO OHDLAG WIND XWEEKEND
Shift 0 0 0 0 0 0 0
I
Variance Estimate
Std Error Estimate AIC SBC Number of Residuals
=
Autocorrelation Check of Residuals To Lag 6 12 18 24 Chi Square 6.04 10.27 15.92 23.44 Autocorrelations OF 4 10 16 22
0:1:961 0;4#"
Probe 0.079 0.144 0.013 0.018 0.012 0.067 0.106 0.004 0.022 0.111 0.137 0.250 0.192 0.056 0.170 0.080 0.127 0.056 0.079 0.069
0.161 0.108 0.018 0.051
The Distribution of the Likelihood Ratio for the Multivariate Multiple Regression Model 419
Supplement
and the eigenvalues of Zl(ZlZd1Z; are 0 or 1. Moreover, tr(Zl(Z;Zlr1Z l)
= tr«ZiZlrIZiZI) =
1 ) == q + 1 = Al + A2 + ... + A +1> where (q+I)X(q+l) q Al :2! A2 :2! '" :2! Aq+1 > 0 are the eigenvalues of Zj (ZiZlr1Zi. This shows that Zj(ZlZjrlZl has q + 1 eigenvalues equal to 1. Now, (Zj(ZiZlrIZi)ZI == Zt> so any linear combination Zlb c of unit length is an eigenvector corresponding to the eigenvalue 1. The orthonormal vectors gc, e = 1,2, ... , q + 1, are therefore eigenvectors of ZI(ZiZlrIZl, since they are formed by taking particular linear combinations of the of Zl' By the spectral decomposition (216), we have tr( Zl(ZiZlflZi =
C=l
2: gcge. Similarly, by writing (Z (Z' ZrIZ') Z =
r+l C=1
Z, we readily see
THE DISTRIBUTION OF THE LIKELIHOOD
RATIO FOR THE MULTIVARIATE MULTIPLE REGRESSION MODEL
The development in this supplement establishes Result 7.1l. We know that nI == Y'(I  Z(Z'ZfIZ')Y and under Ho, nil == Y'[I  Zl(ZiZlr1zUY with Y == zd3(1) + e. Set P == [I  Z(Z'Zf1Z'). Since 0 = [I  Z(Z'ZfIZ')Z = [I  Z(Z'ZrIZ'j[ZI i Zz) = [PZ I i PZ 2) the columns of Z are perpendicular to P. Thus, we can write nI
that the linear combination Zb c == gc, for example, is an eigenvector of Z (Z'Z flZ' with eigenvalue A = 1, so that Z (Z'Zr1Z' ==
2: gcge.
es e>
Continuing; we have PZ == [I  Z(Z'ZrIZ')Z = Z  Z == 0 so gc = Zb c, r + 1, are eigenvectors of P with eigenvalues A = O. Also, from the way the ge, r + 1, were constructed, Z'gc = 0, so that Pg e = gc. Consequently, these gc's are eigenvectors of P corresponding to the n  r  1 unit eigenvalues. By the spec
tral decomposition (216),P = nI = E'PE =
2: gegc and (=r+2
l=r+2
n
:±
_
(E'gc)(E'gc)' =
.
C=r+2
:±
VcVe
= (z/3 + e),P(Z/3 + e) = e'pe
+ e)'PI(Zd3(J) + e)
=
where, because Cov(Vei , ljk) = E(geE(i)l'(k)gj) = O"ikgegj = 0, e oF j, the e'ge = Vc = [VC1,"" VCi ,";" VcmJ' are independently distributed as Nm(O, I). Consequently, by (422), nI is distributed as Wp,nrl(I). In the same manner,
nil = (ZI/3(i)
E'PIE
P
where PI = 1  ZI(ZiZlfIZj. We then use the GramSchmidt process (see Result 2A.3) to construct the orthonormal vectors (gl' gz,···, gq+l) == G from the columns of ZI' Then we continue, obtaining the orthonormal set·from [G, Z2l, and finally complete the set to n dimensions by constructing an arbitrary orthonormal set of n  r  1 vectors orthogonal to the previous vectors. Consequently, we have
19C 
{gC e> q + 1
0
es
q + 1
so PI =
(;q+2
2:
n
ge gc· We can write the extra sum of squares and cross products as
,...

"
r+1

gl,gZ, ... ,gq+l> gq+Z,gq+3,···,gr+I' gr+Z,gr+3,···,gn
r
n(I 1
I) = E'(P1
P)E =
f=q+2
2:
r+l
(E'ge) (E'ge)' ==
e=q+2
2:
VeVc
from columns from columns of Zz arbitrary set of of ZI but perpendicular orthonormal to columns of Z I vectors orthogonal to columns of Z Let (A, e) be an eigenvalueeigenvector pair of Zl(ZiZd1Zl' Then, since [Zl(ZlZdlZ1J[Zl(ZlZdlZll == ZI(Z;ZdIZl, it follows that 2 Ae = Zl(Zi Z lf1Z;e = (ZI(ZlZlrIZl/e == A(ZI(ZlZdIZDe == A e
L
where the Ve are independently distributed as Nm(O, I). By (422), n(I 1  i) is since n(I 1  i) involves a different distributed as Wp,r_q(I) independently of set of independent Vc's.
ni,
+ q + 1) ]In (/i II/I 1 /) follows from Result 5.2, with P  Po = m(m + 1)/2 + mer + 1)  m(m + 1)/2 m(q + 1) = mer  q) dJ. The use of (n  r  1   r + q + 1) instead
of n in the statistic is due to Bartlett [4J following Box [7J, and it improves the chisquare approximation.
The large sample distribution for [ n  r  1  (m  r
418
420
Chapter 7 Multivariate Linear Regression Models
1.6.
Exercises 421
(Generalized inverse of Z'Z) A matrix (Z'Zr is caJled a generalized inverse of Z'Z if ':' z'z. Let rl + 1 = rank(Z) and suppose Al ;:" A2 ;:" ... ;:" Aq + 1 > 0 are the nonzero elgenvalues of Z'Z with corresponding eigenvectors el, e2,"" e'I+I' (a) Show that ',+1 = "I:' I I I ( Z'Z)./
Exercises
7.1. Given the data
ZI
z'z (Z'Z)Z'Z
I
10
5
7
19
11
8
9325713
;=1
fit the linear regression model lj =)3 0 + f3IZjl + Bj, j = 1,2, ... ,6. Specifically, calculate the least squares estimates /3, the fitted values y, the residuals E, and the . residual sum of squares, E' E. .
is a generalized inverse of Z'Z. (b) The coefficients that minimize the sum of squared errors (y  ZP)'(y  ZP) satisfy normal (Z'Z)P = Z'y. Show that these equations are satisfied for any P such that ZP is the projection of y on the columns of Z. (c) Show that ZP = Z(Z'Z)Z'y is the projection ofy on the columns of Z. (See Footnote 2 in this chapter.) (d) Show directly that = (Z'ZrZ'y is a solution to the normal equations (Z'Z)[(Z'Z)Z'y) = Z'y.
Hint: (b) If is the projection, then y is perpendicular to the columns of Z. (d) The eigenvalueeigenvector requirement implies that (Z'Z)(Ai1ej) = e;for i rl + 1 and 0 = ei(Z'Z)ej for i > rl + 1. Therefore, (Z'Z) (Ai1ej)eiZ'= ejeiZ'. Summing over i gives
P
7.2.
Given the data
ZI
Z2
10 2
5 3 9
7 3 3
19 6 25
11 7
18
9
13
y
15
7
P
fit the regression model
Yj
= {3IZjl
ZP
ZP
+ {32Zj2 + ej'
j = 1,2, ... ,6.
to the standardized form (see page 412) of the variables y, ZI, and Z2' From this fit,deduce the corresponding fitted regression equation for the original (not standardized) variables.
7.3. (Weighted least squares estimators.) Let
(nXI)
(Z'Z)(Z'Z)Z'
= Z'Z (
=
l=l
',+1
Aileiei Z'
)
y
=
(/lX('+I)) ((,+1)XI)
Z
/3
+
(nXI)
E
rl+l) ( eiej Z'
=
1=1
(r+1
) eie; Z' = IZ'
=
Z'
where E ( e) = 0 but E ( EE') = 0'2 V, with V (n X n) known and positive definite. For V of full rank, show that the weighted least squares estimator is
7.7.
since e;Z' = 0 for i > rl + 1. Suppose the classical regression model is, with rank (Z)
(nXI)
=r+
P(2)
1, written as
Pw = (Z'VIZrIZ'VIy
If (T2 is unknown, it may be estimated, unbiasedly, by
y
=
ZI P(1) (/lX(q+I)) ((q+I)XI)
+
(nX(,q))
((rq)xJl
+ e
(nXI)
(n  r  lr l x (y  ZPw),VI(y 
ZPw).
Hint: V I/ 2y = (V I/ 2Z)/3 + V I/2e is of the classical linear regression form y* = " I Z*p + e*,withE(e*) = OandE(e*E*') =O' 2I.Thus,/3w = /3* = (Z*Z*) Z*'Y*.
where rank(ZI) q + 1. and = r  q. If the parameters P(2) are identified beforehand as bemg ofpnmary mterest,show that a 100(1  a)% confidence region for P(2) is given by
:=
(P(2)  P(2))' [ZZZ2  ZzZI(Zj Z lr Zj Z 2] (P(2)  P(2)
Hint: By ExerCise 4.12, with 1 's and 2's interchanged,
1
=
q)F,q,/lrl(a)
Use the weighted least squares estimator in Exercise 7.3 to derive an expression for the estimate of the slope f3 in the model lj = f3Zj + ej' j = 1,2, ... ,n, when (a) Var (Ej) = (T2, (b) Var(e) = O' 2Zj, and (c) Var(ej) = O'2z;' Comment on tQe manner in which the unequal variances for the errors influence the optimal choice of f3 w·
C
22
= [ZZZ2 
l Zz ZI(ZjZIlI Z ;Z2r ,
where (Z'Z)I
7.S.
Establish (750): phz) = 1  I/pYY. Hint: From (749) and Exercise 4.11 Ilzzl (O'yy  III Ilzz I Uyy IIzzluyy yy From Result 2A.8(c),u YY = IIzz IIII I, where O' is theentry.ofl I in the first row and first column. Since (see Exercise 2.23) p = V I/2l VI/2 and pI = (V I/ 2I V I/ 2fl = VI/2IIVI/2, the entry in the (1,1) position of pI is Pyy = O' yy (Tyy. . 1 PY(Z)
2
=
(Tyy  O'yy
Multiply by the squareroot matrix (C 22 rI/2, and conclude that (C 22 )If2(P(2)  P(2)1(T2 is N(O, I), so that l (P(2)  p(2)),( C22 (p(2)  P(2)
r
=
7.S.
Recall that the hat matrix is defined by H = Z (Z'Z)_I Z ' with diagonal elements h jj • (a) Show that H is an idempotent matrix. [See Result 7.1 and (76).) (b) Show that 0 < h jj < 1, j
=
1,2, ... , n, and that
number of independent variables in the regression model. (In fact, (lln) h jj < 1.)
j=1
2: h jj =
n
r + 1, where r is the
422 Chapter 7 Multivariate Linear Regression Models
Exercises 423
(c) Verify, for the simple linear regression model with one independent variable the leverage, hji' is given by
z, that
7.13. The test scores for college students described in Example 5.5 have
Z
=
ZI] [
Z3
=
[527.74] 54.69, 25.13
S ;,
569134 600.51 [ 217.25
126.05 2337 23.11
]
7.9.
Consider the following data on one predictor variable ZI and two responses Y1 and Y2:
"12
YI
Y2
5 3
1 3 1
0 4 1
·1 2 2
2
1 3
Assume joint normality. (a) Obtain the maximum likelihood estimates of the parameters for predicting ZI from Z2 andZ3 • (b) Evaluate the estimated multiple correlation coefficient RZ,(Z2,Z,), (c) Determine the estimated partial correlation coefficient R Z "Z2' Z" 7.14. 1Wentyfive portfolio managers were evaluated in terms of their performance. Suppose Y represents the rate of return achieved over a period of time, ZI is the manager's attitude toward risk measured on a fivepoint scale from "very conservative" to "very risky," and Z2 is years of experience in the investment business. The observed correlation coefficients between pairs of variables are
Determine the least squares estimates of the parameters in the bivariate straightline regression model ljl = {301
+ {3llZjl + Bjl
Bj2'
lj2 = {302 + {312Zjl +
j
= 1,2,3,4,5
i
with
Y
Also calculate the matrices of fitted values and residuals Verify the sum of squares and crossproducts decomposition
Y
Y
=
[YI
i Y2)'
R =
['0 .35
.82
ZI 35 1.0.60
Z2
.60 1.0
B2]
= .35 and rYZ2 = .82.
Y'y
=
Y'Y + i'i
(a) Interpret the sample correlation coefficients ryZ,
7.10. Using the results from Exercise 7.9, calculate each of the following. (a) A 95% confidence interval for the mean response E(Yo1 ) = {301 + {311Z01 corresponding to ZOI = 0.5 (b) A 95 % prediction interval for the response Yo 1 corresponding to Zo 1 = 0.5 Cc) A 95% prediction region for the responses Y01 and Y02 corresponding to ZOI = 0.5 7.11. (Generalized least squares for multivariate multiple regression.) Let A be a positive defmite matrix, so that d7(B) = (Yj  B'zj)'A(Yj  B'zj) is a squared statistical choice distance from the jth observation Yj to its regression B'zj' Show that the n B = for any choice of positive definite A. Choices for A II and I. Jl,int: Repeat the steps in the proof of Result 7.10 With I replaced by A. 7.12. Given the mean vector and covariance matrix of Y, ZI, and Z2,
(b) Calculate the partial correlation coefficient rYZ!'Z2 and interpret this quantity with respect to the interpretation provided for ryZ, in Part a. The following exercises may require the use of a computer. 7.1 S. Use the realestate data in Table 7.1 and the linear regression model in Example 7 A. (a) Verify the results in Example 704. (b) AnaJyze the residuals to check the adequacy of the model. (See Section 7.6.) (c) Generate a 95% prediction interval for the selling price (Yo) corresponding to total dwelling size ZI = 17 and assessed value Z2 = 46. (d) Carry out a likelihood ratio test of Ho: {32 = 0 with a significance level of a = .05. Should the original model be modified? Discuss. 7.16. Calculate a Cp plot corresponding to the possible linear regressions involving the realestate data in Table 7.1. 7.17. Consider the Forbes data in Exercise 1.4. (a) Fit·a linear regression model to these data using profits as the dependent variable and sales and assets as the independent variables. (b) Analyze the residuals to check the adequacy of the model. Compute the leverages associated with the data points. Does one (or more) of these companies stand out as an outlier in the set of independent variable data points? (c) Generate a 95 % prediction interval for profits corresponding to sales of 100 (billions of dollars) and assets of 500 (billions of dollars). (d) Carry out a likelihood ratio test of Ho: {32 = 0 with a significance level of a = .05. Should the original model be modified? Discuss. .
jJ = (Z'Zr1z'Y minimizes the sum of squared statistical distances, d7(B), , )=1
determine each of the following. (a) The best linear predictor Po + {3I Z 1 + {32Zz of Y (b) The mean square error of the best linear predictor (c) The population multiple correlation coefficient (d) The partial correlation coefficient PYZ(Z,
424 Chapter 7 Multivariate Linear Regression Models
7.18. Calculate (a) a C plot corresponding to the possible regressions involving the Forbes data p Exercise 1.4. (b) the AIC for each possible regression. 7.19. Satellite applications motivated the development of a silverzinc battery. contains failure data collected to characterize the performance of the battery dunng Its , life cycle. Use these d a t a . ' (a) Find the estimated linear regression of In (Y) on an appropriate ("best") subset of predictor variables. ' (b) Plot the residuals from the fitted model chosen in Part a to check the assumption.
Exercises 425
7.21. Consider the airpollution data in Table 1.5. Let Yi = N02 and Y2 = 03 be the two responses (pollutants) corresponding to the predictor variables Zt = wind and Z2 = solar radiation. (a) Perform a regression analysis using only the first response Yi, (i) Suggest and fit appropriate linear regression models. (ii) Analyze the residuals. (iii) Construct a 95% prediction interval for N02 corresponding to Zj = 10 and Z2 = 80. (b) Perform a muItivariate mUltiple regression analysis using both responses Yj and 12· (i) Suggest and fit appropriate linear regression models. (H) Analyze the residuals. (Hi) Construct a 95% prediction ellipse for both N02 and 0 3 for Zt = 10 and Z2 = 80. Compare this ellipse with the prediction interval in Part a (iii). Comment. 7.22. Using the data on bone mineral content in Table 1.8: (a) Perform a regression analysis by fitting the response for the dominant radius bone to the measurements on the last four bones. (i) Suggest and fit appropriate linear regression models. (ii) Analyze the residuals. (b) Perform a multivariate multiple regression analysis by fitting the responses from both radius bones. (c) Calculate the AIC for the model you chose in (b) and for the full model. 7.23. Using the data on the characteristics of bulls sold at auction in Table 1.10: (a) Perform a regression analysis using the response Yi = SalePr and the predictor variables Breed, YrHgt, FtFrBody, PrctFFB, Frame, BkFat, SaleHt, and SaleWt. (i) Determine the "best" regression equation by retaining only those predictor variables that are individually significant. (ii) Using the best fitting model, construct a 95% prediction interval for selling price for the set of predictor variable values (in the order listed above) 5,48.7, 990,74.0,7, .18,54.2 and 1450. (Hi) Examine the residuals from the best fitting model. (b) Repeat the analysis in Part a, using the natural logarithm of the sales price as the response. That is, set Yj = Ln (SalePr). Which analysis do you prefer? Why? 7.24. Using the data on the characteristics of bulls sold at auction in Table 1.10: (a) Perform a regression analysis, using only the response Yi = SaleHt and the predictor variables Zt = YrHgt and Zz = FtFrBody. (i) Fit an appropriate model and analyze the residuals. (ii) Construct a 95% prediction interval for SaleHt corresponding to Zj = 50.5 and Z2 = 970. (b) Perform a multivariate regression analysis with the responses Y j = SaleHt and Y2 = SaleWt and the predictors Zj = YrHgt and Z2 = FtFrBody. (i) Fit an appropriate multivariate model and analyze the residuals. (ii) Construct a 95% prediction ellipse for both SaleHt and SaleWt for Zl = 50.5 and Z2 = 970. Compare this eilipse with the prediction interval in Part a (H). Comment.
Data
Zt
Charge rate (amps) .375 1.000 1.000 1.000 1.625 1.625 1.625 .375 1.000 1.000 1.000 1.625 .375 1.000 1.000 1.000 1.625 1.625 .375 .375
Discharge rate (amps) 3.13 3.13 3.13 3.13 3.13 3.13 3.13 5.00 5.00 5.00 5.00 5.00 1.25 1.25 1.25 1.25 1.25 1.25 3.13 3.13
Depth of discharge (% ofrated amperehours) 60.0 76.8 60.0 60.0 43.2 60.0 60.0 76.8 43.2 43.2 100.0 76.8 76.8 43.2 76.8 60.0 43.2 60.0 76.8 60.0
Z3
Z4
Temperature
(QC)
Zs End of charge voltage (volts)
2.00 1.99 2.00 1.98 2.01 2.00 2.02 2.01 1.99 2.01 2.00 1.99 2.01 1.99 2.00 2.00 1.99 2.00 1.99 2.00
Y
Cycles to failure 101 141 96 125 43 16 188 10 3 386 45 2 76 78 160 3 216 73 314 170
esearc
40 30 20 20 10 20 20
10 10
30 20 10 10 10 30 0 30 20 30 20
te d from, S Sidik ,H Source' S eI ec . Leibecki "and J Bozek , Failure of Si/verZinc Cells with Competing Le . R
Failure ModesPreliminary Dala Analysis, NASA Technical Memorandum 81556 (Cleveland:
WIS
h
Center, 1980),
7.20. Using the batteryfailure data in Table 7.5, regress on the first nent of the predictor variables Zb Z2,"" Zs· (See SectIOn 8.3.) Compare the the fitted model obtained in Exercise 7.19(a).
991 : Z2 FFF 36.934 Z3 ZST 1. Table 7.7 Pulp and Paper Properites Data BL 21.513 577 . experience level was coded as Novice or Guru. (iii) Construct a 95% prediction ellipse for both Total TCAD and Amount of amitriptyline for Zl = 1. (a) Perform a regression analysis using each of the response variables Y1. Refer to the data on fixing breakdowns in cell phone relay towers in Table 6.515 2. Z2 = 1200.859 Y4 BS Zl AFL . • (b) Repeat Part a using the second response Yz.27.330. Zl = arithmetic fiber length.315 6.030 .855 28.i = 1. heartbeat.239 .601 6.prenhall.400 . In the initial design. Check for outliers. Y3 = stress at failure.294 20. and Z5 = 85. Comment.070 Source: See Lee [19]. 13 and Y4 • (i) Suggest and fit appropriate linear regression models. ' Table 7.500. Y4 = burst strength.064 1.997 3.326 5.015 .6.163 20. Z2 = long fiber fraction. Z3 = fine fiber fraction. There are n = 62 observations of the pulp fiber characteristics. 7. Y2 = elastic modulus.932 .010.26.786 5. yz.2.572 7.com/statistics).20.988 8. (ii) Analyze the residuals. (b) Perform a muItivariate multiple regression analysis using all four response variables. (a) Perform a regression analysis using only the response Y1 • (i) Suggest and fit appropriate linear regressIOn models.719 7.Z3 and Z4' (i) Suggest and fit an appropriate linear regression model. Z4 = 70..709 19.25. Z2 = 45.479 4. Guru and Experienced. Source: See [24]. (iii) Construct a 95% prediction interval for SF (13) for Zl = . Z4 = 70.206 20. Yl = breaking length.866 3.441 16.220 39. in the original data set.289 Z2 = Amount of antidepressants taken at time of overdose (AMT) Z3 = PR wave measurement (PR) Z4 = Diastolic blood pressure (DIAP) Z5 = QRS wave measurement (QRS) Y1 Y2 EM 7.470 3389 1101 1131 596 896 1767 807 1111 645 628 1360 652 860 500 781 1070 1754 3149 653 810 448 844 1450 493 941 547 392 1283 458 722 384 501 405 1520 .057 1. Specify the matrix of estimated coefficients /J and estimated error covariance matrix (ii) Analyze the residuals. Some additional runs for an experienced engineer are given below. and the paper properties.713 39.375.025 . Y1 .324 2.7 (c) Perform a multivariate multiple regression analysis using both responses Yi and yz.845 1.437 Y3 SF 5.6 Amitriptyline Data Yl TOT Y2 AMI Zl Z2 Z3 GEN 1 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 AMT 7500 1975 3600 675 750 2500 350 1500 375 1050 3000 450 1750 2000 4500 1500 3000 PR 220 200 205 160 185 180 154 200 137 167 180 160 135 160 180 170 180 DIAP 0 0 60 60 70 60 80 70 60 60 60 64 90 60 0 90 0 Z4 Z5 QRS 140 100 111 120 83 80 98 93 105 74 80 60 79 80 100 120 129 .049 : Z4 Table 7. Also. Zl = 20. and Z5 = 85. Z4 = 1.033 1.and the four independent variables. ZZ.086 7.779 6.586 21.998 1.289 17. Measurements of properties of pulp fibers and the paper made from them are contained in Table 7. The two response variables after an amitrIptyhne overdose are given ID are Y I = Total TCAD plasma (TOT) yz = Amount of amitriptyline present in TCAD plasma level (AMI) The five predictor variables are ZI = Gender: liffemale.081 1.542 20.851 30.050 1.018 17.7 (see also [19] and website: www. IS ff Yts that seem to be related to ttie use of the drug: irregular hit d' are also conjecture SI e e ec I bl d ssures. (ii) Analyze the residuals.030 .694 .236 . (ii) Analyze the residuals. _ Zl  = 1200 ' 1.039 6. i. Z2 7.072 36570 84554 81.559 .for the same settings of the independent variables given in part a (iii) above. (iii) Construct a 95% prediction interval for Total TCAD for Z3 = 140.Q70 : LFF 35. D on 17 patients who were admitted to the hospital among other a a ga .396 4. Comment. Check for outliers or observations with high leverage.017 4.639 1.605 .756 32. Z3 = 140. Zl. . However.312 21. 3. (i) Suggest and fit appropriate linear regression models. Z4 = zero span tensile. 13 and Y4 .912 2.053 1.742 .449 16. . and irregular waves on tee ec rocar wgram. Compare the simultaneous prediction interval for Y03 with the prediction interval in part a (iii).991 36.478 .979 6. there 7.054 3.415 .871 .' 'bed b some physicians as an antidepressant. Compare this ellipse with the prediction intervals in Parts a and b.237 5.Oifmale (GEN) Exercises 42. Yz.239 35. (iii) Construct simultaneous 95% prediction intervals for the individual responses Yoi.060 4.426 Chapter 7 Multivariate Linear Regression Models . Now consider three levels of experience: Novice.795 6.4.008 . reclassify Guru in run 3 as ' .
Irwin. and R. 31 (1960). London: Chapman and Hall. and H. 19. C. E.2 10. G. 10.A. W. Econometric Theory.). 1996. S. 1992. S. Naik. 2006. 189193. Nachtsheim. Kutner. 6.). and problem severity. problem complexity and experience level as the predictor variables. 2003. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity (Paperback). Transformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Annals of Mathematical Statistics. Ne: SAS Institute Inc.428 Chapter 7 Multivariate Linear Regression Models Experienced and Novice in run 14 as Experienced. ' 15. Keep all the other numbers for these two engineers the same. A. A. G. Khattree. Applied Multivariate Statistics with SAS® Software (2nd ed. New York' John WIley. 1986. G. and R. 23... 1. Englewood Cliffs.3 5. 4. T. 12. New York: WiJeyInterscience. and C. New York: WileyInterscience. CA: Thompson Brooks/Cole. Applied Linear Regression Models (3rd ed. 0. and 1. D. 20. Reinsel. and D. R. B. "Upper Percentage Points of the Largest Root of a Matrix in Multivariate AnalYSIS. T. Atkinson. 38 (1951).317346. New York: WIIeyIntersclence.9 12. 22. Box. S. 2000. "A General Distribution Theory for a Class of Likelihood Criteria. Seber. 1964. Abraham.. Cook. . "Testing for Serial Correlation in Least Squares Regression H. 1. and S. and Plasma Drug Levels after Amitriptyline Overdose. 625":'642. With these changes and the new data below. 6771. "R. A. O'Connell. and F.f Some Upper Percentage Points of the Distribution of the Largest Charactenstlc Root. Lee. E. New York: John Wiley." Journal of the Royal Statistical Society (B). New York: WileYInterscience. P. ChIcago: RIchard D. England: Oxford University Press. New York: John Wiley. Weisberg. Toward Mediocrity in Heredity Stature..36 (1949). 2. Anderson. A. Hadi. so the analysis is best handled with regression· methods. Bowerman. and G. Bartlett. E. Bels!ey. Weisberg. F.9 Problem implementation time 9. S. R. New York: John Wiley. D. 1999.5 15. L.296298." BLOmetnka. Chatterjee.) Problem severity level Low Low High High High Problem complexity level Complex Complex Simple Simple Complex Engineer experience level Experienced Experienced Experienced Experienced Experienced Problem· assessment time 5. Neter. 17.). Linear Regression Analy. 2002. 16 (1954). NJ: Prentice Hall. ':Charts ?.is. Welsh. 159178. Daniel. Time Series Analysis: Forecasting and Control (3rd ed.. F." Journal of the AnthropologlcalInstltute. 1. G.. 11.. and S. 19 (1982).). 9. 24. S. CA: Thompson Brooks/Cole. Heck.246263: 16.n. "A Note on Multiplying Factors for Various ChiSquared Approximations. 54 (1967)." BLOmetrika. 14. G. S. D. 18. Consider regression models with the predictor variables and two factor interaction terms as inputs. Plots. Kuh. 21. 2004. . 1999. P. Durbi. Residuals and Influence in Regression.0 4:5 6. Price. Oxford. W. Cook. Rao. perform a multivariate multiple regression analysis with assessment and implementation times as the responses. C. Rudorfer.7 14. Applied Regression Analysis (3rd ed. 5. Belmont. M.0 4. Ledolter. Box. Umverslty of Toronto. 1982.6 8. 3. An Introduction to Multivariate Statistical Analysis (3rd ed. Watson. Fitting Equations to Data (2nd ed. R. Journal o/ToxlcologyClznical Toxicology.). (Note: The two changes in the original data set and the additional." Unpublished doctoral theSIS. 1998. New York: John Wiley. Belmont.9 8. C.6 13. C. Wood.9 Total resolution time 14. Faculty of Forestry. .N. 1994.) (paperback).).. 2006.) Cary. data below unbalances the design. and B. Linear Statistical Models: An Applied Approach (2nd ed. B. R. References 1. M.. Smith. Goldberger." Biometrika. Applied Regression Including Computing and Graphics. RegreSSion Analysis by Example (4th ed. N.) (paperback). Introduction to Regression Modeling. C.2 21. 15 (1885).8 References 429 13. 1977. Jenkins.. V. E. 7. Linear Inference and Its Applications (2nd ed.elati0!lshil?s Between Properties of PulpFibre and Paper. 8.1999.
I.. p y yThhe principa l are those un correlat ed linear combina tions I. Var(a1X ) subject to alal = 1 " es Second principa l compon ent = linear combina tion a'2X th a t maxImiz Var (a2X) subject to a2a2 = 1 and Cov(a1 X. ith principa l compon ent = linear combina tion at X that maximiz es Var(aiX ) subject to aia. " ' t al " hand no require a normal assumpt ion. es th a t maXlmIZ = linearcombinatl'on a'X 1 Y.. a"X) = 0 for ) J J 8.X . A good example of this is provided by the stock market data Exampl e 8. X p as th e coord' mate axes. constant some by al any Ing by multiplY We to restrict attentio n to coefficient vectors of unit length. Geometr Xp. is reduced to a data set consisting of n measurements on compon ents.. 0d Y. = 1 and Cov(a.al.. var!ance. maximu .. were not An analysis of principal components often reveals relationships that y ordinaril not would that ations interpret allows thereby and d previou sly suspecte discussed in result. .compon ent is the linear combina tion with a'Y. there is (almost) as much information in the k replace then can ents compon principal k The .. The new axes represen t the d'Irect'lOns . Conslde r the hnear combina tions (81) Yp = = ap1X1 + a p2 X 2 + '" + appXp Then. p w ose m (82) are as large as possible.' the initial p variables. On the other useful have ions populat normal derred for multiva riate es can be made inferenc . Althoug h p components are required to reproduce the total system variabilit printhe of k number small a by for d accounte be can ty often much of this variabili components cipal compon ents..ak i = 1. If so. . p (82) (83) i. example For ations. .2. . . .a = (Yd Var that clear is It a1Y.a2X) = 0 At the ith step. Jh vector X' = [X1. and the original data set. as there is in the original p variables ments on . . \ ues ". .. . X pave Let the Irandom the covanan ce matrix Y. evelopm ent does elf p' . ellipsoid density t constan the 0 terms III Ions compon ents when the popUlation is· multivariate normal. ) 'th' WI . consisting of n measure k principal p variable s. because they frequent multiple a to inputs be may nts compone l principa . X Xl' s dom variable 2 original the rotating by the selectio n of a new coordinate system obtained 430 k < i . (scaled) regressi on (see Chapter 7) or cluster analysis (see Chapter 12).Cha pter Population Principal Components 431 with X b X z. Further s.• . 2"·. .2. we obtain Var(Y. .5. X 2. principal components are particular linear combinations represen t tions combina linear these ically. The first. ' .1 Introduction covariance A principal compon ent analysis is concerned with explaining the variance . larger investig r. than an Analyse s of principal components are more of a means to an end rather much in steps iate intermed as serve ly end in themselves. First principa l compon ent .) = aiY. Yk ) = aiY. Its variables these of tions combina linear few a structlir e of a set of variables through ation.I A2 '" Ap 0. b T ' a simpler and more parsimo nious descript with max' ion vana Ilty and proVIde of th e covanan ce structur e. using (245). Moreove the fact9r for matrix ce covarian the of g" "factorin one are principa l compon ents analysis model considered in Chapter 9. It Var(l}) = this e eliminat To .. That IS.ai Cov(Y. . interpret (2) and reduction data (1) are es general objectiv y.ependXsoThlely .. (See PRINCIPAL COMPONENTS 8.2 Population Principal Components of the p ranAlgebraically. n the covarian ce . k = 1. .. 2.
.432 Chapter 8 Principal Components Population Principal Components 433 Result 8. and the variables X k · Here (A1> el)' (A2.....= Ak+1 k = 1.. Y.· ..) = Cov(aleX. 0] so that X k = a"X and COy (Xk . The magnitude of ejk measures the importance of the kth variable to the ith principal component. .. .. . . .*0 a a and consequently. .p Result 8.ep ] so that PP' = P'P = I. . Set ale = [0. From Result 8. If 1] = e. and X k • '* Result 8."·' P' P ponent is given by Y.: . and the proof is complete. Proof. Ap Let Y = ejX. CTU + CT22 + .X. \ e) (A e) where Al A2 .. In particular. 12 = ezX.2. .) /=1 p Although the correlations of the variables with the principal components often help to interpret the components. ... with B = :t... X 2.... Var(Y. k and k = 1. = eiAkek = Akeiek =0 are the correlation coefficients between the components Y.. two. or three components. Yk ) ei:tek Proof. . . + Ap = 2: Var(Y.. .1 aa = For the choice a = ek+l. . are not unique. i k) gives COy (Y. eip] also merits inspection..el)' (A2. vVar(Y. they do not indicate the importance of an X to a component Y in the presence of the other X's... the principal components are uncorrelated and have variances equal to the eigenvalues of :to eigenvector pairs (AJ. i = 1...eiX = enXI + ej2X2 + .0. .) p • (86) =0 If some Aj are equal.} p = tr(:t) = tr(A) = L Var(Y.X k = ejkv% VCTkk are the principal components i.2. .1. Then Var(Y. .) = Aj (see (85)J and Var(Xk ) = CTkkyield Cov(Y.) = a"Ajej= Aieik. = obtained from the covariance matrix :t. Let X' = [XI' X 2. .. Xp] have covariance matrix:t. . .ek e. and. any t'. Y.. with eigenvalue i. : t e l . A{' are dIstmct. i = 1.. 80 to 90%) of the total population variance. i k. they measure only the univariate contribution of an individual X to a component Y.. From Definition 2A. .. + Ap ( component ( attained when a = el) k = 1. + CT pp . Yk ) = eiIek for any i *.. ... + Ap Thus.. ej. . For this reason. Then I Result 8.. the choices of the corresponding coefficient vectors. premultlplicatlOn by ej gIves If most (for instance. eiek = 0.. Since :tej = Ajej. .p . From (220) with A = :t.11(c).p (87) But el el = 1 since the eigenvectors are Thus. ejk.. a':ta max.. eiX) = alc:tej. for large p. . the eIgenvectors to common eigenvalues may be chosen to be orthogonal.ep) where Al A2 . some . PYiX. 0. e2. ...= el:tel elel = Var(YI ) a':ta . . Y2 = e2X..k = 1. . we get max • J. VCTkk VCTkk • O. then these components can "replace" the original p variables without much loss of information. for i 1. irrespective of the other variables. That is. Proof.1. + er pp = 2: Var(Xj) i=1 p = Al + A2 + ..k=1. .3.. . using (252). . then PY..2. p ..2.o . ejk is proportional to the correlation coefficient between Y... Let :t be the covariance matrix associated with the random vector X' = [XI. 1. A 2.) vVar(Xk ) vA. (Ap. e2). + CT pp = tr(:t).28. .. = Al = .···.2 says that L Var(X. Each component of the coefficient vector ei = [ejJ. Ap O.k....x be the principal components.2.p (88) '* Cov(Y. . + ejpXp.. CTu + CTn + . Yp = e. If the eigenvalues are not all distinct..2. COV(Xk.e2)... the proportion of total variance due to (explained by) the kth principal component is Proportion of total ) population variance _ Ak due to kth principal . . r .*0 a a Similarly.... .. . .p With these choices. can be· attributed to the first one. .2.. according to (245).we have tr(:t) = tr(PAP') = tr(AP'P) = tr(A) = Al + A2 + . + Ap max. Since :tek = Akek. with ek+1ej = 0.· . We know from (251). Yk ) = O. Then the ith principal ( 1l2' 2. a':ta = Al .. X k ) Aiejk e·k VX.. . that Total population variance = CTII + CT22 = Al + A2 + + . Let :t have the eigenvaIueeigenvector pairs (AI. el). =.Al + A2 + . ...1.2.. p .. (Ap.eIgenvectors ej and ek' ejek = 0. e"+1:tek+Iiele+lek+1 = ek+l:tek+1 = Var(Yk+d But ele+I(:tek+d = Ak+lek+lek+1 = Ak+1 so Var(Yk:l) = Ak+l· It remains to show that ej perpendicular to ek (that is.) = ei:tej = Aj Cov (Y.... the eigenvectors of:t are orthogonal if all the :igenvalues AI. Xp]. r .= :.= = . hence Y. Using ResuIt 2A.2. e p ) are the eigenvalueeigenvector pairs for:t... . "l>e2. ·. we can write:t = PAP' where A is the diagonal matrix of eigenvalues and P = [el.
83 VS = .3. with coefficient . A2 = 2. the principal components become Yi. it is our experience that these rankings are often not appreciably different.I (Calculating the population principal components) random variables Xl' X 2 and X3 have the covariance matrix validating Equation (86) for this example. indicating that the variables are about equally important to the first principal component.383(0) which have axes ± cVA. x Al . ei = [. for example. It also has the largest correlation (in absolute value) with Yi.. Suppose X is distributed as Np(IA' l. where the (Ai. = eiX = . however. p. Cov(X) = Cov(W)..924. e.p.383?Var(X1) + (. ....383Xl = . Finally. is almost as large as that for X 2 .83. Although the coefficients and the correlations can lead to different rankings as measures of the importance of the variables to a given component.854(5) . that X 2 contributes more to the determination of YI than does Xl' Since. we obtain . l From our discussion in Section 2...).Further. A3 = 0. we can write C = .centered ellipsoids Therefore. . 0] Notice here that the variable X 2 .924X2) = (.1] e3 = [.383X2 The variable X3 is one of the principal components.383. The proportion of total variance accounted for by the first principal component isAJ/(A l + A2 + A3 ) = 5.2.998 It may be verified that the eigenvalueeigenvector pairs are Al = 5. . In practice. _ It is informative to consider principal components derived from multivariate normal random variables. be used to interpret the components. with YI . both coefficients are reasonably large and they have opposite signs. because it is uncorrelated with the other two variables..924v'5.. Example S. ... In this case. ei p] in the coordinate system that has origin lA.924X2.2.3 with A = l.924X1 + .383X1 .383) ( ..00 + .383 Cov(Xl> X 3) . The following hypothetical example illustrates the contents of Results 8. we would argue that both variables aid in the interpretation of Yi. We recommend that both the coefficients and the correlations be examined to help interpret the principal components.434 Chapter 8 Principal Components Population Principal Components 435 statisticians (see.924X2 12 = e2X = X3 }\ = e3X = .924?Var(X2) = . variables with relatively large coefficients (in absolute value) tend to have relatively large correlations. frequently give similar results.l.1.83 + 2)/8 = .383.83/8 = .= 0 in the argument that follows..)2 2 = x. 12) = 5. x)2 + . ei2.8.. and 8. the components Y1 and Y2 could replace the original three variables with little loss of information. and E(W) However.. .0.147(1) + 2( . .924. A point lying on the ith axis of the ellipsoid will have coordinates proportional to ej = [ei I. •. We know from (47) that the density of X is constant on the lA. X2.83 + 2. The correlation of Xl. Var(Yd = Var(..17. using (88). + 1 (e' x) 2 + 1 ( e2 A2 Ap p . The relative sizes of the coefficients of Xl and X 2 suggest.708( 2) ..924 COV(X2' X 3) x = 1 ( el . Rencher [16]) recommend that only the coefficients eib and not the correlations.925.98 of the population variance. so the two measures of importance...924..the first two components account for a proportion (5.00. receives the greatest weight in the component YI ..924(0) =0 It is also readily apparent that 0"11 + 0"22 + 0"33 = 1 + 5 + 2 = Al + A2 + A3 = 5. X 3) Cov(Y1 . Next. (as it should) The remaining correlations can be neglected.383Xl . . X p' It will be convenient to set lA.83 = Al = Cov(. ei' i = 1.73. in this case.0] e2 = [0..) are the eigenvalueeigenvector pairs of l. .17 IThis can be done without loss of generality because the normal random vector X can always be translated to the normal random vector W = X .and axes that are parallel to the original axes XI. . since the third component is unimportant. For example. Equation (85) can be demonstrated from first principles. the first multivariate and the second univariate.924) Cov (Xl> X 2) + .
1 The constant density ellipse x'Il x = c Z and the principal components YI .... Using (87) with Z in place of X.. . since the variance of each Z./L) that has mean and lies in the direction ei' A constant density ellipse and the principal components for a bivariate normal __ . principal component coordinates of the form [0.4 follows from Results 8.. Result 8. X 2 • .Y2 + . .. to refer to the ith principal component and (A. Y2 for a bivariate normal random vector X having meanO.· necessarily. Yp = we have 11 C Population Principal Components 437 In matrix notation.. is unity. the principal components YI' = et x. This result holds for p > 2 dimensions as well. = e.)2.' . (Az.. itis the meancentered principal component Yi = ei(x . However. in general. Therefore. . Z2 • . with some simplifications. Ap are positive) in a coordinate system with axes YI.8. • 11=0 P = .p (812) are the eigenvalues of p.ILIl 1 k=1.. )2 = x. then the major aXIs hes ill the dIrectIOn el· The remaining minor axes lie in the directions defined by ez. A2.) p = i=I 2: Var(Z.1... (AI.•. E(Z) = 0 and l l Cov (Z) = (V I/2r l:(V I/2r = p by (237). . with Al Az .k = 1. .) p =p (811) y.. .. We see from (811) that the total (standardized variables) population variance is simply p.• Xp and p in place of l:. ..2.. the (A.. . z _ (X2 2  va:.=1 2: Var(Y. x are recognized as the principal components of x.75 Figure 8.. ."" eip] and.Yl + . Zp in place of XI.Z2.. Y2 = ezx. ez ° i = 1.1. The principal components of Z may be obtained from the eigenvectors of the correlation matrix p of X. Ap 0.75 are shown in Figure 8.. The ith principal component of the Z' = [ZI. the sum of the diagonal elements of the matrix p.2. . with ZI. Result 8. e2)"'" p.2 (Principal components obtained from covariance and correlation matrices are different) Consider the covariance matrix l:=[! . .0). .) for the eigenvalueeigenvector pair from either p or l:. When /L =P 0. + A' Yp "I "2 P \1 11 1I i 11 1 I I'I and this equation defines an ellipsoid (since Aj.:'" tively.. e. Yp lying in the ej.3.436 Chapter 8 Principal Components where et x.. We shall continue to use the notation Y.2.0. 1L2) Example 8.2. Yp = lie in the directions of the axes of a constant density ellipsoid. .I' ei2. We see that the principal components are obtained by rotating the original axes an angle () until they coincide with the axes of the constant denSIty ellIpse.4. . not the same as the ones derived from p. .. e. et>.···. = [e. e p • To summarize. any point on the ith ellipsoid axis has x coordinates proportional to e. p Moreover. Setting YI = el x. e p) are the eigenvalueeigenvector pairs for Proof... Yi' 0. (810) where the diagonal standard deviation matrix VI/2 is defined in (235).. . eZ x. and 8. random vector with /L = 0 and p = .. e2.) derived from :t are. . . .x and i.is given by standardized variables 1 2 1 2 1 2 z = ..Zp)withCov(Z) = p... . .p In this case. CAp... . . If Al is the largest eigenvalue. Clearly. All our previous results apply. we find that the proportion of total variance explained by the kth principal component of Z is Proportion of (standardized») A population variance due = ( to kth principal component p where the Ak'S Principal Components Obtained from Standardized Variables Principal components may also be obtained for the standardized variables Z _ (Xj.
.. we see that the relative importance of the vanables.707 and .999] = [.999 attached to these variables in the principal component obtained from l:.for instance.. e2 = [. For a covariance matrix with the pattern of (813).040XI + . the contours of constant density are ellipsoids whose axes already lie in the directions of maximum variation. . and X 2 (or Z2) will play a larger role in the construction of the principal components. e. . Variables should probably be standardized if they are measured on scales with widely differing ranges or if the units of measurement are not commensurate. For example. l:). = .0].84.707. .01 to . . = [0.837 py1·1 0 and PY1. their subsequent magnitudes will be of the same order.. ... = aije.4. .. . Using Result 8.707Z2 = = p: Yz = . if both variables are standardized. if X is distributed as Np(p.707Z2 = . In this case.040 and .7 2 of the total (standardized) population variance.ILl) .IL2) + .16.IL2) + . if Xl represents annual sales in the $10. and we conclude that (aj. e. this first principal component explams a proportion _A_I_ = 100.60 range. Furthermore.2.707v1.707J = 1 ..992 Al + A2 101 l: = fo 0 0 1 0 0 o . 0 .1 .707.0707(X2 .ILl) .837 n 0 0 1aii or Ie. Most strikingly.6. ...16 = . When the variables XI and X 2 are standardized. .0707 are in direct opposition to those of the weights ..Z2 fT a22 0 = e21 VI. the first principal component explains a proportion = 1.040] A2 = e2 Similarly. = [.0. to. . 0 0 In this case. . ..040X 2 I 2 and YI = . the set of principal components is just the original set of uncorrelated random variables.707 (X2 10 . . • The preceding example demonstrates that the principal components derived from I are different from those derived from p.707] The respective principal components become Y 2 j = .707(XI .ILl) (X2 . . .p = .707(XI ·ILI) .000 to $350. . an . The eigenvalueei.707Z1 XI . This suggests that the standardization is not inconsequential.707 ( .) is the ith eigenvalueeigenvector pair.999X . From another point of view.707 10 = . with 1 in the ith position. . . the first principal component is greatly affected by the standardIZatIOn..999.4 = .IL2) Because of its large variance..1 . Consequently. then the total variation will be due almost exclusively to dollar sales.1.0." = Al P . there is no need to rotate the coordinate system. the relative magnitudes of the weights .438 Chapter 8 Principal Components Population Principal Components 439 and the derived correlation matrix p= Al e.707 ( . 0 . however. = [.genvector pairs from I are = 100.707Z1 + Principal Components for Covariance Matdces with Special Structures There are certain patterned covariance and correlation matrices whose principal components can be expressed in simple forms.4 = . the resultmg variables contribute equally to the principal components determined from p. (813) Setting e. X 2 completely dominates the first determined from I. the eigenvalueeigenvector pairs from pare Al A2 =1+P= 1.. we observe that of the total population variance.999X I: Y = . This behavior was observed in Example 8. Since the linear combination et X = Xi.707v1.IL2) .0707(X2 .4 = . When the first principal component obtained from p is expressed in terms of Xl and X 2 . one set of principal components is not a simple function of the other. .000 range and X 2 is the ratio (net annual income)/(total assets) that falls in the .. Moreover. Suppose l: is the diagonal matrix all XI . we would expect a single (important) principal component with a heavy weighting of Xl' Alternatively.040. we obtain z = ell v'X".. . nothing is gained by extracting the principal components.4.
..8ZXJ.X and sample variance 81S81' Also. p = I. the largest is Al = 1 + (p .. Zp have a multivariate normal distribution with a covariance matrix given by (815). This principal component explains a proportion = Al 1 + (p .. .0.. Recall that the n values of any linear combination j = 1. then the ellipsoids of constant density are "cigar shaped. If the standardized variables Zl. so the eigenvalue 1 has multiplicity p and e. = [0. Consequently. Vp. Suppose the data Xl.0]. It is not difficult to show (see Exercise 8... the sample covariance matrix S. When p is near 1.. = Ap = 1 .2. In this special case.1.0. This principal component is the projection ofZ on the equiangular line I' = [1. •• . The matrix in (815) implies that the variables Xl' X 2 .1) . the p X P identity matrix. the last p . Z2.1) ] 1 1 = [ V(p _ l)p"'" V(p .0 v'(il)i J (p . 1. ..oJ 1 [ = I VU  1 {i .l)p have sample mean 8J. Our objective in this section will be to construct uncorrelated linear combinations of the measured characteristics that account for much of the variation in the sample.. When p is positive.... the principal components determined from p are also the original variables Zlo"" Zp.1. . pe. the first component explains 84 % of the total variance.. = le.80 and p = 5.1)/ V(p . ."" Xn represent n ipdependent drawings from sOme pdimensional popUlation with mean vector p.n The remaining p . It might be regarded as an "index" with equal weights.1].440 Chapter 8 Principal Components Standardization does not substantially alter the situation for the 1: in (813).l)p of the total population variation. (817) 8.5) that the p eigenvalues of the correlation matrix (815) can be divided into two groups. i = 1. .1] X. The minor axes (andremainingprincipal components) occur in spherically symmetric directions perpendicular to the major axis (and first principal component). . are convenient choices for the eigenvectors.' . has the general form Summarizing Sample Variation by Principal Components 441 The first principal component l] = el Z =  2: Z. .. In that case. We see that Adp == p for p close to 1 or p large.oJ e3 = . .. which often describes the correspondence among certain biological variables such as the sizes of living things. a measure of total size. Clearly. have sample covariance 8jS8z [see (336)]. . . .1] Z. These data yield the sample mean vector x. p.1.Xj.3 Summarizing Sample Variation by Principal Components We now have the framework necessary to study the problem of summarizing the variation in n measurements on p variables with a few judiciously chosen linear combinations.. retaining only the first principal component Yj = (l/vP) [1.l)p p p =p+ 1.. .." with the major axis proportional to the first principal component Y1 = (I/Vp) (1. ... and covariance matrix 1:.2.. . with associated eigenvector ej = . still explains the same proportion (818) of total variance.... . . .. the multivariate normal ellipsoids of constant density are spheroids. . Xp are equally correlated. The uncorrelated combinations with the largest variances will be called the sample principal components. X2.=l 1 p is proportional to the sum of the p standarized variables. . . . in this case of equal eigenvalues. Another patterned covariance matrix..0. Moreover.1 eigenvalues are A2 = A3 = . the pairs of values (8J.p p (818) The resulting correlation matrix (815) is also the covariance matrix of the standardized variables. if p = . For example.P and one choice for their eigenvectors is ez = 2· 0. for two linear combinations. and the sample correlation matrix R.1 components collectively contribute very little to the total variance and can often be neglected.
P 0. (. ej 0 = 0 n (823) The first principal component maximizes a\Sa J or.957 0. Specifically. . p i=l p " ' L Sii = Al + A2 + .11. we have ith sample principal component . the choice of S or I is irrelevant.027 where Al .937 89. First sample linear combination aixj that maximizes principal component = the sample variance of a. )lk) In addition. by tract.96. but it will be clear from the context which matrix is being used.2. if the Xj are nonnally distributed.067 0.x).937 0..1)/n]A.n (822) generated by substituting each observation Xj for the arbitrary x in (821). . because the assumption of nonnality is not required in this section.1d I give the same sample correlation matrix R.nIVl'IJue··ei!>emlectod"·· That is. as in the proofs of :ni 8.000) and [ 1. as in (820). both S and I give the same sample principal components eix [see (820)] and the same proportion of explained variance + A2 + .91.319 Oill7] Can the sample variation be summarized by one or two principal components? 2Sample principal components also can be obtained from I = Sn. . Ap 0 and x is any observation on the )(1. . .42. ei) are the eigenvalueeigenvector pairs for S. and c.953 1. (S!!e [1]. . the ith sample principal component is by i = 1. + Ap = Ab = k = 1. area.2. a2Xj) At the ith step. the sample m!?an of each principal component is zero.) We shall not consider J.)(p·A1so.···. employed age over 16 (percent) 4. It is also convenient to label the component coefficient vectors ei and the component variances Ai for both situations.306 2..953 28.xj..".18. we obtain the following results conceming sample principal cornDCln€ If S = {sid is the p X P sample covariance matrix with ·P'". where (A. These data produced the following summary statistics: X' = pairs (AI' ed. Example 8. the maximum is the largest eigenvalue Al attained for the al = eigenvectqr el of S. The data from 61 tracts are listed in Table 8.52. k = 1.1.203 0.ei x n Xj j=l ») x = lA.'s.044 0.044 26. )lp. The observations Xj are often "centered" by subtracting x. government employment (percent) 2. Sample variance(Yk) Sample covariance()li. irrespective of whether they are obtained from S or R. i = 1. in general. .?rresponding eigenvectors e.673 1.1..' . Thus.vi = ei(x .64] median home value ($100. This has nO effect on the sample covariance matrix S and gives the ith principal component I . Summarizing Sample Variation by Principal Components 443 We shall denote the sample principal components by )11."" (Ap.078 10.. Finally.xj subject to a1al = 1 Second sample linear combination a2Xj that maximizes the sample principal component = variance of a2Xj subject to a2a2 = 1 and zero cOvariance for the pairs (a.uumanr which have maximum sample variance.957 1. .. professional degree (percent) 71.102 9. ljli I1 11 I. then = variance of aixj subject to aiai = 1 and zero sample covariance for all pairs (aixj. Total sample variance = and i...p e [4. strict the coefficient vectors ai to satisfy aiai = 1. . a1 Sa l a1 a l By (251). The sample variances are still given by the A. Successive choices of ai maximize (819) subject o = aiSek = aiAkek> or ai perpendicular Jo ek' Thus.5l3 10.203 1. provided that the eigenvalues of I are distinct.. .)(2.) In this case. If we consider the values of the ith component j = 1. total population (thousands) 3. on five socioeconomic variables for the Madison. e2). k < i linear combination aixj that maximizes the sample . so if the variables are standardized. both S a!.306 1.102 S = 4. (See Result 4.3. + Ap). .2 The components constructed from Sand R are not the same. As with the population quantities.=  ei Yi n j=l Xj  _) = .VJ.2.078 0.2.. .47.. Wisconsin.. the sample principal components can be viewed as the likelihood estimates of the corresponding population counterparts.442 Chapter 8 Principal Components The sample principal components are defined as those linear .. I has eigenvalues [( n .. .2 . and the single notation Yi is convenient..5 in the exercises at the end of this chapter.3 (Summarizing sample variability with two sample principal components) A census provided information...5l3 55. Also.626 28.p (821) for any observation vector x. i #' k 1... p). the maximum likelihood estimate of the covariance matrix I.2. a"xj). equivalently. 2..
hence. collectively. the component coefficients eik and the correlations ryi. The number of components is taken to be the point at which the remaining eigenvalues are relatively small and all about the same size.26) 0.16) 107. The correlations allow for differences m. Jolicoeur and Mosimann [11] measured carapace length. That is..35) 0. A useful visual aid to determining an appropriate number of principal components is a scree plot. that the correlation coefficients displayed in the table confirm the interpretation provided by the component coefficients.95) 0. is the rock debris at the bottom of a cliff.4 (Summarizing sample variability with one sample principal component) In a study of size and shape relationships for painted turtles.3.125 8. and height.492( .977 0. however. Things to consider include the amount of total variance explained. but only measure the importance of an indJVldual X Without regard to the other X's making up the component.9.17) 39. a scree plot is a plot of Ai versus ithe magnitude of an eigenvalue versus its number.Xk should both be exami?ed to inte.961 0.02 e2 0.130(. may indicate an unsuspected linear in the data.9 The first principal component explains 67.73) 0. The first two principal components. There is no defin.030 0.009(. width.0.67 e3 0.18.24) 0.039( .082 2. Consequently.2 shows a scree plot for a situation with six principal components. and the subjectmatter interpretations of the components.188 0.xk) . Table 6.22) 0. We notice in Example 8.480(.864(.8% of the total sample ance.071(.2 A scree plot.37 e4 e5 I 0. An elbow occurs in the plot in Figure 8. explain 92. .105(.153 0. deemed unimportant.863(.8 98. Example 8.87 67.091 0. sample variation is summarized very well by two principal ponents and a reduction in the data from 61 observations on 5 observations to observations on 2 principal components is reasonable. the original variables. 3 Scree The Number of Principal Components There is always the question of how many components to retain. To determine the appropriate number of components.444 Chapter 8 Principal Components We find the following: Summarizing Sample Variation by Principal Components 445 I! !\ I Coefficients for the Principal Coefficients in Variable Total population Profession Employment (%) Government employment (%) Medium home value Variance (Ai): Cumulative percentage of total variance I el (rh. The second principal cOIloponelllr' appears to be a weighted sum of the two. as we discuss later. reproduced in Exercise 6. the relative sizes of the eigenvalues (the variances of the pIe components). Given the foregoing component coefficients. itive answer to this question.046 0. it appears. Figure 8. suggest an analysis in terms of logarithms.2 at about i = 3..) Perform a principal component analysis..68) 0.7 92. In this case. Figure 8.7% of the total sample variance. (Jolicoeur [10] generally suggests a logarithmic transformation in studies of sizeandshape relationships. a component associated with an eigenvalue near and. As we said in our discussion of the population components.one:nl appears to be essentially a weighted difference between the percent employed government and the percent total employment. In dition. Their data. 3 With the eigenvalues ordered from largest to smallest.1 99. the eigenvalues after A2 are all relatively small and about the same size.rpret the principal components. the first principal cOlnp. without any other evidence. we look for an elbow (bend) in the scree plot. that two (or perhaps three) sample principal components effectively summarize the total sample variance.171 0.32) 0.015(.
159 .105223590 X3 3.324401 Figure 8.725443647 0.00000 20 PRINl PRIN2 PRIN3 Eigenvalue 0.62.000238 Proportion 0.159479 .0081596480 1 0.478. input length width height.97) .4 USING PROC PRINCOMP. .080104466 OUTPUT el{ryj.00677275851 YI = . proc princomp coy data = turtle out = result.2 .594 .622 . 0.713 . I r 11 iI I 11 S = 103 11. has an interesting subjectmatter interpretation.683102 0.0080191419 0.0060052707 A scree plot is shown ih Figure 8.082296771 103 I Xl Xl X2 X3 Covariance Matrix X2 X3 0.523) X 10 3 Eigenvalues of the Covariance Matrix 1 Cumulative 0. data turtle.712697 0.000360 Difference 0.005 6.014832 Eigenvectors 10 .3 A scree plot for the 3 turtle data. > PRIN.99) .960508 0.594012 PRIN3 .160] 8. 0:572539. infile 'E84.023303 0.1 on page 447 for the output from the SAS statistical software package) yields the following summary: Coefficients for the Principal Components (Correlation Coefficients in Parentheses) Variable In (length) In (width) In (height) Variance (A. .417 6.160 6.005 [ 8.97) 23.1953 .3 98.30 X e2 .. The first principal component.Xk) .523 (.60 x'1O.510 In (width) + . PRINl '0." Xl X2 X3 .4. 1 PROGRAM COMMANDS Principal Components Analysis 24 Observations 3 Variables Simple Statistics X2 4.024261488 = In [(iength)·683(width). Since I 0.3.0064167255 0. var xl x2 x3.477573765 0.0110720040 0.1 SAS ANALYSIS FOR EXAMPLE 8.019 6.0080191419 0.36 X 103 Mean StD Xl 4.703) and covariance matrix Summarizing Sample Variation by Principal Components 447 PANEL 8.dat'.000598 0.5 e3 . The very distinct elbow in this plot occurs at i = 2.510 (.725.1 100 0.022705 0. x3 =Iog(height).019 8.788 . There is clearly one dominant principal component.024661 0.98517 1.773 11 illI! 11 A principal component analysis (see Panel 8. xl = log(length).3.): Cumulative percentage of total variance title 'Principal Component Analysis'. x2 =Iog(width).51O(height).324 .072 8. which explains 96% of the total variance.683 (.683 In (iength) + .510220..446 Chapter 8 Principal Components The natural logarithms of the dimensions of 24 male turtles have sample mean vector i' = [4.0060052707 0.703185794 0..523 In (height) Total Variance = 0.96051 0.0081596480 96.
gives the length of the projection of the vector (x . .3 and Result 4.. Figure 8.) are the eigenvalueeigenvector pairs of I. the sample principal components can lie in any two perpendicular directions. For instance.2. the last few sample principal components can often be ignored.4. If AI = Az.I' which have an Np(O. . with S in place of I. = 2. 0. in some sense. . (x . not invariant with respect to changes in scale. where Al .. The sample principal components are well determmed. = e.a hyperellipsoid that is centered at x and whose axes are given by the eigenvectors of SI or. when the eigenvalues of S are nearly equal. [See (28) and (29). the absolute value of the ith principal component. The diagonal matrix A has entries AI. If the last few eigenvalues Aj are sufficiently small such that the variation in the corresponding ej directions is negligible. Yj = e. equivalently..4(b) shows a constant distance ellipse.. . The approximate contours can be drawn on the scatter plot to indicate the normal distribution that generated the data. Even when the normal assumption is suspect and the scatter plot may depart somewhat from an elliptical pattern. at x. A) distribution..x). and the data can be adequately approximated by their representations in the space of the retained components. Also.7.(X . z Because ej has length 1. equivalently. with Al > Az . First...5Z3.xl'S' (x ... e. but it is not required for the development of the properties of the sample principal components summarized in (820).· " . (824) defines . Ai == Az . Geometrically. suppose the underlying distribution of X is nearly Ni 1'. (See Exercises 8.4 Sample principal components and ellipses of constant distance. .:: . . Ap and (A j .1') = c of the underlying normal density.] Thus.1. variables measured on different scales or on a common scale with widely differing ranges are often standardized.II(X . n XjZ Zj = nI/2(Xj  x) = VS.6 and 8. the sample principal components Yj = e. (See Section 8. lie along the axes of the hyperellipsoid.) Finally. which coincide with the axes of the contour of (824). which . Then the sample principal components.x) I. When the contours of constant distance are nearly circular or. The geometrical interpretation of the sample principal components is illustrated in Figure for E. the contour consisting of all p X 1 vectors x satisfying (xx)'S'(xx)=c2 \i 11 Figure 8. Fjgure 8. The normality assumption is useful for the inference procedures discussed in Section 8. the sample variation is homogeneous in all directions.:: 0 are the eigenvalues of S. in general.x) on the unit vector ej. p. • Summarizing Sample Variation by Principal Components 449 "2' (x . Standardizing the Sample Principal Components Sample principal components are. including directions of the original coordinate axes. i = 1.(x .4(a) shows an ellipse of constant centered at x.:: Ap . . standardization is accomplished by constructing Xjl  XI Xz j = 1.(x . Now.) The lengths of these hyperellipsoid axes are proportional to i = 1. including those of the original coordinate axes..x) = c2 !I It 11 Interpretation of the Sample Principal Components The sample principal components have several interpretations. .x) = cZ 2 estimates the constant density contour (x . Similarly. the adjusted height is (height).x) are realizations of population principal components Y. It is then not possible to represent the data well in fewer than p dimensions.:: A . the axes of the ellipse (circle) of constant distance are uniquely determined and can lie in any two perpendicular directions. for the rounded shape of the carapace. . the sample principal components can be viewed as the result of translating the origin of the original coordinate system to x and then rotating the coordinate axes until they pass through the scatter in the directions of maximum variance.).5. If S positive definite. p. (See Section 2. 1 yd = 1e.. Supplement 8A gives a further result concerning the role of the sample principal components when directly approximating the meancentered data Xj  x. .) As we mentioned in the treatment of population components.2. I). . and their absolute values are the lengths of the projections of x .x in the directions of the axes ej. They lie along the axes of the ellipse in the perpendicular directions of variaflce. For the sample.2.448 Chaptet 8 Principal Components the first principal component may be viewed as the In (volume) of a box with adjusted dimensions. the data may be plotted as n points in pspace.(x . The data can then be expressed in the new coordinates. Consequently.p.X)'SI(X . of S. from the sample values Xj' we can approximate I' by xand I by S. Az. we can still extract the eigenvalues from S and obtain the sample principal components.
. p Znl Xl Zn2 Xl2  Xli  where (Ai. The observations in 103 successive weeks appear to be indepen dently distributed. Xz. Ap O. (n .l)S12 (n . Then Example =_l_ Z 'Z n..l)s2p (n . X22 ..4 in the Exercises. + Ap yields the sample mean vector [see (324)] 1 ' Z' 1 Z' 1=1 z=(I ) = n n n =0 (827) Using (829). . . given br.0007. . i = 1.l)Slp (n . As we have mention ed. .. . . but the rates of return across stocks are correlat ed. Citibank. Xn2 . with the matrix R in place of S. .. Xs denote observe d weekly rates of return for JP Morgan . because as one 6xpects. and ExxonMobil. Citibank . the observatlO?S are already centered by construction.2...0011. The data are listed in Table 8. p (829) vs.. vs. Z? znp X2 Xl p . . Sample variance (Yi) = Ai Sample covariance (Yi.. Also.p (830) p and sample covariance matrix [see (327)] S = _l_(Z z n1 !n'z) '(z . . . 8. Xl VS.2. the ith sample principal compon ent is = = ZI] [ZlI Z12 [zn . .l)s22 sZ2 n1 (n . equivalently.VS.. The weekly rates of return are defined as (current week closing pricep revious week closing price )/(previ ous week closing price). Zn are standard ized observations with covariance matrix R..Xz VS.observations ar:..li')'(Z .. .l)Slp =R (828) VS.!n'z) n n = _l_(Z .. This rule does not have a great deal of theoreti cal support . Wells Fargo.. = tr(R) = p = Al + A z + .Xp In addition. ZIP] '.. adjusted for stock splits and dividends.1)spp spp The sample principal components of standardized . Total (standar dized) sample variance and i.0040) . individu ally.0040.. . Yk) 0 (826) X21  vS. Royal Dutch Shell. (820).2. x' = [... we see that the proport ion of the total sample variance explaine d by the ith sample principal compon ent is Proport ion of (standar diZed») sample variance due to ith ( sample principa l compon ent i = l. Xnp .Xp i = 1.450 Chapter 8 Principal Components The n X p data matrix of standardized observations Summarizing Sample Variation by Principal Components 451 If Zl.. X2p .Xz . however..l)SI1 Sl1 (n . Let xl. and it should not be applied blindly. only those compon ents which..1 A rule of thumb suggests retaining only those compon ents whose variance s Ai are greater than unity or. .1 (n . a scree plot is also useful for selecting the appropr iate number of components. Royal Dutch Shell. e..2.. VS.l)szp (n .k = 1.) is the ith eigenvalueeigenvector pair of R with Al Az . and ExxonMobil) listed on the New York Stock Exchang e were determi ned for the period January 2004 through Decemb er 2005. Wells Fargo. respectiv ely. Z2. there is no need to wnte the components In the form of (821). .0016. explain at least a proport ion 1/p of the total variance.l)S12 (n . stocks tend to move togethe r in respons e to general economic conditions.lz') n.p Xnl  Xl VS.Xp i = 1.' .S (Sample principal components from standardized data) The weekly rates of return for five stocks (JP Morgan . ..
the smallest eigenvalueeigenvector pair should provide a clue to its existence.469.315.574 . e) = [..000 .437. there is some evidence that the corresponding population correlation matrix p may be of the "equalcorrelation" form of (815). .6925 1.000 The eigenvalues of this matrix are = 2. This component might be called a general stockmarket component.) Thus.48.574 1. xz. have interesting interpretations. Body weight (in grams) for n = 150 female mice were obtained immediately after the birth of their first four litters.000 . = 1. .000 = Xl . e4 = [ e5 = [ We note that the first eigenvalue is nearly equal to 1 + (p . = . 1.511 . = . = .400.7386 .469z 1 + . An unusually small value for the last eigenvalue from either the sample covariance or correlation matrix can indicate an unnoticed linear dependency in the data set. This interpretation of stock price behavior also has been suggested by King [12).772.115 .085. .585.342. In any event. The second component represents a contrast between the banking stocks (JP Morgan.496.95] We note that R is the covariance matrix of the standardized observations Zl 1. pages 131133.4.236z2 .629.384..501.387z4 + .452 Chapter 8 Principal Components and Summarizing Sample Variation by Principal Components 453 Example 8.h = elz = . .000 .1.7501 [ .9.Xs R = The eigenvalues and corresponding normalized eigenvectors of R. the variation in weights is fairly well explained by the first principal component with (nearly) equal coefficients.595.183 . The are small and about equal.136.071.45. . The first principal component . we see that most of the variation in these stock returns is due to market activity and uncorrelated industry activity. .368. Thus." of the five stocks.000 .322 .585z4 + .. Thus.381. determined by a computer.493) = 3. .6854) . The remaining components are not easy to interpret and.255. Citibank. ExxonMobil).109) . .606) Al = 3.7386 . R = . Rutledge.49. 4 The sample mean vector and sample correlation matrix were. although "large" eigenvalues and the corresponding eigenvectors are important in a principal component analysis.407) 100% = 77% of the total (standardized) sample variance.50z4 accounts for loo(AJ/p) % = 100(3. (See the discussion in Section 3. rounding error in the computation of eigenvalues may lead to a small nonzero value. If the linear expression relating X4 to (Xl> XZ. This notion is explored further in Example 8. A3 = . .11. It might be called an industry component.683 . The eigenvectors associated with these latter eigenvalues may point out linear dependencies in the data set that can cause interpretive and computational problems in a subsequent analysis. Although the average postbirth weights increase over time. and A4 = .6 (Components from a correlation matrix with a special structure) Geneticists are often concerned with the inheritance of characteristics that can be measured several times during an animal's lifetime.532.683 m] LOoo and x' = [39.6363] . . . although A4 is somewhat smaller than Az and A3 .093..000 . . are AI Az A3 A4 As . .49z3 + . where I' is the arithmetic average of the offdiagonal elements of R. eigenvalues very close to zero should not be routinely ignored.532z2 + . _ Comment.146 .407. .058/4)% = 76% of the total variance.213 .632 1. A2 = .6625 .511 .437 . .289.. represent variation that is probably specific to each stock.236.183 1.361z s Yz = ezz = .213 . collectively.632 .606zs These components. simply.X4 is always zero. ej = [ .08. . which account for Cl ..115 .056.322 ..1) (. If this occurs.363.217 e2 = [.49z1 + . and X3 are subtest scores and the total score X4 is the sum Xl + Xz + X3' Then.6329 .155 .Xz Xs . A2) 100% = C. or "index.6363 . respectively. The first component is a roughly equally weighted sum. . a market component. Wells Fargo) and the oil stocks (Royal Dutch Shell. . .465z3 + .Zz = VS.88.146 . one (or more) of the variables is redundant and should be deleted.604. Consider a situation where Xl..X3) was initially overlooked. . ..52zz + .7501 1.382. we obtain the first two sample principal components: 'vI = elz = .6625 1. I)x = Xl + X2 + X3 .465.361) . or. although the linear combination e'x = [1.315z3 + .Zs = Xz . 4Data courtesy of 1. .6329 .368z1 . .387.6925 .1)1' = 1 + (4 . 1.XI . they do not explain • much of the total sample variance.498) Using the standardized variables.1.
make QQ plots from the sample values generated by each principal component. Principal components.L _ . Also. .454 Chapter 8 Principal Components Graphing the Principal Components 455 8.51O(x2 .4 Graphing the Principal Components Plots of the principal components can reveal suspect observations. Thus. Figure 8. for the multivariate linear model. £ . ep of S. Apart from the first turtle.3..7 (Plotting the principal components for the turtle data) the plotting of principal components for the data on male turtles discussed m Example 8. Figure 8. 2. .3 The diagnostics involving principal components apply equally well to the checking of assumptions for a multivariate multiple regression modeL In fact.e· . Each observation can be expressed as a linear combination Xj = .6 Scatter plot of the principal components . The observation for the first turtle is circled and lies 10 the l0:ver nght corner of the scatter plot and in the upper right corner of the QQ plot.478) + .4.4.. Since the principal components are combinations of the original variables.P j=l J J J J (832) can be scrutinized in the same manner as those determined from a random sample.4. It may be suspect. J e· ....4. " n (831) + .2. ./ ." ••• . + will oftednlbe SUhCh t.703) 52 = 5'3 = . . (See Supplement 8A for more general approximation results.3.. These help identify suspect observations. To help check the normal assumption.4. it is often necessary to verify that the first few principal components are approximately normally distributed when they are to be used as the input for additional analyses. J 2 \ 0 2 QQ plot for the second principal component Yz from the data on male turtles.. .4.. the square of whose length is YJq + ".e·).. The three sample principal components are Yl = .703) where Xl = In (length).3 = Yjle. .725) . Yj p contnbutmg to this square engt Wl e arge.159(XI . ez.h and Yz of the data on male turtles.. • •• • •• • •• •• . 1. This point should have been checked for recording errors. . 52). Construct scatter diagrams and QQ plots for the last few principal components.478) . the plot appears to be reasonably elliptical. so the last eigenvalues will be zero._ . . derived from the covariance matrix of the residuals.1 •• • • • • • :. :V2 Figure 8. it is prudent to consider the Residual vector or = Example 8.5 shows the QQ plot for Yz and Figure 8. The plots for the other sets of principal ponents do not indicate any substantial departures from normality.622(X2 .523(X3 + ... respectively.I • :V.w fit the That is.hllabt atlleast one of the coordinates Yjq' . the magnitudes of the principal components determine how well the fe. construct scatter diagrams for pairs of the first few principal components. n . and X3 = In (height).324(X3 .04 L'_'_ _i . + Yj2 e2 + .4.3..: .qle ql differs from Xj by Yjqe q + '" + Yjpe p.478) + . + (xjep)e p .. The last principal components can help pinpoint suspect observations.:)(A . YiJeJ + Yj2 e2 + .788(X3 .. + Yj. as well as provide checks on the assumption of normality. e· .683(XI ..6 the plot of (Yl. it is not unreasonable to expect them to nearly normal. having fit any model by any method of estimation.) The following statements summarize these ideas.04 o.703) . within rounding error.713(xI .. X2 = In (width).725) Ej = Yj (pXI) (pXI) (pXI) P\ j = 1. You should be aware that there are linear dependencies among the residuals from a linear regression analysis. + Yipe p of the complete set of eigenvectors el .594(X2 .725) (observation vector) _ (v(ect?r of pr)edicted) esttmated values + .S A (xjedel + (xje2)e2 + . or the turtle have been examined for structural anomalies..1 ..
. The eigenvectors determine the directions of maximum variability. A2. then is approximately Np(O.0019 • Whenever an eigenvalue is large. Since n = 103 is large. for n large.S large Sample Inferences We have seen that the eigenvalues and eigenvectors of the covariance (correlation) matrix are the essence of a principal component analysis.10.8 (Constructing a confidence interval for '\1) We shall obtain a 95% confidence interval for AI.96. we obtainP[lAi .. Approximate standard errors for the coeffis. The elements of each ei are correlated. In practice. In general. . . X k ) = p.=:. [2].. Let A be the diagonal matrix of eigenvalues Ab···' Ap of:t. all i k. Ci) extracted from S or R.'s. such as 100 or even 1000. and the correlation ?epends to a large extent on the separation of the eigenvalues AI.:t) population.0014 and in addition. we can (833) with i = 1 to construct a 95% confidence interval for Al.'s for the A.96 V ) or .). unless there is a strong reason to believe that :t has a special structure that yields equal eigenvalues. .<: Ho: P = po = and (pxp)· ..'s for the e.. decisions regarding the quality of the principal component approximation must be made on the basis of the eigenvalueeigenvector pairs (Ai. you can find some of these derivations for multivariate normal populations in [1]. E.0014 (1 + 1. Example 8. p p 1 A. Because of sainpling variation..!2 (1 . 2A 2). let then Vii (ei .456 Chapter 8 Principal Components Large Sample Inferences 457 where z(a/2) is the upper 100(a/2)th percentile of a standard normal distribution.a. and [5]. Each Ai is distributed independently of the elements of the associated ei· Result 1 implies that. Assume that the stock rates of return represent independent drawings from an N5(P.. Al = . and the eigenvalues specify the variances.ients eik are given by the square rools of the diagonal elements of (l/n) Ei where Ei is derived from Ei by substituting A. A large sample 100(1 .Ad:5 z(a/2)Ai V271i] = 1 . (2) :5 Al :5 . Using this normal distribution.· (1 + z(a/2)V271i)  A. From Exercise 8. most of the total variance can be "explained" in fewer than p dimensions. the confidence interval gets wider at the same rate that Ai gets larger.1.'s and e. Usually the conclusions for distinct eigenvalues are applied.. Even when the normal assumption is violated the confidence intervals obtained in this manner still provide some indication of the uncertainty in Ai and Ci· Anderson [2] and Girshick [5] have established the following large sample distribution theory for the eigenvalues A' = [Ab. We shall simply summarize the pertinent large sample results.a)% confidence interval for Ai is thus provided by _ _ _!. . for reasonable confidence levels. 2Ar/n) distribution.ei) is approximately Np(O. ... the intervals generated by (833) can be quite wide.) Result 2 implies that the e/s are normally distributed about the corresponding e/s for large samples. Bonferronitype simultaneous 100(1 . It must also be assumed that the (unknown) eigenvalues of :t are distinct and positive. the Ai are independently distributed. but Lawley [14] has demonstrated that an equivalent test procedure can be constructed from the offdiagonal elements of R..4. Moreover. where :t is positive definite with distinct eigenvalues Al > A2 > .4 in the Exercises. <: A·I 1 (1 . Let Vii (A  A).96 V 103 . Xn are a random sample from a normal population.z(a/2)v2fn) A test of Ho versus HI may be based on a likelihood ratio statistic. > Ap > o.a)% intervals for m A/s are obtained by replacing z(a/2) with z(a/2m). even though n is fairly large.0011:5 Al :5 . or Corr (Xi. Ai has an approximate N(Aj. these eigenvalues and eigenvectors will differ from their underlying population counterparts. c 1.. To test for this structure. Consequently. some care must be exercised in dropping or retaining principal components based on an examination of the A/s. z(. p of S: . The sampling distributions of Ai and Ci are difficult to derive and beyond the scope of this book. Testing for the Equal Correlation Structure The special correlation structure Cov(Xj . the variance of the first population principal component. > A5 > O. X k ) = Yajjakk p. If you are interested. When the first few eigenvalues are much larger than the rest..0014 . The one exception is the case where the number of equal eigenvalues is known. so that Al > A2 > . is one important structure in which the eigenvalues of :t are not distinct and the previous results do not apply. (See Section 5. with 95% 8. 3.··' Ap] and eigenvectors Cl.025) = 1. Ap (which IS unknown) and the sample size n. Large Sample Properties of Ai and ei Currently available results concerning large sample confidence intervals for Ai and ei assume that the observations XI' X 2. 2. Therefore.···. using the stock price data listed in Table 8.
r= p (P  2 k=I l)2:2:rik i<k Y = (4 .6 Monitoring Quality with Principal Components In Section 5. Consequently. including temperature..2. Clearly.r)2 = (.r) :)2 [2:2: (rik i<k 7')2  r k=l ± (rk .7501 . A 3 ..6625 1.. with A4 being somewhat smaller than the other two.4 T = (n (1 .0 . Example 8.p HJ:p 7: p.P . YiI = el(xi .0 .00245 1 i=I i. If a process is stable over time.(1 .6625) = .1329)(. the first two explain tlIe largest cumulative proportion of the total sample variance.01277 .2)/2 = 5(2)/2 = 5. We consider the first two sample principal components. Using (834) and (835).x).1f[1 .6363 . so the evidence against Ho (equal correlations) is strong.9. Xn be a random sample from a multivariate normal distribution with mean p.6625 63 . We shall use this correlation matrix to illustrate the large sample test in (835).6625 . and we set H.1329 = . As we saw in Example 8. .6329 + .r)2] 2 P . Additional principal components could be considered. X 2.6925 1.2)(1 .00245)] = 11. concentration..0 l 8. Today. pressure. .6925 1.. .6329 + . the smallest eigenvalues A 2.6925 + .6.7386 + . Even witlI 10 variables to monitor.6855)2] 4 .6329 .(1 . Yi2) for j = 1. To monitor quality using principal components.7501 ..x) and Yi2 = eZ(xi .. The large sample approximate 'alevel test is to reject Ho in favor of HI if = (1 (150 . of any two components.1 A 2: (rk . the sample correlation matrix constructed from the n = 150 postbirth weights of female mice is _ R  ['0 . at various positions along the production process.f.. and covariance matrix l:. Here p = 4. .. r4 = .7501 .. + (.6626.458 Chapter 8 Principal Components Monitoring Quality with Principal Components 459 4 Lawley's procedure requires the quantities rk = . there are 45 pairs for which to create quality ellipses..07.6329 .6363 + . a large sample test that all variables are independent (all the offdiagonal elements of l: are zero) is contained in Exercise 8.mal elements.6855f y = (p 1f[1 ... Since (p + 1) (p . p '* Po p 1 p p 1 :l Checking a Given Set of Measurements for Stability Let Xl.6.6731 . 2. Conversely.6855 2:2: (rik .6855f = + . the 5% critical value for the test in (835) is = 11. The first part of the procedure is to construct an ellipse format chart for the pairs of values (Yjl.7501 + .r)2] > XtP+I)(p2)/2(a) where XtP+I)(p2)/2(a) is the upper (100a)th percentile of a chisquare distribution with (p + 1)(p . we obtain rI = '1 3 (.6855f + . so that the measured characteristics are influenced only by variations in common causes. . with the large sample size in this problem. another approach is required to both visually display important quantities and still have the sensitivity to detect special causes of variation. and weight. The value of our test statistic is approximately equal to the large sample 5% critical point.6855)2 .k 2: 'ik p k = 1.6. we consider a twopart procedure.6731..r) and T It is evident that rk is the average of the offdiagonal elements in the kth column (or row) of Rand r is the overall average of the offdiag<.6855)2 [.6791 4(3) r2 = .7386 .2)(1 . + (.7501 + .6791 . and A4 are slightly different.(2.(4 . n. r3 = . Major chemical and drug companies report measuring over 100 process variables.. it is not uncommon for data to be collected on 10 or 20 process variables.7271. including the quality ellipse and the T2 chart.01277 . small differences from the equal correlation structure show up as statistically significant.9 (Testing for equicorrelation structure) From Example 8. we introduced multivariate control charts. witlI electronic and other automated methods of data collection.. if tlIe principal components remain stable over time. tlIe common effects that influence tlIe process are likely to remain constant. _ Assuming a multivariate normal population..7386 @ .p.. r = _2_ (. then the values of the first two principal components should be stable.6363) = .(p .1) .2)/2 d.6855)2 = 2. but not overwhelming. but two are easier to inspect visually and.6329 .6855)2 i<k + (. .7')2 = (.
206 '628.048 .582 .7 521.2 61.6 563./L)'elel + (X .046 .7 1045.2 686.7 782.5 135. we see that this is the value 3936.1.151 .629 . Although n = 16 is not large.9 2143. + Y2 .10 (An ellipse format chart based on the first two principal components) Refer to the police department overtime data given in Table 5.5 165.8 1142.9 170.734 .1 contains the five normalized eigenvectors and eigenvalues of the sample covariance matrix S.9. M § • 0 • The first two sample components explain 82 % of the total variance./L.2 This ellipse centered at (0.3 2366.9 89.2 340.2 878.129 Let us construct a 95% ellipse format chart using the first two sample principal components and plot the 16 pairs of component values in Table 8.7 296. we use = 5./L)'e3e3 + .9 2153.397 .7 213.2 1917.586 221.069 .5 38./L)'eZ e 2 + (X .9 for period 11. and the sample variance of the second principal component is the secondlargest eigenvalue '\2' The two sample components are uncorrelated..5 148.985 . Scanning Table 8.).8 2787. This chart is created from the information in the principal components not involved in the ellipse format chart.Al A2 Example 8. is shown in Figure 8. because the second principal component for this point has a large value.643 . • In the event that special causes are likely to produce shocks to the system.3 270. • • • • • • • • Variable Appearances overtime Extraordinary event Holdover hours COA hours Meeting hours e) e2 .8 68.5 184. a second chartis required..0).9 256.658 .4 620. Xj .3 7.8 883.3 283. Table 8. • so the quality ellipse for n large (see Section 5.039 .7 177. Table 8.5 142.7.2.99 Al A Period 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Yjl Yj2 Yj3 Yj4 YjS 2044.1 209.4 761.7 281.2.4 276.6 196. and the ellipse becomes Table 8. Aj 1.6 1437.8 441.0 244. Even without the normal assumption.5 82.6 28.99.7 The 95% control ellipse based on the first two principal components of overtime hours. X . the second part of our twopart procedurethat is.2 418./L can be expressed as the sum of its projections on the eigenvectors of I.2.4 3936.6 6115 202. I.1 Eigenvectors and Eigenvalues of Sample Covariance Matrix for Police Department Data 8 S I . + (X  /L)'epe p .3 437.6 450.3 443. According to the entries of e2 in Table 8.077 .8 84..429.4 66. The principal component approach has led us to the same conclusion we came to in Example 5.8 425.2 403.107 e3 .9 250.1 565..8 563. '2 '2 • .824 § '" I 2000 (x) (xz) (X3) (X4) (xs) .5 545.081 e4 . The sample values for all five components are displayed in Table 8. the sample variance of the first principal component YI is given by the largest eigenvalue AI. Consider the deviation vector X . . and assume that X is distributed as Np(/L.155 2.2 325.3 125. along with the data.8 2186.9 132.8 2187.250 .7 588.2 Values of the Principal Components for the Police Department Data :1 + :2z 'z 'z :5 5.0 115.6 707.503 .6 344..3 159.8 0. • +. One point is out of control.213 .2 464.1 189..460 Chapter 8 Principal Components Monitoring Quality with Principal Components 461 By (820).1 1988.rz) such that YI < X2 2( a ) .0 373.6) reduces to the collection of pairs of.9 736.8 801.7 1193.770.138 es .2 169..007 .8.392 .6 966.432 .2 182./L = (X .226 o 2000 4000 F·igure 8. the second principal component is essentially extraordinary event overtime hours.6 498.107 .784 99.· possible values CYI.
+ . Recall that Var (Y. the principal components do not have a normal even when the tion is normal. Y3j is large in period 12. equivalently.T .'. Consequently. base it on these last prmclpal components.8.. we considered checking whether a given series of multivariate observations was stable by considering separately the first two principal components and then the last p .. chart on the approxImatIOn errors... epJ be the orthogonal matrix whose columns are the eigenvectors The orthogonal transformation of the unexplained part. . + Ypep !I . and COA hours..Y1ej . no further modifications are necessary for monitoring future values. when p = 20 variables are measured..2. holdover. A1/2y. . For example. + _ _ A3 A4 Ap (838) o  This is just the sum of the squares of p .2 population principal components. the principal components and eigenval ueS must be estimated. .81. To illustrate the second step of the twostep monitoring procedure.2 degrees of freedom. T} = . the quality ellipse based on the first two principal components..) = Ai for i = 1. The approximation to X .8 A T chart based on the last three principal components of overtime hours. and so has a chisquare distribution with p . this T2 based on the unexplamed variation in the original observations is reported as highly effective in picking up special causes of variation. and from Table 8.A3 Yj3 '2 + . The T 2chart is shown in Figure 8. Since points 12 and 13 exceed or are near the upper control limit. we can.p.. Using the eigenvalues and the values of the principal componentl'.891 and so on. it is usual to appeal to the large sample approximation described by (838) and set the upper control 2 limit of the T 2chart as UCL = c = a).. given in Example 8.462 Chapter 8 Principal Components or Monitoring Quality with Principal Components 463 Example 8. . Consider the quality control analysis of the police department overtime hours in Example 8.. This T 2statistic is based on highdimensional data.p. Yk ) = 0 for i k. something has happened during these periods. . We note that they are just beyond the period in which the extraordinary event overtime hours peaked. Further. was shown in Figure 8. we plot the time sequence of values E'(X  m [!].1.10.by the first two pnnclpal components has the form Y1el + Y2e2' This leaves an unexplained component of !i I: 1\ X . the statistic based on the last p .2.11 (A T 2 chart for the unexplained [orthogonal] overtime hours) x11 p = Yjel + Y2e2 + Y3e3 + . Since p = 5. Was there some adjust_ ing of these other categories following the period extraordinary hours peaked? Controlling Future Values Previously.2 = 3 dimensions. From Table 8.2.7. Because the chisquare distribution was used to approximate the UCL of the T 2chart and the critical distance for the ellipse format chart. P and Cov(Yi.p) I ei is the population ith principal centered to have mean O.2 independent standard normal variables.I 11 11 where Yi = (X . it is customary to create a T chart based on the statistic o 2 5 10 15 Period Figure 8. it uses the information in the 18dimensional space per?endicular to the first two eigenvectors el and e2' Still.10. However.. the large coefficients in e3 belong to legal appearances. becomes f. Rather than base the. The first part of the quality monitoring procedure... k Ink of the sample data.[1] m UJ Y.A4 Ap Yj4 '2 Yjp '2 which involves the estimated eigenvalues and vectors.. and the upper control limit is = 7. 4 "* 2 +_ + . 6 J = '2 Yj3 ' + '2 Yj4 A + '2 YjS A A3 A4 As where the first value is T2 = .Y 2e2 Let E = [el.. we create the chart for the other principal components. this chart is based on 5 . so the last p . + . e2.2 principal components are obtained as 2an orthogonal of the approximation errors. Because the coefficients of the linear combinations ej are also estimates.
X .Yjlel . In particular. at least 50 or more observations are needed. Typically.05 or . These include numerous process variables as well as quality variables. We drop this point and recalculate eigenvalues and eigenvectors based on the covariance of the remaining 15 tions. that avoids the difficulty caused by dividing a small squared principal by a very small eigenvalue.Yj2 2) e Note that. • te o •• • • The upper control limit is then cx.Yjlel . An alternative approach (see [13]) to constructing a control chart. in order to set future limits. .9 A 99% ellipse . For each stable observation.. the space orthogonal to the first few principal components has a dimension greater than 100 and some of the eigenvalues are very small. we proceed as follows. The component consisting primarily extraordinary event overtime is now the third principal component and is not inclUded in the chart of the first two. we set du 2" 2 = 1 "'' d u j = C IJ n j=l 8 0 '" • S <. where a = . take the sum of squares of its unexplained component Appearances overtime Extraordinary event Holdover hours COA hours Meeting hours (Xl) (X2) (X3) (X4) (xs) db j = (Xj . we also have The principal components have changed.X . the constant c and degrees of freedom IJ are chosen to match the sample mean and variance of the db j. j = 1. Because our initial sample size is only 16. ..3. The results are shown in Table 8. more than 100 variables are monitored simultaneously.Yj2 e 2) . . To implement thIS approach.464 Chapter 8 Principal Components In Example 8.2. the dbj are plotted versus j to create a control chart. Figure 8.12 (Control ellipse for future principal components) Monitoring Quality with Principal Components 465 . Using either form. The lower limit of the chart is 0 and the upper limit is set by approximating the distribution of db j as the distribution of a constant c times a chisquare random variable with IJ degrees of freedom. Usually.01.9 gives the 99% prediction (836) ellipse for future pairs of values for the new first two principal components of overtime. n. The 15 stable pairs of principal components are also shown. 5000 2000 2000 4000 Figure 8. determined that case 11 was out of control.J:: 0 0 0 and detennine • •• • • § I • . from stable operation of the process. which is just the sum of squares of the neglected principal components. by inserting EE! = I.(a). In applications of multivariate control in the chemical and pharmaceutical mdustnes. Example 8. (Xj . .. For the chisquare approximation. has been successfully applied. format chart for the first two principal components of future values of overtime.10. dropping a single case can make a substantial difference.
.=1 L (Xj .i .X.. . .i) so the error sum of squares is + UU'(Xj . Xn .. e2(Xj .i)] tr[UU'(xj .X)'UIUI + (Xj ..i)'(1 ...UV')(Xj ..x) IS the projectIOn of x.UU') U = U . Yj2.x) (SA2) This follows because.i) + U(U'(Xj .UU'(Xj .i)'UU'(x. or proo:: (Xj . Moreover. .' .x) l' of the first r sample principal components for the jth unit. .Ur] [ UHXj ...UU'(x..b j )' (U' (Xj . + }ljrer j=1 = (n .d . we shall present interpretations for approximations to the data based on the first r sample principal components. = so the jth column of its transpose A' is 8j = hlel + 466 t ± ' 2: (Xj .i) .UU') (Xj . Ur (see Result 2A.x) (SA3) . n)..XI·S bes t approxlma I S projectIon on the space spanned by U(.. The ..i) .1) (A +1 r + . Further..12).x) : =UU'(xji) u..U_ == 0... + A) p where Ar+1 . _. Let Er = .. for an arbitrary vector b.i . Ap are the smallest eigenvalues of S.i)'u r U r = [UbU2. ' Un so that u2. an]' (nXp) to the mean corrected data matrix [Xl . Xj .i)'UU'(x . .UV') (x . .=1 2: (Xj  n " i)' (I . YjrJ' = [el(Xj .. chosen so that b. X2 .[u(. u r] satIsfies U'U = I.x) .(Xj .i .x  UU'(Xj . ..x) . We consider approximations of the form A = [ab a2.=1 Yj2 2 n i)'UU'(xj . .=1 L (Xj .X.b j) where the cross product vanishes because (I .i . .= U'(x.x» = [eb e2.1) tr[U'SU] (SA4) .3).b j ) (Xj .set of r perpendicular vectors UI. ..x) = = ± L n tr[(x. .x) (Xj .i)'] ' j=I" e + . .i) (Xj .x). .···. The interpretations of both the pdimensional scatter plot and the ndimensional representation rely on the algebraic result that follows.Ubj) = (Xj . The error of approximation sum of squares in (8Al) is minimized by the choice . t e db y .t . b j IS ..x).i . .UU'U = U . er].x  8j)' (Xj  x 8j) = (n .1) tr[UU'S] = (n . h last term· (SA 3) B h YmaxlmlZmg t e m . ..I j=1 ' We are now in a position to minimize the error over choices of U b . are the .X)l uz(Xj . THE GEOMETRY OF THE SAMPLE PRINCIPAL COMPONENT ApPROXIMATION In this supplement. I Let A be any matrix with rank(A) (nXp) r< min (p. .i)(x. where ei is the ith eigenvector of S.x) +0 + (V' (Xj .Ubj = Xj .Ubj = (I ... .i on the plane' . y t e properties of trace (see Result 2A. + (x..The Geometry of the Sample Principal Component Approximation 467 Supplement where [Yjl.. last t IS and UbI UU (Xj . .. .X)'U2U2 + .. . u2. For fixed U ' x. .. with the choice a= Ub= UV"( xx) (SA 1) b· n ' " ' ecomes !r:n Result SA.i)' (Xj .x) ...X]' The error of approximation is quantified as the sum of the np squared errors (SAI) C?nsider first any A whose transpose A' has columns a· that are a linear matlOn of a flXe..i»' (x.Ubj)'(xj ..
x. gives 01 = el' For perpendicular to by e2.. From (S19)..• .468 Chapter 8 Principal Components The Geometry of the Sample Principal Component Approximation 469 3 That is. so the plane passes through the sample mean.L (Xji j=1 n Xi  ci bj )2 Considering ( An to be of rank one we conclude from Result SA . by columns. Also. The plane through the origin.(xj  " A x» by Result SA.i) (Xj ." . n Fr':. = j=1 n [ n L (Xj . Now.b) = a' + Ubi.l)S.=1 ) J " L j=1 x Ubj ) + n(x . determined by uJ. translated to pass through a..X.. bIll.X. v = 0 and the sum a/the squared lengths a/the projection d ev la t/O I1S J1 2. becomes a + Ub for some b. e2. r."" u" consists of all points x with for some b This plane. With this choice the ith diagonal element of U'SU is e:Sei = e:p'iei) = Ai so e].i on the plane Ub is Vj : l!U (Xj . = (a + Ub) + U(bj . Let U = U in (SA3). U2. 1 that xp)' i=l b· = nb <F 0. The investigator places a plane through xand moves it about to obtain the largest spread among the shadows of the 5 If LT = .i)'UU'(xj 0.v)' = " L j=1 j=1 L L n J J (Xj  a . e2"'" er' The coefficients of ek are ek(xj . use a + Ubj.a .x) = tr L (Xj ."" er] = Er and A' = ErE. .Xi.x) = Jjb the kth sample principal component evaluated at the jth observation.2 the projection of the deviation x.a) The nDimensional Geometrical Interpretation Let now consider. ith column [Xli . " squared distances L dJ between the observations Xj and the plane.1.Xi' X2i . L It v)(Vj . If Xj is approxij=1 .) = ( 11 _ 1 1) tr [" Vjvj ] = j=1 (11  1 1) tr [" v'v· ] .x). An alternative interpretation can be given.Ubi) = = and this plane also maximizes the total variance L j=1 n (Xj (Xj  xx Ubj + ia)' (Xj Ubj)' (Xj " A x Ubj + x a) tr(S.m (8A.. .] = A' has rank (A) :. d" A . [See (252). Also. the first diagonal element of U' SU. since [Ublo'''' Ub. L (v. The approximating plane interpretation of sample principal components IS illustrated in Figure S.i)' j=1 (n .i»'(xj  x ErE. . + error bound follows.as Il asserted.1) tr(S) = (n .] Continuing.l)(Al + A2 + .1) tr[U'SU] is maximized by U E.? vjVj = = 11 J=1 (Xj . A/?). J...10.' . since v = = j=1 " mated by a + Ubj with L b j = j=1 0.. .. . For: = 1. We want to select the rdimensional plane a + Ub that minimizes the sum of j=J 2: dY.Ubj)'(xj . n x) = (n . the approximation of the meancentered data matnx by A. + A r .. The square of the length of the error of approximation is 2: L n (Xj  j=1 x ErE.i)' (Xj .[Xl . selecting DJ to maximize D1SD].a)' (x ... X x]. the best choice for U maximizes the sum of the diagonal elements of U'SU.(xj . ... This plane is determined bye]. . " tr [U'siJ] = AJ + A2 + . b2 . The lower bound is reached by taking a = x.]' is approximated by a multIple cib of a fixed vector b' = [b].. X"i .. X2 . we find that U = [e].5 then (n .____+_ 2 Figure 8_10 The r = 2dimensional plane that approximates the scatter plot by minimizing The pDimensional Geometrical Interpretation The geometrical interpretations involve the determination of best approximating planes to the pdimensional scatter plot. and the • r .
2 Determine the principal components YI .Xi].6(a).] In either case. (a) Find the eigenvalues of the correlation matrix minimizes the sum of squared lengths 2: LT..19 Exercises 8. Let 1= 0 4 0 2 0 0] [0 0 4 . calculate the proportion of the total population variance explained by the first principal component.z!.12 i = [155.11 The first sample principal component.4 of Chapter 1.470 Chapter 8 Principal Components 3 Exercises 471 (b) Compare the components calculated in Part a with those obtained in Exercise 8. to a line. and indicate the principal components 511 and 512 on your graph. and each of direction. Find the principal components and the proportion of the total population variance explained by each when the covariance matrix is (a) Principal component of S (b) Principal component of R Figure 8. What interpretation.1 for all variables.X)/vs:.] has length n .7. L from the deviation vectors.Xi)/YS. if any.1."" Xni . (XZ' . Convert the covariance matrix in Exercise 8.. Determine the population principal components YI and Yz for the covariance matrix (a) Determine the sample principal components and their variances for these data.1. can you give to the first principal componeflt? 8. (a) Find the sample principal components 511. (a) Determine the principal components YI and Y2 from p and compute the proportion of total population variance explained by YI .'s) have the most influence on the minimization of i=1 1 P p] pIp [P P 1 2: LT· p Are your results consistent with (816) and (817)? (b) Verify the eigenvalueeigenvector pairs for the p X P matrix p given in (815).62 26.62J 303.2. . That is. (b) Compute the proportion of the total sample variance explained by 511' (c) Compute the correlation coefficients 'YI>Zk' k = 1. do you feel that it is better to determine principal components from the sample covariance matrix or sample correlation matrix? Explain. 8.1 to a correlation matrix p. Given the original data displayed in Exercise 1.'vI' minimizes the sum of the squares of the distances. d.3. .Xi.. Y2 .4.60J If the variables are first standardized. Yz and their variances. = [Xli .2. Note that the longer deviation vectors (the larger s. X2i .4..45 303."" Xni . the best direction is determined i=1 ' p P= by the vector of values of the first principal component.. In the former case Li 2: between [Xli . This is illustrated in Figure 8. Data on XI = sales and X2 = profits for the 10 largest companies in the world were listed in Exercise 1.x)/vs:. 8. and Y3' What can you say about the eigenvectors (and principal components) associated with eigenvalues that are not distinct? 8.2.) (b) Find the proportion of the total sample variance explained by 'vI' (c) Sketch the constant density ellipse (x . i=1 14. (d) Compute the correlation coefficients 'Yl>}(k' k = 1.Xi.6 to a sample correlation matrix R.X)'SI(X .70 ' s= [7476. (You may need the quadratic formula to solve for the eigenvalues of S. From Example 4.x) = 1.6.Xi> XZi . Convert the covariance matrix S in Exercise 8.11(b). Are they the same? Should they be? (c) Compute the correlations PYI>ZI' PYI>Z2' and PY2. Interpret 'vI' (d) Compare the components obtained in Part a with those obtained in Exercise 8. the vector b is moved around in nspace to minimize the sum of P z·IS the squared d'Istance the squares of the distances L7. I = Also.S. r.11( a). 8.Xi. the resulting vector [(Xli . (X .Xi)' and its projection on the line determined by b. <p< 1 1 v2 v2 8. The second principal component minimizes the same quantity among all vectors perpendicular to the first choice. [See Figure 8.4.
Explam. Hint: Exercises 473 (a) Compute the correlations r.00584 0.00606 0.01825 0.05253 0..00951 0. Hence ILi J. al 'nt is is approximately Xtp+2){p1)/2' Thus. and max L(JL. because the constant denSIty .01744 0.00805 0.03193 0.01303 0. The following exercises require the use of a computer.03054 0.00323 0.01267 0. Use the results in Example 8. so S may be used.01081 0.02380 0.01701 0.00299 0. A2 . :to) is the product of the univariate p.) (a) Construct the sample covariance matrix S. [Mithm'ti' moon J A npl2 < C 94 95 96 97 98 99 100 101 102 103 0.01017 0.00319 0. X(p+2){pl)/2 • 2 contours are spheres when:t = 0' I..01437 Wells Pargo 0.04477 0.01220 0. X p(pI 2 (b) Show that the likelihood ratio test of Ho: :t = 0'21 rejects Ho if .Zk for i = 1. + /=1 ±(Xjp .03449 0.00311 0.00688 0. .00970 0.02382 0.00122 0.02296 0.02920 0..02099 0.00807 0.n12eXP[± (XjjJLY/2O'il].01645 0.00266 0. l IT .01531 0.01279 Citibank 0. The weekly rates of return for five stocks listed on the New York Stock Exchange are given in Table 8.01039 0.Xj)2..) 8.02039 0. . Use Result 5.01371 0.00167 0.01792 0.01196 0 0.01885 0.2 and k = 1. (c) Construct Bonferroni simultaneous 90% confidence intervals for the variances AI.. Note that testing:t = :to is the same as testmg p = I.00349 0.01697 0.05819 0.04456 0. (A test that all variables are independent. A IS Inl2 (1. 2 (a) This test is called a sphericity test. A.00466 0.xp/J/n p under Ho. A for a large sample size.00756 0.02919 0.01349 0. and Y 3 • (d) Given the results in Parts ac. 8.02990 0.01820 0.LjUjj and ojj = (1In) (b) Verify 0. The sample a CrItical pomt IS 2 )1 (a) .00443 0.) (b) Determine the proportion of the total sample variance explained by the first three principal components.(8)/ p i.}.00522 0. and find the sample principal components (a) Consider that the normal theory likelihood ratio test of Ho: :t is the diagonal matrix o IT A 0'22 o I .472 Chapter 8 Principal Components 8.9.01669 0. and A3 of the first three population components YI .01225 0. List any assumptions required in carrying out this test..00337 0.02859 0.04098 0..' likelihoods.00336 0.03593 0.00372 0.04070 0. p' p the divisors n cancel in the statistic. Do?these • tions reinforce the interpretations given to the first two components. do you feel that the stock ratesofreturn data can be summarized in fewer than five dimensions? Explain. Again.00614 0.5.00838 0.01113 0. results in an improved chisquare approximation.00849 0. Y2 .(2p + 1l)/6nJlnA be used m place of .02817 0.00305 0.= IR In/2 < =p TI .prenhal1. Bartlett [3] suggests that 2[1 .01004 0.comlstatistics. maX(27T)n/2O'i.00515 0. I ]n12 _ geometrIC mean Aj (.00608 : Royal Dutch Shell 0.03166 0. (See the stockprice data on the following website: www.00863 0..:t) is given by (510). (Note that the sample mean vector x is displayed in Example 8.04848 0.01380 0..00248 0.01874 ..00784 0. the large sample a CrItIc pO! .03732 0.02800 0.10.00864 0.5. so S may be used.01082 0.2 = Ho: P = Po = p [ p .00290 0. versus at the 5% level of significance.01618 0.01637 For a large sample size.02568 0.(2p2 + P + 2)/6pn) In A .5.4.).01013 0.01723 0.00621 0. Table 84 StockPrice Data (Weekly Rate Of Return) Show that the test is as follows: Reject Ho if s In/2 c Week 1 2 3 4 5 6 7 8 9 10 : JP Morgan 0.=1 n/2 Sji in (820).8.02528 0.00768 0.2.00633 0. Xl)2 (xj1  + .02156 0. 2ln A is approximately Bartlett [3] suggests the test statistic 2[1 . Interpret these components. (b) Test the hypothesis 1 p p p p 1 p P (a) max L(JL.02174 0.2 to calculate the chisquare degrees of freedom.03059 0. The divisor n cancels in A.00844 0.00695 0.00498 Exxon Mobil 0. p p p p p 1 p 1 ± [± j=1 /=1 = nI± xjj j=l j=l (Xjj .
04 3. .29 74. ObservatIOns rom a .5 23.74 9. .38 7.01 74.37 10.0652 .4108 .58 86.61 67.2293 . . The first four fish belong to the centrarchids. f d' nt census tracts are likely to be correlated. 8.16 2.44 9.Perform a principal component analysis using S.by the two pnnclpal ( ) p t obtained in Part b Calculate the correlatIOn coefficients.02 72.59 77. Compare your results with the results in' es f h' h .84 4..20 83. .80 2. Interpret your results.25 4. VS22 = 33.52 1.21 2.7 24. That is.67 2.60 1.6 20.94 53. (b) Obtain the pairs and the first two sample pnnclpal components for the covariance matnx m Part a. Perform a principal component analysis using the sample covariance matrix of the sweat data given in Example 5.2249 . Construct a QQ plot for each of the important principal components.60 : 30.6 'der the censustract listed in Table 8.1 20.4108 . c Corn ute the proportion of totar variance explained . that is.47 2.5. ' ? C an th e d at a be data usmg matnx IS chosen for anaI YSls.0 24.70 74.16 1.12 5. Conduct a pnnclpal componen t ana I YSls t e·· fewer bP th matrix S and the correlation matrix R.com/statJstlcs.40 2.IS.5534.13.4919 .) Fish caught by the same fisherman live alongside of each other. Your job is to summarize these data in Sl2ConslertealTpoU .3 covanance m .94 71. (Note that .3506 .'.14 2.48 5. Vs)) = 36. d' th e or fewer dimensions? Can you mterpret t e prmclpa componen s.3 26. these 61 observations may not Note''.Xk' and e components if p' ossible.2636 . atrix can be obtained from the covanance matnx given m xamp e 8.5918.2249 .prenhall.2 21. Consl Exercises 475 8.69 1. Why?) .71 4. .2045 .prenhall.34 7.79 5. (d) Prepare a table of the correlation coefficients between each principal component you decide to retain and the original variables. h f'f h i d ow by 10 by multiplying the offdiagonal elements m t e I t co umn an r an d th e diagonal element S55 by 100.23 1 .474 Chapter 8 Principal Components S.2277 .2493 .2144 .7 (see also the radiotherapy data on the website www. What have you .6 36.62 4. in decreasing order of size.6 to construct the sample covariance matrix S.0 48.1917 .2277 . I t ? . _ 7 d' ensions if possible. the percent that each eigenvalue contributes to the total sample variance. (c) Given the results in Part b.48 1.com/statistics). so the data should provide some evidence on how the fish group. Over a period of five years in the 1990s.11 1.72 6.26 2. (a) Comment on the pattern of correlation within the centrarchid family XI through X4' Does the walleye appear to group with the other fish? (b) Perform a principal component analysis using only Xl through X4' Interpret your results. C t the sample covariance matrix S for the censustract data when lue home is recorded in ten thousands of dollars.93 4.5 31. Exampe I 8.2 20.II.54 78. . . . S.30 2.98 64.9 1.82 4.65 2. ry.5. Are there any suspect observations? Explain. ' . I Jace C plete data set available at www. decide on the number of important sample principal components. summarIZe m re are v'5.42 1.4653 .2045 .000) 1 2 3 4 5 6 7 8 9 10 2. Their responses were then converted to a catch rate per hour for Xl X4 Tract Total population (thousands) Professional degree (percent) Employed age over 16 (percent) Government employment (percent) Median home value ($100. the most plentiful family.' = 32..8 17.07 1.50 1. h I fth table by 10 of dollars.55 1.30 4. h .36 6.72 2.14 5.4919 .98 = Bluegill X2 Xs = Black crappie X3 X6 = Smallmouth bass = Northern pike = Largemouth bass = Walleye The estimated correlation matrix (courtesy of Jodi Barnet) R= 52 53 54 55 56 57 58 59 60 61 78.00 67.25 3.3517 Use these and the correlations given in Example 8. (There were a few missing values.2144 1 . Suppose the observations on d' lue home were recorded in ten thousands. Prepare a table showing.8 19. The four sample standard deviations for the postbirth weights discussed in Example 8.03 72.25 5. Is it possible to summarize the radiotherapy data with a single reactionindex component? Explain.24 4.0652 . (b) Pick one of the matrices S or R (justify your choice).2635 .3127 .71 4.2 37. rather than hundred thousands. om is based on a sample of about 120.43 5.85 2. principal components? 'd h . (c) Perform a principal component analysis using all six variables.73 1.58 1. the n = 98 observations on p = 6 variables represent patients' reactions to radiotherapy.4653 . I h mterpre 3 Wh at..2493 . .33 1. Xs = me Jan va .84 71. h' .0647 .83 3.me lan va . If possible. E I . and determine the eigenvalues and eigenvectors.39 78.3 32.93 69.1917 .44 5. The walleye is the most popular fish to eat.82 2. can you say about the effects 0 t IS C ange m sca e on t e . and VS44 = 37.' ' 0 fh • . (a) Obtain the covariance and correlation matrices Sand R for these data.2293 .4 20. multiply all the numbers listed m t e SlXt co umn 0 e . (a) d' Xs . II tion data listed in Table 1.9909.14 6.40 4.3 43. constitute a random samp e.23 1.6 22.33 79. yearly samples of fishermen on 28 lakes in Wisconsin were asked to report the time they spent fishing and how many of each type of game fish they caught. ? D 0 't make any difference which d oes I learne.0 27.3506 .16. interpret the components.44 2.27 7.54 5.3127 . S. ' . In the radiotherapy data listed in Table 1.0647 .14.52 73.2.
72 13.83 14.60 49.59 3. Source: lAAFlATES Track and Field Statistics Handbook for the Helsinki 2005 Olympics.18 (min) 1.44 129.48 13.68 3.76 1. (b) Determine the first two principal components for the standardized variables.70 27.13 14. long.11 20.27 10.8.11 46. The marathon is 26.71 1.45 13.17 10.97 13.79 1.81 1.57 3.26 12.9 to speeds measured in meters per second.77 29.62 3.81 29.29 13.93 10.6 National1rack Records for Men Country Argentina Australia Austria Belgium Bermuda Brazil Canada Chile China Columbia Cook Islands Costa Rica Czech Republic Denmark DominicanRepublic Finland France Germany Great Britain Greece Guatemala Hungary India Indonesia Ireland Israel Italy Japan Kenya Korea.67 3. Construct a scree plot to aid your determination.65 20.43 146.23 13.20 10.72 1.56 3.79 1.30 28.43 20.18 45.11.80 1.63 43.75 1.55 127.54 4.76 1.48 13.78 44. and the cumulative percentage of the total (standardized) sample explained by the two components.01 3.21 13.69 1. Using the data on bone mineral content in Table 1.78 1.09 46.49 28.000 m and the marathon are given in minutes.31 13.23 20.73 1.40 10.98 12.73 1.97 (min) 27.61 3.22 127.62 27.26 44.21 13.19 139.00 9.195 meters.59 20.57 10.00 10.83 1. Compare the results with the results in Exercise 8.53 3.37 126. and Sale Wt. 5000 m.53 132. PrctFFB. Courtesy of Ottavio Castellini.83 3.37 45. and determine its and eigenvectors.41 20.57 48.59 3.52 20.38 45.58 46.64 3.16 10.60 28. North Luxembourg Malaysia Mauritius Mexico Myanmar(Burma) Netherlands New Zealand Norway Papua New Guinea Philippines Poland Portugal Romania Russia Samoa Singapore Spain Sweden Switzerland Taiwan Thailand Thrkey USA lOOm (s) 10.25 45.45 12.18.35 10.13 13.91 30.20.23 130.78 1.87 30.20.21 10.94 1.90 13.29 10. Frame.19 10.33 10.59 3.47 127.37 10.20.33 44.57 127.19 20.85 22.00 131.35 139.49 3.38 9. (c) Do you think it is possible to develop a "body size" or "body configuration" index from the data on the seven variables above? Explain.6.70 1.42 20.30 10.48 3.72 4.05 13.43 27.2 miles. Utilizing the seven variables YrHgt.38 27.80 1.03 20.77 1.16 20.50 144. Which analysis do you prefer? Why? 8.52 20.10 10.62 4.79 1. (See also the data on national track records for men on the website www.94 1.11 10. Convert the national track records for men in Table 8.04 19.85 21. 10.12 20.90 46.80 45.65 27.72 46.41 10.82 1.29 13.96 20. FtFrBody.70 1.90 45.93 27.05 130.44 45.21 27.13 132.39 13.04 27.69 44.10.16 124.63 45.19 129.38 129. .75 1.64 13.16 161.22 12.58 3.63 3.77 45.67 28.36 132.53 3.45 28. (a) Obtain the sample correlation matrix R for these data.52 27. plot the data in a twodimensional space with YI along the vertical axis and Yz along the horizontal axis.18.09 13.77 3.98 20. South Korea.61 3.84 10.13 27.18 44. Consider the data on bulls in Table 1.24 27.14 29.70 3.30 129.57 44.77 20.18 for the men.92 45.38 28.75 1.45 20.74 1.20 16.22 127.70 13. 3000 m.15 10. 29.15 134.02 10.77 3. The data on national track records for men are listed in Table 8.36 10. (b) Interpret the sample principal components.44 27.08 10.97 10.37 20.36 29.76 1.80 3.74 1.23 131.72 1.30 19.18.46 (min) 13. perform a principal component analysis using the covariance matrix S and the correlation matrix R.75 13.61 20.96 13.68 46.46 20.81 1.31 48.84 1.57 3. perform a principal component analysis of S.36 45.99 46.2 miles.33 27.06 20.53 27.54 3.50 3.42 13.22.11 10.61 3.81 20. 1500 m.57 129.28 10.09 20.21 10.76 1.71 (min) 3.16 10.11 14.76 3.195 meters.72 27.74 1.98 13.12 .01 10.69 21.84 13.9.23 19.03 28. Can you distinguish groups representing the three breeds of cattle? Are there any outliers? (e) Construct a QQ plot using the first principal component.00 9.26 44.26 133.78 200 m 400 m 800 m 1500 m 5000 m 10.01 13.25 125.29 126.84 27.66 13.77 44.42 14.18 10.76 1.476 Chapter 8 Principal Components 8. or 42.54 20.92 20. SaleHt.05 46.41 21.44 3.15 13.84 51.19.70 1.60 44.20 146.77 3.53 3. (d) Using the values for the first two principal components.15 20.75 20.75 1.23 21.33 12.80 1.58 26.73 1.comlstatistics) Repeat the principal component analysis outlined in Exercise 8.37 46. Perform a principal component analysis using the covariance matrix S of the speed data.98 47.82 3.31 128.55 3.82 1.90 34.52 3.73 1.21.73 20. Refer to Exercise 8.35 3.34 10.11 45.40 21.15 126.89 27.59 45.62 46.23 (min) 129.29 44.61 3.48 3.6 to speeds measured in meters per second.) (d) Rank the nations based on their score on the first principal component.25 13.87 1.77 1.86 3.41 44.09 28.10 132.00 139. Your analysis should include the following: (a) Determine the appropriate number of components to effectively summarize the sample variability.67 27.43 45.93 13.07 127.32 10. Prepare a table showing the correlations of the standardized variables with the nents.33 130.11 14.80 27. The data on national track records for women are'listed in Table 1.22 13.17 171.06 20.21 127.80 1.75 1.20 129. does the subsequent ranking differ from that in Exercise 8.000 m Marathon (s) 20.50 29.20 10.19 20.18 129.18? Which analysis do you prefer? Why? 8.42 20. Does this ranking correspond with your inituitive notion of athletic excellence for the various Exercises 477 Table 8.09 132.58 3.47 20.04 13.02 45. Refer to Exercise 8.64 10. The second component might measure the relative strength of a nation at the various running distances.49 44.48 46.14 20.46 28.prenhall.73 1.27 143.94 19.36 128. The marathon is 26.89 20.71 1.93 20.80 1. (Note that the first component is essentially a normalized unit vector and might measure the athletic excellence of a given nation.64 44.34 28.86 10.42 45.86 21.89 19.17 148.88 35.78 28.23 9.24 10.08 10.20 29.06 9.19 13.77 1.71 1.77 1.04 132.84 3.26 134.54 44. Perform a principal components analysis using the covariance matrix S of the speed data. 1500 m.73 3. (c) Interpret the two principal components obtained in Part b.63 3.23 10.32 27.11 10.36 27.07 13.78 10.77 3.43 19. long.54 31.23 126.77 45.25.89 44.56 134.74 1.76 1.72 20.17 27.14 10.64 14.72 45.91 13.72 26.59 130.95 47.42 13.65 3.13 10. Compare the results with the results in Exercise 8.65 27.60 10.71 31.70 3.00 3.81 27.32 (s) 46.50 14.57 128.17 10.51 132.90 29. BkFat.17 21.28 27.18 21. Are the results consistent with those obtained from the women's data? 8.21 10.17 20.51 28.74 3.91 14.96 45. Notice that the records for 800 m.87 10. Convert the national track records for women in Table 1.03 149.43 20. Notice that the records for 800 m.57 3.49 16.13 138. 8.32 10. Do your interpretations of the components differ? If the nations are ranked on the basis of their on the first principal component.29 10.53 3.80 1.28 14.66 13. 13.24 3. and the marathon are given in minutes. Interpret the plot.18 131.40 46. or 42.38 countries? 8.27 12.
35 80. . Construct a scree plot to aid m your determInation. (c) D? you it it i.39 s= 3266. Cotton 1.23.88 21. Table 8.(Stress at and.50 3.ected as part of a study to assess options for enhancing food secunty. based on the sum of squares db j. BL (breaking length).50 162.through the sustaInable use of natural resources in the Sikasso region of Mali (West Afnca). (c) Comment on the similarities and differences between the two analyses.6.00 1. (b) Interpret the sample principal components.10.0 3.478 Chapter 8 Principal Components Exercises 479 8.0 0 2. data were coll.0 5. Identify any outliers in this data set. X9 = X4 Xs X6 X7 = Sorg (hectares of sorghum planted in year 2000) = (a) Perform a principal component analysis using the covariance matrix.00 4.50 5. Remove any obvious autliers from the data set.00 5.69 93.46 1343.54 1175.25.0 3.50 . and DistRd versus Cattle.5 6.0 . 8.73 731.com/statistics (a) Construct twodimensional scatterplots of Family versus DistRd.26.24. (c) Using the values for the: first two principal plot the m a tW?dimensional space with YI along the vertical aXIs and Y2 along the honzontal axiS.50 1. 8. an chart. BS strength).98 Head width (cm) 31. 8. Benev.0 3. (b) Interpret the sample principal components.0 1.85 13.5 4.0 0 0 1.0 8.0 10. Measurements on n = 61 bears provided following summary statistics: . performs a principal component analYSIS usmg the ance matrix S and the correlation matrix R Your analysis should include the followmg: (a) Determine the appropriate number . Can you distinguish groups representing the two socioeconomic levels and/or the two genders? Are there any outliers? .0 1.00 1.17 474.7.73 179. Conform and Leader. Can the data be effectively summarized in fewer than six dimensions? (b) Perform a principal component analysis using the correlation matrix.37 324.26 XI = Family (total number of individuals in household) X2 X3 = = = DistRd (distance in kilometers to nearest passable road) Cotton (hectares of cotton planted in year 2000) Maize (hectares of maize planted in year 2000) Millet (hectares of miJIet planted in year 2000) Cattle (total).prenhall.5 1.1 2.00 . 8.25 1.52 55.5 1.8. A total of n = 76 farmers were surveyed and observations on the nine variables Variable Weight (kg) Body length (cm) 164.27.95 56.00 4.35 80.7 and on the website www.5 2.00 1.0 0 3.0 Maize 1. .0 0 1.97 731.50 7.80 39.SF .80 94. (d) Using the values for the first two principal components.0 .38 Neck (cm) Girth (cm) Head length (cm) 17.00 4.5 2.0 2. perform a principal component analYSIS USIng the covanance matnx Sand correlation matrix R.00 0 0 0 0 0 0 0 0 : Bull 2 6 0 1 4 2 0 0 4 2 : Cattle 0 32 0 0 21 0 0 0 5 1 6 3 0 0 8 0 0 14 0 8 Goats 1 5 0 5 0 3 0 0 2 7 : 77 21 l3 24 29 57 0 41 500 19 18 500 100 100 90 90 0 1. Refer to Example 8.0 6.5 2.00 1.0 2. Add the variable X6 = regular overtime hours whose values are (read across) 6187 7679 7336 8259 6988 10954 6964 9353 8425 6291 6778 4969 5922 4825 = Bull (total number of bullocks or draft animals) = Xs Goats (total) were recorded.00 2.88 9.00 2.17 117.37 1343.0 0 0 1. to monitor the unexplaIned vanatlon m the onginal observations summarized by the additional principal components.25 1.25 537.28. The pulp and paper properties data is given in Table 7. The data are listed in Table 8.17 117. (d) Construct a 95% confidence interval for Ab the variance of the first population principal component from the covariance matrix.15 281. Your analysis should include the following: (a) Determine the appropriate number of components to effectively summarize variability. Refer to the police overtime hours data in Example 8.68 238.00 1.68 238.50 1.0 2.00 .s possible to develop a "paper strength" index that effectively contams the mformatlOn in the four paper variables? Explain.00 4.85 39. .00 Sorg 3.15 63.28 281.50 162.10 and the data in Table 5. EM (elastic modulus).00 . Consider the psychological profile data in Table 4.50 1 0 1 2 6 1 0 3 2 7 0 1 0 5 6 5 4 10 2 7 Source: Data courtesy of Jay Angerer.5 0 Millet .00 2.7 Mali Family Farm Data Family 12 54 11 21 61 20 29 29 57 23 20 27 18 30 DistRD 80 8 l3 13 30 70 35 35 9 33 : 7307 6019 and redo Example 8.13 Sample mean x Covariance matrix 95.00 2.of.73 94. . page 240.98 63.17 56. Using the five Indep.50 1.91 324.00 2.97 721. Using the four paper variables.0 0 1.54 1175. 8.00 3. Construct a scree plot to aid in your determination.10. components summarize the variability. plot the data in a twodimensional space with YI along the vertical axis and Y2 along the horizontal axis. A naturalist for the Alaska Fish and Game Department studies grizzly bears with the goal of maintaining a healthy population.25 537.73 13.0 5.
Hotelling. might correspond to another factor.321377. . 24 (1960). and 1. B. if available. Mosimann. "Analysis of a Complex of Statistical Variables into Principal Components. "On Testing a Set of Correlation Coefficients for Equality. Because of this early aSSOCIatIOn With constructs such as intelligence.2oo2. Basically. W. mathematics. Charles others to define and measure intelligence. 39 (1966). Hotelling. (c) Interpret the first five principal components. the factor model is motivated by the follOWIng argument: Suppose variables can be grouped by their correlations. B." Journal of Educational Psychology. 3. C. S. Interpret this chart.ed by Spearman suggested an underlying "intelligence" factor.2735. 481 .217225. P. 110115. H. Kourti. "Asymptotic Theory for Principal Components Analysis. 5. 2." The American Statistician.480 Chapter 8 Principal Components (b) Perform a principal component analysis using the correlation matrix R.). English. T.1 Introduction Factor rather turbulent controversy throughout its history. 34 (1963).43 (1989).28. 4." Psychometrika. 139142. FACTOR ANALYSIS AND INFERENCE FOR STRUCTURED COVARIANCE MATRICES 9. quantities called factors. Lawley. Can you identify. McGregor.417441. Girschick. 149151. "Size and Shape Variation in the Painted Turtle: A Principal Component Analysis. P. "A Note on Multiplying Factors for Various ChiSquared Approximations. 13. C. 8. 12. Jolicoeur. 15. Arguments over the psychological interpretations of several early studies and the lack of powerful computing facilities impeded its initial development as a statistical method. R." Journal of Quality Technology.). however. perhaps. New John Wiley. 296298. 122148. 14." Journal of the Royal Statistical Society (B). H. (b) Construct an alternative control chart. Anderson. Identify the car locations that appear to be out of control. that IS responsible for the observed correlations." Growth. "The Multivariate Generalization of the Allometry Equation. W. For example. French. the covariance relatIOnshIps many variables in terms of a few underlying. 24 (1933). M. H. Its modern begInnIngs he m the early20thcentury attempts of Karl Pearson. Then It IS concelvabl. 7. to the variation in the original observations summarized by the remaining four princi" pal components.." Biometrika.498520.409428. "Market and Industry Factors in Stock Price Behavior. that each application of the technique must be examined on its own merits to determine its success. 10 (1939). 46 (1992). physicalfitness scores. 34 (1963). Bartlett. 11." The American Statistician. 1 (1936). 16 (1954). T. (a) Construct a 95% ellipse format chart using the first two principal components. factor analysis was nurtured and developed primarily by scientists interested in psychometrics.29." Journal of Educationaf Psychology. "Multivariate SPC Methods for Process and Product Monitoring. Using the covariance matrix S for the first 30 cases of car assembly data. 16. Canonical Variates and Principal Components. "The Most Predictable Criterion. Rencher. Jolicoeur. and music colIect. It IS std I true. "On the Sampling Theory of Roots of Determinantal Equations. "Relations between Two Sets ofVariates." Annals of Mathe/1zatical Statistics. Dawkins. N. 19 (1963).vl Yz. Rao. Most of the original techniques have been and early controversies resolved in the wake of recent developments. correlations from the group of test scores in classics.203224. A. 2003.. Linear Statistical Inference and Its Applications (2nd ed. References 1.339354. Anderson. "Simplified Calculation of Principal Components. Refer to Exercise 5. 28 (1936). It IS thiS type of structure that factor analysis seeks to confirm. "Multivariate Analysis of National Track Records." Annals of Mathematical Statistics." Biometrics. New York: WileyInterscience. D. 10. if possible. H. 28 (1996). T.497499. Use the propor" tion of variation explained and a scree plot to aid in your determination. E. An Introduction to Muftivariate Statistical Analysis (3rd ed. Determine the number of components to effectively summarize the variability. suppose all variables within a particular group are highly correlated among them have relatively small correlations with variables in a different group. purpose of factor analysis is to describe. Hotelling. or factor. "goats and distance to road" component? 8.e that each group of variables represents a single underlying construct. That is. 139190. but un observable. obtain the sample principal components. 6. and 1. Hotelling." Annals of Mathematical Statistics. M. a size" component? A. 26 (1935). A. for example. King. A second group of variables." Journal of Business. "Interpretation of Canonical Discriminant Functions. based on the sum of squares db j. The advent of highspeed computers has generated a renewed interest in the theoretical and computational aspects of factor analysis. 9.
p.. . However. in many investigations the E. so the matrix L is the matrix of factor loadings. and that F and e are independent.F) = E(eF') = 0 (pXm) These assumptions and the relation in (92) constitute the orthogonal factor model. Xp . the factors F to be correlated so that Cov (F) is not diagonal m?deL The obhque model presents some additional estimation difficulties a . where 'I' is a diagonal matrix The coefficient £ij is called the loading of the ith variable on the jth factor...' ... has mean p. primary question in factor analysis is whether the data are consistent with a prescribed structure.) (X .F) = E(X .. (pXl) Cov(e) = E[ee'] = . thiS book. El. This distinguishes the factor model of (92) from the multivariate regression model in (7 23). tend to be combinations of measurement error and factors that are uniquely associated with the individual variables. specific factors... sometimes.. the factor model is Xl .. . . in which the independent variables [whose position is occupied by Fin (92)] can be observed. (X . F2. so Cov(e. We assume that E(F) = (mxI) orthogonal factor model implies a covariance structure for X From the model In (94). However the approximation based on the factor analysis model is more elaborate.. which can be checked. .2 9.p. Fm. by the model in (94). + flmFm + El + £22 F2 + .ILp are expressed in terms of p + m random variables Fj.IL2 = £llFl £2l F l Orthogonal Factor Model with m Common Factors X=p. a direct verification of the factor model from observations on Xl. (X . Also by independence. The factor model postulates that X is linearly dependent upon a few unobservable random variables Fl .ILl. Note that the ith specific factor Ei is associated only with the ith response Xi' The p deviations Xl . F2. F) = E( e F') = 0 Also. Xp is hopeless. . called common factors. XIL= L F (pXm)(mXl) (pXl) F and e are independent E(F) = 0. and p additional sources of variation El. (See [10].p.IL2.. Cov (e) = 'It. . X 2. 1 In particular.+L F+e (pXl) (pXl) (pXm)(mXl) (pXl) ILi = mean of variable i Ei = ith specific factor (94) + £12F2 + . Ep which are unobservable. in matrix notation. With so many unobservable quantities. = (LF + e) «LF)' + e') = LF(LF)' + e(LF)' + LFe' + ee' so that l: = Cov(X) = E(X . ..)' = LE(FF')L' + E(eF')L' + LE(Fe') + E(ee') 0 .. Cov (F) = E[FF'] = I (mXm) E(e) = 0 .' matrix l:. . Cov (e.)F' = LE(FF') + E(eF') = L.p.. 1 As Maxwell [12] points out.. Cov (F) = I + E (pXl) E( e) = 0. . and C01varian. e) + eF'..p. with some additional assumptions about the random vectors F and e. the model in (92) implies certain covariance relationships..p. E2.2 The Orthogonal Factor Model The observable random vector X. X 2 .) 2 AllOWing. . Ep' called errors or. l)Y . E2. + f2mFm + E2 Fj = jth common factor eij = loading ofthe ith variable on the jth factor The unobservable random vectors F and e satisfy the following conditions: or. . Both can be viewed as attempts to approximate the covariance matrix l:.with p components.ILl = X 2 .) F' = (LF + F' = LF F' Cov(X. 'It = (pXp) ["'? : 0 0 = LL' + 'It 0/2 0 jJ (93) according to (93)..)' = (LF + e) (LF + e). Fm. . .482 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices The Orthogonal Factor Model 4¥3 FaCtor analysis can be considered an extension of principal component analysis.) (X .
That portion of the variance of the ith variable contributed by the m common factors is called the ith communality. 12J [4 1] 2 30 57 5 23 2 5 38 47 12 23 47 68 = 7 2 1 6 1 8 4 [1 7 2 1 6 e'rl + '" + Crm + I/Ii + 0 4 0 0 0 0 1 0 or COv(X.p.IL2 = C 21 F2F3 + ance structure LV + 'If given by (95) may not be adequate. That portion of Var (XJ = (J"ii due to the spe.484 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices The equality The Orthogonal Factor Model 485 Covariance Structure for the Orthogonal Factor Model 1. + CimC km 2. .ILl = Cl1 F1F3 + Bl.. it is when m is' small relativp to p that factor analysis is most useful.. Since L = The model X .. but the relationship is nonlinear.X2 . any covariance matrix l: can be reproduced exactly as LV [see (911)]. if X contains p = 12 variables. However. related to underlying factors. P The ith communality is the sum of squares of the loadings of the ith variable on the m common factors. ['" ' 'J [4 lJ r"' 2l C C 3l e32 C 22 _  7 2 1 6 ' £41 £42 1 8 0 0 0 0 1/12 0 0 0 0 1/13 0 0 0 0 0 1 0 0 4 CrI + CT2 + '" + CYm + I/Ii hi = cL + e1 2 = or 42 + 12 = 17 and the variance of Xl can be decomposed as communality or + specific variance (96) (J"ll= (erl+Cfz) 19 + I/Il=hr+I/Il + 'v' 2 17 +2 and i variance communality + specific variance = 1. For example. we see frOm (95) that 'If = the communality of Xl is. in fact. The very lmportant assumption of linearity is inherent in the formulation of the traditional factor model.F) = L or l: = LL' + 'If may be verified by matrix algebra.2. = LF + e is linear in the common factors. . and the factr (94) with m = 2 is appropriate. A similar breakdown occurs for the other variables.1 )/2 = pep + 1 )/2 variances and covariances for X can be reproduced from the pm factor loadings Cij and the p specific variances I/Ii' When m = p.. so 'I' can be the zero matrix. such as in Xl . Denoting the ith communality by hr. Cov(X) = LL' + 'If or Var(Xi) = ['9 3. / / .Xk) = CilC kl + . Cov(X. or specific variance. l: has the structure produced by an m = 2 orthogonal factor model.1 (Verifying the relation l: = LL' + 'I' for two factors) Consider the co variance matrix 19 30 l: = 2 12] 5 23 30 57 [ 2 5 38 47 12 23 47 68 Thefactor model assumes thatthe p + pep . the factor model provides a"'" pIe" explanation of the covariation in X with fewer parameters than the pep parameters in l:. cific factor is often called the uniqueness. Therefore. then the pep + 1)/2 = 12(13)/2 = l: are described in terms of the mp + p = 12(2) + 12 = 36 pararr the factor model. from (96). In this case.. If the p responsesX are.. • Example 9.
given by the diagonal elements of LL' = (L*) (L*).4 = .70 = C11 C 31 = + 0/3 I = LV + '11 = LTT'L' + 'I' = (L*) (L*). Example 9.40 == C21 C31 implies that LT and L (99) Substituting this result for C 21 in the equation .7 Using the factor model in (94). That is. different from the loadings L. and X3 have the positive definite covariance matrix The Orthogonal Factor Model 487 gives 0/1 = 1 . or Cl1 = ± 1. To see this. let T be any m X m orthogonal matrix 1 so that TT' = T'T = I. the equation 1 =' Cl1 + o/l> or 0/1 = 1 . the loadings L* = . However. to distinguish the loadings L from the loadings L*. 1 The pair of equations . + 'I' (98) This ambiguity provides the rationale for "factor rotation. in general.9 . and suppose the random variables Xl> Xz. so. most covariance matrices cannot be factored as LL' + '11. The loading matrix is then rotated (multiplied by an orthogonal matrix).F cannot be greater than unity (in absolute value).4 1 where x p. The communalities. C I ). Then the expression in (92) can be written I ." since orthogonal matrices correspond to rotations (and reflections) of the coordinate system for X . [1 . ICll l = 1.575 = .Cl1 The analysis of the factor model proceeds by imposing conditions that allow one to uniquely estimate Land '11. the factors F and F* = T'F have the same statistical properties. Thus. and estimated values for the factors themselves (called factor scores) are frequently constructed. a correlation 11 = Cov(XI. and even though the loadings L* are. we obtain Xl X ILl IL2 IL3 . on the basis of observations on X. Also. Once the loadings and specific variances are obtained. The follOWing example demonstrates one of the problems that can arise when attempting to determine the parameters C ij and o/i from the variances and covariances of the observable variables. factors are identified.= LF + E = LTT'F + and E = L*F* + E (97) = C 11 Fl = z + El C 21 Fl + E2 + E3 L* = LT Since F* = T'F X3  = C 31 Fl E(F*) = T' E(F) = 0 The covariance structure in (95) implies that and Cov(F*) = T'Cov(F)T = I = LV + '11 or .9 1 . Since Var(Fd = 1 (by assumption) and Var(XI ) = 1.486 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Unfortunately for the factor analyst. it is possible to get a unique numerical solution to the equations I = LL' + '1'. for this example with m = 1.255 is too large. yieldS efl = 1. That is. so it is not a proper solution. where the rotation is determined by some "easeofinterpretation" criterion. are also unaffected by the choice of T . since it gives a negative value for Var (e1) = 0/1' Thus. from this point of View. they both generate the same covariance matrix I.Fd = Corr(X1 .90 = C 21 11 C ·70 = C 31 11 C T'T = (mXm) I 1= + o/z AD = C21 C 3l it is impossible.1. there is always some inherent ambiguity associated with the factor model.575. Factor loadings L are determined only up to an orthogonal matrix T.255.90 = C11 C21 both give the same representation.575 which is unsatisfactory. . Now. the solution is not consistent with the statistical interpretation of the coefficients. where the number of factors m is much less than p.7] . _ When m > 1.2 (Nonexistence of a proper solution) Let p = 3 and m = 1.
is to neglect the contribution of A. Does the factor model of (94).these circumstances.+l . In .. The solution from either method can be in order to simplify the interpretation of factors. approach. the solutions should be consistent with one another. we obtam the apprOlumation 1: == [VAr" el ! e2 ! . x2. That is. .ei vA. To apply this approach to a data set xl> X2. (Xjp xp) Apart from the scale factor VAj.. the variables are not related.4. + to 1: in (910).2.... i vA. em] [ : = L L' (pXm) (mXp) (912) \lA. .:=: A2 . Although the factor analysis representation of I in (911) is exact. the principal component (and the related principal factor) method and the maximum likelihood method. The sample covariance matrix S is an estimator of the unknown population covariance matrix 1:. it is usually deSirable to work WIth the standardized variables (Xjl  This fits the prescribed covariance structure for the factor analysis model having as many factors as variables (m = p) and specific variances !/Ii = 0 for all i..e:r. then a factor model can be entertained. If 1: appears to deviate significantly from a diagonal matrix. seeks to answer the question. . adequately represent the data? In essence..n 1: (pxp)(pXp) L L' + 0 (pXp) = LV (911) VS...+ . and the initial problem is one of estimating the factor loadings f.?. Xn • avoids the problems of having one variable with large vanance unduly mfluencing the determination of factor loadings. explain the covaiiance structure in terms of just a few common factors.LL'.488 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Methods of Estimation 489 9.. If the offdiagonal elements of S are small or those ofthe sample correlation matrix R essentially zero. it is not particularly useful: It employs as many common factors as there are variables and does not allow for any variation in the specific factors £ in (94). ! \lA. .. we can write (pXp) Xl) (Xj2  X2) j = 1.:=: O.. ... The centered observations ivA. dominant role. . Neglecting this contribution. ..2. e2 i . when the last p . .1> i··· . and specific variances !/Ii' We shall consider two of the most popular methods of meter estimation.3 Methods of Estimation Given observations XI. factor analysis.. Let 1: have eigenvalueeigenvector pairs (Ai. We prefer models that . Then where!/li (Tu  j=l 2: th for i = 1. The representation in (912) assumes that the specific factors e in (94) are of and can also be ignored in the factoring of 1:.n (914) (910) xp Xjp  xp have the same sample covariance matrix S as the original observations.. One whose sample covariance matrix is the sample correlation matrix R of the observations xl... '.. . The loading matrix has jth column given by VAj ej. In cases in the units of the variables are not commensurate.. where LL' is as defined in (912). .:=: Ap. whereas the major aim of factor analysis is to determine a few important common factors. i \lA. If specific factors are mcluded m the model. VA. Several computer programs are now available for this purpose. em] m " [ __ ... p."" Xn . with a small number of. __ •••. ei) with A1 . we tackle this statistical model_ building problem by trying to verify the covariance relationship in (95). it is customary first to center the observations by subtracting the sample mean x. . as described in Section 9. if the factor model is appropriate for the problem at hand. Current estimation and rotation methods require iterative calculations that must be done on a computer. .. Allowing for specific factors. factors.' . the specific factors play the .:=: ••• . we find that the approximation becomes I==LL'+'IJt _  = : : \IX..] Xj  x: Xjp  = : = : j = 1. the factor loadings on the jth factor are the coefficients for the jth principal component of the population. It is always prudent to try more than one method of solution.:: + r'" '1'1 0 o The Principal Component (and Principal Factor) Method The spectral decomposition of (216) provides us with one factoring of the covariance matrix 1:.+lem+l e :r. xn on p generally correlated variables. and a factor analysis will not prove useful.2. their variances may be taken to be the diagonal elements of 1: .m eigenvalues are small. .. .. .•••• ::::::c:::.
85 4 .490 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Methods of Estimation 491 Consequently.00 .71 1. we may subjectively take the m factor model to be appropriate. I: = ! Vfze2 ! . is known as the principal component solution. Consider the residual matrix = I: = and if m Example 9.. The best approach is to retain few rather than many factors. The number of common factors retained in the model is increased until a "suitable proportion" of the total sample variance has been explained.71 .00 ® @ Oll .. The name follows from the fact that the factor loadings are the scaled coefficients of the first few sample principal components..5) Sum of squared entries of (S  . at most. such as by theory or the work of other researchers.42 . However.C2) 2.3) group. when applied to the sample covariance matrix S Or the sample correlation matrix R. + It is clear from the circled entries in the correlation matrix that variables 1 and 3 and variables 2 and 5 form groups. ed. the estimated loadings for a given factor do not change as the number of factors is increased.. (920) is frequently used as a heuristic device for determining the appropnate number of common factors. we have (see Exercise 9. + f im (917) The prirlcipal component factor anl!lysis of the sample correlation matrix is obtained by starting with R in place of S. The correlation matrix is presented next: Attribute (Variable) Taste Good buy for money 2 Flavor 3 Suitable for snack 4 Provides lots of energy 5 1 TOO (918) resulting from the approxinlation of S by the principal component solution.13 .02 .. the contributions of the first few factors to the sample variances of the variables should be large. so (915) = variances are provided by the diagonal elements of the = with (916) o 0 'iil p Communalities are estimated as hi = fi 1 + fi 2 + ... .. l Sll + S22 A. s]] + S22 + . Then the matrix of estimated factor loadings {f ij } is given by 1 e \ + + '" + e:1= Proportion sample vanance ( due to jth factor  = Al e since the eigenvector el has unit length. + sPI' = tr(S).00 . For example. Ap. . . on a 7point semantic differential scale. since all the eigenvalues are expected to be positive for large sample sizes. assuming that they provide a satisfactory interpretation of the data and yield a satisfactory fit to S or R.13 .) Principal Component Solution of the Factor Model The principal component factor analysis of the sample covariance matrix S is specified in terms of its eigen. (Ap.96 .he number of common factors. a random sample of customers were asked to rate several attributes of a new product. ! VA:em ] The matrix S ..42 .. the offdiagonal elements of S are not usually reproduced by 1:1:' + 'if. (A2. p )... By the of 'iili. The contribution to the sample variance s·· from the .50 1. In general. For the principal component solution.LL'. or equal to the number of positive eigenvalues of S if the sample covariance matrix is factored. and if the other elements are also small.11 (1:1:' + 'if» s + . .01 2 ..•. the diagonal elements of S are equal to the diagonal elements of LV + '1'. do we select the number of factors m? If the number of common factors is not determined by a priori considerations. from the first common factor is then The representation in (913). e1) and (A2 are the first two eigenvalueeigenvector pairs for S (or R).. AnalytiCally. These rules of thumb should not be applied indiscriminately. Ideally.. where Al A2 . Given these results and the small number of variables.!.2 It fIrst common factor is f il' The contribution to the total sample variance. then. the choice of m can be based on the estimated eigenvalues in much the same manner as with principal components... The diagonal elements are zero.11 . a small value for the sum of the squares of the neglected eigenvalues implies a small value for the sum of the squared errors of approximation. Variable 4 is "closer" to the (2.valueeigenvector pairs (AI.. two or three common factors.5) group than the (1.02 1. (See Chapter 8.. + s pp for a factor analysis of S (92D) for a factor analysis of R Aj p .79 3 ® 5 1. were tabulated and the attribute correlation matrix constructed.50 . The responses.. I: = ! e2]' where (AI. Let m < p be . Another convention. if m = 1.3 (Factor analysis of consumerpreference data) In a consumerpreference study. How.00 . is to Set m equal to the number of eigenvalues of R greater than one if the sample correlation matrix is factored. m = p if the rule for S is obeyed. frequently encountered in packaged computer programs. For example. we might expect that the apparent linear relationships between the variables can be explained in terms of.
10 .014 .98.563 .97 . The estimated factor loadings.56 .099 = .68 . for example. (916).487 .53 .73.94 .571 .75 hi '. scaled by the square root of the corresponding eigenvalues.81 Onefactor solution Estimated factor loadings F1 Twofactor solution Estimated factor loadings F1 F2 Variable . communalities.11 1.85 F2 .025 .53 . . nearly reproduces the correlation matrix R. Specifically.2.014 .099 0 .63 .003 . We shall consider this example again (see Example 9.11 0 0 o o .46 . of R are the only eigenval_ ues greater than unity.563 . Good buy for money 3.006 .27 .00 ] = .02 + 0 o o 0 . specific variances.156 0 .65 . JPMorgan Citibank Wells Fargo Royal Dutch Shell ExxonMobil Specific variances ifJi = 1 . .12 0 .054 .56 .02 .75 . A1 = 2. we would judge a twofactor model with the factor loadings displayed in Table 9. The communalities (.80 .156 . 4.93 .134 .831 .732) + (. we can easily obtain principal component solutions to the orthogonal factor model.00 0 R  LL'  o . [0.44 . 5.89 . Taste 2.726 .5.056 .79 .2 m = 2.4 (Factor analysis of stockprice data) Stockprice data consisting of n = 103 weekly rates of return on p = 5 stocks were introduced in Example 8.85 and A2 = 1. . Flavor 4.31 .10 .98.56 .00 .88.54 Ll> + .11 . the factors (and loadings) are unique up to an orthogonal rotation. 2. the first two sample principal components were obtained from R. Provides lots of energy Eigenvalues Cumulative proportion of total (standardized) sample variance F1 . . and specific variances.78 = . The estimated factor loadings. and proportion of total (standardized) sample variance explained by each factor for the m = 1 and m = 2 factor solutions are available in Table 9.605 . h1 = ell + e12 = (.02 . Taking m = 1 and m = 2.769 The residual matrix corresponding to the solution for m = 2 factors is .81 5 = . communalities.82] [. .65 [ .492 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Methods of Estimation 493 The first two eigenvalues.694 .53 1.732 . m = 2 common factors will account for a cumula_ tive proportion A1 + P A2 = 2.53 . _ Example 9.80J .9 and Panel 9.134 o .54 Cumulative proportion of total (standardized) sample variance explained . obtained using (915).17 Now.831 .726 .02 0 o o o 0 .88 .80 2.23 .82 .054 .94 .98 .487 .75 . 3.82 .932 1. The communalities are given by (917).47 .1. .185 [ .78 .00 [1.94 . and (917). In that example. on a purely descriptive basis.437) = .I Table 9.78 .01 1. Thus. A rotation of the factors often reveals a simple structure and aids interpretation.2.93 . We shall not interpret the factors at this point.719 .732 .1 of the total (standardized) sample variance.1) after factor rotation has been discussed. variances .185 .89.12 .98 . with Estimated factor loadings eij = Aieij Communalities Specific variances Variable 1. 2 2 .Pi = 1.33 .81.fri = 1 .025 .93) indicate that the two factors account for a large percentage of the sample variance of each variable.003 o .374 . the estimated factor loadings are the sample principal component coefficients (eigenvectors of R). As we noted in Section 9. So.65 . are given in Table 9.056] .437 .hi Specific .85 + 1.07 Table 9.1 as providing a good fit to the data. Moreover.07 .006 .280 .hJ .605 . Suitable for snack 5.54 1.
although the procedure is also appropriate for S. where rii is the ith diagonal element of R. banktng stocks with the oil stocks.j} are the estimated loadings. e7). and the loadings are about equal. To rates of return appear to be determined by general market co?ditions a?d that are unique to the different industries. the most popular choice. is "'. Further discussion of these and other initial estimates is contained in [6].1(xji)(Xji)' X • "'i =1 j=1 "" e*2 £. .5.. Rr is factored as (921) where = {e. when one is working with a correlation matrix. All of the stocks load highly on this tor.. ..1 variables. F2 seems to differentiate stocks in different industries and might be called an industry factor. m i'" i (nl)p (nl) = tr II {. to our minds. = 1/rU . we should take the number of common factors equal to the rank of the reduced popUlation matrix. . the initial specific variance estimates use Sii. = (21T)  i.) In practice. (See [6]. that initial estimates of speCIfIC . with the communality estimates of (923) becoming the initial estimates for the next stage. Unfortunately. This is essentially the conclusion reached by an exammatton of the sample principal components in Example 8. If the factor model p = LV + 'I' is correctly specified.e correlation matrix Rr should be accounted for by the m common factors.J ij (21T) . which we discuss next. (The banks have relatively large negattve loadtngs.) Thus. the communalities would then be (re)estimated by }. When Fj and Ej are jointly normal. The principal factor method of factor analysIs employs the esttmates The Maximum likelihood Method If the common factors F and the specific factors E can be assumed to be normally distributed.!e (iIL) . since. We describe the reasoning in terms of a factor analysis of R. consideration of the estimated eigenvalues Ai. equivalently. as well as a or ftrm speC:lfic · factor.*2 = £. Suppose. In the spirit of the principal component solution. helps determine the number of common factors to retain. )] (925) L. however. Although the principal component method for R can be regarded as a principal factor method with initial communality estimates of unity. m = 2: numbers that are. the two are philosophically and geometrically different. and the oils have large positive loadings. the observations Xj ..e. "'i = 1 = IzT + "'i "'i "'i . An added complication is that now some of the eigenvalues may be negative.. the two frequently produce comparable factor loadings if the number of variables is large and the number of common factors is small.I . In partIcular. larger than the sample correlatIons. In turn. Although there are many choices for initial estimates of specific variances.494 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Methods of Estimation 495 The proportion of the total variance explained by the twofactor solution larger than that for the onefactor solution.. or specific variances equal to zero. A. = i vA. . this rank is not always well determined from R" and some judgment is necessary. on the factor. Now. the diagonal elements of SI.J I) l j=1 (923) A Modified Approachthe Principal Factor Solution A modification of the principal component approach is sometimes considered. A. apart from sampling variation. For factoring S. • where (A. The second factor the. the likelihood is L.are Then replacing the ith diagonal element of R by hi = 1 we obtam a reduced sample correlation matrix hr./L = LFj + Ej are then normal. the m common factors should account for the offdiagonal elements of p. for. m are the (largest) eigenvalueeigenvector pairs determined from R r . It seems fairly clear that the first factor. However. then maximum likelihood estimates of the factor loadings and specific variances may be obtained. which is equal to the square of the multiple correlation coefficient between Xi and the other p . now. as well as the communality portions of the diagonal elements Pii The principal factor solution can be obtained iteratively. m [r tr 1 ( (Xji)(XjiY+n(iIL)(iILY)] [ (. We do not pursue the principal factor solution. ThIs IS partIcularly true for '13' . due to the use of initial communality estimates. the solution methods that have the most to recommend them are the principal component method and the maximum likelihood method. and from (416). Ideally.'I' = . The initial communality estimates then become hi *2 =1 "'i = 1 • . The relation to the multiple correlation coefficient means that h? can be calculated even when R is not of full rank. r" 1 (924) If the specific factor contribution is removed from the diagonal or. all of the elements of the reduced sampl. ..2. i = 1.. the 1 replaced by the resulting matrix p . . . in general.. Flo represents general economIc conditions and might be called a market factor.
Thus. able one to get these estimates rather easily. communalities. the maximum likelihood estimator of p IS p = Vl/2l:VI/2 = (VI/2L) (Vl/2L). . setting i.ity of choices for L made by orthogonal transformations. surprisingly. the preceding ei/s denote the elements of i •. The communalities corresponding to the maximum likelihood factoring of R are of the form [see (931)] h. as in (810)... + +e '2 '2 fpj (932) p To avoid more tedious notations.l /2. The software program obtains a feasible solution by slightly adjusting the loadings so that all specific variance estimates are nonnegative.1 )/n j S are i = yl/2i. and proportion of total (standardized) sample variance explained by each factor are in Table 9. V 1/2. . Here V. _ The equivalency between factoring Sand R has apparently been confused in many published discussions of factor analysis.. and we evaluate the importance of the factors on the basis of . and .jrli being diagonal. where l: = LL' + 'I' is the covariance matrix for the m common factor model of (94). The maximum likelihood estimates I.' tively..jr. whenever the maximum hkelihood analysis pertains to the correlation matrix.2 = + So. + Vlf2 '1'Vl/2 jJ = CVl/21. hi d'109. By the pro.jr. The maximum likelihood estimates of the communalities are for i = 1. .. efficient computer programs now exist that en: . for example.4 were reanalyzed assuming functions of L and 'I' are estimated by the same functIOns of L and '1'.jr. or e·+e·+···+e I} 2} P} Sll + S22 + . must be obtained by numerical maximization of (925). and a sample correlation matrix is factor analyzed. and the maximum likelihood estimates i.5 and 9.jrV..)' + yI/2q. By the invariance property of maximum (se!? Section where Uii is the sample variance computed with divisor n. because of the multiplic./2(X .yl/2 = an m = 2 factor model and using the maximum likelihood method. + '2 e im · • I If. Fortunately..(proportion of total (standardiZed») = sample variance due to jth factor '2 f lj 2j + .l:). For this example. We summarize some facts about maximum likelihood estimators and.1. " + . A Heywood case is suggested here by the .JL). = yl/2i and . the variables are standardized so that Z = V. the observations are standardized.00 value for the specific variance of Royal Dutch Shell. The distinction between divisors can be ignored with principal component solutions.I/2. where V. given the estimated loadings i. In particular. (See Supplement 9A.jr based on the sample covariance matrix S. P (927) so Proportion oftotal sample = ( variance due to jth factor ) Comment.Xn be a random sample from Np(JL. . Result 9. and q.l/2. + erm have maximum likelihood estimates '2 '2 hi = ea + . not R. we find that the resulting maximum likelihood estimates for a factor analysis of the co variance matrix [(n . The sample correlation matrix R is inserted for [en . . 3 The corresponding figures for the m = 2 factor solution obtained by the principal component method (see Example 9. + spp '2 '2 '2 (928) Proof. Ordinarily. (See Supplement 9A. and .I/2 is the diagonal matrix with the reciprocal of the sample standard deviations (computed with the divisor vn) on the main diagonaL Going In the other direction.58 3 The maximum likelihood solution leads to a Heywood case.. It is desirable to make L well defined by lffiposlOg the computationally convenient uniqueness condition a diagonal matrix (926) the maximum likelihood estimates of the communalities. specific variances.) As a consequence of the factorization of (930). . Although the likelihood in (925) is appropriate for S.p .2.. = V.4) are also provided. and specific variances '1'.jr = yl/2.l)/njS in the likelihood function of (925)..V. p has a factorization analogous to (95 ) Wit specific variance matrix '1'. The maximum likelihood estimators 1" q" and jL = x maximize (925) subject to l:. = V. . for now. then the covariance matrix p of Z has the representation (929) I L . (930) l hI = (.S (Factor analysis of stockprice data using the maximum likelihood method) The stockprice data of Examples 8.2.mat' fiX • . the communalities hr = erl + .perty of maXimum likelihood estimators.l/2'1'V.T. ./2Land . are obtained using a computer.115)2 + (. The estimated factor loadings..l /2 and i are the maximum likelihood estimators of V.. we call i = 1. obtained from R. this practice is equivalent to obtaining the maximum likelihood estimates i and .) Example 9./2 and L. rely on a computer to perform the numerical details.) CVI/2I. 'r.. respec.765f = . .3. Let X l . This is still not well defined. the solution of the likelihood equations give estimated loadings such that a specific variance is negative.X 2.496 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Methods of Estimation 497 which depends on L and 'I' through l: = LV + qr.
as we dId m the principal component solution.4036 .hi Fl .1012 . the fourth factor is not easily identified.0000 .3520 .500meter run have large loadings.2553 . it may have something to do with jumping ability or leg strength.4752 .11 after a discussion of factor rotation. 4. much like Linden's original analysis.loadings Variable 1. The interpretation of the second factor is not as clear as it appeared to be in the principal component solution.002 0 = .3227 .500meter run have large positive loading on the first factor. we prefer the maximum likelihood approach and typically feature it in subsequent examples. The recorded values for each event were standardized and the signs of the timed events changed so that large scores are good for all events.726 .000 .4682 .LL' . 1. 3.4008 .p are much smaller than those of the residual matrix corresponding to the principal component factoring of R presented in Example 9.001 [ 0 . Focusing attention on the maximum likelihood solution.0000 .4125 . the first four eigenvalues.5167 .2812 . Following his approach we examine the n = 280 complete starts from 1960 through 2004.3503 .002 .4728 .2539 .694 .0000 .000 .4163 .i = 1 h? I .788 .3449 .000 R  fi' . The bank stocks have large positive loadings and the oil stocks have negligible loadings on the second factor F2 .000 .563 .4752 .732 . The remaining factors cannot be easily interpreted to our minds. The third factor is running endurance since the 400meter run and 1.322 Citibank . The cumulative proportion of the total sample variance explained by the factors is larger for principal component factoring than for maximum likelihood factoring.000 .280 374 . 5.3520 .000 0 . .755 .4125 .2395 . For the maximum likelihood method.5520 . In this case. 4.182 Wells Fargo Royal Dutch Shell 1.4213 .3998 . of R suggest a factor solution with m = 3 or m = 4.0000 .4357 .3509 .54 .605 .4). From a principal component factor analysis perspective.0000 .(.6040 . a variance optimizing property.647 .4.6040 .7926 .2395 .2812 . useful factor patterns are often not revealed until the factors are rotated (see Section 9.0109 .0002 .2344 .000 .0000 . This suggests that some events might require unique or specific attributes not required for the other events..3509 . For the principal component factorization.1821 . This factor might be labeled general athletic ability.498 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Methods of Estimation 499 3 Maximum likelihood Estimated factor .033 .000 .4036 1. which loads heavily on the 400meter run and 1. the second factor might be simply called a banking factor.2539 .00 .0000 . analyze the correlation matrix. • Example 9. We.323 .3151 .6386 1.0000 The elements of R .06.2553 1. is R= .4179 . reinforces the choice m = 4.831 .3503 1.2116 . We shall return to an interpretation of the factors in Example 9.500meter run might be called a running endurance factor.002 .0983 1.27 .42 .4706 .3262 . From this perspective.5167 .0983 . The fourfactor maximum likelihood solution accounts for less of the total sample .3227 . the two solution methods produced very different results.23 . Factor 2.5668 . 1.000 0 ffi2] 1.33 . all events except the 1. although the estimated specific variances are large in some cases (for example.plL be a diagonal matrix.487 .4179 .3102 .5668 . On this basis.53 F2 .15 .1712 .769 The residual matrix is .1546 .17 Cumulative proportion of total (standardized) sample variance explained ducted a factor analytic study of Olympic decathlon results for all 160 complete starts from the end of World War 11 until the midseventies.652 .4953 1.] .001 .3998 .4706 .3449 .2380 .001 .115 J PMorgan .437 .2100 ..21.032 "'i = ' 1 .3262 . we see that all abIes have positive loadings on FI .052 .7926 . The fourfactor principal component solution accounts for much of the total (standardized) sample variance.4953 . 2. Therefore.002 0 . the first factor appears to be a general athletic ability factor but the loading pattern is not as strong as with principal component factor solution.1712 .27 . [See the discussion preceding (819). which based on all 280 cases.4213 .39.719 '.3657 .2344 .0000 .92. Again. by design.0352 . which have.4728 .1546 1.033 .2116 . too.0002 .4163 . We call this factor the market factor.683 Texaco .1012 .6386 .0109 .001 .000 .4682 .0120 . Specific variances Principal components Estimated factor loadings '2 Specific variances The patterns of the initial factor loadings for the maximum likelihood solution are constrained by the uniqueness condition that L'. The second factor is primarily a strength factor because shot put and discus load highly on this factor.3151 1.3657 .1821 .2100 .0352 .4008 .5520 . Alternatively. It is not surprising that this criterion typically favors principal component Loadings obtained by a principal component factor analysis are related to the prmcipal components.6 (Factor analysis of Olympic decathlon data) Linden [11] originally con FI F2 . the second factor differentiaties the bank stocks from the oil stocks and might be called an industry factor. the javelin).4357 1.2380 .000 .0120 .3102 . A subsequent interpretation.p .
022 .062 ..003 ..082 ..085 .000 .039 .030 .086 .107 ...000 .000 .059 .000 o .033 .046 .069 .086 .021 ..000 .078 .006 .004 .006 .006 .210 .015 . When l: does not have any special form.006 .11 with i = ((n l)/n)S = SnJisproportionalto 500 (934) .001 .001 .033.000 .023 ..9 t<l t<l 4< C..078 .048 ..015 .014 .000 .124 ..IO\ I .:: o S 8 B S '2.010 .001 ..151 .001 000 .084 2 .029 ..064 .119 .001 ...002 .068 ..019 '..000 . .000 .204 .016 .074 ..'If = o .078 ..009 ...047 .001 . "g 0.003 .000 0 .031 ..010 if.151 .022 .001 . .000 .030 .005 .006 .030 .004 ..042 .015 o .000 . J: bJJ tIl ('10\000\".001 .003 0 .046 0 .. .024 .030 0 .055 ..026 .023 .064 Maximum likelihood: R  LL' .029 ..068 .096 . rOOM.085 .042 ..064 .021 .078 ..038 0 .124 .011 ..Methods of Estimation 50 I variance..002 .026 0 . '0 .119 ..003 .074 .017 .003 '" • A Large Sample Test for the Number of Common Factors The assumption of a normal population leads directly to a test of the adequacy of the model. the maximum likelihood estimates and 't do a better job of reproducing R than the principal component estimates L and "It. bpt.042 ...025 .064 .082 0 .107 .005 .001 .002 0 .006 .000 .000 .. In this case l: = LV + "It.078 .000 .055 .000 .000 .069 .002 .010 .004 o .006 0 ..038 .084 ..204 . the maximum of the likelihood function [see (418) and Result 4. .024 .000 .029 ..002 0 .078 0 o . Suppose the m common factor model holds.000 .021 0 .004 ..003 .001 .025 .030 .. Principal component: R  I .. and testing the adequacy of the m common factor model is equivalent to testing Ho: (pXp) l: = (pXm) (mxp) L L' + "It (pXp) (933) versus Hi: l: any other positive definite matrix.031 .003 .017 .000 .019 .016 . .001 .021 .039 .000 .042 .000 .003 .015 .062 ..013 .006 .002 ..096 0 .210 .000 . as t}1e following residual matrices indicate.030 ..011 .013 .010 .000..001 .006 .000 o .. = o .047 .000 .001 .048 ..029 .014 .009 .006 .000 ..001 .
2)2 . where r" [ (pX2) i.11 provide a nearly ideal pattern.(2p ILl: + q.248 . we know that an orthogonal transformation corresponds to a rigid rotation (or reflection) of the coordinate axes. since (943) Equation (943) indicates that the residual matrix.288 .740 . = Sn . the observed significance level. as well as the implied orthogonal transformation of the factors.2.351 1.q. orientations are not easIly and the.0 R= English . maximum likelihood.392 . and hence the communalitie. the 5% critical value Xy( . Example 9.410 . In thIS c!usters of variables are often apparent by eye.15 implies that Ho would not be rejected at any reasonable level.595 1. in general. we evaluate the test statistic in (939): [n .84 is not exceeded. the specific variances !J!i. all factor loadings obtained from the initialloadings by an orthogonal transformation have the same ability to reproduce the covariance (or correlation) matrix.372 Variable 1. although the rotated loadings for the decathlon data discussed in Example 9.S Estimated factor loadings FI F2 .553 .504 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Factor Rotation 505 Using Bartlett's correction.0 Algebra .568 . it is not always possible to get this simple structure. = (pX2)(2X2) i T (944) clockwise rotation counterclockwise rotation sm</> sin </> ] cos</> sin</> ] cos </> T=[COS</> sin </> 9. each pomt corresponding to a variable.0 n . it is usual practice to rotate them until a "simpler structure" is achieved.132 Communalities hi . P[Xy > 2. or Pvalue. If L is the p X m matrix of estimated factor loadings obtained by any method (principal component. 2.406 .0 History .211 .4 Factor Rotation As we indicated in Section 9. I + 4m + 5)/6) In I Sn I = [ 103 .. In fact. and we fail to reject Ho. For this reason. Table 9.. The uncorrelated common factors are regarded as unit hr. The rationale is very much akin to sharpening the focus of a microscope in order to see the detail more clearly. Since the originalloadings may not be readily interpretable. !J!i have been derived when these estimates have been determined from the sample U:variance matrix S.m)2 .450 .181 . [See (98).490 . we should like to see a pattern of loadings such that each variable loads highly on a single factor and has small to moderate loadings on the remaining factors. When m = 2. for m > 2. The coordinate axes (C (10 + 8 + 5)] 6 In (1. • !arge sample variances and covariances for the maximum likelihood estimates £.10 Since . where TT' (944) is rarely implemented in a twodimensional graphical analysIs. and so forth) then L* = LT. Moreover.439 1.5. LL' . the transformation to a simple structure can frequently be determined graphically.8 (A look factor rotation) Lawley and Maxwell [10] present the correlatIOn matrIX of examination scores in p = 6 subject areas for Gaelic 1.2) = 1. Cd p pomts. quite complicated. and these enable one to the common factors without having to inspect the mag of rotated loadmgs. The correlation matrix is = T'T = I (942) is a p X m matrix of "rotated" loadings.5 . Moreover. 3. or the common factors are considered two at a time. However.356 . magnitudes of the rotated loadings must be inspected to find a mterpretatIOn of the original data. On the other hand. is called factor rotation. a maximum likelihood solution for m = 2 common factors yields the estimates m Table 9.595 :429 .464 1. We conclude that the data do not contradict a twofactor model.724 .10) == .623 .!' a re unaltered.190 .0216) = 2.320 .220 male students.) From matrix algebra. Gaelic English History Arithmetic Algebra Geometry .569 .0 Arithmetic .05) = 3. 5. (See [10).1 .q" unchanged. the estimated covariance (or correlation) matrix remains unchanged. We shall concentrate on graphical and analytical methods for determining an orthogonal rotation to a simple structure.. 6..288 . an orthogonal transformation of the factor loadings.273 .p .0 Geometry . A plot of the pairs of factor loadings ij are determined from the relationships mgs C tp*en be vIsually rotated through an anglecall it </>and the new rotated load il .354 . from a mathematical viewpoint. it is immaterial whether L or L * is obtained.m) = ![(5 .329 . The choice of an orthogonal matrix T that satisfies an analytical measure of simple structure will be considered shortly.470 . Thus.164 1. Ideally.1  perpe?dicular axes.329 .) The expressions are. 4.L*L*' .
1.5 I I I I I I I F1 3 _I 2 Figure 9. Individuals with aboveaverage scores on the mathematIcal tests belowaverage scores on the factor. Scaling the rotated coefficients C.558 . it has a simple interpretation. V <X j=I (variance of squares of (scaled) loadings for) jth factor (946) F2 .(p 2: 2: 2: £ij . 6.356 .3} and the through the {4. This angle was chosen so that one of the new axes passes (C41 • ( 42 )· this is done.568 . BMDP. The communality. Then the (normal) varimax procedure selects the orthogonal transformation T that makes varimax (or normal varimax) criterion. The mathematical test variables load highly on and have negligible loadings on F. is arbitrary. In words. 2 Kaiser [9] has suggested an analytical measure of simple structure known as the 'l7j = f7/hi to be the rotated coefficients scaled by the square root of the communalities. the statistical software packages SAS. 6} group. 2. Half the loadings ' : positive and half are negative on the second factor.789 .2 • l!J . estimates are unchanged by the orthogonal rotatIOn. and the communalities are the diagonal elements of these matrices. and the two distinct clusters of variables are more clearly revealed.nt& are labeled with the numbers of the corresponding variables. all the points fall in the first quadrant (the factor loadmgs are all pOSltlve). As might be expected. 5. SPSS.2.001 . The poi.6 Estimated rotated factor loadings F. A with this loadings is called a bipolar factor. Computing algorithms exist for maximizing V. however.406 . Similarly.=1 P ] (945) as large as possible. One new axis would pass through the cluster {1. maximum likelihood. (The assignment of negatIve and posltlve. maximizing V corresponds to "spreading out" the squares of the loadings on each factor as much as possible. ° p £ij .6. it can always be held fixed and the remaining factors rotated. .372 It is apparent. We point out that Figure 9.nonmath" factor. After the transformation T is determined. . smce ii: = iTT'i' = i*i*'. Gaelic English History Arithmetic Algebra Geometry • j. Oblique rotations are so named because they to a non rigid rotation of coordinate axes leading to new axes that are not perpendIcular. The factor loading pairs (fil' f i2 ) are plotted as points in Figure 9. If a dominant single factor exists.054 .=1 .j has the effect of giving variables with small communalities relatively more weight in the determination of simple structure. Although (945) looks rather forbidding. that the interpretation of the oblique factors for this example would be much the same as that given previously for an orthogonal rotation.752 . the three verbal test variables have high loadmgs on F and moderate to small The second factor might be a factor. By contrast... .1. because the signs of the loadings on a factor can be reversed wIthout " affecting the analysis. coincide. Also shown is a clockwise orthogonal rotation of the coordinate axes through an of c/J == 20°. varimax rotations of factor loadings obtained by different solution methods (principal components. Fr • Fr. we hope to find groups of large and negligible coefficients in any column of the rotated loadings matrix L*.) This factor is not easily identified. the pattern of rotated loadings may change considerably if additional common factors are included in the rotation. it will generally be obscured by any orthogonal rotation. .490 . F. 3. The magnitudes of the rotated factor loadings reinforce the interpretatIOn of the factors suggested by Figure 9. and so forth) will not. .1 Factor rotation for test scores.467 . The first factor might be called a factor. loadings on The generalintelligence factor identified initially IS submerged m the factors F I A Factor Rotation 507 Table 9.604 [ aID . Therefore.433 Communali ties = Variable 1.623 . . the loadings 'l7j are multiplied by hi so that the original communalities are preserved. Lawley Maxwell suggest that this factor reflects the overall response of the students to instruction and might be labeled a general intelligence factor. Define 1 m V = P J=I and The rotated factor loadings obtained from (944) wIth c/J = 20 and the corresponding communality estimates are shown in.1 suggests an oblique rotation of the coordinates. '. Effectively.083 .506 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices All the variables have positive loadings on the first factor. in general. and MINITAB) provide varimax rotations. Table 9.but is such that individuals who get aboveaverage scores on the verbal tests get Scores the factor. Perhaps this factor can be classified as "math. 4. Also.369 . and most popular factor analysis computer programs (for example. 5.
52420 0. the marketing data discussed in Example 9.0205 0.571 .0409 0.55986 0. The original factor loadings by the principal component method)..2 Factor rotation for hypothetical marketing data.82 .. The factor loadings for the variables are pictured with respect to the original and (varimax) rotated factor· axes in Figure 9.878920 FLAVOR SNACK ENERGY .0..80 .102081 0.98 .54323 0. small or negligible loadings on factor 1).9728 4 0.932 Eigenvalue Difference Proportion Cumulative Eigenvalues of the Correlation Matrix: Total = 5 Average = 1 1 2. Figure 9.3. and 5 define factor 1 (high loadings on factor 1. although it has aspects of the trait represented by factor 2.96 . (continues on next page) .00 money 1. .601842 5 0.5 .204490 0.. . I F2 . 3.0 4 F. 2. • F* 2 1 . . TASTE MONEY FLAVOR SNACK ENERGY 0.853090 1.507 .71 energy .5 / 0 / / / / / / / /1 • 3 I FAcrORi ..9 (Rotated loadings for the consumerpreference data) Let us return to . 1.102409 0.79821 :0. title 'Factor Analysis'.932 .54 .00 Communalities hr proc factor res data=consumer method=prin nfact=2rotate=varimax preplot plot.50 ... 0.56 .00 ..01 .0067 1..65 . We might call factor 1 a nutritional factor and factor 2 a taste factor.02 flavor ..78 .o:1o/m 0. var taste money flavor snack energy.98 . 5... and the (varimax) factor loadings are shown in Table 9. .81610 ... _type_='CORR'..0000 2 factors will be retained by the NFACTOR criterion. ! Factor Pattern..5706 0. 5 • 1.1.. .9 USING PROC FACTOR.64534' 074795 0. while variables 1 and 3 define factor 2 (high loadings on factor 2.. .00 .77726 .52 .42 .75 .....89 .88 . data consumer(type = corr). input _name_$ taste money cards..5 .2.13 snack .046758 0.806332 1.9933 OUTPUT Cumulative proportion of total (standardized) sample variance explained Prior Communality Estimates: ONE .94 . 4. 2· . the communalities..7.. small or negligible loadings on factor 2). Taste Good buy for money Flavor Suitable for snack Provides lots of energy Estimated factor loadings Fl F2 Rqtated estimated factor loadings F.·93911:. 4..93 !Initial Factor Method: Principal Components I 3 0.. (See the SAS statistical software output Panel 9.79 1.508 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices PANEL 9.1 SAS ANALYSIS FOR EXAMPLE 9. Variable 4 is most closely aligned with factor 1.00 .5706 2 1.068732 0. Factor Rotation 509 Example 9.) Variable 1.033677 It is clear that variables 2.85 flavor snack energy...10 .11 PROGRAM COMMANDS 1. TASTE MONEY 0. taste 1.
The varimax rotated loadings for the m = 4 factor solutions are displayed in Table 9. and. Linden called this factor running endurance. along with the specific variances.000 .. It is difficult for us to label these factors intelligently. This factor could be called running speed. As we have noted. this factor might be caUed explosive arm strength.122027 specific variances and cumulative proportions of the total (standardized) sample variance explained by each factor are also given. sjpce.669 .00 .5 and 9. .647 .322 .510 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices PANEL 9.54 . one on which all the variables load highly) tends to be "destroyed after rotation.10 (Rotated loadings for the stockprice data) Ta?le 9. andagain to some extentthe long jump load highly on a third factor. The rotated loadings indicate that the bank stocks (JP Morgan.  . in cases where a general factor is evident.. The cumulative proportion of the total sample variance explained for all factors does not change. discus.) The two rotated factors. Factor 2 appears to represent economic conditions affecting oil stocks. and javelin load highly on a factor. An m = 2 factor model is assumed. using both principal component and maximum likelihood solution methods. the same phenomenon is observed for them. As he notes.537396 FACTOR2 2. 400meter run. An interpretation of the factors suggested by the unrotated loadings was presented in Example 9.1 Factor Rotation 5II (continued) I Rotation Method: Varimax I Rotated Factor Pattern TASTE MONEY FlAVOR SNACK ENERGY FACTOR 1 0.01123 0.993J . although factors 1 and 2 are not in the same order.42 . These quantities were derived for an m = 4 factor model.53 . and Wells Fargo) load highly on the first factor.8 Maximum likelihood estimates of facfOfloadings Variable JPMorgan Citibank Wells Fargo Royal Dutch Shell ExxonMobil Cumulative proportion of total sample variance explained FI F2 Rotated estimated factor loadings Fj Fi Specific variances = 1 . Example 9. the I5DOmeter run loads heavily and the 400meter run loads heavily on the fourth factor. We identified market and industry factors. the values are to. Finally.5.652 .227 .42805 0." .024 .12856 0. while the oil stocks (Royal Dutch Shell and ExxonMobil) load highly on the second factor. differentiate the industries. (Although the rotated loadings obtained from the principal component solution are not displayed. rotation will affect only the distribution of the proportions of the total sample variance explained by each factor. This condition. the uniqueness condition that L''I'IL be a diagonal matnx.032 .8 shows the init. an orthogonal rotation is sometimes performed with the general factor loadings fixed.788 .6. together. IS convenient for computational purposes.27 . The rotated factor loadings for both methods of solution point to the same underlying attributes.9. Similarly. Citibank.821 .683 . for obtained by maximum likelihooq.118 . Factor 1 represents those unique economic forces that cause bank stocks to move together. The estimated Table 9. Linden labeled this factor explosive leg strength.. 5 _ Rotation of factor loadings is recommended particularly .93744 0.182 1.115 . Apart from the estimated loadings.5. high jump.ial and rotated maximum likelihood estimates of the factor loadmgs for the data of Examples 8. We see that shot put.675 Example 9. The lOOmeter run. following Linden [11]. "The basic functions indicated in this study are mainly consistent with the traditional classification of track and field athletics.01563 Variance explained by each factor FACTOR 1 2.hf . a general factor (that is. A varimax rotation [see (945)] was performed to see whether the rotated factor loadings would provide additional insights. llDmeter hurdles. but may not lead to factors that can easily be interpreted.755 .346 . andto some extentlong jump load highly on another factor.84244 0.97947 0.96539 FACTOR2 0.323 .647 5Some generalpurpose factor analysis programs allow one to fix loadings associated with certain factors and to rotate the remaining factors..01970 0.113 .11 (Rotated loadings for the Olympic decathlon data) The estimated factor loadings and specific variances for the Olympic decathlon data were presented in Example 9. The interpretation of all the underlying factors was not immediately evident." For this reason.98948 0. pole vault.000 .104 (.
an oblique rotation is frequently a useful aid in factor analysis.3 Rotated maximum likelihood loadings for factor pairs (1.20 . [6] or [10]) and will not be pursued in this book . An oblique rotation to a simple structure corresponds to a nonrigid rotation of the coordinate system such that the rotated axes (no longer perpendicular) pass (nearly) through the clusters.155 0. (e e e Plots of rotated maximum likelihood loadings for factors pairs (1.2 • 6 .22 . • 9.4 r .278 .5541 1.7391 .17 . (The numbers in the figures correspond to variables. F.09 .139 .291 .43 .29 .33 .019 . Nevertheless. j2 . .37 .0 I Maximum likelihood Estimated rotated factor loadings.057 . factor scores fj = estimate of the values fj attained by Fj (jth case) Oblique Rotations Orthogonal rotations are appropriate for a factor model in which the common tors are assumed to be independent.221 . f7j A 1.252 . The points are generally grouped along the factor axes. the estimated values of the common factors.8 I Variable lOOm run Long jump Shot put High jump 400m run F. However.097 .3) are displayed in Figure 9.001 .6 J 0. F: rpi = 1.254 1.39 .267 . F: Specific variances rpi = 1 .045 .8831 . Rather. .76 .01 .6 I0. interest is usually centered on the parameters in the factor model. . the point with the m coordinates i 1. An oblique rotation seeks to express each variable in terms of a minimum number of factorspreferably. an orthogonal rotation to a simple structure corresponds to a rigid rotation of the coordi.) .9 Principal component Estimated rotated factor loadings. •.2. for example.8 B 0..205 .142 .4 8 • 0.182 1.23 run Cumulative proportion of total sample variance explained .8851 . j = 1. These quantities are often used for diagnostic purposes.0 • 4 • 0.6 9 () 0.15 . as well as inputs to a subsequent analysis..204 .110 .8 . after rotation.005 . they are estimates of values for the unobserved random factor vectors Fj . F.8 J 0. 3)decathlon data. Oblique rotations are discussed in several sources (see.'" 5 12 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Factor Scores 513 9. .296 .. jl1l ) represents the position of the ith variable in the factor space.3 on page 513. . 2) and (1. n.002 . as well as orthogonal rotations. If we regard the m common factors as coordinate axes. Many investigators in social sciences conSIder oblique (nonorthogonal) rotations..075 .62 often suggested after one views the estimated factor loadings and do not follow from our postulated model.. a single factor.151 .17 .0 0..28 hi Fi F.0 0 • f f 0.62 .6 Factor I 9 () 0. called factor scores. .055 .293 .070 .51 .0  0. Assuming that the variables are grouped into nonoverlapping clusters. may also be required.2) and (1. . Factor scores are not estimates of unknown parameters in the usual sense.4 Factor f Figure 9.. 0. The former are .228 .33 N Il< i 0.2 CV 0 L 0.. nate axes such that the axes.242 0.4 • 2 0. That is. e7j 1. Plots of rotated principal component loadings are very similar.12 . pass as closely to the clusters as possible.S Factor Scores In factor analysis.302 .2 0..280 1.
is (947) If the factor loadings are estimated by the principal component method.. i = 1. Both of the factor score approaches have two elements in common: 1. do not change when rotated loadings are substituted for unrotated loadings.then  or C j = for standardized data. = 12 . regard the specific factors E' = [Bb B2' .n = 2. = . The factor scores are. some rather tic. }} J' f. .i). perhaps centered or standardized. so we will not differentiate between them.) If rotated loadings L* = are used in place of the originalloadings in (950). The sum of the squares of the errors. . we take the estimates L. ' are related to C· by C* = T'C. Bartlett [2] has suggested that weighted least squares be used to estimate the common factor values.. these estimates must satisfy the uniqueness condition.. The computational formulas. but reasoned.2.n = nlj2 (Xj . = 1" 2 . ]. (See Exercise 9. . n . the subsequent factor scores. .e estimated rotated loadings. .·r. and the specific variance 'Ware known for the factor model (pXl) The factor scores generated by (950) have sample mean vector 0 and zero sample covariances. . need not be equal. . or.J Z T Z Zj j = 1. approaches to the problem of estimating factor values have been advanced.16. a diagonal matrix. as in (825). Since Var( Si) = I/Ii. if the correlation matrix is factored (950) C· ) = where Zj z Z z )lL.. and Ej outnumber the observed Xj. . To overcome this difficulty. and jJ = + The Weighted Least Squares Method Suppose first that the mean vector p" the factor loadings L. (sample covariance) . this amounts to assuming that the I/Ii are equal or nearly equal. weighted by the reciprocal of their variances... rather than the . Implicitly. The solution (see Exercise 7. We then have the following: 1 n ' n . as given in this section.•• . and jL = i as the true values and obtain the factor scores for the jth case as (949) For these factor scores. We describe two of these approaches. original estimated loadings. are used to compute factor scores. 2. "TYpically.514 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Factor Scores 5 15 The estimation situation is complicated by the fact that the unobserved quantities f. (pXJ) (pXm)(mXJ) L F+E (pxJ) Further. They involve linear transformations of the original data. th. They treat the estimated factor loadings were the true values.1.1 j=J . p. Factor Scores Obtained by Weighted Least Squares from the Maximum Likelihood Estimates e and specific variances as if they ij c= j _ jL) . (sample mean) and = I When L and are determined by the maximum likelihood method. Bp] as errors.· Xp. .i).... Since we have L= i e2 Bartlett proposed choosing the estimates C of f to minimize (947).. It IS customary to generate factor scores using an unweighted (ordinary) least squares procedure.3) is (948) (951) Motivated by (948).
.·(See Result 4. A maximum likelihood solution from R gave the estimated rotated loadings and specific variances This identity allows us to compare the factor scores in (955). meanS and covariances given by (93). Of the methods presented. n + 'l'r1L The quantities L'(LL' + 'l'rl in (953) are the coefficients in a (multivariate) regression of the factors On the variables.2.li)'l = AI and if the elements of this diagonal matrix are close to zero..821 .2. none is recommended as uniformly superior. evaluated at Xj' The Regression Method Starting again with the original factor model X . .5 16 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Factor Scores 5 17 Comparing (951) with (821).12 (Computing factor scores) We shall illustrate the computation of factor scores by the least squares and regression methods using the stockprice data discussed in Example 9.. the regression and generalized least squares methods will give nearly the same factor scores. I*). fj = i:II(xj .= LF + E. II = (pxp) LV + 'I' i Ll i (pXm) = . We then have the following: Factor Scores Obtained by Regression f· } = L'SI(X' ..L'(LL' (953) (954) Again. we see that the fj are nothing more than the first m (scaled) principal components. [. . we see that the jth factor score vector is given by A numerical measure of agreement between the factor scores generated from two different calculation methods is provided by the sample correlation coefficient between scores On the same factor.IL)ff = (I + (L'. we obtain Li = .plifl L' .J. Estimates of these coefficients produce factor scores that are analogous to the estimates of the conditional mean values in multivariate regression analysis. . j = 1. .27 o .J. In an attempt to reduce the effects of a (possibly) incorrect determination of the number of factors. .10.n (958) (952) where.JL = LF + E has an Np(O...993 . where .. and 0 is an (m + p) X 1.50. 1.. Z' = For maximum likelihood estimates (L'.J. generated by the regression argument.113 .r1 = (I + L' .. . When the common factors F and the specific factors (or errors) E are jointly normally distributed with .L'I1L = I .. with those generated by the weighted least squares procedure .plLrl)ff The vector of standardized observations.118 . we initially treat the loadings matrix L and specific variance matrix 'I' as known.. practitioners tend to calculate the factor scores in (955) by using S (the original sample covariance matrix) instead of I = LL' + ...763 .x) = L' (Li.2. Using Result 4.fl(Xj .x) J' j = 1. using (956). any vector of observations xi' and taking the maximum likelihood estimates L and 'I' as the true values.20. if rotated loadings L* = LT are used in place of the original loadings in (958). the linear combination X .JL) and F is Nm+p(O. Temporarily.3.2. if a correlation matrix is factored.42 = 0 0 0 0 o . 1. j = 1..J. LV + '1') distribution.11) = L'(LL' + 'l'fl(X . .227 . n (955) The calculation of f) in (955) can be simplified by using the matrix identity (see Exercise 9. . .11) and covariance = Cov(Flx) = I . we find that the conditional distribution of Fix is multivariate normal with mean = E(Flx) = L'II(x .00 o 1] fjS = (L'.40] yields the following scores On factors 1 and 2: .x). the joint distribution of (X .) Moreover. we denote the former by ff and the latter by fj • Then.40.pl (mXp) (pXp) (mxm) (mXp) (pxp) Example 9. 'LS [see (950)].024] and.669 [ .70. see (825).) Consequen!ly. (See Chapter 7.11.n or..54 o o o o o o o o o . vector of zeros.J.z [..6.104 . .J. + . .6) (956) L' (LL' + .ILrl(1 + L'.675 .J.. the subsequent factor scores fj are related to fj by j = 1..
Factor scores with a rather pleasing intuitive property can structed very simply.563 . take the loadings with largest absolute value in L as equal in magnitude.4 Factor scores using (958) for factors 1 and 2 of the stockprice (maximum likelihood estimates of the factor loadings). The factor scores for sums of the standardized observations corresponding to variables with !I = Xl + X4 X2 +  X3 + X4 + Xs fz = + Xs Xl as a summary. As we pointed out in Chapter 4.374 .026 . was set to . the test will most assuredly reject the model for small rn if the number of variables and observations is large. greater than absolute value) loadings on a factor.40 .280 .61 _ . Group the variables with high (say. Probably the most important decision is the choice of m. Plots of factor scores should be examined prior to using these scores in other analyses. we would standardize these new variables.214 .911 . The simple factor scores are frequently highly correlated with the factor scores obtained by the more complex least squares and regression methods. Bivariate scatter plots of factor scores can produce all sorts of nonelliptical shapes. For each factor.526 .4. marginal transformations may help.694 .00 in the fourth "'. Similarly. They can reveal outlying values and the extent of the (possible) nonnormality.605 .023 . g 0 • ) • • •• o Factor) 2 • \ •• • Perspectives and a Strategy for Factor Analysis There are many decisions that must be made in any factor analytic study. _ Linear compounds that make subjectmatter sense are preferable.719 and L* = LT . Yet this is the situation when factor analysis provides a useful approximation. Data reduction is accomplished by replacing the standardized data by these simple factor scores.13 (Creating simple summary scores from factor analysis groupings) The principal component factor analysis of the stock price data in Example 9. Thus. '" B tI.j.831 = . Although multivariate normality is often assumed for the variables in a factor analysis.726 [ . the number of common factors. and neglect the smaller loadings. it is suitable only for data that are approximately normally distributed. we create the linear combinations Comment.20[ .l = z z z z z z [. the simple factor scores would be il = Xl + X2 + X3 o 2 !2 = X4 + Xs • • • • •• •• • • 2 • • •• • • • •• • • • • • The identification of high loadings and negligible loadings is really quite subjective. Although a large sample test of the adequacy of a model is available for a given rn. If.851 = .133 .437] . .50] L .813 [ . it is very difficult to justify the assumption for a large number of variables .079 .852 . instead of L.137· . obtained using (958). the factor scores mayor may not be normally distributed.084 .70 1.001 . In practice. Most often. All of the factor scores. . 6 In order to calculate the weighted least squares factor scores.732 .030] .063 . The scores for factor 1 are then summing the (standardized) observed values of the variables in the bined according to the sign of the loadings.518 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Weighted least squares (950):6 Perspectives and a Strategy for Factor Analysis 519 on factor 2. we start with the varimax rotated loadings L*.011J 1.01 so that this matrix could be inverted. the final choice of m is based on some combination of Figure 9.221 . Example 9.909 In this case. Moreover. and so forth. are plotted in Figure 9.4 produced the estimated loadings f= Regression (958): Ci:'wli*)li*'.61J . the two methods produce very similar results.
264 Cumulative proportion of total (standardized) sample variance explained Maximum Likelihood Variable .128 . The results are given in Table 9. calculate standardized scores for each observation and squared distances as described in Section 4.084 .950 Estimated factor loadings F2 F3 Fl . who started his analysis from a different correlation matrix than the one we use.336 .289 . including a varimax rotation.10 Factor Analysis of ChickenBone Data Principal Component Variable 1. 4. 6. Compare the two results with each other and with that obtained from the complete data set to check the stability of the solution.831 (506 ) .14 (Factor analysis of chickenbone data) We present the results of several factor analyses on bone and skull measurements of white leghorn fowl.720 .894 .950 . 3. ) (a) Look for suspicious observations by plotting the factor scores.604 .218 .233 .050 .057 .073 . split them in half and perform a factor analysis on each part.878 .ggests that maximizing the likelihood function may produce a Heywood case. the most satisfactory factor analyses are those in which rotations are tried with more than one method and all the results substantially confirm the same Perspectives and a Strategy for Factor Analysis 521 The sample correlation matrix 1.000 .) .482 .33 .569 .350 . The full data set consists of n = 276 measurements on bone dimensions: Head: Xl = skull length { X 2 = skull breadth X3 = femurlength { X 4 = tibia length X5 = humerus length { X6 = ulna length 1.937 1. 7 Notice the estimated specific variance of .211 . Perform a maximum likelihood factor analysis.894 1. and no single strategy should yet be "chiseled into stone. Compare the solutions obtained from the two factor analyses.189 .763 .602 . Fi . 7 Table 9.874 . Readers attempting to replicate our results should try the Hey(wood) option if SAS or similar software is used.000 . Perform a principal component factor analysis.228 . 2. The choice of the solution method and type of rotation is a less crucial decision.067 .450 . Fi .000 . This would reveal changes over time.000 .467 .422 1.000 .450 .283 . Skull length Skull breadth Femur length Tibia length Humerus length Ulna length .505 ." We suggest and illustrate one reasonable option: 1.51 .874 . . 3. 4.075 .164 .904 .877 .888 the maximum likelihood analysis.000 .286 . 5.467 If.145 .012 .894 .00 . 4. (The data might be divided by placing the first half of the cases in one group and the second half of the cases in the other group.362 .738 .937 .047 Rotated estimated loadings F. was factor analyzed by the principal component and maximum likelihood methods for an m = 3 factor model.00 for tibia length in the maximum likelihood solution.000 .325 .926 1.652 .10. 2.603 .741 . Do extra factors necessarily contribute to the understanding and interpretation of the data? 5.520 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices (1) the proportion of the sample variance explained. .929 .463 .823 Leg: Wing: .921 .00 .926 1.143 .945 .890 .340 .949) .192 (. (b) Try a varimax rotation.948 .576 .743 .252 .12 .603 .936 .569 1. The original data were taken from Dunn [5]. 5. Factor analysis of Dunn's data was originally considered by Wright [15]. For large data sets. and (3) the "reasonableness" of the results. This method is particularly appropriate for a first pass through the data.902) . TIlls su.09 Example 9.878 .212 .000 R= factor structure.6.779 . 3.621 .573 .000 .175 .396 . 6.422 .873 .08 .02 .823 .667 .039 Rotated estimated loadings F. Skull length Skull breadth Femur length Tibia length Humeruslength Ulna length Estimated factor loadings F2 F3 Fl . In fact.602 .505 .877 .244 (.792 .926 .467 .467 .943 . Also.00 .482 . factor analysis still maintains the flavor of an art.559 . Repeat the first three steps for other numbers of common factors m.355 2. .345 .621 . (8) Do the loadings group in the same manner? (b) Plot factor scores obtained for principal components against scores from .602 .177 . At the present time.045 . (It is not required that R or S be nonsingular.08 .214 .08 .874 .07 . (2) subjectmatter knowledge.272 Cumulative proportion of total (standardized) sample variance explained .
660 Rr= 1..10.39. test .3. . Plots of this kind allow us to identify observations that. I   I I All of the entries in this matrix are very small. An m = 2 factor model is considered in Exercise 9.$ $ •• $.73. • l$ • • • . [39.927 .000 . for one reason or another. was removed and the analysis repeate{l. $ $ . That is.540 1.366 1. If the loadings on a particular factor agree.587 .000 . but not for factors 2 . it should be divided into two (roughly) equal sets.386 .000 . the last factors are not meaningful. as indicated by Plot (c) in Figure 9.000 .909 . and 3.. are not consistent with the remaining observations.000 . Factor scores for factors 1 and 2 produced from (958) with the rotated maximum likelihood estimates are plotted in Figure 9.606 . 1.the latter occurs.844 ..639 . • • • •• • •• .000 .. • . If the sets of loadings for a factor tend to agree.003 .522 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices After rotation.000 . The rotated maximum likelihood factor loadings are consistent with those generated by the principal component method for the first factor. It is also of interest to plot pairs of factor scores obtained using the principal component and maximum likelihood estimates of factor loadings. represent skull dimensions and might be given the same names as the variables.115. The meaning of the third factor is unclear.$ • •• •  • • • ..587 . If . the loadings were not altered appreciably.4. collectively.598 . For the maximum likelihood method.000 .000 . When this point.894 R2 = 1. · ••• ••••• • $ $ $• $ •• $ $ • $ $ • $$ $• •• $$ •• $ • •• $ ••• $ • •• $$ • • • .406 . respectively.1]. Focusing our attention on the principal component method and the cumula_ tive proportion of the total sample variance explained.950 .000 . and a factor analysis should be performed on each half.001 . ' The . ••• •  11 . Sets of loadings that do not agree will produce factor scores that deviate from this pattern. It has an unusually large Fzscore. respectively. When the data set is large..000 .chickenbone data were divided into two sets of nr = 137 and n2 = 139 observatIOns. The results of these analyses can be compared with each other and with the analysis for the full data set to 2 o I 2 3 Figure 9.000 . The second and third factors.000 .. The first factor appears to be a bodysize factor dominated by wing and leg dimensions.000 and 1. .572 .940 1.$ •• • •• • $ $ • ( ••• $ $$ $..588 .69..75. This seems to be the case with the third factor in the chickenbone data..000 3 . . If the results are consistent with one another confIdence In the solution is increased.000 .. The resulting sample correlation matrices were .the sta.420 .bility of the solution. and it is probably not needed. the second factor appears to represent head size. We shall pursue the m = 3 factor model in this example.. skull breadth and skull length.000 .352 1.866 1..001 .6 on pages 524526.000 . Plots of pairs of factor scores using estimated loadings from two solution methods are also good tools for detecting outliers.004 3 I I Perspectives and a Strategy for Factor Analysis 523 I I 2 • • • • .584 .901 .863 .6 that one of the 276 observations is not consistent with the others. • •• •• •. it is usually associated with the last factors and may suggest that the number of factors is too large.835 1.000 • 2 l .001 . It is clear from Plot (b) in Figure 9. For the chickenbone data.694 .$ $ $ . • • • • • • .000 . ••• • . the pairs of scores should cluster tightly about the 45° line through the origin. $ ••• . we see that a threefactor solution appears to be warranted.. The third factor explains a "significant" amount of additional sample variation.931 1.000 .001 . Further support for retaining three or fewer factors is provided by the matrix obtained from the maximum likelihood estimates: .1.7. . the two methods of solution appear to give somewhat different results.5. Potential outliers are circled in the figure. • • · • ..696 .000 . plots of pairs of factor scores are given in Figure 9. outliers will appear as points in the neighborhood of the 45° line.575 .6.• •• • 0 • I z z z = .. · • .000 .000 .S Factor scores for the first two factors of chickenbone data.000 .911 1. but far from the origin and the cluster of the remaining points. •• •• • $ .000 ..000 .
925 .247 .05 . and we can be fairly confident that the large loadings are "real.i . 5. specific variances.853 ) . .272 F.303 .5 .332 .06 .238 .968) .06 .5 3. and proportion of the total (standardized) sample variance explained for a principal component solution of an m = 3 factor model are given in Table 9.0 1.6 Pairs of factor scores for the chickenbone data.10 .5 I Maximum likelihood 3.593 .912 .0 .) • First set (n} = 137 observations) Rotated estimated factor loadings Variable 1.252 . if.  6. skull length and skull breadth. A one.270 .5 2.11 .11 (b) Second factor Figure 9. The solution is remarkably stable.8 I .0 1.239 . (Loadings are estimated by principal component and maximum likelihood methods.962 .743 .248 .145 . 6.7 o 1.524 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Perspectives and a Strategy for Factor Analysis 525 11 2." As we have pointed out however.0 Figure 9.50 .10.9 1. (See Exercise 9. (.168 t/!i .930 .5   I 1.395 . The first factor.0 2.914 F.7 I I 1 I 11 I I I I Principal component I I I I T I T 10 I 9.312 .5   2. interchange with respect to their labels.231 F.5 o ..75 0  1.9 _ _____________________ 11 I I I 32 1121 12 I I 21 1I 1 rI2321 1 I 4 26311 I 21 24 3 1 I 33112 2 11 21 17 2 1 .00 .130 .361 (.360 .25 1.5 3.830 .242 .921 ) .0 . 3.0 2.00 I I I I I I I I 300 UO (a) First factor ( continued) Table 9.5 2.8 11 1 III I 1 1 11 43 1224331 13 223152 11 1411221 I 12121 115251 11133 1 1 2 1 3 2 1 Maximum likelihood 3.877 . Skull breadth Femur length Tibia length Humerus length Ulna length .203 .352 .940 .5 1. three factors are probably too many.167 (.0   4.00 . The results for the two halves of the chickenbone measurements are very similar. again appears to be a bodysize factor dominated by leg and wing dimensions.01 .6 I 3.06 2.5 1. .11 on page 525.or twofactor model is surely sufficient for the chickenbone data. 4.208 .1 IJ 1 1 I 2 12111 1 23221 12346231 224331 21 4 46C61 1 213625 572 121A3837 31 11525111 223 31 I 21 1 III 1 I II 1 I1 1 I I 2.00 .780 .08 Cumulative proportion of total (standardized) sample variance' explained .871 F. (. Factors F. and you are encouraged to repeat the analyses here with fewer factors and alternative solution methods.187 .899) .. o 3 7.914 . but they collectively see<m to represent head size. and F.175 .546 .) The rotated estimated loadings.08 Fi . These are the same interpretations we gave to the results from a principal component factor analysis of the entire set of data. Skull length Second set (n2 = 139 observations) Rotated estimated factor loadings Fi .6 I 3.
but still use the title maximu m likelihoo d to refer to resulting estimates.q.00 Principal component 2..1 2: (Xj . . al sciences In these areas.4 1. Result 9A. This modific ation. amounts to employi ng the likeliho od obtained from the Wishart It distribu tion of (Xi .2 . 2: 3. analysis provide s a way of explamm g t he 0 bserved variability ID behavlOr ID erms of these traits. Some i=1 factor analysts employ the usual sample covarian ce S. they can be shown to satisfy certain equation s.50 2. since they both produce the same correlat ion matrix. referenc ed in Footnot e 4 of this chapter..i)' = nt(n j=l n l)S and '&1 '&2 . . the mvestIga tor can s hout "Wow.. If ral and social Factor analysis has a tremendous mtUltive appea or the behavio .&m 521' .50 1 111 1 1 1 1 1 2 1 11 I III . of course.e· h f t pies. Rather. that quality seems to depend on a WOW criterion . factor analysis remains very su b' Jec IV.25 2 1. I I I 1 11 1111 21 32 1 I 21 22 121 I I 1 2 1 3141 I I I 2 I 111 11 I 11 11 I 1 1 2 II 2 I 11 3 I I 111 I 111 2 I 1 1 . II1 22 1 I I I I I 1 2 1 2 I I 21 2 1 211 11 11 21 1 1 I 1 2 1 11 III 1 2 I I1 3 11 I 112 I II 1 2 I 1 I 1 I I· 1 I I 1 11 I 2 I 1 2 SOME COMPUTATIONAL DETAILS FOR MAXIMUM LIKELIHOOD ESTIMATION Althoug h a simple aqalyticaJ expressi on cannot be obtaine d for the maximu m likeliho od estimato rs L and 'It.2 (c) Third factor Figure 9.q.X)' of an unstruc tured covarian ce matrix. In practice the vast majority of attempted factor analyses do not Ylel suc l cut results.25 . Let x I.q. . Xz.8 1.'..75 I Maximum likelihood 1 o .6 0 . The maximu m likeliho od estimate s i and . the conditio ns are stated in terms of the maximu m likeliho od estimato r n S" = (l/n) (Xi .0 2. consist of situations whlch analysis model provides reasonable explanations in terms of a few ah e a tors. t . .X)' and ignoring the minor contribu tion due to i=1 the normal density for X. unaffec ted by the choice of Sn or S. it is natural to regard multivariate observatlOns.6 (continued) 2: .' If. Not surprisingly.1/2 correspo nding to eigenva lue 1 + Here Sn = n. are obtaine d by maximiz ing (925) subject to the uniquen ess conditio n in (926). in dommon with most published sources. f Our exam Still when all is said and done.i)(xj .526 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices 3. unfortun ately.X) (Xi .I.6 1.X) (Xi .75 1. I understand these factors. .q. They satisfy (9A1) so the jth column of .. •. the criterion for judging the quality of any factor an YSIS has not been well quantified.I/2i is thAe (nonnor malized ) eigenve ctor of .I/2Sn . apmmt . .00 3. The factor analysis of R is. Xn be a random sample from a normal populat ion. b ac or and human processe s as manifestations of underlym g uno able "traits ." the application is deemed successful. while scrutinizing the factor analYSIs.
respect to the maximization. = V1/ 2L.Comment. We minimize (9A3) subject to L'qt1L = a. Compute initial estimates of the specific variances 1/11. as in (9A3). 1/2 I · the diagonal matrix V . Al > A2 > . Equivalently..1) S has. since Sn and p are constant with. in an iterative fashion.InlSIp Under these conditions. and the likelihood equations are solved. Lawley and Maxwell [10]. For testing the factor model [see (939)]. a Wishart distribution.em. then maximizing the reduced likelihood over L and 'I' is equivalent to maximizing the Wishart likelihood Likelihood ex I 1(n1)/2 e[(n1)/2]lr[:£'S] e over L and '1'.] If we ignore the contribution to the likelihood in (925) from the second term involving (IL . would be similar.) 2 P sI! IVI/211L L' + 'I' IIV1/21) • z • + tr [(L L' + 'I' )lVI/2V1/2RVI/2V1/2) _ p IVl/211RIIV1/21 '" I (V 1/2L. and I ii' + .. 1nl I + . Substitute i obtained in (9A6) into the likelihood function (9A3).1/12.. A . until the differences between successive values of ij and are negligible. A = I + a and E = qt1/2LA1/2.528 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Some Computational Details for Maximum Likelihood Estimation 529 Also.f. ILL' + '1'1 should be compared with ISn I if the actual likelihood of (925) is employed. . It happens that the objective in'(9A3) has a relative Maximum likelihood Estimators of p = l z l'z + '\{I z matrix for the standardized variables is L. •...whose ith diagonal element is the square root of the lth dIagonal element of Sn.flR)  p (9A7) Recommended Computational Scheme For m > 1. for large n. compute the first m distinct eigenvalues.1/12.)' + V 112 '1'. I) IRI + + qt. From (9A1). + V1/2qtV. For most packaged computer negative I/Ii.IS n) _ where Sii is the ith diagonal element of Sl. .d" covariance matnx (9A5) Let = [e1 i i e!?'] be the p X m matrix of normalized eigenvectors and A = diaglA lo Am] m ::< diagonal matrix of eigenvalues.. the condition L'qt. .fl should be compared with IS I if the foregoing Wishart likelihood is used to derive i and .1L = a effectively imposes m(m ...) (V1/2L )' + VI /2qt V1/21) = In ( • z I Sn I (9A4) + tr[ «VI/2Lz)(Vl/2L.. p can be factored as p = = (V1/2L) (V1/2L). we obtain the estimates (9A6) 3."" I/Ip. and the corresponding maximum likelihood "estimates. a diagonal matrix.1)j2constraints on the elements of Land '1'. use the unbiased estimate S of the covariance matrix instead of the maxi. and minimize the result with to . along with many others who do factor analysis. Now. The values 1/11.1•.LL' and (9A2) 2. [See (421) and (423). • Comment. = (1 _1. the investigator minimizes In ( I When has the factor analysis structure = LL' + '1'. or a Heywood case. are changed to small pOSltlve numbers before proceeding with the next step. The loading + '1'. One procedure is the following: In ( 1. Also. Equivalently. Result (9A 1) holds with S in place of S". Given. (n . S and S11".. .. We avoid the details of the proof. correspondmg to negative values for some I/Ii' This solution is clearly . madm1sslble and is said to improper. .f.are almost identical.V1/2). e2. at convergence. for normal data.I/2 = + '1' •. of the "uniquenessrescal.. if they occur on a particular iteration. . m) (1. J6reskog [8] suggests setting '1'1 .'/11:. = V1/2 qtV1/2._ mum likelihood estimate Sn.1/1 p obtained from this minimization are employed at Step (2) to create a new L Steps (2) and (3) are repeated until convergencethat is. . • L and '1'. If R is substituted for S" in the objective function of (9A3). Thus. > A > 1. it is evident that jL = xand a consideration of the loglikelihood leads to the maximization of (nj2) [1nl I + over L and '1'. = ithdiagonalelementofSn . el. '(I ii' + i I) ISnl p +tr[(LL'+qtfSn)p 1 (9A8) .'/12.. .x). However.'/1p' A numerical search routine must be used. where V1/2 is the diagonal matrix with ith diagonal element O'i/f2. and the corresponding specific variance matrix is '1'. subject to these contraints. we can minimize 1nl I + tr(rIS) or. and correspon?ing eigenvectors. we can write the objective function in (9A7) as .
19 0 'I' = Cov(e) = [ gl l:  _[1 .2. Given p and 'I' in Exercise 9.530 Chapter 9 Factor Analysis and Inference for Structured C'avariance Matrices The last inequality follows because the maximum likelihood estimates I. = . [Equality holds in (9A8) for L. and take the transpose.24 . Is the result consistent with the information in Exercise 9.1. has zeros on the diagonal. for given O"ll.74 . (c) L'(LL' + 'l'r t = (I + L''I'. where P(2) = [e m +li···i e p] i. (The factor model parameterization need not be unique.9 .0 .36. 9.!lltiply the result in (b) by L use (a). = yI/lL and = yl/2iYI/l.97* . is equivalent I to obtaining Land i from Sn and estimating L.16 .35 1.i:i> .638. i = 1. Use (sum of squared entries of A) = tr AA' and Ir [P(2)A(2)A(2i(2)] =tr [A (2l A (2)).52 . = P(2)A(2)P(2).177] = [.45 for the p = 3 standardized random variables 2 1 .1 are Al A2 A3 Show that there is a unique choice of L and 'I' with l: = LL' + '1'.) Let the factor model with p = 2 and m = 1 prevail.4.Ft ) for i = 1.45] .63 .l Lr 1L''I'l i. write p in the form p = LL' + '1'. and (I + L''I'tLr l are symmetric matrices. = VI/l'l'VI/l by = The rationale for the latter procedure comes from the invariance property of maximum likelihood estimators.49 . 9.35 [ .· (Unique but improper solution: Heywood case.7.96. The eigenvalues and eigenvectors of the correlation matrix p in Exercise 9. calculate the loading matrix L and matrix of specific variances 'I' using the principal component solution method. ez = [.63 1. so the choice is not admissible.5FI + cL + 0/2 62 63 and.4 .0 P= Hint: Postm. Show that O"ll .66 .39 . .S.7FI + . 9. (a) (I + L''I'. '1'1. (b) (LL' + 'l'r l = '1'1·_ 'I'IL(I + L''I'lL)lL''I't Hint: Postmultiply both sides by (LL' + '1') and use (a).. and A(2) is the diagonal matrix with elements Am+l>"" Ap.minimizing (9A7) over L. p .14 (sum of squared entries ofS  i:i> .04 ..3. (b) What proportion of the total population variance is explained by the first common factor? 9. Ft) = 0.3.02 . but that 0/3 < 0.JTherefore.50 .4 1 .219. and minimize the objective function (9A3).9FI + 61 22 = = Ctl + 0/1. :s. (sum of squared entries ofS  i:l:') *This figure is too high. . and .593. 0"22.17 .60 .i:i:' = Am+lem+te:"+l Exercises 531 + . there is an infinity of choices for L and '1'.2.29 .I Lr l L''Ir l L = I . S.7 .03 . and 0"12.. Show that the covariance matrix 1.) Consider an m = 1 factor model for the population with covariance matrix where Var (Ft) = 1.17 .!.491. Stoetzel [14] collected preference rankings of p = 9 lIquor types from n = 1442 individuals. (a) Calculate communalities hT.46 .64 e3 = F2 F3 (a) Assuming an m = 1 factor model. and 23 can be generated by the m = 1 factor model 21 = . = .10 .3. 9. 9. [See ( Now.8.1.06 . 0"22 0"12 = 0"21 = Cll C2l = 23 = . as a result of an approximation method for obtaining the estimated factor loadings used by Stoetzel. calculate the reduced correlation matrix = p .843] [. Compare the results with those in Exercise 9. Establish the inequality (919). Verify the following matrix identities.0 . = yI/lL and '1'.'I' and the principal factor solution for the loading matrix L. Which variable might carry the greatest weight in "naming" the common factor? Why? 9. .42 . and '1'. In a stU?y of liquor preference in France.22 . A factor analysis of the 9 x 9 sample correlation matrix of rank orderings gave the following estimated loadings: = 1.1 and an m = 1 factor model. .64.507] Estimated factor loadings Variable (Xl) Liquors Kirsch Mirabelle Rum Marc Whiskey Calvados Cognac Armagnac FI .2. 9. Hint: Since S .08 .7 1 That is. and interpret these quantities..29 .(I + L''I'lLr l Hint: Premultiply both sides by (I + L''I'tL). Use the information in Exercise 9.68. .09 . = V. (b) Calculate Corr(2j .20 .1? Should it be? .749. Exercises 9.9] .6. e. noting that (LL' + '1') 1./2L by L. It exceeds the maximum value of .625. Cov (e.9.19 .
579 .874 .484 .738 .543 . 11.532 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Given these results.602 . (See [14].610 . The maximum likelihood factor loading estimates are given in (9A6) by Varimax rotated estimated factor loadings F.878 . (c) Proportion of variance explained by the factor.639 .000 .505 1.712 .000 . accounting replacement.604 1.652 1.594 where'& = A .831 513 B87 1. of historical.143 .10. Hirsche! and Wichern [7] investigate the consistency. obtain the maximum likelihood estimates of the following. Q Market relative excess value.0752 .681 . p.773 Variable Historical return on assets. Comment on the results. HRA Historical return on equity.617 1.467 . HRS Replacement return on assets.019 6.072 ] S = 103 8. i Verify. Stoetzel concluded the following: The major principle of liquor preference in France is the distinction between sweet and strong liquors.621 . this second factor plays.000 . Conduct a graphical orthogonal rotation of the factor axes.476 .303 .200 .352 .13. Refer to Exercise 9.j. (d) The residual matrix R .000 .000 HRE HRS RRA RRE RRS Q REV .894 .467 .826 .160 6.ii: .937 1.] 9. for this choice.688 . Does your interpretation agree with Stoetzel's interpretation of these factors from the unrotated loadings? Explain.937 1. and marketvalue measures of profItabIlIty for a sample of firms operating in 1977 is as follows: Using the unrotated estimated factor loadings. (c) The proportion of variance explained by each factor.717 . Hint: Convert S to Sn.IS. 4. 9. 6.877 . 5. does Stoetzel's interpretation seem reasonable? (b) Plot the loading pairs for the first two factors. [See (940).692 . In(height) . Generate approximate rotated loadings.LL + 'I' (WIth m = 1) versus HI: l: unrestricted cannot be carried out for thIS example.4) is . which can be understood by remembering that liquor is both an expensive commodity and an item of conspicuous consumption.000 .000 . and uses of and measures of profitability.625 . (d) The residual matrix Sn . In(width) 3. (a) The specific variances. .894 .14. that = .874 1.154 .828 . .602 . HRE Historical return on sales.I is a diagonal matrix . RRS Market Q ratio.499 .000 .j1/2i'& 1/2 1. Except in the case of the two most popular and least expensive items (rum and marc).603 519 . 9. The second motivating element is price.000 .731 .926 1.ercise Compute the test statistic in (939).926 1.563 . Indicate why a test of ?l: . obtain the maximum likelihood estimates of each of the followmg.11. The correlatIOn matnx. As part of their study. RRA Replacement return on equity.10. 2. REV HRA 1. Interpret the rotated loadings for the first two factors. variability of the judgments. (a) Specific variances.867 . The third factor concerns the sociological and primarily the regional.327 .411 .450 . (b) The communalities.569 . 9.417 [ 8.0765 Using the factor loadings. z· 9.000 .000 . (b) Communalities.1022 .744 . COlllpute the value of the varimax criterion using both unrotated and rotated estimated factor loadings.14) is 1. 3. a much smaller role in producing preference judgments.12.000 .319 .520 1.322 .000 .. RRE Replacement return on sales.005 6. Skull length Skull breadth Femur length Tibia length Humerus length Ulna length .608 1.000 .ll.000 . F. The correlation matrix for chickenbone measurements (see Example 9.422 1.419 .000 .603 The following estimated factor loadings were extracted by the maximum likelihood procedure: Estimated factor loadings Variable FI F2 Exercises 533 The maximum likelihood estimates of the factor loadings for an m = 1 model were obtamed: Estimated factor loadings Variable FI 1. EX. In(length) 2. The covariance matrix for the logarithms of turtle measurements (see Example 8.861 .) (a) Given what you know about the various liquors involved.855 . a factor analYSIS of accountmg and market estiJ?1ates of economic profits was conducted. determinants.375 . 9.482 .
Compute factor scores.. Verify that factor scores constructed according to (950) have sample mean vector 0 zero sample covariances. respectively.21. generate the sample covariance matrix. (a) Assume an orthQgonal factor model for the standardized variables Zi = (Xi . does an m = 3 factor model appear appropriate for these data? (c) Assuming that estimated loadings less than.628 . that marketvalue measures provide evidence of profitability distinct from that provided by accounting measures? Can you separate accounting historical measures of profitability from accounting replacement measures? 9. Interpret each factor. Compare the two sets of rotated loadings.12.. (b) Given your solution in (a).534 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices The following rotated principal component estimates of factor loadings for an m :.X4. obtain the principal component solution for factor models with m = 1 and m = 2.20. The following exercises require the use of a computer.238 .12 on page 536. and X6 given in Table 1.331 . 9. on which 100 indicates "average" performance. and compare the principal component and maximum hood solutions rotation.355 . l 9. Perform a varimax rotation of both m = 2 solutions in Exercise 9. Using the airpollution variables Xl> X 2 .160 . (b) Determine the residual matrix.4 are small.20.}Li)/VU:..18. interpret the three factors. and newaccount sales.X7] = [110.406 .892 .5.198 . using (951).499 .5.20. evaluate (i. . A firm is attempting to evaluate the quality of its sales staff and is trying to an amination or series of tests that may reveal the potential for good performance In 9. and mathematical ability. Interpret the factors. Refer to Exercise 8.98. (b) Find the factor scores from the principal component solution.01 level. LV + 'I' for both m = 2 and m = 3 at the Cl' = . Your analysis should include factor rotation and the computation of factor scores. (c) Compare the three sets of factor scores.234 . Calculate the salesperson's factor score using the weighted least squares method and the regression method. which purported to measure creativity.483 .22. which choice of m appears to be the best? (e) Suppose a new salesperson. Start with R and obtain both the· maximum likelihood and principal component solutions. Interpret the factors for the m = 1 and m = 2 solutions. Use the sample covariance matrix S. abstract reasoning.414 . . (a) Obtain the principal component solution to a factor model with m = 1 and m = 2.t. Interpret the m = 2 and m = 3 factor solutions. obtain the rotated loadings for m = 2 and m = 3. Repeat Exercise 9.16. factor models with m = 1 and m = 2. Does it appear. These measures have been converted to a scale. is factored? Explain. With these results and those in Parts band c. factor model were obtained: Estimated factor loadings Variable Historical return on assets Historical return on equity Historical return on sales Replacement return on assets Replacement return on equity Replacement return on sales Market Q ratio Market relative excess value Cumulative proportion of total variance explained FI . Note: The components of x must be standardized using the sample means and variances calculated from the original data..X4. selected at random. Interpret the results.14.. Using the information in this example. 9.01 so that ir.. Refer to Example 9. Are the principal component and maximum likelihood solutions consistent with each other? 9.17.1 can be determined. Will the regression and generalized least squares methods for constructing factors scores for standardized stock price observations give nearly the same results? Hint: See equation (957) and the discussion following it.294. i = 1. Perform a factor analysis of the "stiffness" measurements given in Table 4.287 F2 F3 Exercises 535 The firm has selected a random sample of 50 sales people and has evaluated each on 3 measures of performance: growth of sales.895 . Does it make a difference if R. 9.928 . (b) Find the maximum likelihood estimates of L and 'I' for m = 1 and m = 2. mechanical reasoning.433 .ir.887 . Perform a factor analysis of the censustract data in Table 8.12.910 .125 .15.r .18.908 (a) Using the estimated factor loadings. for example.23.. 9.25. specific variances.IL. and check for outliers in the data.X6' Determine relisonall>lc: number of factors m. Note: Set the fourth diagonal element of ir z to .283 .296 . 9. . (a) Calculate the factor scores from the m = 2 maximum likelihood estimates by (i) weighted least squares in (950) and (ii) the regression approach of (958). . rather than S.105. The n = 50 observations on p = 7 variables are listed in Table 9. R .612 . (c) Rotate your solutions in Parts (a) and (b). (d) Perform a factor analysis using the measurements XI .2. Each of the 50 individuals took each of 4 tests. X 5 .789 . (a) Using only the measurements XI . . Which choice of m do you prefer at this point? Why? (d) Conduct a test of Ho: I = LV + 'I' versus HI: I . . Refer to Exercise 9. determine the specific variances and communalities. Compare the results.35]. profitability of sales.ir z ' Given this information and the cumulative proportion of total variance explained in the preceding table. obtains the test scores x' = [Xi> X2.708 . obtain the maximum likelihood solution for . and LL' + ir. (c) List the estimated communalities. Obtain either the principal component solution or the maximum likelihood solution for m = 2 and m = 3 common factors .7.for the m = 2 and m = 3 solutions. Compare the solutions and comment on them.20. Comment on your choice of m.079 .16 concerning the numbers of fish caught. (b) Using only the measurements XI . starting from the sample correlation matrix. .19.24. (c) Compare the factorization obtained by the principal component and maximum likelihood methods.3 and discussed in Example 4. 9.
8 103. Consider the miceweight data in Example 8.8 107.0 115.5 92.5 (X3) 97. Also. Note: Be aware that a maximum likelihood solution may result in a Heywood case.5 105. Compare your results with the results in Exercise 9.5 113.3 106.) Perform a factor analysis of the speed data. Compare the results obtained from S with the results from R.) (a) Obtain the principal component solution to the factor model with m = 1 and m = 2. Refer to Exercise 9.8. and check for outliers in the data. obtain the factor scores using the maximum likelihood estimates of the loadings and Equation (958).5 92.0 106.3 95.8 102.8 104. are the interpretations of the factors roughly the same? If the models are different. (See Exercise 8.0 97.3 104.8 102. .0 100. Can the information in these data be summarized by a single factor? If so.8 103. rather than S. Repeat the analysis with the sample correlation matrix R.5 110.12 Salespeople Data Index of: Sales growth Sales profitability Newaccount sales Score on: Mechanical Abstract MatheCreativity reasoning reasoning matics test test test test 9. (b) Find the maximum likelihood estimates of the loadings and specific variances for m = 1 and m = 2.0 96.0 106.5 103. Perform a factor analysis using observations on the four paper property variables.0 95. Perform a factor analysis of the psychological profile data in Table 4. Which analysis do you prefer? Why? 9.0 99. is factored? Explain.30.8 112.3 99. and check for out/iers in the data. Perform a factor analysis of the national track records for women given in Table 1.0 (X4) 09 07 08 13 10 10 18 10 14 12 10 16 08 13 07 11 11 05 17 10 05 09 12 16 10 14 10 08 09 18 13 07 10 18 08 18 13 15 14 09 13 17 01 07 18 07 08 14 12 (X5) 12 10 . and check for outIiers in the data. Convert the national track records for women to speeds measured in meters per second.8 102.3 108.0 102.5 114. Perform a factor analysis of the national track records for men given in Table 8. is factored? Explain.8 106.8 95.0 104.8 103.0 102. .5 88. Repeat this analysis with the sample covariance matrix S.3 87.8 109.5 107. rather than S.6.7.3 94.15 for VS. Use the sample correlation matrix R constructed from measurements on the five variables.3 95.8 107.3 107.0 109.0 121. rather than S.8 103. 1. Compute factor scores. SF.5 110.8 95. and BS and the sample correlation matrix R. Compare your results with the results in Exercise 9.3 106. and check for outliers.5 103.8 121..0 93.21. PrctFFB.0 102. (See Exercise 8.3 106.8 99.3 96.3 102.5 118.5 99.5 105.3 105. Repeat the steps given in Exercise 9.8 100. Factor the sample covariance matrix S and interpret the factors. Which analysis do you prefer? Why? 09 15 19 15 16 16 10 17 15 11 11 15 12 17 13 ' 16 12 19 20 17 15 20 05 16 13 15 08 12 16 11 9.3 102.26. .10. Repeat Exercise 9.30.28. Is the appropriate factor model for the men's data different from the one for the women's data? If not.5 105.8 104.5 106. 9.5 102. Convert the national track records for men to speeds measured in meters per second.3 102. 9.3 101.5 84.3 106. BL.3 109.3 110. Does it make a difference if R.32.2.3 99.5 115. for the mouse with standardized weights [. SaleHt.29.5 101. and Sale Wt.0 108.8 105. can you interpret the factor? Try both the principal component and maximum likelihood solution methods.0 101.8 97.0 90.3 94.5 107. Salesperson (xl) 1 2 3 4 5 6 7 8 9 10 11 12 13 '14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 93. 09 9.28. Benev. Indep.8 111.8 103.31.0 103.28.3 115.8 97.. (See Exercise 8.6. Compute factor scores. Start with the sample co variance matrix.) Perform a factor analysis of the speed data.8 122.5 105.33.26 by factoring R instead of the sample covariance matrix S.5].34. 9.3 94.0 118.5 106. BkFat. Use the sample covariance matrix S and interpret the factors.0 88.3 103. 9. FtFrBody.5 109.5 99.8 100.5 122.5 100.0 89.5 99.0 93.3 106. Repeat the analysis with the sample correlation matrix R. Refer to Exercise 9. is fadored? Explain. is fadored? Explain.3 103.5 96.8 112. rather than S.0 92.5 92.8 83.8 104.536 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices Exercises 537 Table 9. The pulp and paper properties data are given in Table 7.8 96. explain the differences.0 108.8 103.28. Use the seven variables YrHgt. Perform a factor analysis of the data on bulIs given in Table 1. (c) Perform a varimax rotation of the solutions in Parts a and b.3 120.3 104.9.3 106. Frame.8 92.27. Does it make a difference if R.3 106.8 96.3 103. Compute factor scores.8 99.0 91. 12 14 15 14 12 20 17 18 17 18 17 10 10 09 12 14 14 17 12 11 (X6) 09 10 09 12 12 11 09 15 13 11 12 08 11 11 08 05 11 11 13 11 07 11 07 12 12 07 12 11 13 10 08 11 10 14 08 14 12 12 13 (x7) 20 15 26 29 32 21 25 51 31 39 32 31 34 34 34 16 32 35 30 27 15 42 16 37 39 23 39 49 17 44 43 10 27 19 42 47 18 28 41 37 32 23 32 15 24 37 14 9.0 88.3 95. Supp.8 87. Does your interpretation of the factor(s) change if S rather than R is factored? 06 10 09 11 12 11 08 12 11 09 36 39 .0 120.0 110.3 95.19. Can you interpret the factors? Your analysis should include factor rotation and the computation of factor scores.5 121. Conform and Leader. Use the sample covariance matrix S and interpret the factors.6.5 89.0 94.3 107.8 99.0 119. Obtain both the principal component and maximum likelihood solutions for m = 2 and m = 3 factors. EM.30.5 99. .5 99.8 106.3 106. Compute factor scores.8 110.8 95.0 102.5 99.3 119.3 99. Repeat the analysis with the sample correlation matrix R. Does it make a difference if R. Use the sample covariance matrix S and interpret the factors.3 99. Does it make a difference if R.0 81. .0 (X2) 96.5 103. Repeat the analYSis with the sample correlation matrix R.6.
6. Canonical correlation analysis focuses on the correlation between a linear combination of the variables in one set and a linear combination of the variables in another set. T.3 (1977). The pairs of linear combinations are called the canonical variables. 11. "A Factor Analytic Study of Olympic Decathlon Data. Morrison. 10. Statistical Software Manual to Accompany BMDP Release 71version 7. HoteIling ([5]. is represented by the (p X 1) random vector X(l). Lawley. F. "Factor Analysis by Least Squares and ." Storrs AgriculturalExperimental Station Bulletin.2005. 1. 1976. D. Can these data be summarized by a single factor? Explam. D. Chicago: The University of Chicago Press. Your References 1.Maximum Likelihood. 1977. S. Hirschey. S. Anderson." Journal of Busmess and Economic Statlstlcs. no. edited by O.. Can you identify any outliers in these data? m = 3 4 and 5 factors.M. Dixon. and their correlations are called canonical correlations. and H. in the theoretical development. of q variables. 14..no. trika. Harmon. sample matrix R." Journal of the Royal Statistical Society (B) 16 (1954). 1975. K. lA: Iowa State UmvefSlty Press. 9. Dunn.7. M. we determine the pair of linear combinations having the largest correlation among all pairs uncorrelated with the initially selected pair. W." British Journal of Psychology.375383. Next.1133. S. Belmont. Repeat Exercise 9. H. "A Note on Multiplying Factors for Various ChiSquared Approxima. and D. and ZST. 2. 4 (1984). is represented by the (q X 1) random vector X(2).ofp variables. Stoetzel. Bartlett. 15. and so on. "The Interpretation of Multivariate Systems. 28 (1937).538 Chapter 9 Factor Analysis and Inference for Structured Covariance Matrices 9.hould include factor rotation and the computation of factor scores. A. Joreskog.562568." Journal of Advertising Research. 3.0 (paperback)." Research Quarterly.). Multivariate Analysis in Behavioral Research." In Statistics and m Biology. Kaiser. 97104. [6]). Maxwell. WIlf. London: Chapman and Hall. New York: John Wiley. M. S. "The Statistical Conception of Mental Factors.tions.9. 187200. Multivariate Statistical Methods (4th ed. CA: Brooks/Cole Thompson Learning. Kempthorne and others. E. provided the example of relating arithmetic speed and arithmetic power to reading speed and reading power.) Other examples include relating governmental policy variables with economic goal variables and relating college "performance" variables with precollege "achievement" variables. 1971. 7.). Berkeley. 10. 539 . S.23 (1958). Try both the principal component and maximum methods for .2 Canonical Variates and Canonical Correlations We shall be interested in measures of association between two groups of variables.F.1 Introduction Canonical correlation analysis seeks to identify and quantify the associations between two sets of variables. The maximization aspect of the technique represents an attempt to concentrate a highdimensional relationship between two sets of variables into a few pairs of canonical variables. The first group. and A.). W. CA: University of California Press. L. that X(l) represents the smaller set. 48. Enslem. Bartlett. The canonical correlations measure the strength of association between the two sets of variables. C. . H. 2. The idea is first to determine the pair of linear combinations having the largest correlation. New York: American Elsevier Publishing Co. (See Exercise 10. Factor analyze the Mali family farm data in 8. Ralston.711. 1954." Statistical Methods for Digital Computers. "Accounting and Value of Consistency.34 using observations on the pulp fiber characteristic var!ables AFL. "The Effect of Inbreeding on the Bones of the Fowl. Maxwell. Factor Analysis as a Statistical Method (2nd ed. 13." Psychome CANONICAL CORRELATION ANALYSIS 10. FFF. 8. Ames.3S. 9. H. who initially developed the technique. Can you interpret the factors? Justify your chOice of m. We assume. A. E.). N. M. W. "The Varimax Criterion for Analytic Rotation in Factor Analysis. Wichern. G. 4. 12. Determinants and Uses. The second group. LFF. Modern Factor Analysis (3rd ed.36. An Introduction to Multivariate Statistical Analysis (3rd ed. 1992. Linden. Wright.. . 2003. so that p :5 q. "A Factor Analysis of Liquor Preference.1112. edited by K. 52 (1928). H. 5. New York: John Wiley.l (1960).296298.
. is the pair of linear combinations V 2 . so...(2).X(2»b = a'1: 12 b a'1: 12 b Ya ' 1:11 a Yb'I 22 b 1 ) We shall seek coefficient vectors a and b such that Corr(V. X(2) = (102) is as large as possible. The kth pair of canonical variables. We define the following: The first pair of canonical variables. The second pair of canonical variables.. The following result gives the necessary details for obtaining the canonical variables and their correlations. let E(X(1» = p. we find that the random vector for some pair of coefficient vectors a and b. The mam task of can(o)mcal . which maximize the correlation (107) among all choices that are uncorrelated with the first pair of canonical variables. it is often linear combmabons of that are ing and useful for predictive or comparative purposes. X(2» = 1:12 .b = p. interpreting the of 1: 12 . pq elements of I12 measure the association between two.540 Chapter 10 Canonical Correlation Analysis Canonical Variates and Canonical Correlations 541 For the random vectors X(J) and X(2). which maximize the correlation (107) among all choices uncorrelated with the previous k ... The correlation between the kth pair of canonical variables is called the kth canonical correlation. or first canonical variate pair.. = ((p+q)X1) E(X) = E(X) = P. (qXp) (104) I21 i I22 Result 10.1. m I 21 .. Linear combinations provide simple summary measures of a set of variables. is the pair of linear combinations Vb V1 having unit variances. Set V = a'X(l) V = b'X(2) Cov (X(1» = 1:11 Cov(X(2» = 1:22 (105) Cov (X(1).: = attained by the linear combinations (first canonical variate pair) V1 = eiI1i12 X(1) ai and Vi fiIZY2x(2) . V) = (107) X ((p+q)X1) X(1)] = [ . equ1valently. the assoc1atlOns ... For coefficient vectors a and b .collectlvely 1S ly hopeless. Then. V) a. rank...sets.. Moreover. V) = a'1: 11 a (106) x(1) xi = b' Cov(X(2»b = b'I22 b = a' Cov(X(1). using (105) and (245)..... p and q are relatively large. At the kth step.1 canonical variable pairs. V2 having unit variances. That is. E(X(2» = p. j.. has mean vector p. form the linear combinations U = a'X(l) (pX1) (qx1) and V = b'X(2).. or second canonical variate pair.. Cov (X(2) = 1