This action might not be possible to undo. Are you sure you want to continue?
STATISTICS FOR PSYCHOLOGISTS
An Intermediate Course
Institute ofPsychiatry, King’s College Uniuersity o London f
Brian S. Everitt
7 Connie b
and Vera
Copyright @ 2001 by Lawrence Erlbaum Associates, Inc. I All rights reserved. part of this book may be reproduced in No or any form, by photostat, microfilm, retrieval system, any of the other means, without prior written permission publisher. Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue J Mahwah, N 07430
I Cover design byKathryn Houahtalin~ I Lacev
Library of Congress Cataloginginhbllcation Data
Everitt, Brian. Statistics for psychologists : intermediate course Brian S. Everitt an / cm. p. and Includes bibliographical references index. ISBN 0805838368 (alk. paper) 1. Psychologystatisticalmethods. I. We. BF39 2M)l .E933 519.5’024154~21
oO065400
Books published by Lawrence Erlbaum Associatesprinted on are are acidfree paper, and their bindingschosen for strength and durability.
of Rinted in the United StatesAmerica I O 9 8 7 6 5 4 3 2 1
Contents
Preface
iv
1
2 3 4 5
Statistics in Psychology: Data, Models, and a LittleHistory Graphical Methodsof Displaying Data Analysis of Variance I: The OnewayDesign Analysis of Variance I:Factorial Designs I Analysis of Repeated MeasureDesigns
1
21 63
97
133
6 Simple Linear Regression and Multiple Regression Analysis 7 Analysis of Longitudinal Data 8 DistributionFreeand ComputationallyIntensive Methods
161
209 237 267 293 327 358 369 373
9 Analysis of Categorical Data I Contingency Tables
and the Chisquare Test
10 Analysis of Categorical Data I:LogLinear Models I
and Logistic Regression
AppendixA
Statistical Glossary
Appendix B Answers to Selected Exercises
References
Index
iii
Preface
Psychologists (and even those training to be psychologists) need little persuading that a knowledgeof statistics is important in both the designof psychological studies andin the analysis of data from such studies.As a result, almostall undergraduate psychology students are exposed introductory statistics course an to during their 6rst year of study at college or university. The details of the topics covered in such a course will no doubt vary from place to place, but they will almost certainly include a discussion of descriptive statistics, correlation, simple regression, basic inference and significance tests, p values, and confidence intervals. In addition, in an introductory statistics course, some nonparametric tests may be described and the analysis of variance may begin to be explored. r, Laudable and necessary as such courses ae they represent only the first step in equipping students with enough knowledge statistics to ensure that they of have of a reasonable chance progressing to become proficient, creative, influential, and or even, perhaps, useful psychologists. Consequently, in their second third years (and possibly in later postgraduate courses), many psychology students will be encouraged to learn more about statistics and, equally important,how to apply statistical methodsin a sensible fashion. It is to these students that this book is of aimed, and it is hoped that the following features the text will help itreach its target.
iv
PREFACE
V
1. The centraltheme is that statisticsis about solving problems; data relevant to these problems are collected and analyzed to provide useful answers. this To end, the book contains a large number data sets arising from real problems. of real Numerical examplesof the type that involve the skiing activities of belly dancers as and politicians are avoided far as possible. 2. Mathematical details methods are largely confined to the displays. For the of mathematically challenged, the most difficult of these displays can be regarded as “black boxes” and conveniently ignored. Attempting to understand the “input” to and “output” from the black box will be more important than understanding the minutiae of what takes place within the box. In contrast, for those students of with less anxiety about mathematics, study the relevant mathematical material (which on occasion will include the use of vectors and matrices) will undoubtedly help in their appreciationof the corresponding technique. 3. Although many statistical methods require considerable amountsof arithmetic for their application, the burden actually performing the necessary calcuof lations has been almost entirely removed by the development and wide availability of powerful and relatively cheap personal computers and associated statistical sof ware packages. is assumed, therefore, that most students will be using such tool It when undertaking their own analyses. Consequently, arithmetic details are nois ticeable largelyby their absence, although where a little arithmeticconsidered it is in helpful in explaining a technique, then included, usually a table or a display. 4. There are many challenging data sets both in the text and in the exercises provided at the end of each chapter. (Answers, or hints to providing answers, to many of the exercisesare given in an appendix.) 5. A (it is hoped) useful glossary is provided to both remind students about the most important statistical topics they will have covered in their introductory course and to give further details some of the techniques covered this text. of in 6. Before the exercises in each chapter, a short section entitled Computer hints is included. These sections intended to help point the way to to undertake are how the analyses reported in the chapter by using a statistical package. They are not intended to provide detailed computer instructions for the analyses. The book r at concentrates largely twopackages, SPSS and SPLUS.(Detailsa eto be found on www.spss.com and www.mathsoft.com. A useful reference for the software SPSS is Morgan and Griego, 1998,and for SPLUS, Krause and Olson,0 0 ) 2 0 . The main is widely used by reason for choosing the former should be obviousthe package working psychologists and psychology students and is an extremely useful tool by for undertaking a variety analyses. But why of SPLUS because, at present least, at the package not a favorite psychologists? Well, there are two reasons: the first is of is that the authoran enthusiastic (although hardly an expert)the software; is of user the second is that he sees SPLUS as the software of the future. Admittedly, SPLUS requires more initial effort to learn, the reward for such investmentis but the ability to undertake both standard and nonstandard analyses routinely, with added benefit of superb graphics. This is beginning to sound like a commercial, so
vi
PREFACE
perhaps I will stop here! (Incidentally, those readers do not have access for who to the SPLUSpackage but would like to it out, there a free cousin,R, details try is www.cran.us.rproject.org.) of which are given on It is hoped that the text will useful in a number of different ways, including be the following.
1. As the main part of a formal statistics coursefor advanced undergraduate also and postgraduate psychology students and for students in other areas of
27 the behavioral sciences. For example, Chapters could for the basisof a 10 course on linear models, with, perhaps, Chapter being used to show how be such models can generalized. 2. As a supplement to existing coursdhapters 8 and 9, for example, could an be used to make students more aware of the wider aspects of nonparametri methods and the analysis categorical data. of 3 For selfstudy. . 4. For quick reference.
I would like to thank MariaKonsolaki for running some SPSS examples for me, and my secretary, Haniet Meteyard, for her usual superb typing and general helpful advice in preparing the book.
Brian S. Everitt 00 London, June2 0
1994. and from basic research to the applied professions of clinical. Companion Encyclopediaof Psychology. and is many students are attracted to it because of its exciting promise of delving into many aspects of human behavior. psychology indeed a diverse and fascinating subject. are seems seen as anything but exciting. INTRODUCTION Psychology is a uniquely diverse discipline. statisticians). Andrew M.1. mostly (and. sensation. anda Little History 1. perception. emotion. early in their studies. industrial. educational. such students thatthey are often called on. and as to personality. to learn about statistics. organizational and forensic psychology. As Dr. cognition. because the subject sadly. to name but a It probably comes a disagreeable shock many few.Statistics in Psychology: Data. and the general opinion to be that both should be avoidedas far as possible. Colman states. its practitioners.Colman. this “head in the sand” attitude toward statistics taken by many a psychology student is mistaken. ranging from biological aspects of behaviour on the one hand to social psychology onthe othel. Models. However. counselling. Both statistics (and statisticians) can be very 1 .
means. t tests. and on. a value is still regarded as the holy grail and almost the ruison d’ &re of statistics (and providing one. road from planning an investigation to publishing its results in a prestigious psychology journal. even be made to carry out the arithmetic themselves). statisticians (I can’t t i k why). covering topics such descriptive statisticshistograms. it freAlthough such a course often provides essential grounding quently gives the wrong impression of what is and what is not of greatest importance in tackling real life problems. chisquared tests. most psychologists not. many psychologists determined to experience on finding ap value of . on their introductory course.For many psychology students (and their p teachers). learn how to perform a t test (they may. for example. Readers evenbe asked to abandon the ubiqmay uitous signficance test altogether favor of. Derole still seem spite the numerous caveats issued in the literature. a graphical display that in makes the shucture in the data apparent without any formal analyses. will attempt to transform it the knowledge gained in such a course into a more suitable form for dealing with the complexities of real data. of tial at each stage in the often long. questions have to be asked about what and hn how muchstatistics the average psychologist needs It is generally agreed to know. buildBy ing on and reshaping the statistical knowledge gained their firstlevel course. in . still How can I summarize and understand main features of this set of data? the The aim of this text is essentially twofold. be encouraged to replace the formal usesignificance tests and the rigid interpretation values of of p as with an approach that regards such tests giving informal guidance on possible evidence of an interesting effect. poor things. second. and standard deviations. and have no are desire to be. and frequently far from smooth. at least. Again. More formal methods esof timation and significance testing may also be needed building a model for the in data and assessing itsfit. Readers will. Consequently. Nevertheless. a knowledge and understanding the subjectis essen!. psychology students may.051.M9 and despair on finding joy one of . so correlation and regression. the normal distribution. inference. the chief of statisticians). if not more importantly.for example. First it will introduce the reader to a variety of statistical techniques not usually encountered in an introductory course. The initial examination data and their description will of involve a variety of informal statistical techniques. variances. that psychologists should have some knowledge of statistics.2 CHAPTER 1 exciting () and anyway. in statistics. Statistical principles will beneeded to guide the design of a study and the collection of data. but they may be ill equipped to answer the question. and this is reflectedin the almost universal exposure of psychology students to an introductory as course in the subject. and simple analysesof variance. and equally. elementary probability. for example. confidence intervals.
Most investigations will involve little clear separation between each of these four components. 3. building. The testing of hypotheses about parameters. 1.but for now it will be helpful totry to indicate.particularly by the nonspecialist. distributional assumptions might examined (e. of 1. obtain simple descriptive summaries. AND STATISTICAL MODELS This text willbe primarily concerned with the following overlapping components of the statistical analysis of data. “sir David Cox.so on.relabe tionships between variables examined. the unique parts each. 4. a recording error). just four examplesof horror stones whichare part of statistical folklore.The aim is to clarify the general structure of the data. “Christopher Chaffield. only questionable statistical routines. but often also as an aid to later model formulation. Model formulation. STATISTICS DESCRIPTIVE.STATISTICS I PSYCHOLOGY N 3 students will be better equipped(it is hoped) to overcome the criticisms of much current statistical practice implied in the following quotations from two British statisticians: Most reallife statistical problems have or more nonstandard features. In some cases the results from and this stage may contain such an obvious message that more detailed analysis becomes Initial Examination of Data .. 1. 2. in general terms. with the aimof making any interesting patterns in the data morevisible. The The initial examination of data is a valuable stageof most statistical investigations. Indeed statistics is perhaps more open to misuse than any othersubject. whether data are normal)..e. not only for scrutinizing and summarizing data. The misleading average. There one are no routine statistical questions. perhapsget ideas and for a more sophisticated analbe the ysis.and assessment. possible outliers identified (i. STATISTICS INFERENTIAL. The estimation of parameters of interest. The initial examination of the data.g. for example.1.2.2. observations very different from the bulkof the data that may the resultof. At this stage. Many statistical pitfallslie in wait for the unwary. the graph with“fiddled axes:’ the inappropriate pvalue and the linear regression fitted to nonlinear dataare.
Estimation and Significance Testing Although in some cases an initial examination the datawill be all that is necof essary. in Under a relative frequency of probability. 1986). First. Many the methods used inthis preliminary analysisof the of gmphicul. is at this point that the beloved sigIt nificance test (in some form or other) enters the arena. Despite numerous attem by statisticians to wean psychologists away from such tests (see. Youknow. You can deduce the probabilityof the experimental hypothesis beingtrue. the probability that you are making the wrong decision. You have a reliable experiment in the sense that if. view Only 3 out of the 70 subjects came the conclusion. P = 0.g. There are a number of reasons why it should not. hypothetically. or postgraduate students.all six statements are in factfalse. and itis some of these that are described in Chapter 2. the value retainsa powerful hold overthe average psychology researcher and psychology student.7.4 CHAPTER 1 largely superfluous. if you decided to reject the null hypothesis. The correct interpretation to of t value the probability associated with the observed is the probability of obtaining the observed (or data that represent a more extreme data departure from the null hypothesis) if the null hypothesisis true. e. Althoughp values appearin almost every account of psychological research findings.2. you would obtain a significant result on 99% of occasions. The results presented Table 1. df = 18. Please mark each of the statements below as true or false.2.. You have absolutely proved your experimental hypothesis. Further suppose youa use simple independent means ttest your and result is t = 2. research fellows. Gardner p and Altman. for example. You have absolutelydisproved the null hypothesis that there is no difference between the population means. p value is poorly understood. there is evidence that the general degree of understanding of the true meaningof the term is very low. Oakes (1986). The subjects were all university lecturers.1 are illuminating. . the experiment were repeated a great number of times. data will be 1.01. most investigations will proceed to a more formal stage of analysis that involves the estimationof population values of interest and/or testing hypotheses about paaicular values for these parameters. put the following test 70 academic psychologists: to a Suppose you have treatment whichyou suspect may alter performance on a certain task You compare the meansyour controland experimental groups (say 20 subjects of in each sample). You have found the probability of thenull hypothesis being true.
and although the underlying logic of interval estimaton is essentially similar to that of significance tests. ap value represents only limited information about the results from a study. the significance test relates to what the population parameter is nor: the confidence interval gives a plausible range for what the parameter i. 1986 for details). Second. in relation to a statistical null hypothesis. The probabilityof the experimental hypothesis be deduced. 2.1 85. simply as a means of rejecting or accepting a particular hypothesis.1 60 42 65.1 60.1 Clearly the number of false statements described as true in this experiment would have been reduced if the true interpretation of a p value had been included with the six others. at the expense other ways of assessingresults. has reached such a of degree that levelsof significance are often quoted alone in the main text and in abstracts of papers with no mention of other more relevant and important quantiti or The implicationsof hypothesis testingthat there can always be a simple yes no answeras the fundamental result frompsychological studyis clearly false. The probability that the decision taken is wrong is known. a and usedin this way hypothesis testingis of limited value. Instead they give a plausible range values for the unknown parameter. Nevertheless.4 35. Suchconfidence intervals can be found relatively simplyfor many quantities of interest (see Gardner and Altman. can 5. 4. to estimate the magnitude some pais of rameter of interest along with some interval that includes the population value of the parameter with some specified probability. probability of being significant. 3. A replication has a. ! B 1.STATISTICS I PSYCHOLOGY N TABLE 1.1 Frequencies and Percentagesof ‘True” Responses i a Test of Knowledge n of p v l e aus Statement 5 f 1 2 5 4 46 % 1.As Oakes (1986) rightly of comments.0 5. s . Gardner and Altman make the point that the excessive p values (1986) use of in hypothesis testing. The null hypothesis is absolutely disproved. The probabilityof the null hypothesis hasbeen found. The experimental hypothesis is absolutely proved. the exercise is extremely interestingin highlighting the misguided appreciation p values held by a group of research of psychologists. 6. The most common alternative to presenting results in terms of p values. they do not carry with them the pseudoscientific hypothesis testing languageof such tests.
2. purely from a pragmatic of view. the are neededby psychology students because they remain central importancein of the bulk of the psychological literature.6 CHAF’TER 1 So should thep value be abandoned completely? Many statistician would anI at swer yes. which two allows for a possible measurement is enor.” Such values should rarely be used in a purely confirmato way.A more realistic representation of the scores. for example. of an even when the required often posassumptions of whatever test beiig used are known to be invalid. would be a resounding “maybe. measure of verbal ability. by modeling some associated observable data. butthink a more sensible response.1) ( 1. then a possible model is happening is an of what 20 = [person’s initial score). proabout cess. is believed that studying the dictionary has caused improvement.2) The improvement can be found by simply subtracting the first score from the second. 24 = (person’s initial score] + [improvement}. while being convenient to use. A road map. and after studying a dictionary 24 If it some time. and other features. attempting to reproduce the relative positions towns. The Role of Models in the Analysis Models imitate the properties of real objects in a simpler or more convenient of form. oractivity. a child for for has scored20 points on a test of verbal ability. scores points on a similar test. (1. roads. Chemists use of models of molecules to mimic their theoretical properties. least for psychologists. of or Statistical model sallow inferences to be made an object. In this text. models part the Earth’s surface.isIt is sible to assess whethera p value is likely to be an underestimate overestimate or and whether the result clear one way or the other. of Data 1. model follows accurately as A good as possible the relevant properties the real object.3. of course.example. The €1 and terms . Such a model is. because it assumes that verbal ability can be measured exactly. bothp values and is be point former confidence intervals willused. but in an exploratory fashion they can be useful in giving some informal guidance on the possible existence interesting effect. Suppose. very naive. which inturn can be used to predict the behavior of real objects. and 6 represents the improvement score. y represents the true initial wherexl and x2 represent the two verbal ability scores.
including variation from the measurement system. you should not is on perform operations and analyses the data that assumeit is true. in the case above it says that the improvement S is (An considered to be independentof y and is simply added to it.2). Here the improvement score estimated can be x2. variation caused by environmental conditions that change over the course of a study. y)S + €2. In many situations. additive. Now the parametersare multiplied rather than added to give the observed scores. A suitable model would be  x1 x2 = y +El.1) and (1. to incorporate .This variation may be the result of a variety of factors. is given or assumed. x1 and x2.7) (1. One of the principal uses statistical models to attemptto explain variation of is in measurements. The general principles of model formulation are covered in detail in books on scientific method. = y +(A. Then an appropriate model would be as x1  the measurement error. linear models. however.8) With this model there is no way to estimate S from the data unless a value of A. variation from individual to individual (or experimental unit to experimental unit). dictionary improves performance on the verbal ability by some proportionof test the child’s possible improvement.  (1. by A further possibility that there is a limit. such as those given in Eqs. important point that should be noted here that if you do not believe in a model.STATISTICS I PSYCHOLOGY N €2 represent 7 A model gives a precise description what the investigator assumes occurof is ring in a particular situation. to improvement. Here S might be estimated dividing x2 by xl.h y. The decision about an appropriate model should be largely based on the investigator’s prior knowledge of an area.) Suppose now that it is believed that studying the dictionary does more good if a child already has a fair degreeof verbal ability and that the various random influences that affect the test scores are also dependent on the true scores.for example. use such models. and so on. (1. and studying the is h. because such models allow many powerful and informative statistical techniques to be applied to the data. Formulating an appropriate model can be a difficult problem. and in recent yearsgeneralized linear models (Chapter 10) have evolved that allow analogous models to be applied to a wide varietyof data types.Analysis of variance techniques (Chapters 35) and regression analysis (Chapter 6). are invoked by default. as but they include the need to collaborate with appropriate experts.
The formulation a preliminary model from initial examination the data of an of is the first step in the iterative. based some on an initial examinationof the data. Apart from those formulated entirely onapriori theoretical grounds. at which she sighed. Investigators need to be is aware of the possible dangers such a process. cycle of model building. and so on). extent at least. When data are difficult or expensive to obtain. but ascatlerdiugrum and a histogram (see Chapterwill be of crucial im2) of and portance in indicating the “shape” the relationship in checking assumptions such as normality. of The most important principalto have in mind when testing modelson data is that of parsimony. How of B? does A affect Do A and B vary together? Is A significantly differentfrom B?. what’s the answer?” They remained uncomfortably quiet.the procedures used to answer them. however. but the initial data analysis is crucial in selecting a subset models from of In is determined the class.3.. “In that case. the “best” model one that provides an adequate fit to is data with thefewest numberparameters. therefore. This is often known as Occam’s razor. I . for example. The more usual intermediate case arises whenofamodels is entertained class apriori. TYPES OF STUDY It is said that when Gertrude Stein lay dying. most modelsto are. then some model modification and assessment of fit on the original data almost inevitable. the in choice of which methods to usein research is largely determined by the kinds of questions that are asked. This can produce some problems because formulating a model and testingon it the same data not generally considered good science. and so on.g. that is. formulationcriticism. although completely empirical models are rare. what’s the question?” Research in psychology. is always preferableto is It confirm whether a derived modelis sensible by testing on new data. a regression analysis. scientific research involves a sequence of asking and answering questions about the naturerelationships among variables (e. Do politicians have higher IQs than university lecturers? men havefaster reaction times than women? Should phobic Do patients be treated psychotherapy or by abehavioral treatment such as flooding? by Do children who are abused have more problems laterin life than children who are not abused? Do children of divorced parents suffer more marital breakdowns themselves than children from more stable family backgrounds? In more general terms. is about searching for the answers to particular questions of interest. of which for those with a classical education is reproduced herein its original form: Entia nonstunt multiplicanda praeter necessitatem. and in science in general. . “Well. Scientific research carried out at many levels that differ the types is in of question asked and. she roused briefly and asked her as sembled friends. the general approach apriori. Thus.8 CHAPTER 1 much background theoryas possible.
the researcher identifies topics of interest. Here a study group of people (the cases) that all have a particular characteristic (a certain d perhaps) are compared with a grouppeople who do not have the characteristic of (the controls). attitudes and values.4). telephone conversations. Surveys Survey methods are based on the simple discovery “that asking questions is a remarkably efficient to obtain information from and about people” (Schuman way and Kalton.1. 1. those who experienced as the earthquake could be compared with a group people who did not. an investigator was interested in the health effectsof a natural disaster such an earthquake. the most common are perhaps the following: surveys. exchange as this or can be accomplished by means of written questionnaires. Surveys involve an exchange of information between researcher and respondent. generally over a periodtime. If. observational studies. A recent to risk example involved women that gave birth to very low birthweight infants (less than 1500g).3. The types of analyses suitable for observational studiesare often the sameas those used for experimental studies (see Subsection 1. and Dane. Depending upon length and content of the survey well as the facilities available. and the respondent provides knowledge or opinion about these topics. and experiments. p.2.STATISTICS I PSYCHOLOGY N 9 Of the many types of investigation used in psychological research. Raulin Graziano.3. inperson interviews. Observational Studies Many observational studies involve recording data the members of naturally on of occurring groups. 1985. . Some briefcomments about each of these four types are given below.see Colman (1994). 635). Another of commonly used type of observational is the cmecontrol investigation. quasiexperiments. Unlike experiments.3. for example. and comparing the rate at which a particular event ofinterest occurs in the different groups (such studies are often referred to as prospective). a more deby and tailed account availablein the papers Stretch.in terms of their past exposure some event or factor. Some examples of data collected in surveys and their analysis given in several later chapters. is which all appear in the second volume of the excellent Companion Encyclopedia ofPsychology. Surveys conducted psychologists are usually designed to elicit information by about the respondents’ opinions. beliefs. are 1. compared with women that had a child of normal birthweight with res to their past caffeine consumption.
instead. The an essential feature of experiment is the large degree of control in the hands of th experimentersin designed experiments. for example. the systolic blood pressure of naturally occurring groups of individuals who smoke. Some unidentified factors play a part determining both the level of blood in pressure and whether not a person smokes. In such a study any difference found between the blood be pressure of the two groups would open to three possible explanations. the experimenter deliberately changes the levels of the experimental factors to induce in the measured quantities. And. The level blood pressure has tendency to encourage or discourage smokof a ing. variation to leadto a better understanding of the relationship between experimental factors and the response variable. the researcher (as cannot allocate subjects to be smokers and nonsmokers would be required in an experimental approach). investigation In an 15 year of the effectiveness of three different methods of teaching mathematics to be given olds. and those who do not.3. to In a comparison of a new treatment with one used previously. experimenters control the manner in which subjects are allocated the different levels of the experimental factors. Smoking causes a change in systolic blood pressure. are compared. The classes that receive the different teaching methods would be selected to be similar to each other on most relevant variables. the researcher would have control over the scheme for allocating subjects to treat if the The manner in which this control is exercised is of vital importance results of . a method might to all the members of a particular class three in a school. in particular. 1. for example. for example. the lack of control over the groups to compared in an observational be in study makes the interpretation of any difference between the groups detected the study open to a variety of interpretations. particular (and like the observaIn tional study).10 CHAPTER l however. investigation into the relationIn an ship between smoking and systolic blood pressure. QuasiExperiments Quasiexperimental designs resemble experiments proper (see next section). the ability to manipulate the be compared is not under the into groups vestigator’s control. 2. but they are weak on some of the characteristics. for obvious ethical reasons. Experiments Experimental studies include most of the work psychologists carry out on anim and many of the investigations performed on human subjects in laboratories. Unlike the observational study. 3.3. 1. and the methods would be assigned to classes on a chance basis. the quasiexperimen involves the intervention of the investigator in the sense that he or she applies a variety of different “treatments” to naturally occurring groups. or 1. however.3.4.
it is not as precise as the . equality would be achieved in the long run.double blinding. the statistical methodsmost useful for the analysis ae of data derived from experimental studiesr the analysisof variance procedures . not all measureon a ment is the same. Observed treatment differences would be confounded with differences produced by the allocation procedure. Measuring an individual‘s weightqualitatively different from is as measuring his or her response to some treatment on a twocategory scale. for example. The requirement that one’s data be of high quality is at least as important a component of a proper study design as the requirement for randomization. biased estimates and even biased samples are some of the untowardconsequences of unreliable measurements that can be demonstrated. andoffers this control over such influences it ae r whether or not theyknown by the experimenter to exist. and reproducible. Measurement scales are differentiated according to the degree of precision involved. whereas other methodsof assignment may not. of is its cause is very likely to be the different treatments or conditions received by the two groups. Clearly. The primary benefit that randomization of has is the chance (and therefore impartial) assignment of extraneous influences among the groups to be compared. subjects who first to volunteer I are are all allocated to thenew treatment.STATISTICS I PSYCHOLOGY N 11 the experimentare to be valid. such “improved” or “not improved:’ for example.4.reasons nicely summarized be for (1986): in the following quotation from Fleiss The most elegant design a study will not of overcomethe damage causedby unreliable imprecise measurement. Whatever measurements are made. Larger sample sizesthan otherwise necessary. If it is said that an individual has a highIQ. precise. This randomization ensures a lackof bias. the interpretation an observed group differencelargely unambiguous. Whether a subject receives or the old treatment new the is decided.controlling where necessary for prognostic factors. and so on. WPES OF DATA The basic material is the foundation of all psychological investigationsthe that is measurements and observations made set of subjects. In the majority of cases. f. the same procedure were applied in repeated samplings. then the two groups may differ level of in motivation and so subsequently in performance. described in Chapters 3 5 1. they have to objective. The method most often used to overcome such problems random allocation is of subjects to treatments. In a properly conducted experiment (and is the main advantage of such a this study). by the toss a coin. for example. if however. Note that randomization does not claim to render the two samples equal with regard to these influences.
and blood group. With the use an accurate thennometer. gives a nice illustration of whythis is so. and finally to ratio scales. Measurement scales may be classified into a hierarchy ranging from categorical. Ordinal Scales The next level in the measurement hierarchy is the ordinal scale. A nominal scale classifies without the categories being ordered. or severely anxious. 1. Data categories haveno logical ordernumbers may be assigned to categories but merely as convenient labels.2. NominalorCategorical Measurements Nominal or categorical measurements allow patientsbe classified with respect to to some characteristic. 1. and he she may use the numbers0. mildly anxious. psychiatrist may. Certain characteristics is of of interest are more amenable to precise measurement than others. This has one additional property over those a nominalscalea logical orderingof the cateof gories. the numbers assigned to the categories indicate the mount of a characteristic possessed. Data categories mutually exclusive (an individual can belong one are to only category). that the difference in anxiety between patients with scores of say 0 and 1 is the same that between patients assigned scores 2 3. Frequently. however. for example. 1. The scores as and on an ordinal scale do. Each of these will now be considered in more detail. The psychiatrist cannot infer. however. Chapter 15. . assessing of or the degree of pain of a migraine sufferer is.2. can Quantifying the level anxiety or depression of a psychiatric patient. Techniques particularly suitable for analyzing this type of data are described in Chapters 9 and 10. is not appropriate of This if the steps on the scale are not known to be of equal length. though ordinal to interval. a subject’s temperaturebe measured very precisely. however. 2. The comment that a woman is tall IQ is not as accurate as specifying that her height 1. moderately anxious.l .88 m.4. however. With such measurements.1. The propertiesof a nominalscale are as follows. far a more difficult task. measurementson an ordinal scale are described in terms their mean and standard deviation. grade A patients on an anxiety scaleas not anxious.4. with lower numbers’ indicating less anxiety. and 3 to label the or categories. Examples of such measurements are marital starus.12 CHAPTER 1 statement that the individual has an of 151. sex. allow patients ranked with respect to the charto be acteristic being assessed. Andersen (1990).
Chapter 8 covers methodsof analysis particularly suitable for ordinal data. but an object weighing100 kg can be said to be twice as heavy as one weighing 50 kg.This of type of scale has one property in addition to those listed for interval scales. Data categories have a logical order. The zero point is completely arbitrary. 3. Data categories scaled according to the amount of a particular characterare istic they possess.4. the temperatures were measured on theKelvin scale. Data categories are scaled according to the amount of the characteristic they possess. 2. Data categories are mutually exclusive. This is not true of. An important point to make about interval scales is that the zero point is simply another point on the scale. 1. . The of properties an interval scale are follows. temperature on the Celsius or Fahrenheit scales.4. Consequently. name the possession of a true zero point that represents the of the characteristic absence being measured. An example of such a scale is temperature on the Celsius or Fahrenheit scale. 3.STATISTICS I PSYCHOLOGY N .3. on any part of the scale. 2. 1. an 1.4. which true does have a zero point.l 3 The followingare the properties of ordinal scale. statements can be made both about differences on the scale the ratio of points on the scale. the difference between temperatures of 80°F and 90°F is the same between temperatures of as 30°F and 40°F. Data categories are mutually exclusive. the statement about the ratio could be made. where a reading of 100" does not represent twice the warmth of a temperature of 50". Interval Scales The third level in the measurement scale hierarchy is the interval scale. Such l scales possess a l the properties ofan ordinal scale. however. Equal differences in the characteristic are represented by equal differences in the numbers assigned to the categories. 4. as 1. and An example is weight. If. reflect equal differences in the characteristic being measured. 5. say. does not represent the starting point of the it scale. where not only 100 as is the difference between kg and 50 kg the same between 75 kg and 25 kg. nor the total absence of the characteristic being measured. Data categories have some logical order. Ratio Scales The highest level in the hierarchy measurement scales is the ratio scale. plus the additional property that equal differences between category levels.
particularly that of analysis of variance (see Chapters 33. in some contexts. Essentially. Equal differences in the characteristic are represented by equal differences in the numbers assigned the categories. possess. in his major work. Data categories are mutually exclusive. 2. Data categories are scaled according to the amount of the characteristic they 4. 1. the former are the variables measured by the investigator that appear the lefthand side of the on equation defining the proposed model the data. A further classification of variable typesis into response or dependent variables (also oftenreferredto as outcome variables). An awareness of the different types of measurement that may be encountered in psychological studies is important. Lord Kelvin is quoted as sayingthatonecannotunderstand a phenomenon until it is subjected to measurement. and could therefore eventually be subjected to measurement and counting. Data categories have a logical order. exists in same amount. Psychology has long striven to attain “the dignity of science” by submitting its observations to measurement and quantification. The zero point represents an absence of the characteristic being measured. discussed the relevance o f probability theory to the collection of scientific evidence. Observations on Man (1749). .” And Galton was not alone in demanding measurement and numbers as a sine qua non for attaining the dignityof a science.see comments and examples in later chapters) with which most studies in psychology concerned. andindependent or explanatory variables (also occasionally calledpredictor variables). A LlTTLE HISTORY The relationship between psychology and statistics is a necessary one (honest!). David Hartley (17051757). are One further point.5. again inlater chapters. to 5. and Thorndike has said that whatever exists. A widely quoted remark by Galton “that until the phenomenaof any branch of is knowledge have been submitted to measurement and number. because the appropriate method statistical of analysis will often depend on the typeof variable involved. According to Singer (1979). the latterare variables thought for to possibly affect the response variable and appear on the righthand side of the model. It is the relationship between the dependent variable and the socalled independent variables (independent variablesare often related. 3.14 CHAPTER 1 The properties of a ratio scale are as follows. it cannot assume the dignity of a science. this point is taken up 1. the independent variables also often knownasfactor variables. are or simplyfactors.
that psychologists put undue faith signficance tests. and they 3.but the relationship between psychologist and statisticianis not always easy. WHY CANT A PSYCHOLOGIST BE MORE LIKE A STATISTICIAN (AND VICE VEFXSA)? Over the course of the past 50 years. or of of Analysis of variance techniques are covered in Chapters and 5. The principles of of experimental design and the analysis variance were developed primarily by of Fisher in the 1920s. Other had in early uses of the technique are reported Crutchfield (1938) and in Crutchfield and Tolman(1940). Several of these early psychological papers. the psychologist has become a voracious consumer of statistical methods. Groups of subjects are compared who differ with respect to the experimental treatment. and abuse . hs Since then. but otherwise are the same in all respects.the well earliest paper that ANOVA in its titlewas by Gaskill and Cox(1937).however. in The experimental tradition psychology has long been wedded to a particular statistical technique. An examination of years of issues of the British Journal ofpsychology. namely the analysis variance (ANOVA). who continued to analyze their experimental data with a mixture of graphical and simple statistical methods until into the 1930s.6. for example. the analysis variance in all its guises a become the main of 2 technique usedin experimental psychology.but they took time to be fully appreciated by psychologists. A longstanding tradition in scientific psychology is the application of John Stuart Mill’s experimental “method of difference” to the study of psychological problems.STATISTICS I PSYCHOLOGY N 1 5 and argued for the use of mathematical and statistical ideasin the study of psychological processes. Any difference in outcome can therefore be attributed to the treatment. and by 1943 the review paper of 40 studies using an analysis of variance or covariancein psychology and education. in Theyear 1940. ofcourse. happy. although paying service to the lip use of Fisher’s analysis of variance techniques. Control procedures such as randomization or matching on potentially confounding variables help bolster the assumption that the groups are the same in every way except the treatment conditions.or fruitful Statisticians complain an one. are then shown to be equivalent to the multiple regression model introduced in Chapter 6. relied heavily on more informal strategiesof inference interpreting experimental results. saw a dramatic increase in the use of analysis of variance in the psychological Garrett and Zubin was able cite over to literature. 1.4.According to Lovie (1979). often use complex methods in of in general analysiswhen the data merit only arelatively simple approach. showed that over50% of the papers contain one the other application the analysis variance.
COMPUTERS AND STATISTICAL SOFTWARE The development computing and advances statistical methodology have gone of in hand in hand since the early 196Os. once proposed interpretation of findings regarding the benefits an of nursery education. parents typically did not have any opportunity the schools their to choose children attended! One way of examining possible communication problems between psycholthe ogist and statistician is for eachto know more about the language of the other. once. general.7. But if statisticians at times horrified by the way in which psychologists apare are by ply statistical techniques. But it is not the statistician is faced with the frustration caused by a lastminute phone call who to from a subject who cannot take an experiment that has taken several hours part in arrange. a is suitable Psychologyfir Statisticians text. It is hoped this text will help in this process and enable young (and notso young) psychologists to learn more about theway statisticians approach the difficulties of data analysis. thus making their future consultations with statisticians more productive and . point addition.) 1. Additionally. however. power. (What missing in this equation is.16 CHAPTER l many statistical techniques. accomplished statistician. busily . An for example. who must confront possibly dangerous respondents. statisticians not appear to is In appreciate the complex stories behind each data in a psychological study. A statisticianmay demand a balanced design with subjects in 30 for each cell. It is probably hard for freshfaced students of today. It the psychologist who must continue to persuade peopletl about potentially to ak distressing aspects of their lives. of course. many statisticians feel that psychologists have become too easily seduced by userfriendly statistical software. which all subsequent positive effects could be accounted in for For in terms of the parents’ choice of primary school. in the country for which the results were obtained. so is. The statistician advising on a longitudinal study may call effort for more in carrying out followup interviews.less traumatic. or who arrives at a given (and often remote) address to conduct an intervie In do only to find that the personnot at home.that no subjects are missed. it was psychologists who had to suppress aknowing smile. psychologists no less horrified manystatisticians’ apparent lackof awareness of what stresses psychological research can place on an investigator.so as to achieve some appropriate power the analysis. it is not unknownfor statisticians perform analyses that are statistically to sound but psychologically naive or even misleading. and the increasing availability. and low cost of today’s personal computer has further revolutionalized the way users of statistics work. These statisticians are upset (and perhaps even made to feel a little insecure) when their to plot a few graphsignored in favor of a multivariate analysis covariance or is of similar statistical extravagence.
SAMPLE SIZE DETERMINATION One of the most frequent questions asked when first planning a study in psychology is “how many subjectsdo I need?” Answering the question requires consideration of a number factors.” and of “438134. or even earlier when large volumes numerical tables were the only arithmetical aids available. Thomson’s approach to the arithmetical problems faced when performing a factor analysisby hand. increasingly sophisticated and comprehensive statistics packages are available that allow investigators easy access enormous variety of statistito an cal techniques. consequently.31?” and “0. to imagine just what life was like for the statistician the userof statistics in the daysof and or of simple electronic mechanical calculators. descriptions arithmetito be of cal calculationswill be noticeable largely by their absence. the difficulty of of of . for example. Godfrey Thomson and his wife would. the amount time available. early in the evening.62?. even some equations. One major benefit of assuming that readers will undertake analyses by using a suitable packageis that details of the arithmetic behind the methods will only rarely have given. is not without considerable potentialperforming very poor This for grasped and often misleading analyses potential that many psychologists have (a with apparent enthusiasm). Mrs. In this text it will or be assumed that readers willbe carrying out their own analyses by using one other of the many statistical packages now available.23 multiplied by 714. which a numberof places will use vectors in and matrices. occasionally. Dr.44”: “What’s 904. but it would be foolish to underestimate the advantages as of statistical software to users of statistics such psychologists. and in every other way displaying their computer literacy on the current generationof Pcs. GodfreyThomsonwasaneducationalpsychologistduringthe1930sand 1940s.Professor A. place themselves on either sidesitting of their room fire. Readers who find too upsetting to even contemplate should pass this speedily by the offending material. Maxwell(personalcommunication)tellsthestory of Dr. and Thomson with a copy of Barlow’s Multiplication Tables. exploring the delights of the internet.72 divided by 986. be sufficientto undertake the corresponding analyses. 1. Nowadays.in most cases. although some tables will contain a little arithmetic where this is deemed helpfulin making a particular point. regarding it merely black box and taking as a heart that understandingits input and output will. and we shall even end each by chapter with l i i t e d computer hints for canying out the reported analyses using SPSS or SPLUS. It is a salutary exercise (of the ”young people today hardly know theyborn” variety) to illustrate with are a little anecdote what things were like in the past.STATISTICS I PSYCHOLOGY N 17 emailing everybody.8. and understand their results. Thomson equipped with several pencils and much paper. For several hours the conversation would consist little more than “What’s 613. Some tables will also contain a little mathematical nomenclature and.91728:’ and so on. E. According to Maxwell.
18 CHAPTER 1 finding the type of subjects needed. For example. testing at the 5% level and requiring a power of 80%. and committing to a magnitude of effect that the researcher would like to investigate.8. in which these factors are largely ignored.05. then Zap = 1.68. specifying the appropriate statistical test to be used. then Z.10) 1 So. m e formula in Eq. choosing the power the researcher would like to achieve (for those readers who have forgotten. Statistically.84)' x 1 . sample size determination involves identifying the response variable of most interest. power is simply the probability of rejecting the null hypothesis when it is false). u = 1. that is.1 For example.2.9) is appropriate for a twosided test of size 01.9) where A is the treatment difference deemed psychologically relevant. = 0. estimating the likely variability in the chosen response.96 + 0. at least in the first instance. and Z is the value of the normal distribution that cuts off an upper tail probability . setting a significance level. (1. of /3. and Za/2 is the value of the normal distribution that cuts off an upper tail probability of a/& that is. 16 observations would be needed in each group. Readers are encouraged to put in some other numerical values in Eq. (1. for A = 1. for a study intending to compare the mean values of two treatmentsby means of a z test at the a! significancelevel. /? = 1 power. the formula for the number of subjects required in each group is (1. Typically the concept of apsychologically relevant difference is used. (1. However.a particular type of approach is generally used.84. if a! = 0. but its general characteristicsare thatn is anincreasing function of 0 and a decreasing function of A. so that /? = 0. both of which are intuitively n= 2 x (1. when the question is addressed statistically. the difference one would not like to miss declaring to be statistically significant. These pieces of information then provide the basis of a statistical calculation of the required sample size.96. essentially. if power = 0.9) to get some feel of how the formula works. the sample size needed in each group is = 15. and the cost of recruiting subjects. that is. and where the standard deviation of the response variable is know to be (I.
Statistical Solutions Ltd. they rarely need statistics or statisticians. 5. Guess the professor! 2. rm that but it does still matter because evidencecan be discounted as an artifact gation. However. and it is that such calculations are often little more than “a guess masquerading as is mathematics” (see Senn. www. EXERCISES 1. 2 tended to prefer Pepsi Cola to Coca Cola. p of sampling will not be particularly persuasive. The PepsiCola Company carried out research to determine whether people . money. Participants were asked to taste two glasses of cola and then state which they prefemd.STATlSTlCS I PSYCHOLOGY N 19 understandable: if the variability response increases. in theoflight the researcher knowsa what is “practical” sample size respect to time.g. Experiments lead to the clearest conclusions about causal relationships. of course. one note of caution might be appropriate here. SUMMARY 1. Variable type often determines the most appropriate method of analysis. A good statistical analysis should highlight thoseaspects of the data that be are relevant to the psychological arguments. However. confidence intervalsoften more informative.we to If seek a bigger difference. Statistical principles central to most aspects of a psychological investiare 2.. . values should are not be taken too seriously. and on.9. then other things being of our equal.statsolusa. 1997).. with of 1. 1. 4. and do resistant to criticisms. Thus reported sample in so size calculations should frequently be taken a large pinch salt.com). . we ought ableto find it with fewer subjects. and nowadays software is widely available for sample size determination in many situations (e. 6. psychological arguments. Data and their associated statistical analyses form the evidential paas of 3 Significance testingis far f o the be alland end allof statistical analyses. 1 “If psychologists carry out their experiments properly. A wellknown professor of experimental psychology once told the author. so clearly and fairly.” 1. we ought to need more subjects come to a reasonable conclusion. as in practice there often a tendency to “adjust” the difference sought and the power. nQuery Advisor. The two glasses were not . Comment on his or her remark. many different formulas for sample size determination. to be In practice there are.
explanations for the observed difference. is is 4. marry. unrewarded millions without whom statistics would be a bankrupt science. a million deathsa statistic. After about 15 minutes you e your headache has gone and r m to work. Suppose you develop a headache while working for hours at your com3 but puter (this is probably a purely hypothetical possibility. Can you infer a definite causal relationship between taking the aspirin curing the headache? and 1. Instead the Coke glass was labeled for Q and the Pepsi glass was labelledM. To understand God’s thoughts we must study statistics. The results showed that “more than half choose Pepsi over Coke” and Sandler.4. 1979. Facts speak louder than statistics. . You stop. is we who are born. 5. You cannot feed the hungry statistics. Thou shallnot sit with statisticians nor commit a Social Science. 1 . 2.Are there any alternative muck p. on 3. 6. A single death a tragedy. I a m one of the unpraised. Attribute the following quotations about statistics and/or statisticians. 1. go into another room. other than the taste the two drinks? of Explain how you would carry a study assess any alternative explanation you out to think possible.use your imagination). these are a meafor sure of his purpose. 11).20 CHAPTER 1 labeled as Pepsi or Coke obvious reasons. I a m not anumberI a m a free man. and who die in It constant ratios. 7. and take two aspirins.
The prime objective of this approach is to communicate both to ourselves and to others. Wainer. and relationships between variables. 1997. “there is no stathis tistical tool that is as powerful as a well chosen graph. 21 .1. INTRODUCTION H. Cleveland. According to Chambers. and m e y (1983). A good graph is quiet and lets data tell their the story clearly and completely. For this reason researchers who collect data are constantly encouraged by their statistical colleagues to make bothpreliminary graphical examination their data and to use a of a variety of plots and diagrams aid in the interpretationof the resultsfrom more to formal analyses. are easier to that identify and understand from graphical displays than from possible alternatives such as tables. Kleiner.2 Graphical Methods of Displaying Data 2. But just what is a graphical display? A concise description is given by ’hfte (1983). thereis considerable evidence that there are patterns in data.” and although may be a trifle exaggerated.
5. identify patterns. lines. Charts and graphs can bring out hidden facts and relationships and can stimulate. which showthe percentage people convicted in . because the essential meaning of large measures statistical data can visualized a glance (like Chambers of be at and his colleagues. Graphical techniques useful for diagnosing models and be interpreting results will dealt within later chapters.a coordinate system. words. POP CHARTS Newspapers. a wide variety of new methods for displaying data have been developed with the aim of making this particular aspect of the examination of data as informative as possible. numbers.Both can be illustrated by the of using the data shownTable 2. generally searchfor novel and and unexpected phenomena. diagnose (and criticize) models. Charts and graphs provide a comprehensive picture of a problem that makes for a more complete and better balanced understanding than could be d from tabularor textual formsof presentation. namely pie char% the barchart. Schmid may perhaps be accused prone to a little of being exaggeration here). This chapter largely concerns the graphical methods most relevant in the initial phase of data analysis. 3. 4.1.22 CHAFTER 2 Data graphics visually display measured quantities by means of the combined use of points. Tufte estimates that between 900 billion (9 x 10") and 2 trillion (2 x images of statistical graphics are printed each year.Good graphics will tellus a convincing story about will the data. Graphical techniques have evolved that will provide an overview. 2. television. Large numbers graphs may be required and computers generally of be needed to draw them for the same reasons that they are used for numerical analysis. 1. and the media in general fond of two very simple are very and graphical displays. as well as aid. The use of charts and graphs saves time. analytical thinking and investigation.2. welldesigned charts are more effective in creating interest andin appealing to the attention of the reader. Visualrelationships as portrayed by charts and graphs are more easily grasped and more easily remembered. hunt for special effectsin data. shading andcolour. Some of the advantages of graphical methods have been listed by Schmid (1954). namely that theyare fast and accurate. John Schmid's last point is reiterated by the late m e y in his observation that "the greatest value a picture when it forcesus to notice what we never expected of is to see:' During the past two decades. indicate outliers. In comparison with other types of presentation. symbols. 2.
2 16. In the corresponding bar charts (see Figure 2. 2.8 6. many A table is nearly always better than adumb pie chart.1).7 20.6 50.4 9.” An alternative display that always more useful than the pie chart often is (and preferable to a bar chart) dotplot.3shows apie chart of 10 percentages. who declares that “pie charts completely useless. I .5 Drinkers Abstainers FIG. For example. In the pie charts for drinkers and abstainers (see Figure 2.3 10. both the general and scientificof pie use (1983) comments that charts has been severely criticized.” and more of recently by Wainer (1997).3 44. the only worse design than a pie chart is several of them .6 11. of five different typesof crime. Pie charts for drinker and abstainer crime percentages. pie ae r by Bertin(1981).2).G W H l C A L METHODS OF DISPLAYING DATA TABLE 2. Figure 2.” A similar lack of affection is shown .4 Cleveland (1994). percentages are represented by rectangles of appropriate size placed along a horizontal axis. . the sections of the circle have areas proportional to the observed percentages.6 23. who claims that “pie charts are the least useful all graphical forms.1 Crime Ratesfor Drinkers and Abstainers 23 Drinkers Crime Arson Rape Vtolence S1ealing Fraud 6. Figure shows . charts should never be used. Tufte “tables are preferable to graphics for small data sets. To illustrate. we first use an example from is the 2. Despite their widespread popularity.
1994.2.24 CHAPTER 2 Drinkers Abstainers Arson Rope VlOience Stealing Fraud Arson Rope Violence Stealing Fraud FIG.3.) FIG. Pie chart for IO percentages. 2. Bar charts fordrinker and abstainer crime percentages. (Reproduced withpermission from Cleveland. . 2.
the heightsof wives. HISTOGRAMS.3. AND BOX PLOTS The data given i Table 2. of Also note the thoughtprovoking title.1. Labels and dots are grouped into small units of five to reduce the chances of matching error. For example.First the percentages have a distribution with two modes (a bimodal distribution). but a more exciting application of the dot plot is provided i Carr (1998). Dot plots for the crime data in Table 2.2 shown the heights and ages of both couples a sample n in of 100 married couples.6.GRAPHICAL METHODS OF DISPLAYING DATA 25 the alternative dot plot representation the same values. simple histograms the heightsof husbands. In the former it is far easier all to see a number of properties of the data that are either not apparent at in the pie chart or only barely noticeable.1 (see Figure 2. In addition. the shapeof the pattern for the odd values as the band number increases is the same as the shape for the even values. A number of graphical displaysof the data can help. shown in Figure 2. STEMANDLEAF PLOTS. gives a n particular contrastof brain mass and body mass for 62 species of animal. and of . The diagram. Furthermore. oddnumbered bands lie around the value 8% and evennumbered bands around 12%. Pattern perception is far of more efficient for the dot plot than for the pie chart. and identifying any potentially interesting patterns is virtually impossible.5) are also more informative than the pie charts in Figure 2. each even value shifted with respect to the preceding odd by is value approximately 4%.the grouping encourages informative interpretationthe graph. 2. Assessing general features of the data is difficult with the data tabulated inthis way.
Drinkers l I I I I 0 Violence Rape Fad ru Anwn Abstainers ages. 26 FIG. D t p o sfor (a)drinker and (b)abstainer crime percento lt .5. 2.
5 FIG. Dot p o with lt from Carr.l LogIO(Brain Mass). 2.0.Bellled Marmol Alrlcan Giant Pouched Rat Rabbit Star Nosed Mole Arctic Ground Squirrel Brazilian Tapir Pig Little Brown Bat Gubnea Pig Giant Armadillo Kangaroo Mouse Lesser Shoot.o .Intelligence? Human Rnesus Monkey Baboon Chirnoanzee Othonkey w l Patas Monkey Aslan Elephant Vewet Arctlc Fox Red FOX Ground Squirrel Gray Seal Atrican Elephant Rock Hyrax:H.1. 1998).6.Tailed Shrew NineBandedArmadlllo I I f i . 2/3 LoglO(BodyMass) . Brucci Raccoon Galago Genet Donkey Goat Okapi Mole Ral Sneep Echidna Gorilla Cat Chlnchilla Tree Shrew Gray Wolf Gfratle Horse stow Loris Rock Hyrax:P.5 f f Rat Big Brown Bat Desert Hedgehog Tenrec Musk Shrew Water Opossum . positional linking (taken with permission 27 .Habessinica Phalanager Tree Hyrax Jaguar cow Mountain Beaver Earlern American Mole Yettow.
2 49 25 40 52 58 32 43 47 31 26 40 35 35 35 47 38 33 32 38 29 59 26 50 49 42 33 27 57 34 28 37 56 27 36 31 57 55 47 64 31 1809 1841 1659 1779 1616 1695 1730 1740 1685 1735 1713 1736 1799 1785 1758 1729 1720 1810 1725 1683 1585 1684 1674 1724 1630 1855 1700 1765 1700 1721 1829 1710 1745 1698 1853 1610 1680 1809 1580 1585 43 28 30 57 52 27 52 43 2 3 25 39 32 35 33 43 35 32 30 40 29 55 25 45 44 40 31 25 51 31 25 35 55 23 35 28 52 53 43 61 23 1590 1560 1620 1540 1420 1660 1610 1580 1610 1590 1610 1700 1680 1680 1630 1570 1720 1740 1600 1600 1550 1540 1640 1640 1630 1560 1580 1570 1590 1650 1670 1600 1610 1610 1670 1510 1520 1620 1530 1570 25 19 38 26 30 23 33 26 26 23 23 31 19 24 2 4 27 28 22 31 25 23 18 2 5 27 28 22 21 32 28 2 3 22 44 25 22 20 25 21 25 21 28 (Continued) .28 Heights and Ages of Married Couples CHAPTER 2 TABLE 2.
GRAPHICAL METHODS OF DISPLAYING DATA TABLE 2.2 (Continued) ~~ 29 Wife Husband's Husband's Age (years) Height (mm) ' s Age (yeam) Wifeet Height (mm) Husband's Age at First Mardage ~ 35 36 40 30 32 20 45 59 43 29 47 43 54 61 27 27 32 54 37 55 36 32 57 51 50 32 54 34 45 64 55 27 55 41 44 22 30 53 42 31 1705 1675 1735 1686 1768 1754 1739 1699 1740 1731 1755 1713 1723 1783 1749 1710 1724 1620 1764 1791 1795 1738 1666 1745 1775 1669 1700 1804 1700 1664 1753 1788 1680 1715 1755 1764 1793 1731 1713 1825 35 35 39 24 29 21 39 52 52 26 48 42 50 64 26 32 31 53 39 45 33 32 55 52 50 32 54 32 41 61 43 28 51 41 41 21 28 47 31 28 1580 1590 1670 1630 1510 1660 1610 1440 1570 1670 1730 1590 1600 1490 1660 1580 1500 1640 1650 1620 1550 1640 1560 1570 1550 1600 1660 1640 1670 1560 1760 1640 10 60 1550 1570 1590 1650 1690 1580 1590 25 22 23 27 21 19 25 27 25 24 21 20 23 26 20 24 31 20 21 29 30 25 24 24 22 20 20 22 27 24 31 23 26 22 24 21 29 31 23 28 (Continued) .
a finding that simply refle that men are on average taller than women. All the height distributions are seen to be roughly symmetrical and bell shaped.” Histograms can often misleading for displaying distributions because of their dependence on the number of classes chosen. The histogramis generally usedfor two purposes: counting and displaying the distribution of a variable. heights of husbands and wives and the height differences shown in Figure are . paaicularlywhen shown in the formof a stemandleafplot.8. Simple talliesof the observationsare usually preferable for counting. Accordingto Wilkinson (1992). see the negative part the x axis of the height difference histogram.30 TABLE 2 . however. while of retaining the values of the individual observations.9. “it is effecbe tive for neither. perhaps roughly normal? Husbands tend to be taller than their wives. 2 (Continued) CHAPTER 2 Husband’s Age (years) Husband’s Height (mm) MfGS Ase (yeam) Wife> Height (mm) Husband’s Age at First Marriage 36 56 46 34 55 44 45 48 44 59 64 34 37 54 49 63 48 64 33 52 1725 1828 1735 1760 1685 1685 1559 1705 1723 1700 1660 1681 1803 1866 1884 1705 1780 1801 1795 1669 35 55 45 34 51 39 35 45 44 47 57 33 38 59 1510 160 1660 1700 1530 1490 1580 1500 1M)o 46 60 47 55 45 47 1570 1620 1410 1560 1590 1710 1580 1690 1610 1660 1610 26 30 22 23 34 27 34 28 41 39 32 22 23 49 25 27 22 37 17 23 the difference husband and wife height may be a good way understand in to to begin the data. as described in Display Such a plot has the 2.7 and 2. Stemandleaf plots of the 2. advantage of giving an impression the shapeof a variable’s distribution. The three histograms are shownin Figures 2.1. although there are a few couples in of which the wife is taller.
l I 100 0 IW I 2W I 300 I Height difference (mm) FIG.7. 2.8.GRAPHICAL METHODS OF DISPLAYING DATA 31 1400 " l 1600 1700 1800 1SW Husband's heiphla (mm) P F? 1400 1500 1600 1703 1800 1900 Wife's heights (mm) FIG. . Histogram of height difference for L O O married couples. 2. Hlstograms of heights of husbands and their wives.
Finally. confronts causal It theories that causes y with empirical evidence as to theactual relationship between x and y. tall 2. Next. This is obtained from thefivenumber summary of a data set.12.1 highlights that there are a greater number of couples in which the husband is older than his wife. and three very short women. to the greatestof all graphical designs.11(c) the bivariate scatter of the two age variables is framed with the observations on each. might be expected. For example.1 StemandLeaf Plots CHAPTER 2 choosing asuitable pair of adjacent digits in theheights data. according n f t e (1983).1l(a). THE SlMPLE SCATTERPLOT The simplexy scatterplot has been use since least the 19th in at Century. andthe units digit.10. the trailing digit (the leaf) of each data value is written downon the line corresponding to its leading digit. than there those in which the reverse is true. the five numbers in question being the upper quartile. the plot indicates a strong As y l@). the tens digit.4. is addition. Adding the line =x to the plot. Our first scatterplot. the value data 98 would besplit as follows: Datavalue 98 Split 9/8 stem T construct the simplest form of stemandleaf display of a set of observations.32 Display 2. Despite its age. encouraging and even imploring the viewer to assess the possible causal relationship between the x plotted variables. begin o by 9 and and leaf 8 Then a separateline in thedisplay is allocated for each possiblestring of leading digits (the sremr). correlation for the two ages. . of shown In One unusually short man identified in the plot. Finally.2. The constructionof a box plot is described in Display 2. see Figure 2. shows age husband against age of of wifefor the data in Table 2. The box plots of the heights husbands and their wives are in Figure 2. the minimum.2. one rather woman is present among the wives.It links at least two variables. in are Figure 2. givenin Figure 2. the median. “split” each value betweenthe two digits. Plotting marginal and joint distributions together in this way is usually good data analysis practice. a further possibility for achieving this goal is shown in Figure 2. the lower quartile.and the maximum. A further useful graphical displayof a variable’s distributional properties is the box plot. it remains.
GRAPHICAL METHODS OF DISPLAYING DATA 33 N = 100 Median = 1727 Quartiles = 1690.5 Decimal point is 2places to theright of the colon 0 : 1 :0 0 : 32 0 : 22011333444 0 : 566666777778888999 1 : 0000011111222222222333333344444 1 : 55566666677788999999 2 : 0011233444 2 : 5667889 FIG.5. wives . Stemandleaf p o s of heightsofhusbandsand lt and of height differences. 176. 1645 Decimal point is 2 places to the right of the colon Low: 1410 14 : 24 14 : 99 15 : 0011123344 15 : 5555666667777777888888899999999 16 : 00000000111111112222333444444 16 : 555666666777778899 17 : 001234 17 : 6 N = 100 Median = 125 Quartiles = 79.9. 2. 1771.5 Decimal point is 2places to the right of the colon 15 : 6888 16 : 1223 16 : 666777778888888899 17 : 00000000001111112222222233333334444444 17 : 555556666677888899999 18 : 00001112334 18 : 5578 N = 100 Median = 1600 Quartiles = 1570.
unlike the range.the interquarrile range. upper quartile.5 Where UQ is the upper quartile. and lower quartiles can beused to define rather arbitrary butstill useful lmt . 4. The median.c  Lower adjacent value Outside value .which is constructed as follows: T construct a box plot. stars.5. A horizontal line (or some other feature) is used toindicate the position of the median in the Next. lines are drawn from each end of the box most box. lower . Median . U . are no?outside observations as defined in the minas the text. = LQ . maximum. Finally.the outside observationsare incorporated into the final diagramby representing them individually someway (lines. is the lower quartile.L and U. a "box" with ends at the lower and upper quartilesis fmt o drawn.Q QL . is not badly affected by outliers. upper. 1.X IQR. 3 median. a is a measure. quartile.t help identify possible outliers the data: i is o in U L = UQ + 1.) in 7  + Outside values Upper adjacentvalue c 5 1 n 3 c Upper quartile c . of the spread of distribution thatis quick to compute and. and IQR interquartile LQ the range.2 Constructing a Box Plot The plot is based on fivenumbersummary of a data set: 1. c Lower  quartile 1  .however. to the remote observations that. etc. minimum.34 CHAPTER 2 Display 2. Observations outsidethe limits L andU are regarded as potential outliers and identified separatelyon thebox plot (and known as outside values).5 x IQR. The distance betweenthe upper and lower quartiles. The resulting diagram schematically represents body of the data the extreme observations. 2.
13 represent couples in of in which which the husband older. see Figure y 2. viewing the scatterplotsof each pair of variables is often a useful way to begin to examine the data. There are few couples in which the wife taller than her husband. Furthermore. there are 45 plots to consider. The number of scatterplots.5. The relevant scatterplotshown in Figure 2. 2. couples a husband is younger is than his wife. 2. for example. we might be interested in examining whether is any evidence of an there age differenceheight difference relationship. is Finally. in The relationship between the heights the married couples might also be of of interest. The is points on the righthand side the linex = 0 in Figure 2. Figure2.GRAPHICAL METHODS OF DISPLAYING DATA 35 “1 I 8  Husbands 1 wives FIG. There is some indicationof a positive association.13.14(b). The age difference in married couples might be investigated by plotting i husband’s age at marriage. those to the left. only one of the 100 married couples the In is wife both older and taller than her husband. The relevant scattergram is shown in Figure 2. .15. there are far fewer couples which the wife is the older partner. but not one that is particularly strong.14(a) shows a plotof height of husband against heightof wife. The diagram clearly illustrates the tendency of men marrying late life to choose partners considerably younger than themselves. quickly becomes dauntingfor 10 variables. Again adding the line = x is informative. THE SCATTERPLOT MATRIX In a set data in which each observation involves more than variables (multiof two variate dura). The majority of marriages involve couples in which the husband is both taller and older than his wife. IO. Box plots of heights of husbands and wives. however.
36 CHAPTER 2 20 30 40 50 60 20 30 40 50 60 Husband's age (years) (a) 0 Husband's age (years) (b) ID b m m 3: : m 2 . Arranging the pairwise scatterplots in the form square grid. sewations on each variable.variablejis plotted against variable the ijth cell.. This grid has p rows and p columns. Eachthe grid's cells one of shows a scatterplot of two variables. Because the scatterplot matrix symmetric is about its diagonal. Scatterplotsof (a)ages of husbands and wives in 100 married couples: (b) with line y = x added.3 5 m F i 20 30 40 50 60 Husband's age (years) (0) FIG. despite . in in The reasonfor including both the upper and lower trianglesthe matrix. (c)enhanced with ob. usually of a known as a draughtsmun'splof or scatfeplotmatrix. each one corresponding to a different of thep variables observed. Formally. 1994). and the same iin variables also appear cellji with thex and y axes of the scatterplot interchanged. 2 . I l . a scatterplot matrixdefined as a square symmetric of bivariate is grid scatterplots (Cleveland. can help assessing all scatterplotsat in the same time.
2. showing marginal distributions of each variable. . 0 FIG. Scatterplot of wife's height.GRAPHICAL METHODS OF DISPLAYING DATA 37 1700 le00 Husbands height 0 1e00 1w)0 8.12. 0 00 0 0 E P Q. 2. Age difference of married couplesplotted against h u s band's age at marriage. against husband's height.  0 0 0 O 0 0 0 0 0 0 5la001750 1550 le00 1650 1850 1700 0 Husband's hm e FIG.13.
. F0 00 0 0 000 oo O 0 00 0 0 0 0 0 0 0 ooooo 0 m o 00 0 0 o ~ o 0 O 0 0.. 2.. ... the seeming redundancy......2... 2.... 0 : 0 F 10 I 6 I 0 Age difference of cwple I 5 I 10 I 0 FIG............ 0" rooyo O eoo 00 0 0 0 0 0 I 1550 0 0 0 1600 1850 17W 17H) 1BW I l850 Plotsof (a) husbancfsheightandwife'sheight married couples: (b) enhanced with the line y = x........... Figure.. ....... FIG...... 0 . _ w0 0 " i o 6 i o o 0 0 0 0 i e 0 0 0 0 0 0 0 0 : 0 0 0 0 0 0 0 0 0 i t .. 2............ .........14...... the scalefor the one variable lined up with or along the horizontal the vertical....... arrangement for all the variablesin the married couples' data given in Table .....that it enables a row and column to is be visually scanned to see one variable against all others..16 shows such an 2... g 0 ooo 0 0 0.. 1 E m .15.. Married couple's age difference and height difference.... 0 0 0 0 0 g B..... .......... ..38 CHAPTER 2 0 0 0 0 T P c ..... As our first illustration of a scatterplot matrix....... in 100 si 0 j 0 0 0 0 g m ii .
repeated measuresand will the subject detailed be of Such data are usually labeled 5 consideration in Chapters and 7. couples. 2. Scatterplotmatrix of heights and ages of married From this diagram the very strong relationship between age husband and age of of wife is apparent.3 is showninFigure 2. for example. as are the relative weaknesses the associations between of all other pairs of variables. and 6/60. however. the data are multivariate.GRAPHICAL METHODS OF DISPLAYING DATA 39 FIG. The implications . The diagram shows that measurements under particular pairs of lens strengths are quite strongly related.6/18.17.3. the associations are rather weak. In general. The scatterplot matrix of the data in Table2.6/36. (A lens of power a/b means as that the eye will perceive being at a feet an object that is actually positioned so at b feet. These data arise from an experiment 20 subjects had their response times measured when a light was flashed each of their eyes into through lenses of powers 6/6. shall use the data of we in which shown in Table 2.but of a very special kindthe same variable is measured under different conditions.) Measurements are in milliseconds. For a second example the useof a scatterplot matrix. 616and 6/18. for both left and right eyes.16.
6.40 TABLE W Visual Acuity and Lcns Strength CHAPTER 2 L 4 Eye subject L6/6 Right Eye R6/60 RW18 LW60 R6/36 R6/6 W18 W36 11 12 13 14 15 16 17 18 19 20 10 2 3 4 5 6 7 8 9 1 l16 110 117 115 112 113 114 119 110 119 110115 118 116 114 115 119 110 120 116 114 120 120 94 105 123 119 118 124 120 113 118 116 118 123 124 120 121 118 12 106 115 114 105 100 119 117 116 118 120 110 118 118 110 112 118 119 112 120 120122 110 112 11s 122 120 14 2 112 105 110 105 122 112 105 120 114 110 120 1122 120 110 120 118 122 110 112 I0 1 120 120 112 120 115 110 123 l24 118 117 110 105 100 117 112 120 116 117 99 105 110 115 120 119 123 118 120 119 I0 1 I IO 120 118 105 114 110 120 116 116 94 115 120 120 118 123 116 122 118 122 110 124 119 1s 1 108 l15 118 118 106 97 115 111 114 119 122 124 122 124 124 120 105 120 112 122 110 Note. Taken with permission from Crowder and Hand( 9 0. the price of ice cream each period.4 give the ice cream consumption over thirty 4week in in periods. 2.1. T display the values ofall three variables on the same diagram. we use . 2.6. and the mean temperature each o period. 19) of this pattern of associations for the analysis of these data will be taken up in Chapter 5. Ice Cream Sales The data shownin Table 2. but thereare ways in only which it can enhanced to display further variable values. The possibilities can be of be illustrated with a numberexamples. ENHANCING SCATTERPLOTS The basic scatterplot can accommodate two variables.
Sex.6. what is generally known as a bubble plot. sales of ice cream increasewith temperature but remain largely independent of price. Scatterplot matrix of visual acuity data. demonstrates that consumption remains largely constant as price varies and temperature varies below50°F.17. Height. this corresponds to a month with low consumption. Also included in the diagram are circles centered on each plotted point. of more interest. The bubble plot for the ice cream data is shown in Figure 2. The maximum consumption corresponds to the lowest price and highest temperature. despite low price and moderate temperature. 2. Weight. The diagram illustrates that price and temperature are largely unrelated and. Above this temperature. the radii .GRAPHICAL METHODS OF DISPLAYING DATA 4 1 FIG.2. One slightly odd observation is indicated in the plot.19 of the height and weight of a sample of individuals with their gender indicated. 2.18. and Pulse Rate As a further example the enhancement scatterplots. Here two variables are used to form a scatterplotin the usual way and then values of a third variable are represented by circles with radii proportional to the values and centered on the appropriate point in the scatterplot. Figure shows a plot of of 2.
in essence.282 0.319 0. and gender.277 0.277 0.265 0.265 0.374 0. butit appears to be unrelated to whether an individual smokes.327 0.359 0.286 0.282 0.284 0. Information was also available on whether or not each person in the sample smoked. weight.20. and this extra information is included in Figure 2.326 0. Figure2.470 0.437 0.344 0.292 0.277 0.393 0.443 0.42 Ice Cream Data CHAPTER 2 TABLE2.277 0.381 0.272 0.262 0.269 0.298 0.329 0.386 0.381 0.19 records for the values of four variables for each individual: height.287 0.275 0.425 0.342 0.20 of the data. "F) 41 56 63 68 69 65 61 47 32 2 4 28 26 32 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 2 5 26 27 28 29 30 0.288 0.287 0. as described in the caption.280 0.406 0.265 0.285 0.277 0.277 0.260 X2 (Mean Temperatufz. changes in pulse rate.287 0. Figure is very usefulfor obtaining an overall picture 2.268 0.256 0.267 0.376 0.270 0.277 0. Therefore.386 0.548 55 63 72 72 67 60 44 40 32 27 28 33 41 52 64 71 of these circles represent the value of the changein the pulse rate of the subject after running on the spot 1 minute.265 0.307 0. .282 0.4 Period Y(Pints per Capifa) X I(Price per Pint) 0.280 0.270 0.318 0.416 0.309 0.280 0. The increase in pulse rate after exercise greaterfor men than is clearly or not for women.
boxes. the sizes of which indicate the relative number of applicants. and OConnell (1975) analyzed the relationship between admission rate and the proportion of women applying to the various academic departments at the University of California at Berkeley. . The scatterplot of percentag ofwomenapplicantsagainstpercentageofapplicantsadmitted ii shown in Figure 2.GRAPHICAL METHODS OF DlSPLAYlNG DATA 43 8 0 B@ Odd observation I I p I I I 30 40 50 60 70 Mean temperature (F) FIG.2. 2. show a decrease in pulse rate exercise. Hammel.. Bubble p o of ice cream data.3. IS. 0men in 'h the sample. one a smoker and one a nonsmoker. butno such relationship is apparent for men.21. University Admission Rates Bickel. . lt There is perhaps a relatively weak indication of a larger increase in pulse rate among heavier women.6.
The negative correlation indicated by the scatterplot is duealmost exclusively to a trend for the large departments. of A simple coplot can be illustrated on the data from married couples given in Table 2.) 70 FIG. 2. on the husband's age. There several varieties of conditioning plot. The resulting plot is shown in Figure In this diagram . COPLOTS AND TRELLIS GRAPHICS The conditioning plot or coplot is a particularly powerful visualisation tool for studying how two variables are related.2. If only a simple scatterplot had been used here.44 180 e E m . 2. enhance the plot.19. vital information about the relationship would have been lost.7. We shall plot the wife's height against the husband's height condition 2. Scatterplot of heightandweight for male (M) and female (F) participants: the radii of the circles representa change in pulse rate._ 160 2 140 0 SO & 0 F CHAPTER 2 Q M 0 . conditional on one more other variables or are beiig held constant.22. I Indicates a decrease In pulse rate. 120 $S Height (in. the differenc are largely a matter presentation rather than of real substance.
On corresponding dependence panel. For age intervals to be matched to dependence panels.20. panels below are of is the dependencepanels. the husband’s height a is plotted against the wife’s height. the latter are examinedin order from left to right in the bottom row and again from left to right subsequent rows. height wife. whereby any graphical display. us the panel at the topthe figure known as the given panel. malesmoker: F. Here the coplot implies that the relationship in between the two height measures is roughly the same for each of the six age intervals. femalesmoker. of of age conditional on husband’s age for the in Table 2.2. We leave interpretationof data this plot to the reader! . smoking status. and of wife (circles). Each rectangle in the given panel specifies a range of values of the husband’s age. Coplots are a paaicular example of a more general form graphical display of known as trellis graphics.) 70 75 nG. f. m. those couples which the husband’s age for in is in the appropriate interval. not simply a scatterplot. of 2. and change in pulse rate: M. For example. female nonsmoker: I indicates a decrease in p l erate. Figure shows a bubble plot height of husband. scatterplot o heightandweight with additionalinforf mationabout sex. 2. malenonsmoker.GRAPHICAL METHODS OF DISPLAYING DATA 45 0 so 65 Height (in.23 can be shown for particular intervals some variable.
PROBABILITY PLOTTING Many of the statistical methods to be described in later chapters are based on of of of assuming either the normalitythe rawdata or the normalitysome aspect a is statistics fitted model. normality implies that the observations in question arise from a bell probability distribution. 1975.24.8. (Reproduced with permission from Bickel et al. As readers will (it hoped) recall from their introductory course. 2. 2.) FIG. defined explicitly the formula by a2 where p is the mean of the distribution and is its variance.46 CHAPTER 2 r 90 0 Number ofapplicantsS 40 D D 0 0 L" 1 ° ~ Percentage women applicants Scatterplot of the percentage of female applicants versus percentage of applicants admitted for 85 departments at the University of California at Berkeley.2 1. Three examples of normal distributions are shown in Figure 2. ..
.. ..... there considerable departure from linearity...... .. ..25(b) shows the corresponding plot for a set of 100 observations generated from exponential distribution.... and Figure 2. Here we concentrate on the n o m l probability plotting approach.l). . . a normal distribution with p = 0 and o = 1... distribution with a high an a degree of skewness.. Coplot husbands age..l)... Probability plots all five variables are of shown in Figure 2.. .3.. the second..... .. . 2.. . .. .. that is. ..~ e " E 00 0 S 0 0 00 0 1650 1550 1650 Husband's height (mm) 1550 16501650 1750 9 FIG... of normality... . in which the observations are plotted against a set of suitably chosen percentiles of the standard n o m l distribution.) As a first illustration of this approach. ~ .25(a) shows a normal probability plot for a set of 100 observations generated from a normal distribution..... ...... .. but allthree age variables show some evidence a departure from .. .22.. .. . ...26......... this plot should approximate a straight line. (For those readers who would like the full mathematical details behind such plots. see Display 2.GRAPHICAL METHODS OF DISPLAYING DATA Given :Husbands's age (years) 20 I 47 30 40 M W .. . of heights of husband and wife conditional o n There are avariety ways which the normality assumption might of in be checked. Figure 2... . ...... . Under the assumptionof normality. ... .. . Skewed populations are indicated by concave (left skew) and convex (right skew) plots. .. .. ....l).... . on of Now let us look at some probability plotsreal data...... ..... . .L... .. ..2.. In the first plot the points more or less lie on the required is straight line.... and here we shall use the data on married couples given in Table 2. Both height variables appear to have normal distributions.. . ..
Bubble p o ofheightofhusband.23. 2. 1500 0 0 0 l600 1700 1800 1900 hh FIG. of course. Unfortunately it is relatively easy to mislead the unwary with graphical material. shown in Figure 2. they are almost essential both in the initial phase data exploration and in the interpretation of the results from of as more formal statistical procedures.27. However. 2.48 l500 1600 1700 CHAPTER 2 1800 1900 0 1600 f 0 1600. 0 0 1400 0 1400. the increase in the breast cancer death as rate is altogether less startling. you are actually in the business deceiving your audience). This example illustratesthat undue exaggeration or compressionof the scales is best avoided when oneis drawing graphs (unless.heightofwife. GRAPHICAL DECEPTIONS AND GRAPHICAL DISASTERS In general.and lt age of wife (circles). when the data are replotted with the vertical scale beginning at zero.28. shown in Figure 2.9. of A very common distortion introduced into the graphics most popular with newspapers. and the media in general is when both dimensions of a twodimensionalfigure or icon are varied simultaneously in response to changes . television.will be seen in later chapters. for several periods over the past three decades. indeed. and not all graphical displaysare as honest as they should be! For example. conditional on age of husband. consider the plot of the death rate per million from cancerof the breast. The rate appears toshow a rather alarming increase. graphical displays of the kind described in previous sections are extremely useful in the examinationof data.
against the quantiles of a standard normal distribution. y(l). .GRAPHICAL METHODS OF DISPLAYING DATA 49 Normal Probability Plotting defined as Display 2 3 A normal probability plot involves plotting the nordered sample values.5 I 0 I I 5 10 X 15 l I 20 FIG.sd=l mean=lO. 5 y(2)..sd=5 mean=2.5 n If the data arise f o a normal distribution. . . rm be "" ".24. where usually pi = i0. the plot should approximatelylinear. . mean=5.5 y(".sd=0. Normal distributions. 2.
2.50 CHAITER 2 0 2 l 0 1 2 2 1 0 1 2 Quantiles of Standard Normal (3 Ouantlles of Standard Normal (b) FIG. .25. Normal probability plots of (a)normal data. (b) data from an exponential distribution.
I 51 .
Tufte quantifies the lie factor of a graphical display. 1955 l a 1965 1970 1975 1980 e0 FIG. Death rates from cancer of the breast where they axis does include the origin.29. 2. The examples showni Figure 2.2) . 2.28. both taken from Tufte n distortion with what he calls the (1983). Death rates from cancer of the breast where they axis does not include the origin. in a single variable.27.52 CHAPTER 2 FIG. illustrate this point. whichdefined as is lie factor = size of effect shown in graphic size of effect indata (2.
should be directly proportional to the numerical quantities represented.55 FIG. . Lie factor values close to unity show that the graphic is probably representing the underlying numbers reasonably accurately. 1. The lie factor for the shrinking doctors 2. The coefficient correlation in the righthand diagram in Figure 2.30 appears greater.. and thorough labeling should be used to defeat graphical distortion and ambiguity.GRAPHICAL METHODS OF DISPLAYING DATA 53 I IN THE BARREL.4 and (b)2.8. Label important events in the data. light ciude. Clear. is A furtherexamplegivenbyCleveland (1994).30. Write out explanations of the data on the graphic itself. leaving Saudi Arabia on Jan 1 THE SHRINKING FAMILYDOCTOR in Calnornia $14. demonstrates that even the manner in which a simple scatterplotis drawn can lead to misperceptions about data. Graphics exhibiting lie factors of (a)9. The representation numbers.29. Some suggestions avoiding graphical distortion taken from Tufte are for (1983) as follows.. The lie factor oil barrels in for the Figure 2.4 because a 454% increase is depictedas 4280%.29 is 9. 2. 2. detailed. as physically measured on the surface of the of graphic itself. andreproducedhere in Figure 2.8. The example concerns the way in which judgment about the correlationof two variables made on the basis of looking at their scatterplot can be distorted by enlarging the area in which the of points are plotted.
32 as by Cleveland in a bid to achieve greater clarity.. Figure shows the same data in Figure 2.. .. 4..31and the n also points out that manner of constructionmakes it hard to connect visually the its three values of a particular type of degree a particular discipline. . Of course. T be truthful and revealing.. : .54 CHAPTER 2 2 l0 lf \ .p** . In fact.. . : * B ... .31. and top end the righthand value the of line indicates the position bachelor’s degrees. which originally appeared Vetter (1980). . 1 4 0 X 4 ’ 5 ’ X FIG. The top end of the lefthandline indicates the valuefor doctorates. . the top end the middle of line indicates the for master’s degrees. . . .. the length middle division showing the percentage of the for master’s degrees.. . its aim is to display the percentages in of degrees awarded women in several to disciplinesof science and technology during three time periods... Itis now clear how the data are .. * .C!. An example is shown in Figure 2.31. . 2  * . change through time.show the data.. that to see for is. for Cleveland (1994) discusses other problems with diagram i Figure 2.. not all poor graphics are deliberately meant to deceive. Misjudgment o size o correlation caused by enlarging f f the plotting area. . : . ..* . . Above all else. They are just not as informative as they might be.. and the top division showing the percentage bachelor’s for is degrees. a . * . . A closer examination of the diagram reveals that thethree values of the data for each discipline during each time periodare determined by the three adjacent vertical dotted lines..30. data graphics must bear on the heart of quano titative thinking: “compared to what?” Graphics must not quote data out of context. L. a . becauseit would is imply that in most cases the percentage bachelor’s degrees given to women of generally lower than the percentage of doctorates. replotted 2. .. ..*..S* .. a little reflection shows that thisnot correct. 2. . v.. . .. . 3. *a.. .:.c. . .I. 1 .At first glance the labels suggest that the graph a standard is divided bar chart with the length the bottom division each bar showing the of of percentage for doctorates. . .
and 19761977. Each data point was from a shuttle flightin which the Orings had experienced thermal distress.” Being misled by graphical displays is usually a sobering but not a lifethreatening experience. 2.31.. On the basis of these data.: . and the vertical scale shows the number of Orings that had experienced thermal distress. sciences sciences sciences . Cleveland (1994)gives an example which using the wrong in graph contributed to a major disaster the American space program. namely the in explosion of the Challenger space shuttle and the deaths of the seven people on board. in his demand that “excellence statistical graphics consists in of complex ideas communicated with clarity. with tragic consequences.GRAPHICAL METHODS OF DISPLAYING DATA 55 h engineering and ” . . and the design allows viewers to easily see the values corresponding to each degree in each discipline through time. . . . Proportion of degrees in science and engineering earned by women in the periods 19591960. I sciences FIG.. 1980. .33. engineers studied the graph of the data shown in Figure 2.:. Edward R. However.. . (Reproducedwith permission from Vetter. Finally.. Challenger was allowed to take off when the temperature was3loF. the legend explains figure the graphsin a comprehensive and clear fashion.:: .) represented. Tufte. All all... precision and efficiency. To assess the suggestion that low temperature might affect the performance of the Orings that sealed the joints of the rocket motor.y: . 19691970. The horizontal axis shows the Oring temperature.: . sciences Agrkunurfil sciences : : : : :: . Cleveland appears to in have produced a plot that would satisfy even that doyen of graphical presentation.
(Reproduced with permission from Cleveland.56 CHAPTER 2 f FIG. . Dataplotted by space shuttle enginners the evening before the Choknger accident to determine the dependence of Oring failure on temperature. and 19761 977. 1994. Percentage of degrees earned women for three deby grees (bachelors degree. discipline each and degree indicate periods the 19691 970.Thethree 19591960. and doctorate). 2. three points foreach time periods.) 0 0 60 70 Calculated jolnt temperature (‘F) 80 FIG. 2.33. masters degree. andninedisciplines.32.
they are useful for identifying possible outliers. In addition. 4.see Figure 2. 2.34. To end the chapter on a less sombre note. They can be enhanced in a variety of to provide extra inforways mation.GRAPHICAL METHODS OF DISPLAYING DATA 57 i Calculated joint temperature (OF) M) 70 Bo FIG. 2. In some case a graphical “analysis” mayallthat is required (or merited).32. Graphical displays are an essential feature analysis empirical data. and to show that misperception andmiscommunication are certainlynotconfined to statisticalgraphics. 5. . SUMMARY 2.35. and a dependence of failure on temperature is data.10. The data for no failures are not plotted in Figure 2. They 2. which includes all the were mistaken. Here a pattern revealed. 1. be 3 Stemandleaf plots are usually more informative than histograms for dis. Box plots display much more information aboutsets and are very useful data for comparing groups. in the of playing frequency distributions. as shown by the plot in Figure does emerge. The complete s e t of (rring data. Scatterplots are the fundamental tool for examining relationships between variables.33. The engineers into volved believed that these data were irrelevant the issue of dependence.
) 6. ( The New Yorker collection I96 I Charles E. select the type of chart you want from the Graph menu. it can be edited and refined very and then making required the simply by doubleclicking chart the changes.Al Rights Reserved. define how the chart should appear. Beware graphical deception! SOFTWARE HINTS SPSS Pie charts.58 CHAPTER 2 FIG. Martin 0 l from cartoonbank. then click When you do this you will see the Charts Bar. Enter the data want to use to create the chart. For example. be 3. bar charts.2. 1. you 2. Scatterplot matrices are a useful first in examining data with more than step two variables. 7. and ie rm You enter the data you want to use in the chart.com.35. Bar dialog box appear and the required chart can constructed. the l k are easily constructedf o the Graph menu. Misperception and miscommunication are sometimes a way of life. Click Graph. Once the initial chart has been created. the first steps in producing a simple bar chart would be as follows. . and then click OK.
l y . lt c t e P o ( . S a t r l t x y .density=lO. you 2. is a widely used graphical method that is at least a century old.names=c(“amon”.. pairs. . “fraud")) [Detailed information about the function. (The true ft). choose the relevant x data setand indicate the and y variables for the plot. functions such as plot. .. Click Graph. boxplot. the following command produces a pie chart: pie(drink. of each of a groupof 44 students was asked to guess. when using the command line language. For example.) 4. The histogram is a poor method.0 . there is a powerful graphical users interface whereby a huge variety of graphs can be plotted by using either the Graph menu or the twodimensional (2D For or threedimensional (3D) graphics palettes.GRAPHICAL METHODS OF DISPLAYING DATA 59 SPLUS In SPLUS. the width of the lecture hall which they were sitting. .” Do you agree with Cleveland? Give your reasons. Choose P o Type. 22 Shortly after metric units length were officially introduced in Australia. pie. hist.1 m or 43. dotplot many.is available by using the help command.uViolence”. Enter the data want to useto create the scatterplot. 1. “Rape”. example. orany of the other functions menpie help(pie). ‘Stealing”. EXERCISES 21 According to Cleveland (1994)..] tioned above. the necessary steps to as produce a scatterplot are follows. if the vector drink contains the values of the percentage of crime rates for drinkers (see Table 2. then click 2D P o . “The histogram . width of the hall was 13. lt 3. many others beused to produce and can graphical material. They can be implemented use ofthe coplot function. Details are given in Everitt and RabeHesketh (2001). Alternatively. Coplots and trellis graphics are currently only routinely available in SPLUS. Another group 69 students in the in of same room were asked to guess the width in feet. 2 . to the nearest foot. or using the 2D 3D by by or graphical palettes with the conditioning mode on. In the LindScatterPlot dialog box that now appears. to the nearest meter. for example. maturity and ubiquitydo not guarantee But the efficiency of a tool.1).
82 72.3 2.o 69.115.illiteracy (.1 20 20 140 50 174 124 44 126 172 168 Note. life expectancy (years). why not? 2. $ @er rate (96population).1 10.4 4.09 71. is 2 .08 71.36 shows the traffic deaths in a particular area before and after 3 stricter enforcement the speed limit the police.60 Guesses in meters: 8 10 10 10 9101010 1314 13 14 14 15 15 15 15 15 16 17 17 17 18 18 20 17 Guesses in feet: 24 30 30 30 30 27 25 30 3230 35 35 36 36 36 4037 4037 40 40 40 41 41 42 42 42 43 43 42 45 45 45 46 46 48 48 47 70 75 94 55 55 63 80 60 60 11 1211 13 11 12 15 2238 25 40 27 35 11 CHAPTER 2 1516 15 16 50 44 50 32 3433 34 34 40 40 44 44 45 50 51 54 54 54 40 45 40 45 Construct suitable graphical displays both sets guesses to aid throwing for of in light on which set more accurate.8 0. .71 72.13 70.3 57. Shown in Table 2.1 0.6 53.56 68.5. population (xIOOO).4.412.average per capita income) 3.4.2 52. 7.Does the graph convince you of by that the efforts of the police have had the desired effect of reducing road traff~c deaths? If not.05 71.43 72. average numberof days per year below freezing.0 57.3 0. Variables anas follows: 1.0 50. 2.1 1.6 59.7 5.0 41.5 0.5 0.5 41. Information about 10 States in the USA TABLE 2 3 Variable State l L 3 4 S 6 7 Alabama California Iowa Mississippi New Hampshire Ohio Oregon Pennsylvania South Dakota Vermont 3615 21198 2861 2341 812 10735 2284 11860 681 472 3624 5114 4628 3098 4281 4561 4660 4449 4167 3907 2.64 7. Figure 2.percentage of high school graduates.3 62.2 1.6 .3 0.5 are the values of seven variables for 10 states in the USA.2 60.5 2.6 16.23 70.7 3. homicide rate IOOO): 6.
36.GRAPHICAL METHODS OF DISPLAYING DATA 61 enforcement 300. 2. 275 L After stricter enforcement 1955 1056 I FIG. Irafflc deathsbeforeandaftertheintroduction stricter enforcement of the speed limit. of a Mortality Rates per 1oO~ooO r m Male Suicides fo TABLE 2 6 Age Group Country 2534 4554 3544 5564 6574 Canada Israel Japan Austria France GelIllany Italy Netherlands Poland Spain Sweden Switzerland UK USA Hungary 22 9 22 29 16 28 48 7 8 26 4 28 22 10 20 27 19 19 40 25 35 65 8 11 29 7 41 34 13 22 31 10 21 52 36 41 84 11 18 36 10 46 41 15 28 34 14 31 53 47 59 81 18 20 32 16 51 50 17 33 2 4 27 49 69 56 52 107 27 28 28 22 35 41 22 37 .
6. 2. Mortality rates per 1 0 O from male suicides for a 0. Construct sidebyside box plots for the data from different age groups. .5.labeling the points by state name. 2. Construct a scatterplot matrix of the data. O O number of age groups and a number of countries are shown in Table 2. Construct a coplot of life expectancy and homicide rate conditional on the average per capita income.62 CHAPTER 2 1. and comment on what the graphic says about the data.
3
Analysis o Variance I: f The Oneway Design
3.1. INTRODUCTION
In a studyof fecundity of fruit flies, per diem fecundity (numbereggs laid per of 25 female per day the first 14 days of life) for females of each of three genetic for lines of the fruit Drosophila melanoguster was recorded. The results shown fly ae r in Table 3.1. The lines labeled and SS were selectively bred resistance and RS for for susceptibility to the pesticide,DDT, and the NS line as a nonselected control strain. Of interest here is whether the data give any evidenceof a difference in fecundity of the three strains. In this study, the effect of a single independent factor (genetic strain) on a response variable (per diem fecundity) of interest. The data arise from is is what generally known as a oneway design. The general question addressed in such a data set Do the populations giving rise to the different of the independent is, levels factor have different mean values? What statistical technique is appropriate for addressing this question?
3.2. STUDENT'S t TESTS
Most readers will recall Student's tests from their introductory statistics course t (thosewhosememoriesofstatisticaltopicsare just alittle shaky can give
63
64
TABLE 3.1 Fecundity of Fruitflies
CHAPTER 3
Resistant (RS)
Susceptible (SS)
Nonselected INS)
12.8 21.6 14.8 23.1 34.6 19.7 22.6 29.6 16.4 20.3 29.3 14.9 27.3 22.4 27.5 20.3 38.7 26.4 23.7 26.1 29.5 38.6 44.4 23.2 23.6
38.4 32.9 48.5 20.9 11.6 22.3 30.2 33.4 26.7 39.0 12.8 14.6 12.2 23.1 29.4 16.0 20.l 23.3 22.9 22.5 15.1 31.0 16.9 16.1 10.8
35.4 21.4 19.3 41.8 20.3 37.6 36.9 37.3 28.2 23.4 33.7 29.2 41.7 22.6 40.4 34.4 30.4 14.9 51.8 33.8 37.9 29.5 42.4 36.6 41.4
themselves a reminder by looking at the appropriate definition in the glossary in Appendix A). most commonly used form this test addresses the question The of of whether the means two distinct populations differ. Because interest about the of fruit fly data in Table 3.1 also involves the questionof differences in population of the pair means, is it possible that the straightforward application t test to each of strain means would provide the required answer to whether strain fecundities the two at differ? Sadly,this is an example of where putting and two together arrives the wrong answer. To explain why requires a little simple probability and algebra, and so the details are confined to Display 3.1. The results given there show that the consequences of applying a series o f t tests to a oneway design with a moderate number groups is very likely to be a claim of a significant difference, of even when, in reality, no such difference exists. Even with onlythree groups as in the fruit fly example, the separate t tests approach increases the nominal5%
ANALYSIS OF VARIANCE I: THE ONEWAY DESIGN
The Problem with Using Multiple Tests t The null hypothesisof interest is
Display 3.1
65
H~:fili 2 = * “ = f i k . =f
Suppose the hypothesisis investigated by using a series of r tests, onefor each pair of means. The total numbero f t tests needed is N = k(k 1)/2. Suppose that each t test is performed at significance level , so that for each of the a

tests,
Pr (rejecting the equality the two means given that they equal) =a. of are
Consequently,
Pr (accepting the equality the two means when they are equal) = 1 a. of
Therefore,
Pr (accepting equalityfor allN , t tests performed) = (1 a)N.
Hence, finally we have
For particular values of and for a = 0.05, this results leads to the following k numerical values:
Pr (rejecting the equality af leasr one pair ofmeans when H&, true) of is = 1 (1 alN(Psay).
P
k
2 3
N
6
4 10
1 3
45
1 (0.95)’ = 0.05 1 (0.95)’ = 0.14 1 (0.95)“ = 0.26 1 (0.95)45= 0.90


The probability of falsely rejecting the hypothesis quickly increases above the null of an nominal significance level .05. It is clear that such approach is very likely to lead to misleading conclusions. Investigators unwise enough to apply the procedure would be led to claim more statistically significant results than their data justified.
significance level by almost threefold. The messageclear: avoid multipler tests is like the plague! The appropriate methodanalysis for aoneway design oneway analysis of is the of variance.
3.3. ONEWAYANALYSIS OF VARIANCE
The phrase “analysis of variance” was coined by arguably the most famous statisit tician of the twentieth century, Sir Ronald Aylmer Fisher, who defined as “the separation of variance ascribable to one group of causesthe variance ascribfrom able to the othergroups!’ Stated another way, the analysis of variance (ANOVA)
66
CHAPTER 3
is a partitioning of the total variance in a set of data into a number of component parts, so that the relative contributions of identifiable sources of variation to the total variation in measured responses can be determined. But how does this separation or partitioningof variance help in assessing differences between means? For the oneway design, the answer question can be found by considering this to two sample variances, one which measures variation between the observations of within the groups, and the other measures the variation between the group mea If the populations corresponding to the various levels independent variable of the have the same mean for the response variable, both the sample variances describ if estimate thesame population value. However, the population means differ, variation between the sample means will be greater than that between observations within groups, and therefore the two sample variances will be estimating diferent population values. Consequently, testing for the equality of the group means requires a test the equalityof the two variances. The appropriate procedure.an of is F test. (This explanation how an F test for the equalityof two variances leads of to a test of the equalityof a set of means is not specific to the oneway design; it applies toall ANOVA designs, which should be remembered in later chapters where the explanation not repeated in detail.) is An alternative way of viewing the hypothesis test associated with the oneway design is that two alternative statistical models are being compared. In one the mean and standard deviation are the same in each population; in the other the means are different but the standard deviations are again the same. The F test assesses the plausibility of the first model. If the between group variability is p greater than expected (with, say, < .05), the second modelis to be preferred. A more formal account of both model andthe measures variation behind the of the F tests in a oneway ANOVA is given in Display 3.2. The data collected from a oneway design satisfy the following assumpto have F tions to make the test involved strictly valid.
1. The observations in each group come from a normal distribution. 2. The population variances each group are the same. of 3. The observations are independent one another. of
Taking the third of these assumptions first,is likely that in most experiments it and investigations the independence or otherwise observations will be clear of the to cut. When unrelated subjects are randomly assigned treatment groups, for example, then independence clearly holds. And when thesubjects are observed same under a number of different conditions, independence of observations is clearlyunlikely, a situation shall deal we with in Chapter 5. More problematic are situations in which the groups to be compared contain possibly related subjects, for example, pupilswithin different schools. Such data generally require special techniques (see such as multilevel modelingfor their analysis Goldstein, 1995, for details).
ANALYSIS OF VARIANCE I: THE ONEWAY DESIGN
The Oneway Analysis of Variance Model
Display 3.2
67
In a general sense, usual model considered a oneway designis that the the for
expected or subjects in particular group a come from population with a particular a average value for the response variable, with differences between the subjects within type usually referred to as a group beiig accounted for by some of random variation, "error." So the modelcan be written as
observed response= expected response error.
+
* The expected value in the population
Yij
givingrise to the observationsin the ith group for is assumed to be pi, leading to the following model the observations:
= Pi + G j ?
where yij represents the jth observation in theith group, and the represent E(] zero random error terms, assumedto be from a normal distribution with mean and variance c2. T h e hypothesis of the equality of population means now bewritten as can
Ho:w,=@2==".=
pk
= PP
leading to a new model for the observations, namely, Ytj = It Eij. There are some advantages (and, unfortunately, some disadvantages) in reformulating the model slightly, by modeling mean value for a particular the mean value of the response plus a specific population as the sum of the overall population or group effect. This leads to a linear model of the form
+
Yij=G++i+Eij,
where p represents the overall mean of the response variable,is the effecton an ai ., observation of being in theith group (i = 1,2, . . k), and againq j is a random error term, assumed be from a normal distribution with meanand variance to zero 02. When written in this way, the modeluses k 1 parameters (p, , ,a*,. . a . ,at)to describe onlyk group means. In technical termsthe model is said to be overparameterized, which causes problems becauseit is impossibleto find unique estimates for each parameterit's a bit like trying to solve simultaneous equations when there are fewer equationsthan unknowns. * This feature theusual form ofANOVA models is discussed in detail in Maxwell of and Delaney (1990), and also briefly in Chapter 6 of this book. One way of overcoming the difficulty to introduce the constraint is ai = 0, so that at = (a, a1 ak1).The number of parametersis then reduced byone as required. (Other possible constraints discussed in the exercises Chapter 6.) are in If this modelis assumed, the hypothesis of the equality of population means can be rewritten in tern of the parameters ai as
+
+ + +
Et,
Ho:a1=a2=..=ak=O,
so that underH the model assumedfor the observationsis 0
Yij
= Y +E i j
as before.
(Continued)
68
Display 3.2 (Continued)
CHAPTER 3
The total variation inthe observations, that is, the sum of squares of deviations of
[xf=Ixy=,(y,j y..)2]. observations from overall mean the response variable the of can be partionedinto that dueto difference in group means, the behveen groups sum of squares n(p,, y,,)2],where n is the number of observations each in group, and thatdue to Merences among observations within groups, within groups sum the ofsquares C;=,(yij y i , ) 2 ] . mere we have writtenout the formulasfor the various sums of squares in mathematical terms by using conventional “dot” the notation, in which adot represents summation over a particular suffix. In future chaptersdealing with more complex ANOVA situations, weshall not botherto give the explicit formulasfor sums of squares.) These sums of squares can be convertedinto between groups and within groups variances by dividing them by their appropriate degreesfreedom (see following of are text). Underthe hypothesis ofthe equality of population means, both estimates of u2.Consequently, an F test ofthe equality ofthe two variancesis alsoa test of the equality of population means. The necessary termsfor the F test are usually arrangedin ananalysis o variance f fable as follows (N = kn is the total number of observations).

[x:=,[x;=, 
Source DF Between k groups 1 W~thingr~~p~ k N (error) Total Nl

SS MS BGSS BGSS/(k 1) WGSS WGSS/(N k)

MSRO MSBG/MSWG
Here, DF is degrees of freedom; is sum ofsquares; MS is mean square; BGSS is SS of between groupssum of squares; and WGSSis within group sum squares. If H is true (and the assumptions discussedin the textare valid), the mean square . ratio (MSR) has an F distribution withk 1 and N k degrees of freedom. Although we have assumed equal number of observations, in each group this an n, in group sizes are perfectly acceptable, although account, this is not necessary; unequal in the see the relevant comments made the Summary section of chapter.


The assumptions of normality and homogeneity of variance can be assessed by informal (graphical) methods and also by more formal approaches, as the such SchapiroWilk test normality (Schapiro and for W&, 1965) and Bartlen’s test for homogeneityof variance (Everitt, 1998).My recommendation to use the informal is in (which will be illustrated the next section) and to largely ignore the formal. The reason for this recommendation is that formal tests of assumptions normalof both of ity and homogeneity are little practical relevance, because the good news is tha are are even when the population variances a little unequal and the observations a little nonnormal, the usual test is unlikelyto mislead; the test is F robust against minor departuresf o both normality and homogeneity variance, particularly rm of of of are when the numbers observations in each the groups being compared equal or approximately equal. Consequently, the computed p values will notbe greatly
ANALYSIS OF VARIANCE I: THE ONEWAY DESIGN
69
distorted, and inappropriate conclusions unlikely. Only if the departures from are either or both the normality homogeneity assumptions are extreme will there and be real cause for concern, and need to consider alternative procedures. Such gross a be of the departures will generallyvisible from appropriate graphical displays data. ANOVA approach should not be When there is convincing evidence that the usual used, thereare at least three possible alternatives.
1. Transformthe raw data to make it more suitable usual analysis; that is, for the perform the analysis not the raw data, but on the values obtained after applying on some suitable mathematical function. example, if the data are very skewed, For a logarithm transformation may help. Transformations discussed in detail by are Howell (1992) and are also involved some of the exercises this chapter and in in in Chapterrl. Transforming the data is sometimesfelt to be aickused by statisticians, a a belief thatis based on the idea that the natural scale of measurement is sacrosanct in some way. This is not really the case, and indeed some measurements pH (e.g., values) are effectively already logarithmically transformed values. However, it is almost always preferable present results in the original scaleof measurement. to (And it should be pointed out that these days there is perhaps notso much reason for psychologists to agonize over whether they should transform their data so as to meet assumptionsof normality, etc., for reasons that will be made clearin Chapter 10.) 2. Use distributionfree methods (see Chapter8). 3. Use a model that explicitly specifies more realistic assumptions than nor10). mality and homogeneity of variance (again, see Chapter
3.4. ONEWAYANALYSIS OF VARIANCE OF THE FRUIT FLY FECUNDITY DATA
It is always wise to precede formal analyses of a data set preliminary informal, with usually graphical, methods. Many possible graphical displays were described in Chapter 2. Here we shall use box plots and probability plots to make an initial examination of the fruitfly data. The box plots of the per diem fecundities the three strains of fruit fly are of shown in Figure 3.1. There is some evidence of an outlier in the resistant strain (per diem fecundity of M),and also a degree of skewness of the observations the in the susceptible strain, which may be a problem. More about distributional properties of each strain’s observations can be gleaned from the normal probability plots shown in Figure 3.2. Those for the resistant and nonselected strain suggest that normality is not an unreasonable assumption, but that the assumption is, perhaps, more doubtfulfor the susceptible strain. For now, however, we shall conveniently ignore these possible problems and proceed with an analysis of
which are themselves equal.) The ANOVA table from a oneway analysis of variance of the fruitfly data is shown in Table 3. 3. and so we canconclude that the strains do have different mean fecundities. to be described in Chapter 4. The F test for equality of mean fecundityin the three strains three has an associated p value thatis very small. from of further analyses may be undertaken to find out more detailsof which particular means differ.(A far better way of testing the normality assumption in an ANOVA involves the useof quantities known as residuals. of course. A variety of procedures known genericallyas multiple comparison techniques are now available. MULTIPLE COMPARISON TECHNIQUES When a significant result has been obtained a oneway analysis variance. These procedures all have the of retaining the aim nominal significance level at the required value when undertaking multiple tes of .2. necessarily imply that all strain means differthe equality of means hypothesis might be rejected because theofmean strain differs from the means one two of the other strains.70 CHAPTER 3 variance of the raw data. Discovering more about which requires the useof what are particular means differ a oneway design generally in known as multiple comparison techniques.5. Having concluded that the strains do differ in their average fecundity does not.
os e L e B L N 71 .
1.21 2 Wifhinstrains(Enor) 5659. (This is known as the Bonferonni correction. and the the procedure can be recommended for a results of its application to the fruit data are fly shown in Table 3.by using a Student'stest.3. Clearly some more explanation is needed.1. some real differences are very likely to be missed. is tackled by judging the value from each t test against a significance level p of a h . The disadvantage of the Bonferonni approach is that it may be highly conservative if a large number of comparisons are involved.001 mean differences.2.67 1362. two multiple comparison procedures.02 78.3.60 72 e. that is. The serieso f t tests contemplated here carried out only after a significant is F value is found in the oneway ANOVA.) Howe small number of comparisons. as discussed in Section 3. Each t test here uses the pooled estimate of error variance obtained from the observations in all groups rather thanjust the two whose means are being within groups mean square as defined compared.11 8. yes and nothe as in somewhat unhelpful answer. In this way the overallm e I error will remain close to the desired value a.2. However. of course. (Contemplating a large number of comparisons may.2 Source SS DF MS F p Between strains 681.5. This estimate is simply the in Display3. reflect a poorly designed study. The practical consequence using this procedure that eacht test of is will now have to result in a more extreme value than usual for a significant mean difference to be claimed.2? Well. the 3.where m is the numbero f t tests performed and a is the sizeof the Type I error (the significance level) that the investigator wishes to maintain in the testing process. 2.72 Oneway ANOVA Results for Fruit Fly Data CHAPTER 3 TABLE 3. this not the very of is process dismissed hopelessly misleading Section 3. The problem of inflating the m e I error. 3.) More details of the Bonferonni multiple comparison procedure are given in Display 3. The results are . We shall look at Bonferonni test andScheffi's test. Bonferonni Test The essential feature the Bonferonni approach to multiple testing compare of is to t is each pair meansin turn.
45.2. difference Lower bound Upper bound 8. and the results of applying the test to the fruit fly data are shown in Table 3. corresponding is to significance level / h . Alternatively (and preferably) a confidence interval be can constructed as  mean differencef tNr(a/h)x sJl/nl+ 1/nz. so that thef value comspondingto 72 degrees of freedom and (1/6 for ( = 0.5. Finally sz = 78.ANALYSlS OF VARIANCE l: THE ONEWAY DESIGN Dlsplay 3.74 1.4.52 14. Scheffes Multiple Comparison Procedure Scheffes approach is particularly useful when a large numbercomparisons are of to be carried out. but the critical point test in used for testing each of the t statistics differs from that used the Bonferonni procedure.60 4. where tN+(a/2m) thet value withN k degrees of freedom.3. Details of the Scheff6 test are given in Display 3. to of where m is the number of comparisons made.60 (see Table 3.63 1.3 Bonfemoni t Tests 73 The t statistic usedis where S* is the emr mean square (an estimate of u2)and nl and nz are the number of observationsin the two groups being compared. 'Interval does not contain the value also shown in graphical form in Figure3.3 are ecie In this example N  Comparison NSRS E t of mean s.(These confidence intervalsare readily availablefrom most a statistical software.W 7.4.34 15.see computer hints at the end of the chapter.05 is 2. Each observedt statistic is compared withthe value froma t distribution withN k degrees of freedom corresponding a significance level a/m rather thana . . which of themselves do not differ.Here itis clear that the mean fecundity of the nonselected strain differs from the means the other two groups. Three comparisons can be made between the pairsof means of the three strains.97 3.77 NSSS RSSS zero. I The confidence intervals calculatedas d s r b d in Display 3. 3.2).)  TABLE 3 3 Resulu fmm the Bonferonni Procedure Used on theFruit Fly Data k = 72 and n~ = nz = 25 for each pair of gmups.12 9.4 and Figure 3. The is again basedon a Student'st test.
12 1. One general point to make about multiple comparison teststhe whole.degrees of freedom correspondingto significance level a. and the conclusions identical.44 16. Consequently.) The method can again be used to construct a confidence interval for two means as  mean difference& criticd value x sJl/nl+ 1/m.85 9.. N .3. where F~I.48 1.4 Scheffes Multiple Comparison Procedure The test statistic used here onceagain the t statistic used in the Bonferunni is procedure anddescribed in Display 3. Graphical display of Bonferroni multiple comparison resuits for fruit fly data. Bonferroni meMod response variable: fecundity 2 0 l 2 FIG. of mean difference bound Lower 8.50 and the confidence intervals each comparison are is for NSRS NSSS Comparison usss Est.(More k details are given in Maxwell and Delaney. however.64 Upper bound 14. on of is it they allerr on the side safety (nonsignificance). quite possible (although always disconcerting) to find that. One further that a host is point of such tests available.+~&)]'". and statisticians usually very wary an overreliance are are of . NSRS NSSS RSSS (""""4"""" 6 I (""""4""""4 1 l l " " " " 4" " " " I I I 4 I I 6 4 4 6 8 1 0 1 2 1 4 1 6 simultaneous 95 %confidencelimits. no pair of means is judged by the multiple comparison procedure to be significantly different.each observed test statistic is compared with [(k l)Fkl. In this case.4 Scheffc's Procedure Applied to FruitFly Data Here the critical value 2.3. Display 3. although the F test in the analysis of variance is statistically significant. The confidence intervals very similar to those are arrived at by the Bonferonni are approach.W 1. 3.63 4.N&Y) F value is the with k l . is that.74 3. 1990.9 'Interval does not contain the value e o zr.74 CHAPTER 3 TABLE3.
4. the investigator may be interested priori in the specific hya pothesis that the mean of the three groups treated with caffeinediffers from the mean of the untreated group. Here. the interval constructed indicates that it is very likely that there is a difference in fingertapping performance between the "no caffeine" and "caffeine" conditions. More finger tapping takes place when subjects are given the stimulant. The essential difference between planned and unplanned comparisons (i.6. Readers need to avoid being seduced by is the ease with which multiple comparison tests canbe applied in most statistical software packages. those discussed in the previous section) is that the former can be assessed by . 3. the data shown in Table3.. which were obtained from an investigation into the effect of the stimulant caffeine on the performance of a simple task Forty male students were trained finger tapping. Such a priori planned comparisons are generally more powerful. 3. and the groups received different doses of caffeine ( . The relevant test statistic for the specific hypothesis interest can be constructed relatively of simply (see Display 3. 0 . rather than in the general hypothesis of equality of means that is usually tested in a oneway analysis of variance. Graphical display results for fruit fly data. They were then divided at ranin dom into four groupsof 10.5). more liely to reject F the null hypothesis when it is false. namely a oneway in which finding a significant F value is followed by the application of some type of multiple comparison test. 0 . man was required to carry out finger tapping and the number taps per minute was recorded. As also shown in Display 3. of because the question of interest is whether caffeine affects performance on the fingertapping task. an appropriate confidence interval canbe constructed for the comparison of interest. each 01020 l. than the usual catchall test. PLANNED COMPARISONS Most analyses of oneway designs performed in psychological research involve ANOVA the approach describedin the previous two sections. Scheff6 method response variable:fecundity FIG. But there is another way! Consider. of Scheff6 multiple comparison on such tests (the authorno exception).e. In the caffeine example. for example. and 300 m ) TWO hours after treatment. thatis.5.5.ANALYSIS OF VARIANCE I: THE ONEWAY DESIGN NSRS NSSS RSSS 75 & I I + l " " "  " ( " " ( I l l l I I l I I l 6 4 2 0 2 4 6 8 12 10 1418 16 Simultaneous 95 %confidence limits.
and 300 ml. the independent variable has levels that form an ordered scale. the data analysis is usually limited to testing the overall post null hypothesisof the equalityof the group means and subsequent hoc comif parisons by using some typeof multiple comparison procedure. The investigator moves straight to the comparisons most interest. The response variable is the number of t p per as using conventional significance levels whereas the latter require rather stringent significance levels. as in the teaching methods example. it is often possible to extend the analysis to examine the relationship of these levels to the group An in means of the dependent variable.. THE USE OF ORTHOGONAL POLYNOMIALS: TREND ANALYSIS In a oneway design where the independent variableis nominal. However. However. and not the result of hindsight after inspection of the sample means! 3. example is provided by the caffeine data .7. 200 ml.76 TABLE 3 5 Caffeine and FingerTapping Data 0 m1 1 0m1 0 CHAPTER 3 2 0m1 0 3 0m1 0 242 245 24 4 248 241 248 242 244 246 242 248 246 245 241 248 250 247 246 243 244 26 4 248 250 252 248 250 252 248 245 250 248 250 25 1 251 248 251 252 249 253 251 minute. there one caveat: of is planned comparisons have to be just that. For such data most interest will center on the presence of trends of particular types. that is. An additional difference is that when a planned comparison approach is used. omnibus analysisof variance is not required. l . Table 3 5 where the levels of the independent variable take the valuesof 0 m 100 ml. increases or decreases in the means of the dependent variable over the ordered levelsthe independent variable. Note. Such of .
5 77 Planned Comparisons The hypothesis of particular interest in the fingertapping experiment is Ho : = ~ / ~ [ P I w ~ PO + 2 w~ 3 r d + where PO. because calculated F are the statistic is actually the square of t statistic (21.4 = 3.4. The associated p value is .66.6.) . where is the numberof degrees of freedom the error U of mean square (inthis case the value36). PM)are the population means of the ml. 100 ml. This is tested as a Student’st with 36 degrees of freedom (the degrees of freedom oferror mean square). see Rosenthal and Rosnow.000043). 1/3. of The two approaches outlined exactly equivalent. . The estimate ofthis comparison ofthe fourmeans (called a contrast because the defining constants.iloo .40 = 21. ?m. and 300 ml groups.8 1/3 X 246. including the general case. will. obtained fromthe sample means.000043. because only a single degree of freedom is involved) corresponding this comparison is found simplyas to + + (1/3)21 = 95. 1.io 1/3(.684. is 244. and 0 200 ml. andso on. and s2is and i are once again theerror mean square from oneway ANOVA.im) s(1/10 + 1/30)1/2 ’ where . however.izw +. 1/3 sum to zero).41/4.57.3 113 X 250.40. . Here s2 = 4.4 . agreeing with that the t test described above.1985. (For more details. and .io.41.4. This mean squareis tested as usual against theerror mean square as an F with 1 and U degrees of freedom.8. 2. The hypothesis can be tested by usingthe following t statistic: t= .04).io = 244. An equivalent method of looking planned comparisons is first put the at to hypothesis of interest a slightly differentform from that given earlier: in Ho:~of~lw3~~04/13(*)=0. respectively. The t statistic takes the value 4.ilw = 246.ANALYSIS OF VARIANCE I: THE ONEWAY DESIGN Display 3. ~ 1 0 0 P~M). m the sample means of the four groups.6712). the The second version of assessing a particular contrast by using the F statistic.The associated the p value is very small(p = . 1/3 (3.3.68 = [4. . A corresponding 95% confidence intervalfor the mean difference between two the conditions can be constructedin theusual way to give result (5.57)2 1/10[1+ (1/3)2 (1/3Y The sum of squares (and the mean square.10. the More finger tapping occurs when caffeine is given. So F = 95. the The values of thefour sample meansare .X 248. help the explanation of orthogonal polynomials be given in in to Display 3. 1/3.2200= 248. and the conclusion that the tapping mean is in the caffeinecondition does not equal that when no caffeine is given.iZw = 250.
but how they of are used is of interest.5.6 and indicate that there is a very strong linear trend the fingertapping means over the four levels in data 3. 1992.) The particular are results for the caffeine example also shown in Display 3. (The arithmetic involved in is similar to that described for planned comparisons Display 3. How these coefficients arise is not of great importance (and anyway is outside the levelthis book). Then the children were randomly allocated to one its . Box plots for fingertapping data. (Comprehensive tablesof orthogonal polynomial coefficients corresponding to a number of different levels of the factor in a oneway design can be found in Howell. this is explained in Display 3.) Using these coefficients enables the between groups sumof squares to be partitioned into a number of singledegreeoffreedom terms.4. with the coefficients defining the required comparisons dependent on the number of levels of this variable. 3. which measures field dependence.5). 3.78 CHAPTER 3 I v N I None l l m Amounl d cenelne 2ooml 3wml FIG. of caffeine.6. thatis. each which represents the sum of of squares corresponding to a particulartrend component.6 24 of Each child completed the Embedded Figures Test (EFT). A box plot of the demonstrates the effect very clearly (Figure 3. the extent to which a person can abstract the logical structure of a problem from context. INTRODUCING ANALYSIS OF COVARIANCE The data shown in Table were collected from a sample firstgrade children.8. trends can be assessed relatively simply by using what are known orthogonal as polynomials. These correspond to particular comparisons among the levels of the independent variable.
quadratic trend.40 158.00 0.3 +3 X 250. If the trend is nonlinear.6 The Use of Orthogonal Polynomials When the levels of the independent variable a series of ordered steps.Display 3.001 c. and on.71 .) These coefficients can beused to produce a partition of the between groups of sums to squares into singledepoffreedom sums of squares corresponding the trend components. + 79 .41’ + + + (3)21 246.83 174.49 13.62 Cubic 1 0.83. = [3 X 244.83 39. 1/10t(3)2 + 1 248. what degree equation (polynomial) is required? The simplest wayto approach these and similar questions the use of is by orthogonal polynomial contrasts. and cubic effects add to the between groupssum of squares.50 Note that the sum of squaresof linear.97 within 36 4. quadratic.29 Linear 1 174.2 1 Quartic 4 1 6 4 1 (Note that these coefficients only appropriatefor equally spaced levels of the are independent variable.62 1 0. is Source DF SS MS CaffeineLevels175.. These sums of squaresare found by using the approach described for planned comparisonsin Display 3.small part of A such a table shown here.Do the means of the treatment groups increase linear fashion withan increase in a in the level ofthe independent variable? 2. 1. is Level 3 2 4 5 / Trend 1 c 1 0 1 3 Lmear Quadratic 1 2 1 4 Linear 3 1 1 3 Quadratic 1 1 1 1 Cubic 13 3 1 5 Linear 2 1 0 1 2 Quadratic 2 1 2 1 2 Cubic 1 2 0 .00 F p c.4 X The resulting ANOVA table for the fingertapping example as follows.00 . Note also that in this example the difference in means ofthe fourordered the caffeine levelsis dominated by the linear effect. Is the trend just linear or is there evidenceof nonlinearity? any 3.62. For example.62 0.47 3 58.In particular the following questions would ofinterest. The coefficients are available from most statistical tables. Essentially. the sum of squares Corresponding to the linear trend for the caffeine data isfound as follows: S&.8 + (1) X = 174.71 Quadratic 0. representing linear trend. is often form it examine the relationship between levels andthe means of the the of interest to be response variable. Theyare defined by for of a series of coefficients specific the particular number of levels the independent variable.001 0. these correspond to particular comparisons among so the means. departures from linearity have a sum of squares of only(0.5.14 0.00) = 0.
. f the covariate. exceptin that using it in an analysis can lead to increased precision by square decreasing the estimate experimental error. These data will help introduce a technique known us to as analysis ofcovariance. taking account the varying field dependence into the children. and the “comer group” were told to begin with a comer of three blocks. in the analysis variance. because any difference noted will not apply for other valuesof the covariate.80 WISC Blocks Data CHAPTER 3 TABLE 3. The experimenter was interested in whether the different instructions produced any change average in the of time to complete the pattern. covariates. As described.7. after allowingfor the possible effect of the covariate. They were timed they constructed a3 x 3 pattern as from nine colored blocks.The analysis covariance of tests for group mean differences in the response variable. or. I there is an interaction. that the within group mean of is. The covariate is not in itself of experimental interest. the analysis of of covariance assumes that the slopes the lines relating response and covariate ae the same in each group. The two groups differed in the instructions they were for the task given the “row group” were told to start with a row of three blocks. more generally. which allows comparison of group means after “adjusting” for what are called concomitant variables. The effect the covariate allowed for by assuming a of of is linear relationship between the response variable and the covariate. Details of the analysis of covariance modelare given in Display3. taken from the Wechsler Intelligencefor Children Scale (WISC).6 Row Group lime Comer Gmup EFT 59 33 49 69 65 26 29 62 31 139 74 31 lime EFT 48 23 9 128 44 49 87 43 55 58 113 l 464 317 525 298 491 1% 268 372 370 739 430 410 342 222 219 513 295 285 408 543 298 494 317 407 of two experimental groups. thatthere is no interaction between the groups and r is. then it does not make sense to compare the groupsat a single value of the covariate.
Display 3.7 The Analysis of CovarianceModel In general terms the model is observation = mean group effect covariate effect error. t h then the model assumedis + + + Yij = N + W B h j E) € i j + where B is the regression coefficient linking response variable and covariate and is 2 the grand mean of the covariate values. if yij is used to denote the valueof the response variablefor the j individual in the ith group and xi/ is the valueof the covariatefor this individual. be The means of the response variable adjusted the covariateare obtained simplyas for + + adjusted group mean= group mean & r a n dmean of covariate group mean of covariate). we ignore this possible problem and analyze all the data (butsee Exercise 3 5 . Here shall conveniently . we should of to look at some graphical displaysof the data. More specifically. + H 5 3i B 5E c . Row o w Calm group Box plots of completion times for the WlSC data. Note that the regression coefficient is assumed to the same in each group. )The .2). where is the estimate of the regression coefficient in the model above. t : ... (see EFT the fitted regressions Chapter6)for time against the in each group is given in Figure 3 7 The box plots suggest a possible outlier. 3.(The remaining terms in the equation are as in the model defined in Display 3. 8 wFIG. Box plots of the recorded completion times in each experimental group shown in are Figure 3 6 and a scattergram giving . Before applying analysis covariance (ANCOVA) the WISCdata. .6.
but that there no difference between the two experimental groups in mean completion time. the conclusion from each same:it appears that field dependence is predictive of timeto completion of the task.6.. However despite the small differencein the results from the two is the is packages.7 ANCOVA Results from SPSS for WISC Blocks Data Source ss 109991 11743 245531 11692 DF 1 MS F P F e d dependence il &UP Error 1 21 109991 11743 9. Table 3. 3.41 1. You will notice that the results similar but not identical The testfor the covariate differs. .8 the corresponding results from using are are the analysis shown.7 shows the default results from using SPSS for the ANCOVA of the SPLUSfor data in Table3.The reasonfor this difference will be made clear in Chapter 4. FIG.00 . .. .7.33 E l . In Table 3...006 . .7. showing f i t t e d regressions for each group. Calumn group / 80 EFT 80 mw 20 40 100 l20 140 Scatterplot for time against the EFT for the WiSC data.82 CHAPTER 3 TABLE 3. scatterplot suggests the slopes the two regression linesnot too dissimilar that of are for application of the ANCOVA model outlinedin Display 3.
” Indeed. Is such a conclusion sensible? An examination of Figure 3.00 0. because the ages of the two groups do not overlap and an ANOVA has . their reaction times are also likely to similar.ANALYSIS OF VARIANCE I: THE ONEWAY DESIGN TABLE 3. the useof covariance adjustment may not be worth the effort.33 Further examplesof analysis of covariance are given in the next chapter. Figure 3. however.An ANCOVA with reaction time as response and age as covariate might lead to the conclusion that reaction time does not differ in the are of two groups. rather groups formedby random allocation. Experimental precision could be increased by removing from the error term in the ANOVA that part of the residual variabilityin the response that was linearly predictable from the covariate.6 clearly shows that it is not. 1962) have suggested that “if there only a small chance difference between the is groups on the covariate.g. In other words. McNemar. because the caseof groups formedby randomization. the technique became more widely used to test hypotheses that were generally stated in such terms as “the group differences on the response are zero when the group means on the covariate are made equal. be approximately the same age. some authors (e.006 0. than The analysis covariance was originally intended be used in investigations of to in which randomization had been used to assign patients to treatment groups.”or “the group means on the response after adjustmentmean for differences on the covariate are equal. it is the use of analysis of covariance in an attempt to undo builtin differences among intact groups that causes concern.. one from each group. any in are necessarily the result chance! Such advice of group differences on the covariate is clearly unsound because more powerful tests of group differences result from the decrease in experimental error achieved when analysis covariance is used of in association with random allocation.” Such a comment rules out the very situation for which the ANCOVA was originally intended. and then in Chapter6 the modelis put in a more general context.8 ANCOVA of WISC Data by Using SPLUS 83 Source ss 110263 11743 245531 11692 DF MS F P Field dependence Group Error 21 1 9. Gradually. However. In fact. For example.43 110263 1 11743 1. given that two patients. one point about the method that should be made here concerns its use with naturally oc or intact groups.8 shows a plot of reaction time and age for psychiatric patients belonging to two distinct diagnostic groups.
3. essentially extrapolated into a region with no data. The values of the following four dependent in variables were recorded for each subject the study: anxietyphysiological anxiety in a series of heterosexual encounters. and a behavioral reh plus cognitive restructuring group. measure of social skillsin social interactions. some thought to be like has given to the use of analysis of covariance on intact groups. and readers are referred to Fleiss and Tanur(1972) and Stevens(1992) for more details.9. this by Between group differences in example could be assessed separate oneway analyses of variance on each the four dependent variables. a behavioral rehearsal group. 3. Plot of reaction time against age for two groups of psychiatric patients. There were three groups in in the study: a control group.8.84 CHAPTER 3 I 0 . appropriateness.9 were obtained from a study reported by Novince (1977). which was concerned with improving the social skills of college females and reducing their anxiety heterosexual encounters. Presumably.is this type of it problem that provoked the following somewhat acidic comment from Anderson (1963): “one may well wonder what exactly means to ask what the data would it be likeif they were not they are!” Clearly. assertiveness. alternative of An . ” 0 0 o 0 0 oo ooo 0 0 0 0 o o group2 Diagnostic Age FIG. HOTELLINGS T 2TEST AND ONEWAY MULTIVARIATE ANALYSIS OF VARIANCE The data shown in Table 3.
From Novince (1977). .9 ills ss Social Anxiety 5 4 4 Behavioral Rehearsal 5 3 4 3 4 5 5 4 5 5 4 4 5 5 5 4 4 4 4 3 4 4 5 5 4 5 4 4 3 4 3 4 4 4 3 4 5 4 5 3 4 Control Gmup 6 6 5 6 4 7 5 5 5 5 2 4 2 2 2 I 2 2 3 1 2 4 2 4 1 3 2 4 3 3 1 3 6 4 Behavioral Rehearsal Cognitive Restructuring 4 4 + 4 2 4 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 5 4 4 5 5 4 4 4 6 3 5 4 5 4 4 5 6 4 3 4 4 4 5 5 4 4 5 4 4 3 4 ~~ 3 4 Note.ANALYSIS OF VARIANCE I: THE ONEWAY DESIGN Social Skills Data 85 TABLE 3.
The ideas behindthis approach are introduced most simply by considering first the twogroup situation and the multivariate analog of Student’s t test for testing the equalityof two means.9. it is possible to have no significant differencesfor each variable tested separately but a significant TZ value.86 CHAPTER 3 and. whereasif any significant do TZ be difference is found for the separatevariables. a preferable procedure is to consider the four dependent variablessimultaneously. in many situations involving multiple dependent variables. This is particularly sensible if the variables are correlated and believed to share a common conceptual meaning. and vice versa. The test is described in Display 3. example. A number of different test statistics have been suggested that may give different results when used on the same set of data. T2test would simply reflect It might be thought that the results from Hotelling’s those of a seriesof univariate t tests.9 is detailed in Table3. namely Hotelling’s T Ztest. A number of such composite variables giving maximum separation tween groups are derived. the TZtest would be a waste of time).10. In fact. correlations betweenthem and deciding what unique information each variable beprovides. then Hotelling’s2test will also lead to the conclusion that T the population multivariate mean vectors not differ. no single test statistic can be derived that is always the most powerful for detecting all types of departures from the null hypothesis of the equality of the mean vectors of the groups. Unfortunately. Consequently. A full explanation of the differences between a univariate and multivariate testsituation involving for a just two variablesis given in Display 3.In other words. I’m afraid). when there are more than two groups to be compared. (When only two groups are involved. The conclusion tobe drawn from the T2test is that the mean vectors of the two experimental groups not do differ. although the resulting conclusion from each is often the same.) Details and formulas for the most commonly used MANOVA test statistics are given in Stevens (1992). andin the glossary in Apfor pendix A. the dependent variables considered together make sense as a group. let us say that what the procedure attempts to do is to combine the dependent variables in some optimal taking into consideration the way. When a setof dependent variables to be compared in more than groups. or MANOVA. all the proposed test statistics are equivalent to Hotelling’s T2. the real question of interest is. Does the set of variables as a whole indicate any between group differences? To answer this question requires the use of a relatively complex technique knownas multivariate analysis of variance.8 (the little adventure in matrix algebrais unavoidable. in the sensethat if no differences are found by the separatet tests. (The various test statistics are eventually transformed F statistics into . and its application to the two experimental groupsin Table 3. and these form the basis of the comparisons made. Without delving into the of technical details. is two the multivariate analog the onewayANOVA is used. this is not necessarily the case (if it were. in the multivariate situation. the statistic will also significant.
and(c) the observations are independent. they are the population mean vectors. @) the populations havethe same covariance matrix. The test statistic is when nl and n2 are the numbers of observations in the two groups and is defined D2 as D ’ = (Z: &)’S”(%1. thatis.Each test indicates that the three groups are significantly different on the set of four variables. the statistic F given by The assumptions of the test are completely analogousto those of an independent samples t test. p Note that the form of the multivariate test statistic is very similar to that of the univariate independent sample test of your introductory t course.the null hypothesis is that the p means of the first population equal the corresponding meansof the second population. null hypothesis can be the written as H : = pzr o P: where p and p contain the population means of dependent variables in the two 1 2 the groups. By introduction of some vector and matrix nomenclature.%2).+ (n2 1 ) s ~ ’ where S1 and S2 are the sample covariance matrices in each u p . where Z and Z2 are the sample mean vectors in two groups and is the estimate l the S of the assumed common covariance matrix calculated as S= (nl 1)s: +n22 nl . They are as follows: (a) the data in each populationare from a multivariate normaldistribution. p 1 I F= (nl +n2 ..degrees of freedom. Under H (and when the assumptions 0 listed below are true).this T2 result seems to imply that it is the control group that givesrise tothe differen This .8 If there are p dependent variables.ANALYSIS OF VARIANCE I: THE ONEWAY DESIGN Hotelling’s T 2Test 87 Display 3.9 are data in Table will be discussed here (the results given in Table 3.p 1)T2 of to enable p values to be calculated.11). Combined with the earlier test on the two experimental groups. it involves a difference difference between the “means” (here the means are vectors and the between themtakes into account the covariances of the variables) andan assumed common “variance”(here the variance becomesa covariance matrix). +n2 2 ) ~ has a Fisher F distribution with p and n +n2 .) Only the results applying the tests to the 3.
57 0. If we test separately whether each mean takes the valuezero.4.52 0.45 0..13 0.8130 with 4 and 17 degrees of freedom.27. The correspondingp value is 53.56 0. The sample covariance matrices the two groups are of 0. and we wish of of to test whether the population means the two variables p l and p2 are both zero.27. (Continued) SllJ.’ .21 0.33 0.21 0.25 0. P2 and s2.31 0.52) 0.62 0.4.54 0.2.09 0.82 0. Assume the mean and standard deviation thexI observations are PIand s1 of respectively. The resultingF value is0.11 0.31 0.24 0.)lJii~ where rlwlj.10 p’.31 0.18.52 0.13 0.4.33 0.44 0.37 0.11 0.47 0.9 Univariate and Multivariate Tests for muality of Meansfor TWO Variables Supposewe have a sample n observations on two variables x1 and x2.then we would use two t tests.71 0. p$ = [4. to test = 0 against p1 # 0 the appropriate test statistic is P 0 I t= The hypothesis p1 = 0 would be rejected at the a percent levelof significance if t c rlwl~m) or t tlwI+). ifP1 fell outside the interval [s~f~~~f.45 0..09.34 0.28 0.13 0. t 1 Thus the hypothesis would not be rejected i f f 1 fell within this interval.6956 and T 2= 3.fa) Sltlwl.24 0.62 0.degrees of freedom.31 0.47 0.4. D2= 0.57) ’ The combined covariance matrix is 0.34 0./l )& percent point of the distribution wi!hn .62 0.36 0.45 0. that is.88 Calculationof Hotelling’s T 2for the Data in Table 3.25 0.36.42 0.4. Display 3.44 = (0.8 The two sample mean vectors are given by CHAPTER 3 TABLE 3.82].09].37) ’ = 0.826.27.49 0.59 * Consequently. and of the x2 observations. For example..52 0.13 0. = [4. is the 1W1.
ANALYSIS OF VARIANCE 1 THE ONEWAY DESIGN : Display 3. The multivariate hypothesis[PI. the area within which the point could and the multivariate hypothesis not rejected lie is given by the AB rectangle ABCD of the diagram below.22)against rectangular axes. and isthe wrong inference. in fact. that lie withina region encompassing the line 22) MN. The point P is nor consistent with this region and. the hypothesis112 = 0 for the variable would not be rejected the x2 if mean. should rejected for this sample. where the application two separate univariate tests leads to the of rejection of the null hypothesis. it A sample giving the(Zlr22) values represented by pointQ would give the other type of mistake. When we take account of the natureof the variationof bivariate normal samples that include be on correlation. should lie reasonably close to the straight a . but the correct multivariate inference the is that hypothesis shouldno? be rejected. however. Thus a sample that gave the means2 . xI Then allpoints (XI. of the observations fell within a corresponding interval with s2 substituted for SI. If we were to plot the point (21.x2) and hence(21. (This explanation is taken with permission from Krzanowski. 1991. Hence samples consistent with the multivariate hypothesis should be represented by points (21. where and DCare of length 2Sltlaocl~+/iiwhile AD and BC are of length 2SZrlM(I~a)fi.) . Thus the inference drawn from thetwo separate univariate tests conflicts with the one drawn from a single multivariate test. Suppose. 22.0 would therefore notberejected if both 1121 O 1 these conditions were satisfied. ) line MN through theorigin marked onthe diagram. = [ . this region can shown tobe an ellipse such as that marked the be diagram. represented by the pointP would lead to ( 1 22) acceptance of the mutivariate hypothesis.9 (Continued) 89 Similarly. that the variables and x2 are moderately highly correlated.
36906 54.90 TABLE 3.002 52.001 <BO1 conclusion could be checked by using multivariate the equivalent of the multiplecomparisontestsdescribed in Section 3. that the dataare assumed to come is. CHAPTER 3 Test Name Value Corresponding F Value DFI DF2 P 56. set that is. For more details see Stevens. jointly the of variables may reliably differentiate the groups. 1992. the problem is analogous to the multiple t test problem described in Section 3. small differences on several variables may combine to produce a reliable overall difference.2. The useof a seriesof univariate tests leads a greatly inflated ?Lpeerror to 1 rate.003. Thus the multivariate test be more powerful will in this case. Although the groups notbe significantly different any of the variables may on individually. from a multivariate normal distribution and each populationis assumed to have the samecovariance matrix.36109 .00 8. 2. A multivariate analysis variance on a number of dependent variables of when there is not strong rationale for regarding them simultaneously not to be is recommended. 3.60443 Pillai 0. of (The multivariate test statistics are based on analogous assumptions to those the F tests in a univariate oneway ANOVA.00 8. there are a number rather than a series separate univariate analyses.12600 1.) In situations in which the dependent variables of interest can genuinely be of advantages in using a multivariate approach regarded as a set. 1992. of 1. for details).11 Multivariate Testsfor the Data in Table3 9 .5 (see Stevens. The univariate tests ignore important information.00 <. . However. namely the correlations among the dependent variables. and the glossary in Appendix A.00 8.00 4. it should be remembered that these advantages are genuine only if the dependent variables can honestly be regarded as a set sharing a conceptual meaning. The multivariate tests this information take into account.67980 Hotelling 5.57723 Wilk 0.
In the oneway ANOVA model considered this chapter. the levels used are are those of specific interest. HI:not all means are equal. The examples in chapter have all involved the same number of subjects this in each group. because such models are usually of greater interestin the case of more complex designs. in some sense. 2. Click Statistics. When the levelsof the independent variable form an ordered scale.= pk. SUMMARY 1. This is not a necessary condition for a oneway ANOVA.The GLMGeneral Factorial dialog box will appear. A significant test should be followed one or other multiple comparison F by test to assess which particular means differ. A oneway ANOVA is used to compare the means of k populations when k 2 3. 9. 7. 1. is 6. However. 4.ANALYSIS OF VARIANCE I: THE ONEWAY DESIGN 91 3. a unit. Several dependent variables can be treated simultaneously byMANOVA.10. the levelsof the in independent variable have been regarded asfired. The null and alternative hypotheses are Ho:pI=p2=. The test of the null hypothesis involves an F test for the equality of two COMPUTER HINTS SPSS In SPSS the first steps conducting a oneway in ANOVA are as follows. this approach is only really sensibleif the variables can truly be regarded.An alternative model in which the levels considered as a random samplefrom some populationof possible levels might also have been considered. variances. .'. the use of orthogonal polynomials often informative. click GeneralLinearModel. Including variables thatare linearly related to the response variable as covariates increases the precision study the sense leading to a more of the in of powerful F test for the equalityof means hypothesis. 8. andthenclick GLMGeneral Factorial. as 3. 5. that is. of although it is helpful because it makes depatures from the assumptions normality and homogeneityvariance even less critical than usual. although care needed in the is application of such tests. of of Some the formulas in displays require minor amendments for unequal group sizes. their description will be left until later chapters.
click ANOVA.then theANOVA dialog box will appear. and click Fixed Effects. Click Statistics. 2. 2. andthenclick GLMMultivariate. 3 Click thePlot tag of the dialog box if. Select the response variable the grouping variable. Select the multiple response variables the grouping variable. 1. descriptive statistics. MANOVA. 1. for example. a simpleoneway ANOVA can be conducted by using the following steps.92 CHAPTER 3 2. the confidence level. Click Statistics. Select the dependent variables the grouping variable from the relevant and data set. SPLUS In SPLUS. and click Multiple Comparisons to get the Multiple Comparisons dialog. the basic steps are as follows. and 3. and click MANOVA togetthe MANOVA dialog. and so on. an ANOVA can be undertaken by means of the command line approach using the function. sometype of residual .for example. 3. thefruit fly data (see Table 3. The GLMMultivariate dialog box becomes available. I. Click Statistics. click General Linear Model. click ANOVA. Click on Options to choose any particular options required such homoas of so geneity tests. plot is required. now 2.1) BOV f . as 1. Select the resulting analysis variance object saved from previous analof a ysis of variance and select the method required. 2. Click Statistics. and are Multiple comparison test conducted by using the following. Click Options to refine the analysis and select things such as estimates and post hoc tests. are To conduct a oneway 1. A oneway MANOVA can be undertaken follows. and on. In SPLUS. estimates effect sizes. click Multivariate. Choose the relevant set and select the dependent variable and the grou data ing (independent) variable.
5 5.0 8. A multivariate analysisof variance is available by using themanow function.6 7.ANALYSlS OF VARIANCE l: THE ONEWAY DESIGN 9 3 were storedin a data frame called with variable names fruit group and fecund.2.5 3.4 7.0 6. The experimenter selected kneejoint angles TABLE 3.2 11. The multiple comparison procedure produces a useful graphical display results of the (see Figures3.9 7.uk) 3.2 3. Efficiency was measured in terms of the distance pedalled on three an ergocycle until exhaustion. EXERCISES 3.8 9.0 3.1 6. The datagiven in Table 3.0 7.dat~~uit~.m s exercise should be repeated subsequent chapters.kclac. in and those readers finding any differences between their results are invited and mine to email me atb.1 5.4).4 8.3 3.12 were collectedin an investigation described by Kapor (1981).8 3. in which the effect of kneejoint angle on the efficiency of cycling was studied.6 10. Reproduce all the results given inchapter by using your favorite piece this of statistical software. Multiple comparisons can be appliedby using the multicomp function.8 3.0 6.6 8.5 3.8 10.even'rr@iop.4 10.1 4.6 .3 and 3.1.12 The Effect of KneeJoint Angle the Efficiency on of Cycling: TotalDistance Covered (km) 8.3 4.2 4.group.a by oneway ANOVA could be carried out using the command aov(fecund^.8 3.2 3.
The drag of the ergocycle was kept constant at N. Derive the appropriate analysis variance tablefor the data. 3. for patients undergoing wisdom tooth extraction by one of three TABLE 3. Investigate the mean differences in more detail using a suitable multiple comparisons test. 1. Suggest suitable estimatorsfor the parameters p and ui in the oneway ANOVA model given in Display3.14 show the anxiety scores. Cany out aone way analysis of variance of the data.94 CHAPTER 3 of particular interest: 50". after being moved a fixed distance away. 2.13 were collectedin an investigationof maternal ben havior of laboratory r t . 1. Cany out an initial graphical inspection the datato assess whether there of are any aspects of the observations that might be a cause for concern in later analyses.3. The datai Table 3.5.7 to pedal at a constantspeed of 20 kmih. and 90". Use an orthogonal polynomial approach to investigate whether there is any evidence of a linearor quadratic trend in the group means.and subjects were instructed 14.2. 3. Use your suggested estimators on the fruit fly data. 70".4.The response variable was the time (in seconds) require as for the mother to retrieve the pupto the nest. 3. Thirty subjects were available for the experiment and 10 subjects were randomly allocated to each angle. both before and after the operation.13 Maternal Behavior in Rats Age of P p u 5 Days 20 Days 35 Days 15 10 2 5 15 20 18 30 15 20 25 23 20 4 0 35 50 43 45 40 . 2. of 3. In the study. The data in Table 3. pups of different ages were used.
0 31.9 32.8 34.0 30. The ages of the patients (who were all women) were also recorded.0 31.1 33.4 32.5 32.0 29.1 31.ANALYSIS OF VARIANCE I: THE ONEWAY DESIGN Anxiety Scores for Patients Having Wisdom Teeth Exlracted Method of Exrmction 95 TABLE 3. to 1.3 32.0 36.5 33.1 30.7 34.6 32.2 31.0 34. Carry out a onewayANOVA on the anxiety scores on discharge.0 34. .0 35.2 34. Construct some scatterplots to assess informally whether the relationship between anxiety score on discharge and anxiety score prior to the operation is the same in each treatment group.9 34.0 32.0 34.5 32.2 31.1 31.8 32.0 30.2 30.1 29.0 31.0 29.1 31.14 Age (yeats) Initial Atuiety Final Anxiety Method 1 Method 2 Method 3 27 32 23 28 30 35 32 21 26 27 29 29 31 36 23 26 22 20 28 32 33 35 21 28 27 23 25 26 27 29 30.0 33. Patients were randomly assigned the three methods.9 28.5 34.0 35.6 33.0 32.0 29. and whether the relationship is linear.0 34.4 31.1 34.2 35.0 34.8 32.6 32.3 32.0 29.0 34.2 33.9 32. 2.0 methods.8 36.3 33.9 33.0 30.9 30.4 30.
5.96 CHAF’TER 3 3.6. Show that when only a single variable is involved. carry out anANCOVA of the anxiety scoreson hn discharge. Suggest a suitable model for an ANCOVA of anxiety score on discharge of by using the two covariates anxiety score prior to the operation and age. . the test statistic for Hotelling’s T2 given in Display 3.7.8 is equivalent to the test statistic used in an independent samplest test. using anxiety scores priorthe operationas a covariate. 3. of Use the three T2 values and the Bonferroni correction procedure assess which to groups differ. Reanalyze the S C data in Table after removing any observation you W 3. Carry out the appropriate analysis. 3.Calculate the adjusted group means for time to completion by using thefonnula given in Display 3. Comment on the difference between the analyses in steps 4.7. Carry outan ANOVA of the difference between the two anxiety scores. Apply Hotelling’s T 2 test to each pair groups in the data in Table 3.9. 3. 3 and 6.6 as feel might reasonably be labeled an outlier. to 4.8. If you t i k it is justified.
2. is that as we shall demonstrate the next section. F test At first sight the results appear a little curious. being treated with different tranquillizers. and alsix groups of observations inTable4. and neither is that for the 97 . Because these questions are the samethose considered in the previous chapter. the data shown in Table 4. in 4.2. Consider. The answer is that such analyses would omit an aspect of a factorial design often very importaut.1.4 Analysis of Variance 1: 1 Factorial Designs 4. taken from Lindman (1974). INTERACTIONS IN A FACTORIAL DESIGN The results of aoneway analysis (1)psychiatric categories only. and similarlyfor the different drugs. as the reader might enquire why do not simply apply a onewayANOVA sepawe rately to the datafor diagostic categoriesand for drugs. The questions or of interest about these data concern the equality otherwise of the average improvement in the two diagostic categories. tranquillizer of (2) (3) l are drugs only. for example. Such an arrangement is usually referred to as afucroriul design.1 shown inTable4.1. which gives the improvement scores made by patients with one of two diagnoses. The for the equalityof the means of the psychiatric category is not significant. INTRODUCTION Many experimentsin psychology involve the simultaneous study of the effects of two or more factors on aresponse variable.
making of for a of of of However. groups sumof squares for the six groups combinedh s five degrees of freedom. the F test for the six groups togetheris significant.83 16 2 15 18.90 44.05 18.28 106.07 0.1 Improvement Scores for ' b o Psychiatric Categories and Three TranquillizerDrugs CHAPTER 4 Dm8 Category El 4 6 8 8 4 0 10 6 14 B2 E3 A1 8 10 6 0 4 2 A2 15 9 12 TABLE 4. That correspondingto psychiatric categoriesh s of a a single degree freedom and that drugs h s two degrees freedom.clue to the cause the problem provided by considering the degrees freedom associated with each the three of of between groups sum squares. Nevertheless.2 ANOVA of Improvement Scores in Table 4. the between a total three degrees freedom for the separate analyses.98 TABLE 4.89 5 12 212.05 289. a .20 Drugs 20.11 274.32 1.67 tranquillizer drug means.45Groups Error 8. indicating a difference between means that h snot been detected six the a A of is by the separate oneway analyses of variance.68 22.1 Sourre ss DF MS F P Psychiatric Categories Only Diagosis 1 Error Drugs Only 1.31 Error AN Sx Gmups i 42.
v / represents the interaction effect. An example with more than factors will discussed two be in the next section. be Consequently. Assume thereare a n subjects ineach cell of the design. AB term for factor and the interaction effect for the particular combination. and the analysis a factorial design of.ANALYSIS O F VARIANCE 1: FACTORIAL DESIGNS 1 99 The separate oneway analyses of variance appear to have omitted some aspect of the variationin the data.Notice that this model containsa term to represent the possible interaction between factors. 4.5). What been left out is the effect the combination has of of of factor levels thatis not predictable from the sum the effectsof the two factors as the separately. Model for a 'IboWay Design with Factors A and B The observed valueof the response variable for each subjectassumed tobe of the is form Display 4. have to allowfor the possibility of such interaction effects. (see The model assumed for the observations can now written as be Yijt = fi +ai +p] + X j + B i l k * where represents the overall mean. Such WOway designs are considered in this section. Both the model for.1 observed response = expected response error. a normal distribution with mean zero and variance (Continued) . ai is the effect on an observationof being in the ith levelof factor A. + + + More specifically. and so both factors andtheir interaction are regarded as having fixed effects Section 4. lety i j k represent thekth observation inthejth levelof factor B (with b levels) and the ith levelof factor A (with levels).1. is the corresponding effect for thejtb levelof factor B. the corresponding of is B. an effect usually known interaction between the factors. TWOWAY DESIGNS Many aspects of the modeling and analysis of factorial designs canconveniently be illustrated by using examples with only two factors. andj k is a random error term assumed to have i ei a*. + The expected valueof the response variable for a subjectassumed to be made up is of an effect for the level A under which the subject observed.3. The general model for a twoway design is described in Display 4.The levels of A and B are assumed to be of particular interest to the experimenter. the model above can rewritten as observed response= overall population mean factor A effect factor B effect +AB interaction effect error.
The total variationin theobservation is partioned into that causedby differences between the levels the factor that caused by differences between levels of of A. (The of of detailed interpretationof interaction effectswill be discussed later.rrorMS wcss ASS (“1) (bI) F . The hypotheses of mterest. that is. factor the factor and the interaction the A. for example. see Maxwell and Delaney (1990).1to The results applying the model described in Display the psychiatric data in Table 4. I Under each hypothesis.1 (Continued) CHAPTER 4 Formulated in this way.The error mean square is also an estimate of a*but one that does not rely the on truth or otherwise of any the null of hypotheses. namely no factor effect. = 0. that caused by interaction A and B. The total variation in the observations can be partitioned into parts due each to factor and a part due their interactionas shown. For furtherdetails.m p b = 0. imply the following aboutthe parameters of the model: XI=. @ homogeneity of variance. = yab = 0. and no A B interaction.1 are shown in Table The significant interaction now explains 4. Consequently.3.(i 6nl The F tests are valid underthe following assumptions:(a) normality of the error ) and t e r n in the model. each mean square ratio can be testedas an F statistic to assess each of the null hypotheses specified. XI=. (c) independence of the observations for each subject. l mean squares are d estimators of u2. a2 = H’ :@I = h = i) H?’): yI1 = = . H?) : a ..the main eflects model. a different model assumed to be adequateto describe the is observations.= a. 9 x.100 Display 4. to of 4.MSNerror MS USS . the model has too many parameters constraints haveto and be introduced as explained in Chapter 3. underHTu) model is this Yilt = P +ai +P]+Pi]t. and that caused differences the of hy among observationsin the same cell. . *. Under the hull hypotheses given above. The analysis of variance table the modelis for Source A ss ASS DF U  B AB BSS ABSS WCSS 1 b. The most common method tor q i ethat is eur b ai = pl = the parameters this fixedeffects modelare such that CLIy11 = yii = 0. .=. no factor effect.1 (U l)(b MS  Error (within cells) ab(n 1) 1) . B.MSB/error MS MSAB/e. the factor B. tern the previously confusing results the separate oneway analysesvariance.) .
72 Error 2 2 12 8.1. These data arise from an experiment in which rats were randomly assigned to receive each combination of one of four drugs andone of three diets for a week.15 144 72 8.006 Rat Data: Times to Run a Maze (seconds) TABLE 4.3.04 18 106 MS F P 48 A B AxB 1 18 24 0.83 0.3 101 %oWay ANOVA of Psychiahic Improvement Scores Source DF ss 2.11 0. and thentheir timeto negotiate a standard maze wasrecorded. the data in Table 4.18 2. four rats were used . Rat Data To providea further example of a twoway ANOVA.ANALYSIS O F VARIANCE 1: FACTORIAL DESIGNS 1 TABLE 4.4 will be used.4 Dnrg dl d2 d3 31 D1 22 45 46 36 29 40 21 18 82 D2 23 43 30 37 110 88 29 72 45 63 76 45 71 66 62 23 92 61 49 124 44 38 23 25 24 22 43 D3 35 31 40 D4 30 36 31 33 56 102 71 38 4.
102
CHAPTER 4
for each dietdrug combination. The ANOVA table for the data and the means cell 4.5. and standard deviations are shown in TableThe observed standard deviations in this table shed some doubtthe acceptability the homogeneity assumption, on of a problemwe shall conveniently ignore here see Exercise 4.1). The tests in (but F Table 4.5 show that thereevidence of a difference drug means and diet means, is in but no evidence of a drug x diet interaction. It appears that what is generally known as a main effects model provides an adequate description of these data. In other words, diet and drug (the effects)act additively on time to run the maze main The results of both the Bonferonni and Schefft? multiple comparison procedures for diet and drug are given Table 4.6. A l l the results are displayed graphically in in Figure 4.1. Both multiple comparison procedures give very similar results for this example. The mean time to run the maze appears to differfor diets 1 and 3 and for diets2 and 3 but not for diets 1 and 2. For the drugs, the average time , is different for drug and drug2 and alsofor drugs 2 and 3 1 . When the interaction effect is nonsignificantas here, the interaction mean square becomes a further estimate of the error variance,u2. Consequently, there is the sum to possibility of pooling the error of squares and interaction sum of squares provide a possibly improved error mean square based on a larger number of of freedom. In some cases the of a pooled error mean square will provide more use powerful tests of the main effects. The resultsof applying this procedure to the rat data are shown Table 4.7. Here the results are very similar to those in in given Table 4.5. Although pooling is often recommended, it can be dangerous for a number of reasons. In particular, cases may arise in which the test for interaction is nonsignificant, but in which there is in fact an interaction. As a result, the (this pooled mean square may be larger than the original error mean squarehappens for the rats data), and the difference maybe large enough for the increase in degrees of freedom to be more than offset. The net result is that the experiis really only acceptable when there menter loses rather than gains power. Pooling to are good a priori reasons for assuming no interaction, and the decision use a r pooled error term based on considerations thata e independent of the observed is data. With only four observations in each cell of the table, itis clearly not possible to use, say, normal probability plots to assess the normality requirement needed within each cellby the ANOVA F tests used in Table 4.5. In general, testing the normality assumptionin an ANOVA should be donenor on the raw data, but on deviations between the observed values and the predicted values from the orfitted model applied to the data. These predicted values from a main effects model for the rat data are shown Table 4.8; the terms calculated from in are
fitted value = grand mean
 (diet mean t grand mean).
+(drug mean  mean) grand
(4.1)
ANALYSIS OF VARIANCE II: FACTORIAL DESIGNS
TABLE 4 5 ANOVA lhble for Rat Data
Source
I 03
DF
ss
MS
F
P
Diet
DW
Diet x Drug
Error
Cell Means S f a n d a dDeviafions and
2 <.001 5165.06 23.22 10330.13 3 <.W1 3070.69 13.82 9212.06 6 1.87 416.90 2501.38 222.42 8007.25 36
Diet
.l 1
dl
d2
dl
Dl
M m
SD D2 Meall SD D3
41.25 6.95 88.0 16.08 56.75 15.67 61.00 11.28
32.00 7.53 81.5 33.63 37.50 5.69 66.75 27.09
21.00 2.16 33.5 4.65 23.50 1.29 32.50 2.65
Mean
SD D4 Mean SD
The normality assumption may now be assessed by looking at the distribution residuals. of the differences between the observed and fitted values, the socalled These terms should have a normal distribution. A normal probability plotthe of residuals from fitting a main efEects model to the rat data 4.2) shows that (Figure a number of the largest residuals some cause concern. Again, it will be left give for as an exercise (see Exercise for the reader to examine this possible problem 4.1) in more detail. (We shall have much more to say about residuals and their usein Chapter 6.)
4.3.2.
Slimming Data
Our final example of the analysisof a twoway design involves the data shown in Table 4.9.These data arise an investigation into from types of slimming regime. In
104.
CHAPTER 4
TABLE 4.6 Multiple Comparison Test Results forRat Data
Std Error
fifimate
Lower Bound
UpperBound
Diet Bonfemni’ d l 4 dl43
d243 Scheffbc d l 4 dl43
7.31 34.10 26.80 7.31 34.10 26.80
5.27 5.27 5.27 5.27 5.27 5.27
5.93 20.90 13.60 6.15 20.70 13.30
20.6 47.46 40.16 20.8 47.6b 40.36
d243
D w
Bonfemnid DlD2 D1D3 DlD4 DZD3 D2D4 D344 ScbeW DlD2 DlD3 DlD4 D2D3 D2W DSD4
36.20 7.83 22.00
14.20 14.20
28.40
6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09
53.20 24.80 39.00 11.40 2.75 31.20
19.3ob 9.17 5.oob 45.406 31.20 2.83 18.406 10.00 4.15b 46.3ob 32.10 3.69
36.20 7.83 22.00 28.40 14.20 14.20
54.1 25.7 39.9 10.6 3.6 32.0
“95% simultaneousconfidence intervals for specified linear combinations, by the Bonfemni method, critical point: 2.51 1;response variable: time. bIntervalsthat excludez r . eo ‘95% simultaneous confidenceintervals for specified linear combinations. by the Scbeffb method; critical pint: 2.553; response variable: time. d95% simultaneousconfidence intervals for specified linuv combinations. by the Bonfemni method, critical point: 2.792; response variable: time. ‘95% simultaneous confidenceintervals for specified linear combinations. by the ScheffC method, critical point: 2.932; response variable: time.
this case, thetwo factor variablesare treatment with two levels, namely whether
or not a woman was advised to use a slimming manual based on psychological behaviorist theory as an addition to the regular package offered by the clinic, and status, also with two levels, thatis, “novice” and “experienced” slimmer. The 3 dependent variable recorded was a measure of weight change over months, with negative values indicating a decrease weight. in
ANALYSIS OF VARIANCE It: FACTORIAL DESIGNS
dl& dld3 d2d3 510
1 05
j
_"e" "
l
*
I
" " "
l 
e"*" "" ""
I I I
e
" " " " " "
*
I
4
J I
1510 5 0 20 25 30 40 35 45 simultaneous 95 % confidence limits, Bonfenoni method response variable: The
I
I
50
dl& d l 4 d2d3 510
_"e""
I
*
1
" " 1
(a)
e"""*"""0 10 5
")"_ 6
1 1
" " . " "
)
0 1 1
l
30 35 40 45 simulteneous 95 % c o n f i d e n c e limb. Scheff6 method response vaflable: Tlme
DlD2 D1D3 DlD4 D2D3 D2D4 DSD4
&""C""
,
" t
(""W.I
,
DlDZ DlD3 ; ;   : & e *; I * ~ j j
" "
~
D1D4 D2D3 D2D4 D3D4
I
e"""
; l+i;, T' ljl
15 25 20
1
50
(b)
" "
(I " " " " "
&
" " " "
C+ "" ""
6
"," _
T'
l
l
" "
* " 
6 "" " + 
(""4
" "
1
60
50
40 30 20 10 0 10 20 30 40 slmultsneous Q5% ConRdonw lm .Schen6 memod i b response varlable: Tkne
50
(d)
FIG. 4.l . Multiple comparison results for rat data: (a) Bonferroni tests for diets. (b)Sheffk tests for diets, (c)Bonferroni tests for drugs. (d)Xheffk tests for drugs.
The ANOVA table and the means and standard deviations for these data are shown in Table 4.10.Here the maineffects of treatment and statusare significant but in addition there is asignificant interaction between treatmentand status. The presence of a significant interaction means that some care is needed in arriving at a sensible interpretation of the results. A plot of the four cell means as shown in Figure 4.3 is generally helpful in drawing conclusions about the significant
106
CHAPTER 4
TABLE4.7 Pooling Error and Interaction Terms in the ANOVA the Rat Data of
Soume
DF
ss
MS
F
P
c.ooo1 c.ooo1
Diet Drugs Pooled emr
5165.06 3 3070.69 9212.06 42 250.21 1058.63
20.64 12.27
TABLE 4.8 Fitted Valuesfor Main Effects Model on Rat Data
Diet
D w
Dl
D2
D3
33.10
Dl 74.17 D2 45.75 D3 59.92 D4
45.22 81.48 53.06 67.23
37.92
11.10 47.35 18.94
Note. Grand m m of observations is 47.94, diet means are dl: 61.75, dz: 54.44, d : 27.63; drug means are Dl: 31.42, D 2 67.67, 3 D3: 39.25.D :53.42. 4
interaction between the two factors. Examining the plot, we find it clear that the decrease in weight produced by giving novice slimmers access to theslimming manual is far greater than that achieved with experienced slimmers, where the reduction is negligible. Formalsignificance tests of the differences between experienced and novice slimmersfor each treatment level could be performedin general they are they are usually referred to as tests of simple efectsbut unnecessary because the interpretation a significant interaction usually clear of is from a plot of cell means. (It is never wise to perform more significance tests than are absolutely essential). The significant m i effects might be interpreted an for as indicating differencesin the average weight change novice compared with experienced slimmers, and for slimmers having access to the manual compared with those who do not. In the presence of the large interaction, however, such an interpretation is not at all helpful. The clear message from these results as is follows.
ANALYSIS OF VARIANCE 11: FACTORIAL DESIGNS
TABLE 4.9 Slimming Data
107
Qpe of S l i m r Manual Given? Novice Experienced
No manual
2.85 1.98 2.12
0.00
Manual
4.44
8.11
9.40 3.50
2.42 0.00 2.14 0.84 0.00 1.64 2.40 2.15
Note: Response variable is a measure of weightchangeover 3 months; negative values indicate a decrease in weight.
e e
2
1
0
1
2
Ouantiles of Standard Normal
FIG. 4.2. Normal probability plot of residualsfrom a main effects model fitted to the rat data.
ANOVA of Slimming Data
TABLE 4.10
Analysis of VarianceTable
Source
SS
DF
MS
F
P
21.83 1 21.83 1 25.53 25.53 c x s 20.95 0.023 6.76 20.95 1 Enor 3.10 12 37.19 Condition(C)
sau (S) tt s
7.04 0.014 8.24
0.021
Cell Means and Standard Deviations Experienced Parameter Novice
1.30
No manual Meall
1.74
1.22 6.36 2.84
1.50
SD
Manual Mean SD
1.55 1.08
... ...
No manual Manual
I
f
Novice
Experienced
status
FIG. 4.3.
Interaction plot for slimming data.
108
A suitable model a threefactor design Table results of applying this model to the blood pressure data are shown in 4. 4. The diet. In such cases. for each of the three drugs. with six being randomly allocombinations. The three treatments were follows. 3.2. and what its implications for interpreting the analysis results? First. None of the firstorder interactions are significant. is Seventytwo subjects were usedin the experiment. Just what does such an are of variance effect imply. Once again the simplest way gaining insight of here is to construct the appropriate graphical display. secondorder interaction of diet. Manualnovice slimmers win! (It might be useful to construct a confidence interval the difference in the for weight changeof novice and experienced slimmers when both have access to the 4.11. a significant secondorder interaction implies that the firstorder interaction between two of the variables differs in form or magnitude in the different levels of the remaining variable.The Table 4. drug.) manual.2.and drug main effects are all significant beyond the5% level. in which biofeedback is either present or absent. .For drug2. Here the interaction plots of diet and biofeedback.drug Y and drugZ .ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1 1.4. will help. itis generally assumed that the factors not interact so that the interaction mean square can be usedas "error" to provide F tests for the main effects. Nomanualno 109 difference in weight change of novice and experienced slimmers. but substantially reduces blood pressure when biofeedback is absent. as 1 The firstis drug.4. biofeed. in which a special diet either given or not given. and biofeedback is significant. the error mean do of freedom.the blood pressure difference of . 2.12. 2. HIGHERORDER FACTORIAL DESIGNS Maxwell and Delaney (1990) and Boniface (1995) give details of an experiment in which the effects of three treatments on blood pressure were investigated. For drug Y the situation is essentially the reverse that for drug X. see Exercise One final point to note about the twoway design that when thereis only a is square has zero degrees single observation available in each cell. These plots are shown in Figure 4. The third is diet. data The are shown in cated each to of the 12 treatment for is outlined in Display 4. with three levels. but the three way. For drug X the dietx biofeedback interaction plot indicates that diet has a negligible effect when biofeedback is supplied. drugX. The secondis 6iofeed.
interaction.11 Biofeedback Biofedback Present Absent Drug X Drug Y Drug Z Drug X Drug Y Drug Z Diet Absent 170 175 165 180 10 6 158 161 173 157 152 181 190 186 194 201 215 219 209 164 16 6 159 182 187 174 204 180 187 19 9 170 194 173 194 197 10 9 176 198 164 10 9 169 164 176 175 189 194 217 206 19 9 195 171 173 1 % 19 9 180 203 202 228 10 9 206 224 204 205 19 9 170 10 6 179 179 Diet Present 162 184 183 156 180 173 Model for a ThreeFactor Design with FactorsB. and theih level of factor A t (with a levels). thejth level offactor B (with b levels). s t . to see. and C. and are random error terms assumedto be normally qltl distributed with zero mean and varianceU*. respectively: S. Assume that thereare n subjects percell. and BC. B. and C have fixed effects. and that the factor levels are of specific interestso that A. Maxwell and Delaney.110 Blood Ressure Data CHAPTER 4 TABLE 4. In general t e r n the model is Display 4.. + + + + + + + + More specifically. Here a. B j .2 Observed response= mean factor A effect factor B effect factor C effect AB interaction AC interaction BC interaction ABC interaction error.tl representssecondoder AB.) (Continued) . (Once againthe parameters have to be constrained in some way make the model acceptable). letyilt1 represent the Ith observation in thekth level of factor C (with c levels). for details. B.0. and y represent maineffects ofA. The linear model for the observations in case is this YijU = K +Qi +B] + n +&l + +Olt Ti& +6ijt + Ciltt. and i k representfirsrorder interactions. and C A. 1990. ABC. AC.
for example.ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1 Display 4. factorial designs will become increasingly complex the number of as A factors is increased.3). The implication of a significant secondorder interaction is that there is little point in drawing conclusions about either the nonsignificant firstorder interaction or the significant main effects. be misleading to conclude on the basis the significant main effect anything about the specific of three drugs on blood pressure. is not consistent for all combinationsof diet and biofeedback. The interpretation of the data might effects of these become simpler carrying out separate twoway analyses of variance within each by drug (see Exercise 4. further problem is that the number of subjects required for . Clearly.2 111 (Continued) when the diet is given and when not is approximately equal for both levels of is it biofeedback. therefore. It would. The effect of drug.
such a design. of population of all possible levels. is now timeto consider these termsa little more detail. 1990).0 1837.5 156.5 537.12 Source ss 5202 2048 3675 32 903 259 1075 9400 DF MS F P Diet Biofeed mug Diet x biofeed Diet x drug Biofeed x drug Diet x biofeed x drug Error 1 1 2 1 2 2 2 60 5202.83 3.M .0 2048.W1 <.42 <. Consequently.88 0. 2. 1.0 451. 2. A factor is fixed if its levels are selected by a nonrandom process its levels or of consist of the entire population possible levels. economynumber of subjects required is achieved by assuming a priori that thereare no interactions between factors.73 0.5. A model is called afixed effects model if all the factorsin the design have fixed effects.112 ANOVA for Blood Pressure Data CHAPTER 4 TABLE 4. Three basic types of models can be constructed. depending on what types of factors are involved. . 1 A factor is random if its levels consistof a random sample levels from a .5 129. beginning It in with some formal definitions.20 2. A model is called a random effects model the factors in the design have if all random effects. RANDOM EFFECTS AND FIXED EFFECTS MODELS In the summary Chapter 3.W1 32. alternative designs have to be considered that are more economical in terms of subjects.4 4 .M .67 33.07 11. Perhaps the most common is the latin of these square (described in detail In in by Maxwell and Delaney. 4.04 a complete factorial design quickly becomes prohibitively large.20 13.001 . a passing reference was made of to random andfixedeffects factors.5 <.
diet absent Drug 2 diet present . 3. both in the sampling scheme in the and of interest. a suitable model such a design when both factors regarded as ranfor are 4. .ANALYSIS OF VARIANCE 11: FACTORIAL DESIGNS 113 Absent (a) Present 210 r Biofeed 7. A model is called amixed effects model if some of the factorsin the design have fixed effects and some have random effects. In fact. f 6 180 170 Present Drug Y. No doubt readers will not find wildly exciting until some explanationat this is hand as to why it is necessary to differentiate between factors this way. this dom is shown in Display The important point to note about model. diet present Absent a (C) W Drug 2.1). i compared with thatfor fixed effect factors (see Display is that the termsa. 170 Present Absent Biofeed FIG. when 4. in the philosophy behind random effects models different m that behind the is quite h used parameters use of fixed effects models. Interaction plots for blood pressure d a t a .4. diet absent DNQ Y. 4.3. The difference can be described relatively simply by using a twoway design.
and they be r can estimated as shown in Display4. ' nb MSB MSAB 8. estimator of u2 +nu: + nbu. as usual. estimator of 2. the B mean square and theAB mean square both estimate u2 + U .'. = 0 H$'):' = 0. If ' is true. The linear model for the observations is Ydjt + + + cijtl where now thecri are assumed to be random variables from a normal distribution with mean zero and variance Similarly. the various mean squares provide estimators of combinationsof the variancesof the parameters: (a)A mean square. can be obtained from a + + a = MSA MSAB . Now. : H so the ratioof the A mean square and theAB mean square provides an F test of the hypothesis. = nu MSAB errorMS 3 = 2 Y n Bj. . the ratio of the AB mean square and theerror mean square can be tested as an F statistic to provide atest of the hypothesis. If H ) is true. so i ' the ratioof the B mean square and theAB mean square provides an F test of the hypothesis Estimators of the variancesU. Bj and yil are assumed to be random U. assessing whether the main effect variances are zero now involve the interaction mean square rather than theerror mean square.'. and (d) error meansquare. however. The terms in the analysis variance tableare of calculated as for the fixed effects model described in Display but the F tests for 4.I14 CHAPTER 4 Display 4. The hypotheses of interest in the random effects model are H) a = 0 :: . the test for whether the . (b) mean square. ' HZ': U.3. normal respectively. If H$)is true. the A mean square and the AB mean square both estimate a2 nu:.'. estimator of u2 nu:. (c) AB mean B square. r The parameters of interest a ethe variances of these distributions.. As usual the 6 i j k are error terms randomly sampled from a distribution with mean zero and variance a2. estimator of U* +nu: +nu$. then the mean square and the error meansquare both estimatea2. aY The calculation of the sumsof squares and mean squares remains the same for as the fixed effects model. U. AB Consequently. and y i j a enow assumedto be random variableswith particulardistributions.3 Random EffectsModel for a 'NoWay Design with Factors and B A Let yijt represent thekth observation inthejth level of factor B (with b levels = P +(XI Sj randomly sampled from some population possible levels) and the ith level of of factor A (with a levels again randomly sampled from a population of possible levels). and . variables from normal distributions and zero means with variancesand U:.1. U. However.
The main differences between using fixed effects and random effects models are as follows.06 3070. It must be emphasized that generalizations a population of levels from the to are tests of significance for a random effects factor warranted only when the levels of of the factorare actually selectedat random from a population levels. the hypothesis thatc.13 ANOVA of Rat Data by Using a Random Effects Model DF I15 Source ss MS F P Diets Drugs Diets x drugs Enor Note. 2 c.l 1 416. planned or post hoc comparisons are no longer relevant.0616 416.69a l?:= 5165.90 1. L 12 221. In particular.15. In a random effects model. It seems unlikely that such a situation will in most psychological experiments that use hold . however. the data of in Table 4. 3.3 will again be used.13. = 3070. interest lies in the estimation and testing of variance parameters. making the unrealistic assumption that both the particular diets and a large population possible of the particular drugs used a random sample from are is shown in Table 4.90 .87 C.25 5165.06 7. 1.76. variance of the distribution the interaction parameterszero is the same that of is as used for testing the hypothesis no interaction in the fixed effects model. The resulting analysis that the variancesof both the drugand diet parameter distributions are not zero.69 416.38 36 222.001 . The F tests indicate diets and dmgs. of To illustrate the applicationarandom effects model. Because the levelsof a random effects factor have been chosen randomly. 2.1312.39 3 9212. An advantage of a random effects model that it allows the experimenter is to generalize beyond the levels of factors actually observed. the experimenter is not interested in the means of the particular levels observed. = 0 is not rejected.90 =296.37 6 2501.ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1 TABLE 4.Ooo1 10330.42 8007.
both sexes. children of Aboriginal and White children reported by Quine first. In observational studies. 4. Children who had suffered a serious illness during the year were excluded. Second. the data shown in Table 4. 4. In most experimental situations. on the basisof their mental state3 months after thebiah. namely those who had a previous history of psychiatric illness and those who did not. an aimedfor balanced design can become balanced.6. In the study. Mothers who give hospital in London were divided into two groups. we shall see in Chapters 5 and 7. although in an experimental situation the imbalance sizes likely in cell is to be small.The and third form in secondary school). A researcher may be left with an experiment having unequal numbers of observations in each cell as a result of the deathof an experimental animal. The response variable of interest was the number of days absent from school during the school year. second. although even in welldesigned experiments.In this way. depressed and not depressed. (The other variables Table 4. go wrong.15.116 CHAPTER 4 factorial designs. fixed effects models those most generally ae r as used except in particular situations. The dependent to be considered variable first is the child’s IQ at age 4. The basic design of the study is then an unequally replicated4 x 2 x 2 x 2 factorial. or because of the failure of a subject to attend a planned unexperimental session. and from as children in each age group were classified slow or average learners. the datafor the second example are shown in Table These data arise from an investigation into thea mother’s postnatal depression on child of effect biah to their firstborn child in a major teaching development. in Two examples will serve to illustrate point. consequently.15 will be used later in in this chapter. things can. FACTORIAL DESIGNS WITH UNEQUAL NUMBERS OF OBSERVATIONS IN EACH CELL More observant readers will have noticed that the examples discussedin the all in each previous sections this chapter have had the same number of observations of cellthey have been balanced designs. from four age groups (final grade in primary schools and two cultural groups were used. far larger inequalities cell size may arise.) So unbalanced factorial designs do occur. and. equality of cell sizeis the aim. this of Australian First. however. But does it matter? Why do such designs merit special consideration? Why is the usual ANOVA approach used earlier in this chapter not appropriate? The answers toall these questionslie in recognizing that in unbalanced factorial designs the factors forming the design . of course.14 come from a sociological study (1975). for example. The children’s fathers were also divided into two groups.
12 8 1.ANALYSIS OF VARIANCE II: FACTORIAL DESIGNS TABLE 4. 7 .17.13. IS.5.23. 11. 14 5. 11. 12 0.2. I .14.22 6.7 0. 22.32.81 L 5. 11.30.11.47.7.34.40.0.67 0 .21.36.9. 0 . 18. 16. 11.5.S3.13.5.10.9.36.U. S7 14.23. 14 6.11. 10.40 6. 17 3.3.27 12.14.48.6.15 0. 14.5.41. 14.20.14 Study of Australian Aboriginal a dwhite Children n 1l 7 1 2 3 4 A A M M M M M 6 7 8 9 10 5 A A A F0 F0 F1 F1 F2 A A A 12 13 14 1s 16 17 18 19 20 21 22 2 3 2 4 25 26 27 28 29 30 31 32 11 A A A A M M M F F F F F F F F A A A A F2 F3 F3 F0 F0 F1 F1 F2 F2 F3 F3 N N N N N N N N N N N N N N M M M M M M M M F F F F F F F F F0 F0 F1 F2 F2 F1 F3 F3 F0 F0 F1 F1 F2 F2 N N F 3 F3 SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL 2.0.27.69 O 25 10.46 12.5.5.15 8.32.7.5.60. 1 .53.5.s.24. 16.5.7.6.20.2.22.3.28 0. 33 5.6. U). 17.3 22.19 8.3.7. 1.25.6.13.45 5.9.36 8. 11.54 5’5.2.15 S. IS 7.0.6. 14. 2 .8.38 3 S.3.1. 17.28.s. 16.43.30.37 S.
I8 1 CHAPTER 4 TABLE 4.15 Data Obtained in a Study of the Effectof Postnatal Depressionof the Motheron a Child's Cognitive Development sa of Mother Depression Child IQ PS VS QS HuSband's Psychiatric History 1 2 3 4 5 ND ND 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 ND ND D ND ND ND ND ND ND D D ND D 103 124 124 104 96 92 124 99 92 116 99 56 65 67 52 46 46 50 55 64 61 52 42 61 63 44 46 58 55 43 68 46 43 65 50 46 ND ND ND ND ND ND ND D ND D ND ND ND D ND D ND ND ND 22 81 117 100 89 l25 l27 112 48 139 118 107 106 129 117 123 118 84 61 58 22 47 68 54 48 64 64 50 41 58 45 50 55 51 50 38 53 47 41 66 23 68 58 45 46 68 57 20 75 64 22 41 63 53 46 67 71 64 25 64 63 43 64 64 58 57 70 71 64 64 61 58 53 67 63 61 60 56 41 66 37 52 16 35 36 37 38 39 ND ND ND ND ND 117 102 141 124 110 98 109 48 66 60 50 54 64 120 l27 103 71 67 55 77 61 47 48 53 68 52 43 61 52 69 58 52 48 50 63 59 48 No history No history No history No history No history No history No history No history No history No history No history History History No history History No history No history History No history No history No history No history History No history No history No history History No history History No history No history No history No history No history No history No history No history No history No history (Continued) .
15 119 (Continued) s x of e Mother Depression Child IQ PS VS QS Husband's Psychiatric History 40 41 42 43 44 46 45 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND D ND ND D ND ND ND ND ND ND ND ND D ND 118 117 115 119 117 92 101 119 144 65 60 52 63 63 45 53 59 78 66 64 57 119 127 113 l27 103 128 86 112 115 117 110 67 61 60 67 62 56 42 48 65 67 54 63 54 59 50 99 57 70 45 62 59 48 58 54 61 48 55 58 59 55 52 59 65 62 67 58 68 48 60 28 No history No history No history 139 117 96 111 118 126 126 89 70 64 12 0 ND ND ND ND ND 134 93 115 99 99 122 106 124 52 54 58 66 69 49 56 74 47 55 50 58 55 49 66 65 45 51 58 68 48 61 72 57 45 60 62 64 66 36 49 68 46 60 59 62 41 47 66 51 47 50 62 67 55 36 50 59 45 60 54 61 48 44 64 100 114 43 61 54 58 56 56 47 74 59 68 48 55 No history No history No history No history No history No history No history No history No history No history No history No history No history No history No his~~ry No history No history No history No history No history No history No history No history No history No history No history No history No history No history No history History No history No history History No history No history No history (Continued) .ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1 TABLE 4.
(Review the 9 coverage of the chisquare test given in your introductorye and in Chapter com for details of how expected values are calculated.depressed. Clearly there are more depressed mothers wh husbands are psychiatrically ill than expected under independence.2the2 as x table of counts of children of depressed (D) and nondepressed(ND) mothers and “well” and “ill” fathers for the data table in 4.15 is in Mother Father Well ND 79 m 75(70) 4(8) D 9(14) Total 84 10 94 6(2) 15 Total The numbers in parenthesis show that the expected number of observations (rounded to the nearest whole number) under the hypothesis that the mother’ is independent of the father’s state.) is The resultsof this dependence of the factor variables that itis now nolonger nonoverlapping possible to partition the total variation response variable into in the .120 CHAPTER 4 TABLE 4.15 (Continued) Mother s xof e Depression Child IQ PS VS QS Husband‘s PsychiafricHistory 80 81 82 83 84 85 86 87 88 a9 90 91 92 93 94 Note. verbal score. ND ND ND ND ND ND ND D D ND ND ND D D ND 51 48 62 62 48 65 S8 65 50 121 119 108 110 127 118 107 123 102 110 114 118 101 121 114 64 63 66 61 48 66 61 66 58 53 52 59 58 64 60 56 71 S6 54 68 52 47 S6 50 60 50 55 47 52 47 44 64 62 63 56 62 No history No history No history No history No history No history No history No history No history History No history No history No history No history No history ND. PS. perceptual score: VS.not depressed. For example. D. Score. quantitative are no longer independent they are in a balanced design. QS.
the sums of squares of factors depend on the order in which they are considered. The result is the sum of squares that can be attributed to a factor depends on which other factors have already been allocated a sum of squares. The issues not straightforward and even statisticians (yes. for Sum of squares for ABIB. we would need to calculate for (see sums of squares above. Basically the discussion over the analysis unbalanced factorial designs has of involved the question of whether what are called sequential or I)pe I sums of squares.in an unbalanced twoway design. In terms of a twoway design with factors and B. Sum of squares AIB.) of in . For example. these sums of squares are as follows. both the Sum of squares for B. A Wpe I Sum of Squares (SequentialSums of Squares) Here the sumsof squares come from fitting the terms the model sequentially. To establish a suitable model the data later). even statisticians!) are not wholly agreed on the most suitable method of analysis for aU situations.or orthogonal sums of squares representing factor main effects and interactions. there proportion the variance is a of A of the response variable that can be attributed to (explained by) either factor or factor B. A. Sum of squares for B afterA (usually denoted BIA). that A and of is. as is witnessed by the discussion following the papers of Nelder (1977) and Aitkin (1978).) The dependence between the factor variablesunbalanced factorial design. (This point will be taken up in more detail in the discussion of regression analysis in Chapter 6. in giving. (The interaction sum squares would be identical both cases. A consequence is that A and B together explain less of the variation sum of the dependent variable than the of which each explains alone. and these with the order B reversed. should be used. an in and the consequent lackuniqueness in partitioning the variation in the response of variable. for example. as Sum of squares for AB after A and B (ABIA. B). has led to a great deal confusion about whatis the most appropriate of are analysis of such designs. in other words. or unique or I)pe III sums of squares. Sum of squaresfor A.
then the experimenter is will usually wish to consider simple effects eachAlevel B separately. but in essence they go something like 1. but the question here is. In choosing among possible models. 4. see biofeedback example). However.122 CHAPTER 4 m 111 Sum of Squares e (Unique Sums of Squares) Here the sum squares come from fitting each term model. for there the sum of squares for A. Thus in deciding on possible models the data. say. these sum squares are said to be unique for the particular Thus in of term. the papers Nelder andAitkin referred to earlier. although somewhat less strongly. TypeI and 'Qpe 1 1sums of squares are identical. Sum of squaresfor B = BIA. do not adopt complex models for which thereis no empirical evidence. same recommendation appears. AB. wedo not retain of the termin the model. we not include the for do interaction term unless it has been shown to be necessary. . The arguments are relatively subtle.) In a balanced design. not interpreted. 2. adjusting in by main effects for interaction as used in Type III sums of squares is severely criticized both on theoretical and pragmatic grounds. in which ca (or on main effects involved in the interaction are not carried out if carried out. Thus the argument proceeds Type III sum of squares for A in which it that is adjusted for AB as well as B makes no sense. AB. the interaction sum of squares will be the same as in the sequential approach. this. 5. 3. if the interaction term necessary in the model. in his for Psychology. B. Thus additivityof A and B is assumed unless there is convincing evidenceto the contrary. the principle parsimony is of critical imporfit of we tance. Which type should be used? Well. So if there is no convincing evidence an AB interaction. is independent of whether interaction is assumed or not. our twoway design we would have Sum of squaresfor A = AIB. First. When models are to data. The issue does not arise as clearly in the balanced case. How 1 the respective sums of squares are calculated will be come clear in Chapter 6. Sum of squaresfor AB = ABIA.afer all other of in the terms. (Again. if you take as your Statistical Methods authority Professor David Howell bestselling text book. at of of A test of the hypothesis of no A main effect would not usually be carried out if the AB interaction is significant. then there is no doubtyou should always use Type ID: sums of squares and never Type I. And in Maxwell and Delaney's Designing Experiments the and Analysing Dura.
44 508. there is a substantially lowerIQ.29 2332. Their recommendation ’Qpe sums of squares. .21 250.as the most suitable in which in way to identify a suitable model a data set also convincing and strongly endorsed for is by this author. one We shall consider only the factors of mother’s mental state and husband‘s mental state here.025 0.93 2526.07 0.6).29 2332.) The Australian data givenin Table 4. to use I perhaps considering effectsanumberof orders.26 2.16. 1978.21 22583. (Here we have not graphed the data in any way prior toOUT analysisthis may have been a mistake! See Exercise 4. B main The arguments Nelder andAitkin against the of5e 111 sums of squares of use p are persuasive and powerful. Hereit is the interaction term that the key. If the AI3 interaction is not significant.27 .82 1323.21 9. and herewe shall only of the scratch the surfacepossible analyses.02 1323.002 0.5.14 are complex.) An analysis of variance tablefor a particular order of effects is given in Table 4.16 ANOVA of Postnatal Depression Data I23 Source DF ss MS F P Mother Husband Mother x Husband Error Order 2 Husband Mother Husband x Mother Error Onier l 90 1 l 1 1 1 1711. this task is left to readers e (e s Exercise 4. but when the mother is depressed.5. and a plot average IQ in the four cells of the designis shown in Figure 4. The results of two analyses using 15pe I sums of squares a e shown r is of in Table 4.ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1 TABLE 4.01 . The significant origin x sex x type interaction is the first term that requires explanation. (A very detailed discussion analysis of of these datais given in Aitkin.16 1 90 6.m3 10.21 250. Let us now consider the analysisof the two unbalanced datasets introduced earlier. beginning with the involving postnatal depression and child’s IQ.58 1711.17. then adjusting it is of no interest for and causes a substantial loss of power in testing the A and effects. It appears that the effect on child’s IQ of a father with a psychiatric history is small if the mother is not depressed.29 22583.93 5.40 6.40 .29 9.25 2332.003 2332.58 2526.44 508.
4. No history .17.18. Interaction plot for postnatal depression data.. In Table 4. adjusting the origineffect f r the large number of mostly o nonsignificant interactions has dramatically reduced the origin sum of squares and the associated F test value. Only the values in the lines corresponding to the origin x sex x grade x type effect and the error termare common to Tables 4.. . gives III sums of squares. Look. . history depressed Depressed Mother FIG. which.18 is the defaultSPSS analysis of variance table these data. atthe origin line in this table and compare it with that in Table 4. for inappropriately for the reasons given above. Type for example. . The difference is large.5. .14 2 CHAF'TER 4 Husband .17 and4.
03 298.58 .02 2.45 221.39 .98 .46 349.49 116.72 .67 0.77 .44 1050.03 .92 0.26 129.60 .19 1.43 177.45 3.71 0.44 3152.57 144.ANALYSIS OF VARIANCE II: FACTORIAL DESIGNS TABLE 4.4 3 .06 1.72 349.22 MS F ~ P Qpe Origin Sex Grade l 1 3 1 Origin x Sex Origin x Grade Sex x Grade Origin x Q e p Sex x Qpe Grade x Q e p Origin x Sex x Grade Le Origin x Sex x ? p Origin x Grade x ')p I.93 666.35 0.12 76.66 537.77 129.19 .69 378.16 .762 0.79 1998.82 1361.09 1827.83 137.06 53.0 9 3.21 .09 609.104 3.40 <.89 3.4 l .21 0.53 1019.18 .86 .98 1612.81 177.18 Default ANOVA Table Given bySPSS for Data on Australian School Children Sourre DF 1 I ss 228.46 0.16 0.38 4.60 23527.04 .30 3.77 3.85 1.49 2171.54 824.62 326.M .12 76.52 2.12 34.m .21 .69 .69 342.56 126.02 679.82 73.57 144.27 1S5 0.28 349.16 116.79 0.85 824.22 MS F P Origin Sex Grade m x Sex Origin Origin x Grade origin x Q e p Sex x Grade x Q e p Sex x Grade Origin x Sex x Grade Origin x Grade x Qpe Sex x Grade x Q e p Origin x Sex x Grade x Q e p Error 3 1 1 3 1 m 1 2 3 3 3 3 3 122 228.37 11.60 23527.00 3.53 192.85 13.762 1.53 .47 349.69 342.e Sex x Grade x Type Origin x Sex x Grade x Q e p 1 3 3 1 I 3 3 1 Error 3 3 3 122 2637.95 0.75 0.W1 .68 1.86 2038.82 453.65 0.04 .61 .91 3.72 0.47 412.03 895.53 192.28 0.02 0.019 .53 339.60 .60 .75 5.49 723.013 .17 Analysis of Data on Australian Children 125 Order I Source DF ss 2637.6 1 TABLE 4.08 53.19 55 .87 980.
delayedsmoking. and section be in this an example of its use i a 3 x 3 twoway design will be illustrated by using data n from Howell (1992). ANALYSIS COVARIANCE OF I FACTORIAL DESIGNS N Analysis of covariance in the context of a oneway design was introduced in Chapter 3.19 NS Errors Distract 12 8 9 107 123 133 12 7 1 4 101 75 138 8 10 83 94 7 10 9 86 117 112 11 8 10 10 8 8 130 111 102 120 134 118 97 11 10 DS Errors Distract AS Errors Distract Cognitive Task 4 8 1 1 1 6 1 7 5 6 9 6 6 94 138 126 127 124 100 103 120 91 138 7 1 6 88 118 8 9 1 9 7 1 6 1 9 1 1 2 2 1 2 1 8 8 1 0 64 135 130 106 l 2 3 117 124 141 95 98 95 103 134 119 123 NS Errors Distract DS Errors Distract AS Errors 27 34 19 20 56 35 23 37 4 30 4 42 34 19 49 126 154 113 87 125 130 103 139 85 131 98 107 107 96 143 48 29 34 6 113 100 114 74 18 63 9 54 28 71 60 54 51 25 49 76 162 80 118 99 146 132 135 111 106 96 21 44 Distract 34 65 55 33 42 54 108 191 112128 76 128 98 145107 142 61 75 38 61 51 32 47 144 131 110 132 Driving Simulation NS Errors Distract DS Errors 110 14 2 2 15 5 96 114 125 102 112 137 168 0 17 9 14 16 15 3 9 15 13 109 111 137 106 117 101 116 5 1 1 Distract Errors Distract 7 0 93 108 102 3 130 83 2 6 0 1 2 1 7 100 123 131 6 1 1 14 4 5 1 6 3 103 101 99 116 81 103 78 139 102 AS 0 0 91 109 92 2 0 6 4 1 106 99 109 136 102 119 0 84 114 67 68 0 6 2 3 AS.126 CHAPTER 4 4.nonsmoking. DS. NS. n Smoking and Perfonnance Data TABLE 4. a cognitive task. active smoking. . The technique can generalized to factorial designs.7.19 rm i which subjects performed either a pattern recognition task.The data are given in Table and arisef o an experiment 4.
ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1
127
smoking, smoked duringor just before the task; delayed smoking, not smoked for 3hours; andnonsmoking, which consisted of nonsmokers. The dependent variable
or a driving simulationtask, under three smoking conditions. These were active
was the number of errors on the task. Each subject was also given a distractibility scorehigher scores indicate a greater possibilitybeiig distracted. of Inhis analysis of these data, Howell gives the ANOVAtable shown inTable4.20. This table arises from using SPSS and selecting UNIQUE sum of squares (the as default). The arguments against such sums squares are the same here in the of previous section, and a preferable analysis of these data involves I sums of squares with perhaps the covariate orderedfirst and the group x task interaction ordered last. The results this approach are shown in Table 4.21. There is strong of which again we leave an exercise the as for evidence of a groupx task interaction, reader to investigate in more detail. this conflict between is (It Type I and Type III sums of squares that led to the different analysesof covariance results givenby SPSS and SPLUSon theWISC data reportedin the previous chapter.)
TABLE 4.20 ANCOVA of Pattern Recognition Experiment Using l)pe ILI Sums of Squsres by
Source
DF
ss
MS
F
P
(distract) Covariate Group Task
Group x Task
Error
1 464.88 4644.88 281.63 563.26 2 2 23870.49 4 1626.51 125 8942.32
11935.24 406.63 71.54
64.93 3.94 166.84 5.68
Oo .o Oo .o Oo .o
.M2
TABLE 4.21 ANCOVAof Pattern Recognition Data by U i g 'Qpe I Sums of Squares with sn the CovariateOrdeed F i t
Source
DF
ss
MS
F
P
Distract Task Group
Group x Task
Error
1 10450.43 2 11863.39 23726.78 2 585.88 4 1626.51 l25 71.54 8942.32
10450.43 292.94 406.63
146.08 165.83 4.10 5.68
<.m1
.M <.m1
<.m1
128
C W E R4 4.8. MULTIVARIATE ANALYSIS
O F VARIANCE FOR FACTORIAL DESIGNS
In the studyof the effect of mother’s postnatal depression on development,a child
number of cognitive variables in addition I were recorded each child at age to Q for 4 and are given in Table 4.14. Separate analyses of variance could be carried out on these variables (usingQpe I sums of squares), but in this case the investigator may genuinely be more interested in answering questions about the set of three variables. Suchan approach might be sensible if the variables were be regarded as to indicatorsof some more fundamental concept not directly measurablea so called latent variable. The required analysis is now a multivariate analysis of variance, as described in the previous chapter. The results are shown in Table 4.22. (Here we have included sex of child as an extra factor.) Note that because the design is unbalanced, orderof effects isimportant. Also note that because each term in the ANOVA table has only a single degree of freedom, the fourpossible multivariate test statistics defined the glossary all lead to the same approximate values and in F the samep values. Consequently,only the value of the Pillai test statistic isgiven in Table 4.22. For this set of cognitive variables only the mother’s mental state is significant. The means for the three variables for depressed and nondperessed mothers are as follows. Mother
PS
D
m
50.9 53.3
VS
54.8 51.9 57.1 55.3
QS
The children of depressed mothers have lower average scores on each of the three cognitive variables.
TABLE4.22 MANOVA of Cognitive Variables from a Posmatal Depnssion Study
Source
PillaiStatistic DF1Appmr F DE?
P
Mother Husband 3 Sex 3 Mother x Husband Mother x Sex Husband x Sex Mother x Husband x Sex
0.09
0.05
0.05 00 .7 0.01 002 .0 00 .4
2.83 1.45 1.42 22 .5 0.3 1 00 .7
1.11
3 3 3 3 3
84 84
84 84 84 84
84
.4 0
$23
. U
.09
.82
.98
.35
ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1
129
4.9. SUMMARY
2. The factors in factorial designs can be assumed to have random or fixed
1. Factorial designs allow interactions between factorsbe studied. to
effects. The latter are generally more applicable to psychological experiments, but random effects willbe of more importance in the next chapter 7. and in Chapter 3. Planned comparison and multiple comparison tests can be applied to factor designs in a similar way to that described for a oneway designin Chapter 3. 4. Unbalanced factorial designs require care their analyses. It is important in to be aware of what procedure for calculating sums of squares is being employed by whatever statistical software to be used. is
COMPUTER HINTS
Essentially, the same hints apply as those given in Chapter 3. The most important thing to keep a look out for is the differences in how the various packages deal with unbalanced factorial designs. Those such as SPSS that give 1 p IU sums of '5e squares as their default should be used with caution, and then only after specifiI sums of squares and then considering main effects before cally requesting so interactions, firstorder interactions before secondorder interactions, and on. When SPLUS is used for the analysis of variance of unbalanced designs, only I sums of squares are provided.
m
EXERCISES
4. Reexamine and, if you consider it necessary, reanalyze the rat data in the . 1 light of the differing cell standard deviations and the nonnormality of the residuals as reported in the text.
4.2. Using the slimming data in Table 4.9, construct the appropriate 95% confidence interval the difference weight change between novice and experienced for in slimmers when members of both groups have access a slimming manual. to
43. In the drug, biofeed, and diet example, carry out separate twoway analyse of variance of the data corresponding to each drug. What conclusionsreach do you from your analysis?
4.4. In the postnatal depression data, separately analyze each three variof the ables usedin the MANOVA in Section 4.7. Comment on the results. 4.5. Produce box plots of the IQ score in the postnatal depression data for the four cells in the 2 x 2 design with factorsof father's psychiatric history and
I0 3
CHAPTER 4
Display 4.4 ANCOVA Model for a TkoWay Design with Factors B A and
In general terms the model is
observed response= mean +factor A effect +factor B effect AB interaction effect +covariate effect error.
+
+
More specifically, the analysis covariance model for a twoway design of is
Yilt
= P +0: +6 + Yij + B(xijt  3 Eijt, 1 +
where yilt is the kth observed valueof the response variable in the uth of the cell design andxijk is the corresponding valueof the covariate, whichhas grand mea0;. F. The regression coefficient linking the response and covariate variableThe other is p. 4.1. terms correspond to those described in Display The adjusted value an observationis given by of adjusted value= observed response value +estimated regression coefficient x (grand meanof covariate observed covariate value).
TABLE 4.23 Blood Ressure, Family Histoty, and Smoker Status
Smoking Status
Family History
Nonsmoker
Exsmoker
Current Smoker
Ys e
No
15 2 156 103 129 110 128 135 114 110 91 136 105 125 103 110
114 107
134 140 120 115 128 105
135 120 123 113 145 120 10 4
165
110
125
90
123 108 113 10 6
ANALYSIS OF VARIANCE 1 : FACTORIAL DESIGNS 1
TABLE 4.24 Data f o a Trialof Estrogen Patchesin the Treatmentof Postnatal Depression rm
131
Baseline %ament
I
Baseline 2
Depression Score
Placebo
18 2 5 2 4 19 22 21 21 26 20
24 24
18 27 17 15 20 28 16 26 19 20 22 27 15 28 18 20 21 24 25 22 26
15 10 12 5 5 9 11 13 6 18
10
Active
27 19 25 19 21 21 25 25 15 27
7 8 2 6 11 5 11 6 6 10
mother’s depression. these plots give any concerns about the results the Do you of analysis reported in Table 4.16? If so, reanalyze the data as you see fit.
4.6. Consider different orders of main effects in the multivariate analysis described in Section 4.7.
type interaction found
4.7. Plot some suitable graphs to aid the interpretation of the origin x sex x in the analysis of the data on Australianschool children.
4.8. Graphically explore the relationship between errors and the distraction measure in the smoking and performance data in 4.17. Do your graphs cause Table you any concern about the validityof the ANCOVA reported inSection 4.6?
4.9. Use the formula given in Display 4.4 to calculate the “adjusted” values of the observations for the pattern recognition experiment. Using these values, plot a suitable diagram to aid in the interpretation of the significant group x task interaction.
4.10. The data in Table 4.23 (taken from Boniface, 1995) were collected during a survey of systolic blood pressure of individuals classed according to smoking
132
CHAPTER 4
status and family history circulation and heart problems. out an analysis of Carry of and variance of the data state your conclusions. Examine the residuals from fitting what you consider most suitable model and use themassess the assumptions the to of your analysis variance. of
41. In the model a balanced twoway fixed effects design (see 4.l, .1 for Display ) suggest sensible estimators the main effect and interaction parameters. for 41. The observations in Table are part of the data collected a clinical .2 4.24 in of trial of the useof estrogen patches in the treatmentpostnatal depression.Carry out an analysis variance of the posttreatment measure depression, using both of of pretreatment valuesas covariates.
5
Analysis of Repeated Measure Designs
5.1. INTRODUCTlON
Many studies undertaken in the behavioral sciences and related disciplines involve recordingthe value of a response variablefor each subject under more than one condition, on more than one occasion, or both. The subjects may often be arranged in different groups, either those naturally occurring such as gender, or those formed by random allocation, for example, when competing therapies are assessed. Such a design is generally referred to as involving repeated measures. Three examples will help in gettinga clearer picture of a repeated measures type of study.
1. visual acuity data. This example is one already introduced in Chapter 2, involving response times subjects using their right left eyeswhen light was of and flashed through lenses of different powers. The data are givenin Table 2.3. Here there are two within subjectfactors,eye and lens strength. 2. Field dependence and a reverse Stroop task. Subjects selected randomly from a large group of potential subjects identified as having fieldindependentor fielddependent cognitive style were required to read two types of words (color and form names) under three cue conditionsnormal, congruent, and incongruent.
133
Here there two within subjects factors.1.1 Field Independence and a Reverse Stmop T s ak bl Form Subject Cl h Color c3 0 Cl 0 c2 (C) 0 c2 (C) c3 0 182 176 1 219 206 191 2 175 186 183 3 166 165 190 4 210 5185 182 171 187 179 6 182 171 175 183 174 187 168 7 8 185 186 9 189 10 191 208 192 11 162 168 163 12 170 162 a2 (Field Dependenf) a (Ficld Independent) 1 161 156 148 161 138 212 178 174 167 153 173 168 135 142 146 185 150 184 210 178 169 159 185 201 183 177 187 169 141 147 I60 145 151 267 216 277 235 400 183 165 13 14 15 16 17 18 19 20 21 22 23 2 4 216 150 223 162 163 172 159 237 205 140 150 214 404 215 179 159 233 177 190 186 164 140 146 184 1 4 4 165 156 192 189 170 143 150 238 207 228 225 217 205 230208 211 144 155 187 139 151 271 165 379 187 161 183 140 156 163 148 177 163 Note. cognitive style. congruent. type and cue. The dependent variable was the timein milliseconds taken to read the stimulus are are words. The data shown in Table 5. one with severe dependence and one with moderate dependence on alcohol. had their (in salsolinol excretion levels millimoles) recordedon four consecutive days (for in chemistry.normal.C.W O groups of subjects. incongruent. Alcohol dependenceand salsolinol excretion. Response variable t m in milliseconds.N. and one between subjects factor. salsolinol an alkaloid is those readers without the necessary expertise . is i e I. 3.134 CHAPTER 5 TABLE 5.
32 0.2.70 1.50 4 0. time. In essence.64 8 0.80 1.40 7.60 4.70 1. Primary interest centers on whether the groups to is behaved differently over time.70 0. data are given in Table 5..g.40 0.34 2 0.80 8.40 1.eachsubjectserves l as his or her own control. variability caused differences in average resby ponsiveness of the subjects is eliminated from the extraneous error variance.20 1. accordingly.30 1. and learning and memory tasks).45 3.40 2.63 3 1.69 6. developmental changes over time.50 12 1.60 7.10 3.00 3. A consequence of this is that the power to detect the effects of within subjects experimental factorsis increased comparedto testing in a between subjects design.91 0.60 5.40 7. and one between subjects factor.20 1.85 4. effects of experimental factors giving rise to the repeated measures are assessed relative to the average response made by a subject on al conditions or occasions. Here there a single The within subject factor.73 9 0.90 2.70 2. .40 2.60 1.20 0.30 0.12 3.01 0.10 0.2 Salsolinol Excretion Rates (mol) for Moderately and Severely Dependent Alcoholic Patients 135 DaY Subject l 2 Group I (Moderate Dependence) 1 0. Researchers typically adopt the repeated measures paradigm as a means of reducing error variability andloras the natural way of measuring certain phenomena (e.90 13 0. level of dependence.30 1.10 0.86 4 1.73 0.50 14 0.33 2 5.80 with a structure similar heroin).98 5 0.80 2.ANALYSIS OF REPEATED MEASURE DESIGNS TABLE 5.70 10 11 2. In this type of design.31 Group 2 (Severe Dependence) 1 7 0.70 1. and.30 3 2.39 6 0.10 0.66 2.33 1.
to use relatively straightforward . In a longitudinal design. the subjects can be divided into two groups: the two levels betweengoup factor. rather of of than a different sample as required in a is ANOVA factorial design. however.136 CHAPTER 5 Unfortunately. only within subject factors occur. In the third example. This would be to overlook the fundamental difference that a in repeated measures design the same subjects are observed under the levels of at least some the factors interest. similar to that the factorial designs Chapter 4. 5.2. however. shall consider we them in a separate chapter (Chapter 7). however. and it might of of be imagined that the analysis variance procedures described there could again of be applied here. randomization of course. Observations made under different conditions involving the same subjects will very likely be correlated rather than independent. an option. This the type of study where subjects are simply observed over time rather different from other repeated measures designs. it should come as no great surprise that accounting for the dependence between observations in repeated measure designs brings about some problems the analysisof such designs. The common feature of the three examples introduced in this section is that the subjects have the response variable recorded more than once.as we shall in see later. the repeated measures on the same subject are a it sary part the design. and they often given a different label. such differences often quite large relative produced by manipulation of the experimental conditions or treatments that the investigator is trying to evaluate. makes is not. the repeated measures arise from observing subjects under two different factorsthe within subject factors. are longitudinul designs. It possible. In the visual of the acuity example. the advantagesof a repeated measures design come at a cost. giving rise a factorial design discussed in the previous chapter. where time is the single within subject factor. would be quite possible to use different groupsof subjects for each condition combias nation. You will remember that in Chapters 3 and 4 the independence of the observations was one of the assumpso tions underlying the techniques described. ANALYSIS VARIANCE OF FOR EPEATED MEASURE DESIGNS The structureof the visual acuity study and the fielddependence study is. The to primary purpose of using a repeated measures design in such is the control cases that the approach provides over individual differences between In the area subjects. In addition. and that is the probable lack of independence of the repeated measurements. are to differences of behavioral sciences. In both these examples it is possible (indeed likely) that the conditions under which a subject is observed are given in a random order. Because of their different nature. superficially at least. In other examples of of repeated measures. In the fielddependence example.
99 .2.68 57 551.58 43. in let us assume that this is the bestof all possible worlds.1.2 to be strictly valid.15 57 Eye x Strength1.3. with 2.75 Error 9.61 .90 1650. the condition will be discussed in detailSection 5. and repeated measures always meet the sphericity requirement.3. of Sphericity must also holdin all levels of the between subjects part the design 5. RepeatedMeasuresModel for Visual Acuity Data The results a repeated measures of ANOVA for the visual acuity data are shown in Table 5.2. been met before.62 0.ANALYSIS OF REPEATED MEASURE DESIGNS 137 procedures for repeated measures data if three particular assumptions about the observations are valid. The model allows the repeated measurementscorrelated but only to the extent to be 5. 1.85 Error 14. Homogeneity ofvariance: the variances the assumed normal distributions of are required to be equal.87 190. The model on which this analysis is based is described in Display 5.44 401 12.3.3 ANOVA of Vsa Acuity Data iul Sourre SS DF MS F p Eye 52.67836. which data are always in normal.75 Enor 1 19 3 52. condition implies that the correlations between pairs of repeated measures are also equal. 5.60 Lensstrength 571. however. however. of the compound symmetry pattern to be described in Section It appears that in this only lens strength affects the response variable experiment.90 86. This measurements are equal. For the moment.2. and they need not be further discussed here. The sphericity condition has not. and is particularly critical in the it ANOVA of repeated measures data. to We have encountered thefirst two requirements in previous chapters. The means for TABLE 5. They are as follows. variances of different populations are always equal.1.1 for the analyses be described in Subsections and 5. Normality: the data arise from populations normal distributions. the socalledcompound syrnrnerry pattern.51 3 14. 3 Sphericity: the variances of the differences between all pairs of repeated .22 .
1990.1 Repeated MeasuresANOVA Model for 'boFactorDesign (A and B) with Repeated Measures on Both Factors Let Yijk represent the observation on the ithsubject for factor level j and factor B A level k. The factor A (aj).1 AMsms1 (a1) A x Subjects ESSl (a l)(n 1) FSSl (aINnl) (error 1) B BSS b..138 CHAFTER 5 Display 5.although the information looks a little daunting! AB x Subjects =S3 . represent random error terms.001 1) (o])(bl)(. we assume that there are n subjects in the study(see Maxwell and Delaney.1 BSS BMSiEMS2 &l) Essz ESS2 ( l( 1 b )n ) B x Subjects (blI(n1) (error 2) ABSS AB ABSS (a l( 1 )b ) (0O(b1) . zero This isan example ofa mixed model. for full details).(a W..l) (Continued) . The model for the observationsis yijk =p + + + + + a j pk Yjk Ti (ra)ij + (rk%ik + (rY)ijk + 6jk. Es s3 (error 3) Here it is useful to give specific expressions both thevarious sums of squares for above andfor the terms that the corresponding mean squares are estimating (the expected mean squares). Finally. As usual. and theAB interaction ( y j k ) terms areassumed to be fixed effects. The analysis of variance table as follows: is Source SS DF MS MSR(F) ASS A a. is the effect of the kth level factor B (with b levels). Correlations betweenthe repeated measures arise f o the subject terms because rm these apply to eachof the measurements made on the subject. The term is a constantassociatedwith subject i . but the subject and terms are assumed to be random variables error from normal distributions with means and variances specificto each term. SS where a j represents the effect of thejth level of factorA (with a levels).factor B (a). and (T& and ( r y ) i j k represent interaction effects subject i with each factor their of and interaction. andYjk the interaction effect of of the two factors.
are 6/60 6/18 6/36 616 112. then 0 is greater than zero and the mean . but only to the extent of having the compound symmetry pattern. . = .) of the four lens strengths as follows. the F tests used are positively biased. and a b : = e a. Thus the ratio of these two mean squares provides an F test of the hypothesis that the populations corresponding to thelevels of A have the same mean. type 5. give an extended discussion this point. however. e p are A.23 115. SPHERICITY AND COMPOUND SYMMETRY The analysisof variance of the two repeated measures examples described above are are valid only if the normality.The main in Table effects of type and cue are highly significant.3. c = c:=. identifiedby an obvious correspondence to these terms. the mean square for A and the mean square for A x Subjects are both estimators ofthe same variance.5. e . the factor effectsaj are not zero.2). It is the latter that is of greatest importance in the repeated measures situation. i or yjt terms are p. (The effects of departures from this pattern will be discussed later.2. & zero is true.10 It may be of interest to look in more detail at the differences in these means because the lens strengths form some of ordered scale(see Exercise 5. (Maxwelland Delaney. For example. ' square for A estimates a larger variance than the mean square for A x Subjects. F tests of the various effects involve different error terms essentially for the same reason as discussed in Chapter 4.and AB effectsthat are nonzero unless the correspondmg null hypothesis that the a]. but a detailed interpretation the of results is left as an exercisefor the reader(see Exercise 5.ANALYSIS OF REPEATED MEASURE DESIGNS Display 5.93 118.05 115. and sphericity assumptions valid for the data. Model for Field Dependence Example A suitable modelfor the field dependence example is described in Display 5. Section 4. Y.. 1990. leading to an increase in the probability of rejecting the null .) Application of the model to the fielddependence data gives the analysis of variance table shown 5. under the hypothesis of no factor A effect. .1 139 (Continued) Here S represents subjects.1).If.2.2. B.~ . Again the model allows the repeated measurements to be related. homogeneity. = . XI=. because if the assumption is not satisfied. the sigma terms variances of the various random are effects in the model.4. 5. in respect to a simple random effects model.
factor A B level k. factor B. wit.The model for the observationsis Yijtl = p +aj +81 + OJI + t i l +bk “jk +r1 + + + Xjkl (?ah/ ~1jkl~ + (rb)it +(ry)ijt where the terms are in the modelin Display 5. +ne. where nl is the C. and C) with Repeated Measures on ”bo Factors (A and B) Let y1jn represent the observation on theith person for factor level j . and factor C level 1. B. that n = nl n2 is. Note that there are no subject effects involving factor because subjectsare not C. Okl.but the subject terms and terms are assumed to be random effects that are normally error distributed withmean zero and particular variances. with the addition of parameters as for the betweensubjects factor (with c levels) and its interactions with the within C subject factors and withtheir interaction (81. crossed with this factorit is a betweensubjectsfactor.140 CHAPTER 5 Display 5.1. .k/). X. Again the factor A. The analysis of variance table asfollows.. and factor C terms are assumed to be 6xed effects.2 Repeated Measures ANOVA Model for ThreeFactor Design(A. number ofsubjectsin level l of the between subjects factor + +. is Source Between subjects C Subjects within C (error 1) Within subjects ss css EsSl MSR( F) CMSEMS1 A CxA C x A x Subjects within C (error 2) B CxB C X B x Subjects within C (error 3) ASS CASS Ess2 BSS CBSS Ess3 AMsEMs2 CAMSEMS2 BMSEMS3 CBMSEMS3 ABMSlEM.94 CABMSEMS4 AB CxAxB CxAxBx Subjects within C (error 4) ABSS CABSS Ess4 n is thetotal number of subjects.
: U2 pa2 . increasing the size the Qpe I error over the of This nominal value set by the experimenter.62 5545. pattern must hold each level of the between subject factorsquite an assumption! As weshall see in Chapter 7.78 43622. Sphericity implies that the variances of the differences between any pair of the repeated measurements are the same. the covariance matrix must have the following form: i= (:“ . ) * *.02 .. the variances on the main diagonal must equal one another. where departures But are suspected.4 ANOVA for Reverse Stroop Experiment Source 141 ss 18906. In terms of the covariance matrix the of repeated measures.60 2848. will lead to an investigator’s claiming of a greater number “significant” results than are actually justified by the data.W16 l 1.99 . that is. 0 O 32 . what can be done? .37 90.25 12.04 292. that is.31 44 126.26 44 65.56 18906.16 146.54 3061.69 2.51 2860.l2 .52 2 1.I (see the glossary Appendix A).so hypothesis when itis true.83 2 22. (5.73 1 25760. Specifically.ANALYSlS OF REPEATED MEASURE DESIGNS TABLE 5.08 25760.25 162420. it requires the compound : in symmetry pattern.but perhaps less likely be so much of a of the problem in experimental situations in which the levels within subject factors are given in a random order to each subject in the study.30 5697.1) pa pa * U2 where a2is the assumed common variance and is the assumed common correp This in of lation coefficient the repeated measures.08 .23 e O1 . departures from the sphericity assumption are very to likely in the case of longitudinal data..70 45. .25 22 7382.02 2 172.79 DF MS F P l 2.78 22 1982.66 2 0.00 345.25 3061. and the covariances off the main diagonal must also all be the same. ”.
The procedure not recomwhen is mended. 1. That it will too is.3. which is a function of the covariances and variances of the repeated measures. the formula E is given in Display 5.have suggested using Some authors. Greenhouse and Geisser (1959)suggest simply substituting sample values for the population quantities Display 5. represents the elements of the population .142 CI"TER 5 5. is givenby S where p is the number of repeated measures on each subject. (1959). in Greenhouse and Geisser Display 53 Correction Factor The correction factor. 8.. an estimate of this parameter can be used to decrease the degrees of freedom of F tests for the within subjects effects.the mean of all the elements of the covariance matrix. E takes its maximum valueas one and the F tests need not amended. 8. (nsome waysit is a pity that E has becomethe established way to represent I this correction term. When sphericity for be holds. s = l . Try not to become too confused!) Furthermore. minimum value of E is l ( . . . (Covariance matrices are defined in the glossary in Appendix A...2. to account departure from sphericfor ity. as there is some dangerof confusion with all the epsilons occurring as error terms in models. Greenhouse and Geisser for this lower bound in all cases so as to avoid the need to estimate E from the data (seebelow).the mean of the elements of the main diagonal. . (u. example.. For those readers who enjoy wallowing in gory mathematical details. In fact. CORRECTION FACTORS IN THE ANALYSIS OF VARIANCE OF REPEATED MEASURE DESIGNS Box (1954)and Greenhouse and Geisser (1959)considered the effects of departures from the sphericity assumption in a repeated measures ANOVA.. very conservative.4.where p is the number of repeated measures. however. particularly now that most software packages will estimateE routinely.) . . In this way larger values will be needed claim statistical significance than to when the correction is not used. p .p covariance matrix of the repeated measures. and so the increased risk of falsely rejecting the null hypothesis is removed.both of which are usually providedby major statistical software.r=l. They demonstrated that the extent to which a set of repeated measures data deviates from the sphericity assumption canbe summarized in terms of a parameter. E. there have been two suggestions for such estimation. Such an approach is.and the remaining terms are as follows: 5 .r.. often fail to reject the null hypothesis it is false.the mean of the elements in row r. The / p l).
that C6. in the analysis studies in which a seriesof different response variables are observedon each subject. The main advantage of using MANOVA the analysisof repeated meafor sures designs is that no assumptions now have to be made about the pattern of covariances between the repeated measures.1). The scatterplot matrices the observationsin each group of 5. Huynh and Feldt (1976) suggest taking the estimateE of be min(1. is.5.This technique has already been considered of 3 of briefly in Chapters and 4. but see that we shall ignore here (but Exercise 5. Details of the calculationof both the Greenhouse and Geisser and Huynh and Feldt estimates of E are given in Table 5.6.1. To illustrate the use the correction factor approach of to repeated measures data 5. although values have changed considerably. there does to appear to be some evidence a lackof compound symmetryin the data. this method can also be applied in the repeated measures situation. However.1 are shown in Figures (field independent). and where l)? 2 l[( n is the numberof subjects in each group. C3.7. Cl. A disadvantage of using MANOVA for repeated measures is often stated to be the technique's relatively power low when the assumption of compound symmetry is actually valid. C4.5. these covariances need not satisfy the compound symmetry condition. However. to where a = n g ( p . given in a random order to each subject. but shall assume we shall again use the fielddependence data in Table now we that the six repeated measures arise from measuring the response variable. when the sphericity assumption is judged to be inappropriate. In particular. under six different conditions.is to use a multivariate analysisvariance. and the results of the repeated meaare sures ANOVA of the datawith and without using the correction factor given in Table 5.2 (field dependent).This is of reinforced by look at the variances of differences between pairsof observations and the covariance matricesof the repeated measures in each levelof cognitive style shown in Table 5. Here use of the correction factors doesnot alter which effects are statistically significant. MULTIVARIATE ANALYSIS OF VARIANCE FOR REPEATED MEASURE DESIGNS An alternative approach to the of correction factors the analysis repeated use in of measures data. Although there are not really sufficient observations make convincing claims. and 5. . C5. C2.. and2 is the Greenhouse and Geisser estimate.g is the numberof groups. in which a single response variable is observed under a numberdifferent conditions of and/or at a number different of times. a/b). time. Clearly. p 5.and b = (p) g n 1) (p l)?].ANALYSIS OF REPEATED MWiSURE DESIGNS 143 2. there are large differences between the groups in terms of variances and covariances that may have implications for analysis. Davidson (1972) compares the power the univariate and multivariate analysisvariance of of .
144 .c e .
145 .
93 2461.64 1186.45 3107.73 1138.00 66.01 156.48 4708.97 103.00 123.64 Covariance M a t k for Repeated Measures FieldDependenfGmup for C1 c2 c4 CS C6 5105.54 304.15 c2 221.99 2500.08 147.82 497.91 4324.70 160.82 375.00 1569.91 964.27 1698.36 178.09 C3 156.27 4324.73 169.00 185.36 164.17 c 3 221.00 190.73 c2 c1 1847.55 1138.64 1595.00 .14 161.79 6 S4 1 150.09 375.36 343.08 47.64 1044.27 478.24 c 4 1443.66 1790.32 203.66 1847.36 4708.82 c5 154.94 1698.14 300.36 2595.46 192.15 2240.52 138.99 2768.66 Covariance Matrirfor Reveated Measures for Fieldlndevendent GmUD Cl c 4 c2 c3 CS C6 203.12 01 270.00 c3 4698.02 1044.12 C1 145.99 2673.27 330.09 185.36 c 5 1603.09 172.46 161.42 544.02 296.36 1595.27 164.23 270.64 1003.82 160.63 16.93 1603.73 375.93 1790.27 2866.36 1664.32 145.48 778.61 116.I 46 TABLE 5 5 CHAPTER 5 Covariance Matrices and VariancesDifferencesof Repeated Measures of for Shoop Data Variancesof Differences Between Conditions Diference FieldDependent Variance FieldIndependent Variance c142 C1C3 C1 x 4 C145 ClC6 m43 C2a czc5 (2x6 c x 4 c3c5 C346 c445 C446 C546 68.01 C6 343.81 311.09 172.00 1193.73 3364.73 1309.79 123.79 1569.73 1664.48 4790.64 C6 1309.94 4636.23 160.02 1443.30 4698.55 1003.77 192.36 344.77 256.
277.31 2450. approaches when compound symmetry holds and concludes that is nearly the latter as powerful as the formerwhen the numberof observations exceeds the number of repeated measures by more than 20.59 1001.86) HuynhFeldt Estimate a b min(l.86~1231.72 800.18 632.14 740.o/b) =12x5~2~22.76 Mean of all the elements inS is 1231.66 2505. again assuming that the six repeated measures each subject arise from measuring the reponse for variable time under six conditions given in a random order.55657.08 c 3 2272.42 2272.10 750.1231. 5~(7248464612~10232297+36~1231. =5~(2xll5~8). The multivariate approach to testing this hypothesis uses a version of Hotelling’s T 2statistic as described in Display 5. For the moment. field independent and field dependent.32 871.66 c2 2427.76.4. = 0.10 2490. Cl cz c3 c 4 c 5 C6 890.97 Q c3 c 4 c 5 C6 1722.44 c4 929.303.64 Cl 1603.8.31 975.39 721. of .68 Sum of squares of all the elements inS is 72484646 The row means of S are 845.22 c 1 2427. To illustrate the use of the MANOVA approach to the analysis of repeated measures data.36 C6 Mean of diagonal elements of S is 1638. and assume that the null hypothesis of interest is whether the population means of the response variable differfor the six conditions.364 929. forget the division of the data into two groups.70 2637.ANALYSIS OF REPEATED DESIGNS MEASURE TABLE 5.91 9.73 882.08 87 689.59 1.71 2450.68)*] = 0.70 800.32 750.55 914.73 890. Numerical resultsfor the Stroop data given in Table 5.39 c5 882. we shall apply it to the fielddependence data.72 Sum of the s u r drow means of S is 10232297 qae Greenhouseand GeisscrEstimate (of Correction Factor) a= [36 X (1638.32 657.6 Calculating Correction Factors 147 The Mean of the Covariance Matricesin Table 5. There is are a highly significant effect condition.18 .5 is the Mat& S S y Given By a.72 1001.
Group x Condition: dfl = 1.l4 <. . + + .42 1.. The adjusted degrees of freedom and the revised p values for the within subject tests are as follows. .2768 x 110 = 30.81 5768. .32. ~ .51. where p is the number of repeated measures..3029 so that the following is me.3029 x 110 = 33..168. 0 3 Group x Condition: dfl = 1.1p. thatis.2768. Display 5. Assessing whetherthese differences are simultaneously equal to zero gives the required multivariate test H of .=0 (see Exercise 5.1 .49 28842.  T 2= nd’sdd. 06: Condition: &l = 0.p.26 5 488. p?.. X). thatis.32. H :p l = p 2 = * .38.5).9 04 Nofc.3028 x 5 = 1.‘. = P a t 0 where pt. Condition: dfl = 0. where n is the sample size and Sd is the covariance matrix the differences between of the repeated measurements. p = .I48 CHAPTER 5 TABLE 5.  .4 Multivariate Test for Equality of Means of Levels of a Within Subject Factor A with U Levels The null hypothesisis that in the population. of This is equivalent to H o :pl/12=0. p = .2768 x 5 = 1.50 2.34 1 22 7390. 6’ = [XI. the means of the U levels of A are equal.45.38. Repeated measures are assumed to arise from a singlewithin subject factorwith six levels.OOO41. .07 5 11.38 .fa]. t .34 162581. The HuynhFeldt estimate of c is 0.45. = (n .ooOOl .l/( F p )[n l( )p l)]T2has an F distribution withp 1 and n p 1 degrees of freedom.7 ANOVA With and WIthout Correction Factors for Stroop Data Analysis o Variance Wthout Correction Factors f Source ss 17578.28 53735. .94 947.171. p = .2 . df2 = 30. df2 = 0. Under Ho. For thesedata the Greenhouse and Geisser estimateof the correction factor € is 0.51.19 4736. dP2 = 0. T 2is given by The appropriate test statistic is Hotelling’s T2 applied to the corresponding sample mean differences.p2p)=0.10 DF MS F P Enor Group (G) Condition (C) GxC 110 Enor 17578. dt2 = 33. are the population means the levels of A. . p = .11.
61 36. 9.921.42.83 36.52 14. 10.08.65 34.88.001. The vector of means of these differences is 6' The covariancematrix of the differencesis sa= ( = [1. and p <.00.33.88 20.ANALYSIS OF REPEATED MEASURE DESIGNS Multivariate Test of Equality of Condition Means Stroop Data for l 49 TABLE 5.91 228.8 Differences Between Conditions Subject ClQ C S C c243 4 C 5 6 C c344 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 15 8 1 16 8 8 6 1 7 1 1 0 10 19 0 4 18 53 16 4 4 28 12 22 4 22 16 4 19 1 12 16 5 8 55 55 15 25 22 31 7 5 40 21 1 13 3 0 43 38 23 38 11 11 20 32 28 40 33 28 117 110 25 165 47 40 2 16 31 12 67 48 6 8 8 4 3 3 7 6 4 1 6 5 26 22 0 9 6 12 14 5 4 6 5 2 5 2 18 10 6 18 4 4 24 4 16 7 17 9 3 2 22 22 12 19 7 18 3 1 1 12 3 Nore.10 54.10 14.40.02.90 228.83 34.73 T 2= 115.70 12.90 208. 8.13 100.70 81. .65 1595.51 143. F = 19.64 100.13 20.53 65.61 143.23 208.53 12. 261.88 442.
. andLix (1995) give some detailed comparisonsof univariate and multivariate approaches to the analysis of repeated measure designs. $ ) .. and. . d’. of m e restriction totwo levels for B used simply of convenience is of description. for balanced designs (between subjects factors). The relevant testis Hotelling’sTZas described in Chapter 3. $ ) . There is no evidence of a significant group x condition interaction. Keselman. with the R1 and R2 terms being re laced by the vectors of group mean differences. The concept of the reliability of a quantitative variable can best be introduced by use o f statistical models that relate the unknown “true” values of a variable to the corresponding observed values. .5 Multivariate Test for A x B Interaction i a Repeated Measures Design with Wlthin an n B Subject FactorA (at a Levels) and Between Subjects Factor (at b Levels) Here the null can be expressed as follows: where py)represents the population mean level i of A and level j of B.a ! b af)]. 5.. as they suggest that the multivariate approach is affected more by departures from normality. Details are given in Display 5.) Testing these equalities simultaneously gives the multivariate test for the x B A interaction. second group.This is . Now S1 and S2 are the covariance matricesof the differences between the repeated measurements in each group. recommend the degreesof freedom adjusted univariateF test.5.. and the numerical results for the Stroop data are given in Table 5. = [.8). Keselman.I 50 CHAPTER 5 Display 5.and dz defined similarly for the ay).  of the x Again Now what about the multivariate equivalent group condition test? this involves Hotelling’s T2 test as introduced in Chapter 3 (see Display 3. .9.fy) .6. MEASURING RELLABILITY FOR QUANTITATIVE VARIABLES: THE INTRACLASS CORRELATION COEFFICIENT One of the most important issues in designing experimentsis the question of the reliability of the measurements.
251.ANALYSIS OF REPEATED MEASURE DESIGNS Multivariate Test for Groups x Condition Interactionfor Stroop Data 151 TABLE59 The covariance matricesof the condition differences for each p u p are SI = S*= ( ( d.82 2461.11.82 778. Toillustrate the calculationthis coefficient. The resulting value of 0. Consequently.89138.56.09 455.58. 68.27 o .44 17. a convenient point to introduce such models.3 gives the box plots of the scores given by each judge.08.54 17. Figure 5. 6.18 60.25. the author realizes that synchronized swimming may only be a minority interest among psychologists.27 186.89 22.54 11.82 10.09 168. = [2. they all lead to theintrucluss correlation coefficient the way of indexing reliaas by bility.15 85. the ratings givena number of of judges to the competitors in a synchronized swimming competition will be used.36 22.85 10.18 42.02 26. p = . '0 ' 2 = 11.23. for example. this the is not universally so.20 119.85 42.08.00. the relationship between judges 4 and 5 is far less satisfactory. F = 1.11 103.83.92.52 26. 3. o 17.14 16. d.11 7. a pattern of relatively strong relationships between the scores given by eachofpair five judges.14 17. together with the details of calculating the intraclass correlation coefficient. and Figure shows the draughtsman's plotthe 5. and the .83.95 140. might be useful to examine the it data graphically in some way.08 17.44 . The relevant ANOVA table for the synchronized swimming data is shown in Table 5.56 26.67 101. in general.683 is not particularly impressive.23 26.73 11.4 of ratings made each pair judges. 12. 12.52.18 61. The scatterplots show that although there is.581. The plots shown that the scores given by of box by the first judge vary considerably more than those given by the other four judges. ~ 478. 11.6. these ratings are given in Table 5.08.73 0.23 101.36 455. of Examples of the models used assessing reliability given in Display for are 5. 10.) Before undertaking the necessary calculations.82 119.95 168.67 66.54 7.81 138.0817.21 140. because applyingthem generally involves a particular type repeated measures design.21 60.56 330.67.08 123.21 85.66 The two vectors of the means of the differencesare = [0.18 186.10 (Of course.
that is. ~. When each of r observers rates a quantitative characteristic interest on each of of R subjects. Assume that t has a distribution with mean and variance U:. R. 0 .'.'/.'.+. R decreasestoward a lowerlimit of zero. The three terms t .' the variance. and E are assumed to be independentof one another so that the variance ofan observation is now .. where forms an increasing proportion of observed U. A consequence of this model is that variabilityin theobserved scoresis a combination of true score variance anderror variance. and that t and c are zero independent of each other. where o represents the observer effect.1 52 CHAPTER 5 Display 5. The reliability. which assumed to be distributed with is zero mean and variance U. where t i s the underlying true value ofthe variable for the individualand E is the measurement error. In the reverse case.A possible modelfor x is X=t+€. U. the appropriate model becomes x =t +O+€.of the measurements is defined as the ratio of the true score variance to the observedscore variance. assume p that E has a distribution with mean and variance U.+~ (Continued) .: decreases.6 Models for the Reliability of Ouantitative Variables Let x represent the observed valueof some variable ofinterest for a particular individual. say some a days later. If the observation was made second time.'.=:.which is reached whenall the variability in the measurements results from the error component ofthe model. size of the measurement error does the not dependon thesize of the true value. which can be rewrittenas so that theerror variance forms a decreasing part of variability in the the R is usually known as the intraclass correlation coeficient. As observations. In addition. The intraclass correlation coefficient be directly interpreted as the proportion of can is variance of an observation that due to between subjectvariability in the true scores. R increases andits upper limit of unity is achieved when the error variance is zero.it would almost certainly differ to some degree fromthe first recording.
An estimator of R is then simply for synchronized swimmers involved in the competition might have some cause concern over whether their performances were being judged fairly and consist In the case of two raters giving scores on a variable to the same n subjects. U M S : +nu:.ANALYSIS OF REPEATED MEASURE DESIGNS Display 5. By equating the observed values the three mean squares to their expected values.M E S n 8. u:+u. EMS = SMS r 8 : = RMS . the intraclass correlation coefficient equivalent to Pearson's product moment is correlation coefficient between2n pairs of observations.2+u:. When and in are of value depends only two raters involved. 0: An analysis of varianceof the raters' scoresfor each subject leads to the following Source DF Subjects n 1 Raters Error are r1 MS SMS (n )r 1( 1) RMS EMS It can beshown that the populationor expected values of thethree mean squares SMS : : +m:. the value the intraclass correlation in part on the corresponding product moment correlation coefficient and in part on the differences between the means and standard deviations the two sets of of . and U.6 153 (Continued) The interclass correlation coefficient for SiNation is given by this R= ANOVA table.' =EMS. U:.'.' EMS : U. of the following estimators forthe three variance terns U:. of which the first n are the original values the secondn are the original values reverse order. U.' are found as follows: 8.
1 30.2 27.2 30.3 26.7 32.0 26.2 31.6 26.3 27.0 27.1 27.3 26.0 27.3 31.4 27.0 28.5 29.4 26.3 30.9 25.3 28.2 30.1 31.3 27. l 29.3 28.2 28.4 27.1 28.2 27.2 27.3 27.1 27.1 29.2 27.3 26.0 28.3 27.2 28.0 31.2 31.4 25.3 28.4 28.2 32.5 32.0 I 26.5 31.2 26.5 29.0 29.4 31.2 27.2 30.5 29.3 28.2 29.2 28.I 54 CHAPTER 5 TABLE 5.7 28.9 28.4 31.2 29.4 29.1 31.2 26.2 28.5 27.2 31.1 31.426.2 28.2 28.10 Judges Scores for 40 Competitors in a Synchronized Swimming Competition Judge I Judge 2 Judge 3 Judge 4 Judge S Swimmer 1 2 3 4 5 10 6 7 8 9 11 12 13 14 15 16 17 18 19 20 21 22 2 3 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 33.229.1 28.2 32.1 29.4 27.3 29.1 27.5 29.1 25.3 29.4 25.1 31.1 27.4 26.4 28.3 31.1 27.2 31.1 21.5 30.4 28.5 29.1 27.0 31.5 28.3 29.1 24.5 25.1 26.2 .3 29.0 28.6 29.2 32.0 28.2 30.6 31.1 27.5 31.2 25.9 29.3 30.2 28.2 30.4 27.3 26.3 31.2 27.1 27.2 27.2 28.1 29.2 30.2 27.3 25.1 31.1 28.2 30.1 29. 27.2 28.4 28.2 27. I 24.2 28.5 28.2 27.2 31.1 28.4 27.0 27.1 29.6 27.5 29.2 31.3 31.0 27.3 26.3 28.1 27.1 28.4 30.3 26.4 26.1 29.1 26.2 31.4 27.1 27.2 31.3 26.5 28.2 31.3 31.4 31.0 33.3 30.2 32.0 28.3 29.2 24.1 28.0 25.2 30.4 29.5 30.4 28.4 28.2 29.3 28.2 29.4 28.3 28.4 25.7 26.3 3 1.4 25.2 27.1 26.1 27.3 27.0 27.2 27.1 29.3 30.3 28.3 29.3 27.
Their conclusion is that an increase in r for fixed n provides more information than an increase in n for fixed r. .ANALYSIS OF REPEATED MEASURE DESIGNS r l 1 55 L J Judge 1 Judge 2 Judge 3 Judge4 Judge 5 FIG. Sample sizes required in reliability studies concerned with estimating the intraclass correlation coefficient a e discussed by Donner and Eliasziw (1987).7.3. 5. and r the same authors (Eliasziw and Donner. 1987) also consider the question of the optimal choice of r and n that minimizes the overall cost of a reliability study. ratings. The relationship between the intraclass correlation coefficient and Pearson’s coefficient isgiven explicitly in Display 5. Boxplots of scores given io synchronizedswimming competitors by five judges.
Matrix . 26 28 30 32 o scatterplots for scores given by each pair f o judges in rating synchronized swimmers f FIG.Ooo o p o o o 0 8 Judge 2 ' O 88" 0 0 o 0 0 o m 0 0 m m o 0 0 00 8 0 0 0 m g m o 070 0 0 09 0 0 om0 0003 0 oo 0 O 8 0 0 0 8 0 0 0 0 (u 0 0 030 0 0ogDmo 0 0 0 0 0 O( 0 0 00 0 0 0 0 0 0 8 0 L Judge 3 0 0 o o m 8 0 o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 % W 0 0 8 0 0 O 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c 0 0 O 0 08 0 om 0 O O 0 8 0 0 00 0 0 0 0 0 0 0 0 80 0 i m o 0 o m 0 0 Judge 4 0 0 0 000 0 0 0 0 O o o o oo 0 2 0 0 0 0 0 " $ 8 0 0 0 0 0g0 0 0 0 0 m 0 0 0 n 0 0 0 8 0 o m 0 0 0 o m 0 00 an 0 o o m 0 o m o 000 O m 000 0 0 0 0 I0 0 0 $ 1 .4. 5.n 26 156 CHAPTER 5 28 30 32 0 0 0 0 N 0 0 0 0 Judge 1 8 O O g o 0 0 oo 0 0 0000 m m oo oo 0 0 o '80'8 0 O o 0 0 0 0 0 '008 0 8 O W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0  o.
C. 1 13.08 163.05 = 0. the estimate of the intraclass correlation coefficient is R= 2. 5.53 + 0. 3. 1995).05. .53 Display 5. and the Product Moment Correlation Coefficient. and s.093 + 1. 40 82 = 1.7. The required estimates of the three variance terms in the model are 8. R.96 39 4 156 13. either a correction factor approach or a MANOVA approach can be used (other possibilities are described by Goldstein. Rhl.7 Relationship Between the Intraclass Correlation Coefficient. and S.77 1.37 4.69. This assumption is unlikely to be valid in most situations.05 = 0. 2. When sphericity does not hold.27 19.53. T = Consequently. 2.77 21. Univariate analysis of variance of repeated measures designs depends on the assumption of sphericity. 4. 22 are the mean values of the scores of the two raters. in the Case of l b o Raters where i.37 .11 ANOVA and Calculation of Intraclass Correlation Coefficient for Synchronized Swimming Judgments 1 S7 ANOVA Table Source ss DF MS Swimmers Judges Error 521. = . SUMMARY 1.1. Repeated measures data arise frequently in many areas of psychological research and require special care in their analysis.093.05 Note.ANALYSIS OF REPEATED MEASURE DESIGNS TABLE 5.05 < = 2. are and the corresponding variances.
these factors.12 Skin Resistanee Data Subject I 2 3 4 5 1 2 3 140 4 300 5 6 7 8 500 660 250 220 72 135 105 m 200 600 370 84 50 98 600 54 240 450 75 250 250 310 220 9 90 10 11 12 200 75 15 160 250 58 180 180 290 45 180 135 82 32 220 320 220 300 50 33 430 190 73 34 70 78 32 64 280 135 80 m 13 14 15 16 m 310 20 lo00 48 88 300 50 92 220 51 150 170 66 26 107 230 1050 280 45 . with and without correction factors. Both univariate tests. and the numberof levels of Specify the namesof the within subject factors. click General Linear Model. and multivariate tests can be obtained. Click Add.A test of sphericity is also given. TABLE 5. COMPUTER HINTS SPSS In SPSS the basic steps to conducta repeated measuresanalysis are as follows. Recently developed methods for the analysis of repeated measure designs having a nonnormal response variableare described in Davis (1991). Click Statistics.158 CHAPTER 5 4. and then click GLMRepeated Measures to get the GLMRepeated Measures Define Variable(s) dialog box. Click Define and move the names of the repeated measure variables to the Within Subjects Variables box.
ANALYSIS OF REPEATED MEASURE DESIGNS TABLE 5.13 Data from Quitting Smoking Experiment 159 Before Gendcr Tmatment Afer Home work Home work 4 2 4 5 Male Taper l 5 8 8 6 4 6 5 l 8 5 l 6 5 Intermediate 6 8 5 l 5 l 5 3 6 4 l 8 l 5 Aversion 6 l 6 6 5 8 4 6 6 5 6 8 9 4 5 5 3 l l 8 Taper Female Intermediate 9 l 5 l 6 4 8 5 6 5 8 l 5 6 5 6 l 5 6 5 9 3 5 5 6 4 6 6 8 6 8 6 8 6 8 4 5 5 4 6 8 4 2 3 0 3 2 3 5 5 5 4 6 5 8 4 8 3 Aversion 5 l 4 9 5 6 4 5 5 5 8 6 5 4 9 5 3 3 5 8 5 6 4 3 3 6 5 4 5 6 4 3 6 8 4 4 8 5 6 3 l 5 .
Are there any observations that you think may be having an undue influence on the results? Three different procedures quitting smoking (tapering immediate stopping. of 5. . to Calculate the intraclass correlation coefficientof the electrodes.160 CHAPTER 5 SPLUS In SPLUS. The data Table 5. 5. Reanalyze the visual acuity data by using orthogonal polynomial contrasts for lens strength.12. The results are 5.5. noting that the groups formed by the two between subjects factors.now keeping both within subject factors.The Table given in experiment was carried out see whether all electrode types performed similarly. this is perhaps more applicable to longitudinal studies the examples covered to than it until in this chapter. and aversion therapy) were compared. How do the results in Do compare with the simple univariate ANOVA results given the text? you think any observations should removed prior to the analysis of these data? be 5. gender and treatment group. we shall not say more about Chapter7. Carry out both univariate and multivariate analyses of the data. EXERCISES 5.Five differenttypes of electrodes were applied to thearms of 16 subjects 3 and the resistance was measured in kilohms.4.1. Apply both the correction factor approachand the MANOVA approach to the Stroop data.13 are taken from an investigation in of cigarette smoking. have different numberssubjects. Show the equivalence of the two forms of the null hypothesis given in Display 5. 5 . Subjects were randomly allocated to each 1treatment and wereasked to rate (on a to 10point scale) their desire to smoke “right now” in two different environments (home versus work) both before and after quitting.5.2. for off. main function for analyzing repeated measures but because the is Ime.
the term regression was first introduced by Galton the 19th Century characterize a in to tendency toward mediocrity. it possible (or indeed sensible) ty to use these data Is to r to construct a model for predicting the vocabulary size children older than6. sevand eral explanatory variables. (Incidentally. The derived equation may sometimes be used solely for prediction. More specifically. regression analysis involves the development and use statistical techniques designed to reflect the of way in which variation in an observed random variable changes with changing aim circumstances. in establishing a useful model to describe the data. (It has to be admitted that the methodis also often misused.1. in 161 .) is. observed the offspringof parents. or. of and how should we about Such questions serve to introduceofone most go it? the widely used of statistical techniques. the of a regression analysisis to derive an equation relating a dependentan explanatory variable. th is. INTRODUCTION In Table 6. more commonly.6 Simple Linear Regression and Multiple Regression Analysis 6.1 a small set of data appears giving the average vocabulary size of children at various ages. but more often its primary purpose as a way ofestablishing the relative is importance of the explanatory variable@) in determining the response variable.)In very general terms. that more average. regression analysis.
610. SIMPLE LINEAR REGRESSION The essential components the simple e a r regression model involving a single of h explanatory variable shown in Display 6. Hand.o 1 . and the original are given in its data for of Figure 6.) Fitting the model to the vocabulary data gives the results shownin Table 6.162 TABLE 6. a situation to discussedin Section 6.0 25 . in Chapter 10 we shall consider suitable regression models for categorical response variables.(Those readers who require a more are Lunn. and McConway. Jones. detailed acoount should consult Daly.2.35.5 l). 95% confidence interval.1 The Average Oral Vocabulary Size of Children at Various Ages ~ ~~ CHAPTER 6 Age (Years) Number of WO& 1. The confidence interval the regression coefficient age (513. indicates that there a very strong relationship between vocabulary size is and age.0 3.3.1.5 3 22 446 896 1222 2072 4.0 In this chapter we shall concern ourselves with regressionfor aresponse models variable that is continuous.0 4. No doubt most readers have covered will simple linear regression aresponse for variable and a single explanatory variable their introductory statistics course. Because the estimated regression coefficient positive. at the risk asan and the next section both aide memoir asan initial stepin dealing with the more complex procedures needed when several explanatory variables are considered.5 5. . A plot of the fitted line. in of enduring a little boredom.1. be 6. 3. it may be worthwhile read Nevertheless.S 2. 1995.0 6. the relationship is so is such that as age increases does vocabulary sizeall rather obvious even from a simple plot of the data! The estimated regression coefficient implies that the increase in average vocabulary size corresponding to an increase age of a year in is approximately 562 words.2.
. the model can be writtenas The basic idea of simple linear regression is that the mean values of a response values of an explanatory variable y lie on a straight line when plotted against X I .(9. the Yi =(I+& +Ei.nareknownasresidualorerrortermsandmeasurehowmuch an observed value.. . Estimators of the standard errors of the estimated slope and estimated interceptsare given by S Confidence intervals for. The variability ofthe response variablecan be partitioned into a part that due to is [C:=. where Jand I are the sample means of response and explanatory variable.( and p. namely the ( I The two parameters of the model. yz.i=l..(Yi n2 9d2 where j = B +jxi is the predicted valueorfirted value of the response variable i for an individual with explanatory variable value xi. yn are the n observed values of response variable. are the intercept and slope of the line. 1 Simple Linear Regression Model 163 variable. the respectively. are the corresponding values of explanatory variable.x. x2. The resulting as least estimators are & = Jba.and .j 9 2 ] and a residual regression on the explanatory variable  (Continued) . .yi. a procedure known squares. *The~i.. More specifically.LINEAR AND M U L T l P L E REGRESSION ANALYSIS Display 6 . . An estimator of u2is sz given by S = 2 c:=. and of hypotheses about. are assumed tobe normally distributed with mean and zero variance u2. the tests slope and intercept error parameters can be constructed in the usual way from these standard estimators. . differs from the value predicted by model.. The variance of y for a givenvalue of x is assumed to be independent of x . x.. where y1. +pxi. The error terms. the .. I Estimators of ( and p are found by minimizing the sum thesquared deviations of I of observed and predicted values.c.
2 DF MS 7294087. an estimate this kind is of little use without some measureits variof of ability. Using the are fitted equation predict future vocabulary scores based on the assumption that to is the observed upward trend in vocabulary scores with age will continue unabated + .3. for formulas are given Display 6. Note that the confitheir dence intervals become wider.93 ANOVA Table SE Source Regression Residual 8 ss 7294087. of TABLE 6.(y~ Therelevant terms are usually arranged in an analysis of variance table as follows. Thus the derived regression equation does allow predictions to made.0 F 1 539. the prediction less certain asthe age becomes at which a predictionis made depms further from the mean of the observed ages. The residual mean square gives the estimate u2. For example.86 561. p. Some relevant 6.0 109032.7.1 (Continued) CHAPTER 6 [xy=.25 Slope 24. That is.5 the prediction would be average vocabulary size = 763.19 It is possible to use the derived linear regression equation relating average at different ages.164 Display 6. A little thought shows that they not.2. (6.93 x 5. but be it is of considerable importance to reflect a little on whether such predictions are really sensible in this situation.is zero. SS Source DF MS F  Rearession Residual RGSS RSS 1 MSRGMSR RGSSll n2 RSS/(n 2)  The F statistic provides a test of the hypothesisthat the slope parameter.and in Table predicted vocabulary scores in (CIS) a number of ages and confidence intervals are given.86 561. vocabulary size to age. thatis. to predict vocabulary size for age 5.29 763.5 = 2326. a confidence intervalfor the prediction is needed.2 Results of Fitting a Simple Linear Regression o e to Vocabulary Data Mdl Parameter Estimates Estimate CoejJicient Intercept 88.1) As ever.
Plot of vocabulary data. Display 6. 2 degrees of freedom for the required 100(1. I . showing fitted regression and 95% confidence interval (dotted line).2 Using the Simple Linear RegressionModel for Prediction The Predicted =pome value correspondingto a valueof the explanatory variable of say x0 is An estimator of the variance of a predicted value is provided by (Note thatthe variance. 6. of the predictionincreases as x0 gets further away from L) A confidence intervalfor a predictioncan now be constructed in the usual way: Here t is the value oft with n confidence interval.m)% .LINEAR AND MULTIPLE REGRESSION ANALYSIS 1 65 I 0 I I 1 2 3 4 age I 5 6 FIG.
513 This because the rate vocabulary acquisition will gradually decrease.3 Simple LinearRegression Model with Zero Intercept The model is now yi = BXi +ci.that is. the estimated intercept they.6 (2819. with an approximate95% confidence intervalof (940.An apparently more suitable model would be one in which the intercept is constrained to be zero. this Now you might remember that in Chapter was remarked that you do not 1 it if believe in a model you should not perform operations and analyses that assume it is true. Application of leastsquares to this model gives the following estimator for p. An estimator of the varianceofb is givenby where s2 is the residual meansquare from the relevant analysis of variance table. and particular exampleis no exception. This is clearly a It silly interval avalue that is known apriori to be zero.3.7 (2018.88 203.0 20.5 7.58 151.86. the .Forthe vocabulary data.0 10.I66 CHAPTER 6 TABLE 6 3 Predictions and Their Confidence Intervals for the Vocabulary Data Age Predicted Vmabulary Size SE ofPredicfion 95% Cl 5.reflects the inappropriate for nature of the simple linear regression model for these data. namely between approximately and 610 words per year.2635) 3169.66 423.3520) (4385. Estimation for such amodel is describedinDisplay 6. The estimated that vocabulary data really believable? on vacabulary size at age zero.11453) Display 6.72 (9496. Extrapolation of outside the range of the observed valuesthe explanatory variableis known in of general to be a risky business. into future ages at the same rate as observed between ages 1 and 6.5326) 133. Bearing this warning in mind.0 2326. axisis 763. assumption is clearly false. is the simple linear model fitted to the A little thought shows it is not. 587).
ri = yi ji. 1. 2. The fitted zero intercept line and the original data are plottedin Figure 6. with an estimated standard error of 30. aquadratic term in the explanantory variable should be included in the model. namely the examinationof residuals. So what has gone wrong? The answer is that the relationship between age and vocabulary sizemore complex than allowed forby either of is is the two regression models considered above. . showing the fitted zero intercept estimated value of the regression coefficient is 370. It is apparent from this plot that our supposedly more appropriate model does not fit the data as well as the one rejected as being unsuitable on logical grounds. say. that is. A histogram or stemandleaf plotof the residuals can be useful in checking for symmetry and specifically for normality of the error terms in the regression model. The most of useful plotsace as follows.2) In a regression analysis thereare various ways of plotting these values that can be helpful in assessing particular components the regression model. the difference between an ji.2. (6.LINEAR AND MULTIPLE REGRESSION ANALYSIS I7 6 I I I I I 0 1 2 3 Age 4 5 6 line.yi. observed value. and the value predicted by the fitted model. Plot of vocabulary data. Plotting the residuals against the corresponding valuesof the explanatory variable. 6 . Any sign of curvature in the plot might suggest that. 2 . FIG.96. One way of investigating the possible failings of a model was introduced in Chapter 4.84.
*. . . Plotting the residuals against the fitted values of the response variable (not spelled out in Rawlings. 6. a transformation of the response variabletoprior is indicated.: . Figure 6. S . the response values themselves.. Idealizedresidual p o s lt.168 CHAPTER 6 I Residusl 0  +l ~ .1988). Figure 6.::::*.3. XI FIG. . ..3(b) suggests that the assumption of constant is not justified variance so that some transformation of the response variable before fitting might b sensible.3(a) is what looked forto confirm that the fitted model approis is priate. 1. fitting Figure 6. 3.. 2.*::* . 8 . for reasons Ifthe variability the residuals appears increase with the size fitted of to of the values. ..3 shows some idealized residual plots that indicate particular points about models.
0 2.1.99 194. The problem will be taken up detail in Section 6.96 92 1.4.0 3 22 272 201.4 shows the numerical values the residuals for the vocabulary data of and Figure 6.86 1764.92 19.00 446 8% 1222 1540 1870 2072 2562 640.4 Residuals from Fitting a Simple Linear Regression to the Vocabulary Data Size Vocabulary Axe 1.79 2607. Here there very few are observations on which to make convincing claims.0 1.0 4.of the simple linear regression procedure described prein the and vious section. Itis now the relationship between a response variable several explanatory variables that becomes of interest.5 4. where we also discuss a number in of other regression diagnostics. MULTIPLELINEAR REGRESSION Multiple linear regression represents a generalization to more than a single explanatory variable. Figure 6.0 204. ( npractice.) Table 6.LINEAR AND MULTIPLE REGRESSION ANALYSIS 169 TABLE 6.5.89 1483.19 26. the residual plots obtained might be I somewhat more difficult to interpret than these idealized plots. are given in Display 6. Details of the model.70 3.03 360. including the estimationof its parameters by least squaresand the calculation of standard errors.3.82 2045.96 25.93 79.11 56.4 shows the residuals plotted against age.) The regression coefficients in the .0 3.5 5. (Readers to whom a matrix is still a mystery should avoid this display at all costs.5 2. (The raw residuals defined and used above suffer from certain problems that make them less helpful in investigating fitted models than they might be.15 105.93 57.5 3.72 6.3(c) implies that the model requires a quadratic term in the explanantory variable.22 45.) 6.03 87. but the pattern of residuals do seem to suggest that a more suitable model might be found these datasee for Exercise 6.93 120. of course.
b b b 0 b b 0 b b b b b 170 .
n. for example. a quadratic term. The aim of multiple regressionis to arrive at a set of values for the regression coefficients that makes the values theresponse variable predicted of from the model as close as possible to the observed values.. .In practice. . the multiple regressionanalysis are interpreted as beiig conditionul on the observed values of the explanatory variables. . .) we canwrite the multiple regression model then observations conciselyas for (Continued) .2.4 Multiple RegressionM d l oe 171 T h e multiple h a regression modelfor a response variable with observed values er y Y I . they are not results from a random variables. are assumed to have a normal .LINEAR AND MULTIPLE REGRESSION ANALYSIS Display 6. fori = 1.with observed values . distribution with mean and varianceoz.so the model remains nonlinear model is for one of these variablesis included. xip . explanatory variables.y2..X I .. .the leastsquares procedureis used to estimate the parameters themultiple regression model. result mightlook complicated particularlyif you are not very familiar with matrix algebra. . ( A n example of a y = plep2x1 = Bo +BIXII Bzxiz + + Bpxip + €1. This implies that.. . i = 1. * The residual terms in the model. . . give the amount ofchange h. .the response variableis normally distributed with a mean that is a linear function of theexplanatory variables and a variance that is not dependent on these variables. . n. but you take my word that the result can looks even more horrendous when written without the of matricesand use vectors! Sobyintroducingavectory’=[yl. + coefficientsPI. xp. As in the simple linear regression model. The “linear” in multiple linear regression refersto the parameters rather than the linear if. conditionul on theother explanatory variables the model remaining unchanged.xi2.. for given values of zero the explanatory variables. .yz. . . xil.cl. .. . is . The explanatory variablesare strictly assumed to be fixed. thatis.. y and p explanatory variables. .B. . y n ]a n d a n n x ( p + l ) m a t n j t X given by + f13e@4Xz. wherethis is rarely thecase. in The resulting estimatorsare most conveniently written with the ofsome help The matrices and vectors. Yi The regression inthe response variable associated with a unit change in the corresponding explanatory in variable. .x2.
defined as the correlation between the observed values the response variable.. all the regression that coefficients are zero. Source Regression Residual RGSS p n .and the values predicted by the fitted model. . the variability in the response variable in multiple regression can be partitioned into a part due to regression on the explanatory variables a and residual term. i.) (More details of the model in matrix form and the leastsquares estimation process are given in Rawlings. yn. These can as be arranged in an analysis of variance tableas follows. . .and in most practical situations. + B p ~ p +* The valueof RZ gives the proportionof variability in the response variable accounted forby the explanatory variables. These matrix manipulations are. The covariance matrix the parameter estimates the multiple regression model of in is estimated from S. 1988. . = s*(x’x)l. j+ =h++Blxil . multiple regression model give the change expected in the response variable to be when the corresponding explanatory variable changes by one unit.e.The resulting Ftest can be used to assess the omnibus.172 CHAPTER 6 Display 6.. of y . A measure of the fit of the model is provided by the mulriple correlation coeficient. R. The diagonal elementsof this matrix give the variances the estimated regression of coefficients and the offdiagonal elements their covariances. .4 (Continued) The leastsquares estimators the parameters in the multiple regression modelre of a given by the set of equations = (x’X)”X‘y. otherwise. that is. that one variable the sum of several others. As in the simple linear regression model described the in previous section. conditional on the other explanatory variables remaining unchanged.(SeeSection 6. none of the chosen explanatory variables predictive are of the response variable. your regression software will complain.) The variation in the response variable can partitioned into a part due to regression be on the explanatory variables and a residual for simple linear regression.6. is often referred to This as ‘partialling out’ or ‘controlling’ for other variables.. although such terms are probably best avoided. relatively uninteresting null hypothesis. easily performed on a computer. . but you must ensure that there are no linear relationships between the explanatory variables. for is example.RSS p 1 DF ss RGSSlp RSS/(n p MS 1) F RGMSIRSMS The residual mean squaresz is an estimator of d.
this is an illusion proBut duced by the different scales the two variables.a given temperature. Consumption: 0. The is more sizes of the two regression coefficients might appear to imply thatofprice importance than temperature in predicting consumption. simplyby multiplying the raw regression coefficient by the standard deviation the appropriate explanatory variable and dividing of by the In standard deviationof the response variable. such an analyIn sis. The explanatory variables believed influence consumption are to average price and mean temperature the 4week periods. the ice cream example the relevant r standard deviationsa e as follows. The standardized values might be obtained applying the reby gression model to the values of the response variable and explanatory variables.1. The negative regression coefficient for price indicates that. each regression coefficient represents the change standardized response in the unit variable.00834. The results applying are the multiple regression model to these data shown in Table 6. accountfor 64% of the variability in consumption. although the standardized values of these coefficients can.5.422. the multiple linear regression model will be applied to the data on ice cream sales introduced in Chapter 2 (see Table 2. the multiple correlation coefficient (defined in Display 6. Temperature: 16. For the ice cream data. the hypothesis that both regression coefficients are zero is not tenable. however. A Simple Example of Multiple Regression As a gentle introduction to the topic. consumption for decreases with increasing price. for the two explanatory variables . The raw regression coefficients of should not be used to judge the relative importance of the explanatory variables. The F value of 23.4).06579. associated with a change standard deviation in the explanatory of one variable. in A scatterplot matrix the three variables (see Figure 6. Price: 0.5) suggests that temperof of ature is of most importancein determining consumption.80.4) takes the value 0.LINEAR AND MULTIPLE REGRESSION ANALYSIS 173 6. Clearly. be used in this way. implying that the two explanantory variables. The standardized regression coefficients become as shown.3. standardized (divided by) their respective standard deviations. consumption increases with increasing temperature. price and temperature. again conditional on the other explanatory variables’ remaining constant The standardized regression coefficients can. partially at least. thirty Here the response variable the consumptionof ice cream measured over is 4week periods. for a given price. be found without undertak ing this further analysis. The positive coefficientfor temperature implies that.27 with 2 and 27 degrees of freedom has an associated p value that is very small.
6. Scatterplot matrix of variables in ice cream data.25 50 30 40 60 70 FIG.35 0.174 CHAPTER 6 i L 8 2 8 0 0 0 0 0 00 0 0 0 0 N 60 0 0 m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 . .0 00 0 8 2 0 O O 0 0 0 0 0 0 0 0 0 I I 8 I I I 0 0 I I .5.
their estimated regression coefficients would remain unchanged with the addition of more variablesto the model. The change produced in a previously estimated regression coefficient when addian tional explanatory variableis included in the model results from the correlation between the explanatory variables.1777.00834 0.7563.) signs. 0. this lack of independence of explanatory variablesis also the explanation of the overlapping sums squares found in unbalanced factorial deof 4. this topic is introduced in Chapter . Pammefer Estimate BI 82 The ANOVA tableis Source Regression Residual 1. This will become of more relevance in in later sections. The relevant results are shown in Table 6.27 0. but simply examining some numerical results from the ice cream data will be helpful for now.00171 F 23.07943 0069 .03972 0.9251 0.0030 ss DF 2 27 MS 0. (As we shall see in Section 6.7.LINEAR AND MULTIPLE REGRESSION ANALYSIS TABLE 65 Multiple Regressionfor the Ice Cream Consumption Data 175 The model for the data is consumption = &I +61 x price +82 x temperature. independent (sometimes If the tern orthogonal is used).422 = 0.4018 X 0. the variableswere. SE 0.06579 = 0.0005 The leastsquaresestimates of the regression coefficients in the model are as follows.06579 A comparison of the standardized coefficientsnow suggests that temperature is more importantin determining consumption.40 Price: 1.00303 x 16. One further point about the multiple regression model that can usefully be illustrated on this simple example is the effect of entering the explanatoryvariables into the model a diferent order. : Temperature: 0.6.4018 0.
Can such an exif planatory variable be included in a multiple regression model. that the following new model fitted: is. The multiple correlation coefficient 0.5. age sex for 20 and normal adults between 23 and 61 years old.78 and b the ANOVA table is Source + ss DF 42. The multiple correlation is 0. age and sex related? The data in Table 6.7 include the categorical variable.1 76 CHAPTER 6 TABLE 6.26 and theANOVA table is is Source areao +PI x price.1171 0. ( )The parameter estimatesambo = 0. MS ss DF 2. = 2. The question of interest is ‘how are percentage fat.Note that the for in above.92 and 8.and give the body fat percentage.07551 F 1786 Regression Residual 28 0. 6.07551 0.3.02 1 28 F Regression Residual (C) 0.2. it is quite legitimate to includecategorical variable suchas sex in a multiple a .0031.00846 0.00418 Temperature is now addedto the model. sex. b estimated regression coefficient price is now different from the value given( ) ErtleringTempemtum Followed by Price (a) The first model fitted is now consumption = B0 BI x temperature.7 are taken froma study investigatinga new method of measuring bodycomposition.28 1 MS 0. and. An Example of Multiple Linear Regression in which One of the Explanatory Variables is Categorical The data shownin Table 6.5.05. so.21 and = 0.00846 0.05001 (c) Price is now added to the modelto give the results shown in Table6. is consumption = B0 /31 x price x temperature.6 Multiple Regressionfor the Ice Cream Data: The Effect of Order Entering Price Followed by Tempemrum (a) The first model fittedis consumption = B0 () b The parameter estimates in this model = 0. how? In fact. + + (d) The parameter estimates and ANOVA table are now as in Table 6.
theory at least. Consequently the explanatory variables in at can.5 27. Age. regression model.0 29. be m y type of variable. with a age.2 31. The distributional assumptions the model (see Display of 6.7 42. except perhaps interpretation. + (6. the fitted modelis %fat = 3.5) .female.8 17. for whom sex is coded as zero and so sex x age is also zero. However.1 32.36 x age.5.4 25.8 33. Indeed.3 21.5 30.8 41. 0. and Sex Age Percentage Far 9.0 33. the explanatory variables are not strictl to considered random variables all. Detailsof the possible regression in models for the human fat data e given in Display 6.9 27. 1.male.LINEAR AND MULTIPLE REGRESSION ANALYSIS 177 TABLE 6.1 Human Fatness. However.8 31. For men.4 25.The fitted model can be interpreted following manner.0 20.1 34.1 SX e 23 2 3 27 27 39 41 45 49 50 53 53 54 54 56 57 57 58 58 0 1 0 0 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 60 61 Note. for a twocategory variable suchas sex there are no real problems.0 33. we shall see with as later in Chapter 10.47 0. re shown in sex Table 6.8. in the 1.4) apply only the response variable.9 7. a r The results of fitting a multiple regression model to the human fat data. sex. and the interaction of age and as explanatory variables.care is needed in deciding how to incorporate categorical variables more than two categories.
4 be in The first model that might considered is the simple be linear regression model relating percentagefat to age: %fat = + x age. + + + +B.or. Age Suppose now that a model r q i e that does allow for the possibility of age x is e u r d an sex interaction effect on percentage of fatness. of Because in this case. x age h x sex a?!. the effect of a person’s sex is there no the same for all ages. the effect of age * on fatness is the same for both sexes.the new the model becomes sex to here thevalues of both and sex x age are zero and the model reduces %fat = & .x age. definedas the product of variables ofage andsex. Therefore. After such a model a been fitted. (Continued) .does a person’s sex have any bearing their percentage of on fatness? The appropriate model would be %fat=~+~lxage+~2xsex. afurther question of interest might be. The model now describes the situation shownin the diagram below. namely two parallel l i e s with a vertical separation B .5 Possible Models that Might Considered for the Human Fat Data Table 6. equivalently. %fat = Bo To understand this equation better. the model assumes that is interaction between age and sex.I8 7 CHAPTER 6 Display 6. Such a model must include a new the variable. Allowing hs for the effect of age. where sexis a dummy variable that codes men as zero and womenas one.first consider the percentage of fatness of men. x age x sex.
of the differencebetween the B slopes of the two lines.35 16. The parameter3 is a measure. The leastsquares estimates of the regression coefficients model are as follows.5 179 (Continued) However.LINEAR AND MULTIPLE REGRESSION ANALYSIS Display 6.77 F 17. in the Estimate Parameter SE B 1 P3 TABLE6.8 h 0.64 0.28 16 392.14 8.84 The multiple correlation coefficient takes the value 0.19 0.18 364.06 22. for women sex= 1 and so sex x age = age and the model becomes %fat +(Bo +h)+ +B31x age. Multiple RegressionResults for Human Fatness Data The model fitted is %fat=h+Bt xage+BIxsex+&xagexsex.8738. (B1 Thus the new model allows the lines for males females tobe other than parallel and (see diagram below).22 .15 0. The regressionANOVA table is Source ss DF MS Regression 3 Residual 1176.
in a regression analysis.....7) Collecting together terms leads to %fat = 20.. . ... .) .. ... . .. the fitted model is %fat = 3...567)is simply the estimated difference between the percentage fat men and that of women. .... and it seems that a model In including only sex and age might describe the data adequately.... ..1 x age.....(6. .....before being used coded int e r n of a series binary variables. for whom sex is coded as one and so sex x age is simply age... . 0 . .1 + (6.. . .. Female 1 wS 8 c 0N 11 . The interaction effect is clearly relatively small. ... For women.. .64 ..... .. .6) 0.. 0 . ... ... .. ' ... but shall not give an example until Chapter we such 10. 0 .. . (Categorical explanatory variables with more than two categories have reto of dummy variables. 6. The fitted model actually represents separate simple linear regressions with different intercepts and different slopes men and women. ....... .1 1 0. 0 . . I l I 30 40 Age 60 FIG.. which includes x age interaction......... Plot of model for human fat data.. ... ...180 CHAPTER 6 0 P 1 1 ...6 shows a for plot of the fitted model together with the original data.. such a model the estimated regression cbfficient for sex (11..24 x age. .6. Figure6. .... .. .. which of is assumed to be the same at all ages.47 +0. .... . a sex 2..... .36 x age + 16... . ..
) . then a more parsimonious description of the data is achieved (see Chapter 1). The investigatorfar more likely to be interested is in the question of whether some subset of the explanatory variables exists that might be a successful as the full set in explaining the variation in the response variable. because be it is unlikely that all the explanatory variables chosen for study will be unrelated to the response variable. t Unfortunately.The scatterplots involving of it crime rate (the row ofFigure 6.10. These data will be used to in47 vestigate how the crime rate1960depended on the other variables listed. ever. Here we shall preempt one up of the suggestions to be made in that sectionfor dealing with the problem. of little consequencein most practical applicationsof multiple regression. If using a particular (small) number explanatory variables results in of a model that fits the data only marginally worse than a much larger set.3.6.LINEAR AND MULTIPLE REGRESSION ANALYSIS 11 8 6.3. The data in originatefrom the UniformCrime Report of the F B I and other government sources. because of the relationships between the explanatory variables. indicating that the explanatory variables account for of the variability in the crime rates the 47 states. explanatory variables happened if the to be orthogonal to one another.7.The reasons in that such a strongly correlatedofpair explanatory variables can cause problems for multiple regression will be taken in Section 6. the regression in a coefficients of the remaining variables (and their standard errors) to be reeshave timated from a further analysis. How can the most important explanatory variables be identified? Readers might look again at the results for the crime rate data in Table 6. of The overall test that all regression coefficients in a multiple regression are zero is seldom of great interest. (Of course. such a simple approach is only of limited value. there would be no problem and the t statistics could beused in selecting the most important explanatory variables. is shown in Figure 6.10 and imagine that the answer to this question is relatively straightforward.7) indicate that some. select those variables for which the corresponding statistic is significant and drop the remainder. are The resultsof fitting the multiple regression model to the crime rate data given in Table 6. One disturbing feature of the is the very strong relationship between police expenditure1959 and in 1960. regression coefficients and their associated standard errors are estimated condirional on the other variables the model. Again it is useful to begin OUT investigation of these data by examining the scatterplot matrix the data. Predicting Crime Rates in the USA: a More Complex Example of Using Multiple Linear Regression In Table 6. explanatory top of the plot variables are predictive of crime rate. For this reason. at least. by simply dropping expenditurein 1960 from consideration.75. In most applications it will rejected. If variableis removed from a model.The global hypothesis that the regression coefficients are all zero is overwhelmingly rejected.data for states of the USA are given.9. namely. howThis is. The square of the multiple correlation coeffi12 75% cient is 0.
1 2 8 Crime in the U S A 1960 State CHAPTER 6 TABLE 6.3 34.1 163.3 199.3 65.9 123.2 43.5 74.8 52.9 R Age S Ed Ex0 Er1 L P M N NW U1 U2 W X l 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 l8 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 79.8 196.1 56.3 127.4 107.1 151 143 142 136 141 121 l27 1131 1 157 140 124 134 128 135 1 152 1 142 143 135 130 125 126 1 157 132 131 130 131 135 152 119 166 140 125 1 147 126 1 0 1 0 0 0 1 91 113 89 121 121 110 111 109 90 118 0 105 0 108 0 113 0 117 87 88 0 110 66 1 104 123 0 116 128 0 108 113 0 108 74 89 47 0 % 87 0 116 78 0 116 63 0 121 160 0 109 69 0 112 82 0 107 16 6 1 89 58 0 93 55 0 0 0 0 0 1 0 l 109 104 90 58 103 45 149 109 118 82 115 65 71 121 75 67 62 57 81 123 150 177 133 149 1 145 63 118 97 102 97 100 109 87 58 104 51 88 61 104 82 56 510 950 33 301 95 583 1012 13 102 44 533 969 18 219 141 577 994 157 80 101 591 985 18 30 115 547 964 25 44 4 79 519 982 139 50 109 542 969 179 39 62 553 955 286 68 632 1029 15 7 116 580 966 101 106 71 595 97259 47 60 624 972 28 10 61 595 986 22 46 53 530 986 72 30 77 497 956 321 33 6 63 537 977 10 115 537 978 31 170 128 536 934 51 24 105 567 985 78 94 67 602 984 34 12 44 512 962 22 423 83 564 953 43 92 73 574 103836 7 57 641 984 26 14 143 63l 107177 3 71 540 965 6 4 76 571 1018 79 10 157 521 938 89168 54 521 973 4 6 2 5 4 6 20 54 535 1045 81 586 964 82 97 64 560 972 95 23 97 542 990 21 18 87 526 948 76113 98 53 l 964 9 2 4 56 638 974 24 349 47 599 1024 7 4 0 36 54 515 953 165 74 560 981 126 96 108 96 94 102 91 84 97 79 81 100 77 83 77 77 92 116 114 89 78 130 l02 97 83 142 70 102 80 103 41 36 33 39 20 29 38 35 28 24 35 31 25 27 43 47 35 34 34 58 33 34 32 42 21 41 22 28 92 36 72 26 135 40 105 43 76 24 102 35 124 50 87 38 76 28 99 27 86 35 88 31 394 557 318 673 578 689 620 472 421 526 657 580 507 529 405 427 487 631 627 626 557 288 513 540 486 674 564 537 637 396 453 617 462 589 572 559 382 425 395 488 261 194 250 167 174 126 168 206 239 174 170 172 206 190 264 247 166 165 135 166 195 276 227 176 1% 152 139 215 154 237 200 163 233 166 158 153 254 225 251 228 (Continued) .2 83.4 79.1 66.5 167.6 115.3 155.5 57.6 70.3 75.4 68.9 92.0 122.3 69.6 104.2 121.6 37.2 96.8 94.6 96.6 82.9 121.6 53.9 75.9 51.2 92.5 85.4 84.
4.9 148 141 162 136 139 126 130 0 0 122 0 I 109 1 0 0 121 99 88 104 121 72 56 75 95 106 46 90 66 S4 70 96 41 97 91 601 S23 S22 574 480 1012 968 989 1049 998 968 996 40 9 4 623 S99 29 19 40 3 19 2 208 36 49 24 22 84 107 73 135 78 113 111 20 37 27 590 489 496 37 622 S3 457 2 S93 5 40 588 144 170 224 162 249 171 160 Note.crime rate: number of offences known to the policel.ooO. a number of alternative procedures have been suggested. to but instead we simply state that the result of the calculations of either all is a table . Because the simplestatistics may be misleading when is trying to choose t one subsets of explanatory variables. M.number of males per loo0 females.labor force participation rate loo0 civilian urban males in thep age 1424 per up years. binary (coded as l) from the rest: E . unemployment rate of urban males per I O 0 0 in the age p u p 3539 years. Exl. U2. All Subsets Regression Only the advent modem. ExO. size N number of nonWhites per 1o00. state population in hundreds of thousands. W.LINEAR AND MULTIPLE REGRESSION ANALYSIS TABLE6. two these are discussed in the next section. Age.9 183 (Continued) 41 42 43 44 45 47 46 88. S. N. LP. highspeed computing and the widespread availability of of suitable statistical sofware has madethis approach to model selection feasible. SELECTING SUBSETS OF EXPLANATORY VARIABLES In this section we shall consider possible approaches to the problem identwo of of as tifying a subset explanatory variables likely to be almost informative as the is complete set of variables for predicting the response. W . age per distribution: the number of males aged 1424 years per loo0 of total state population.3 103.1.5 50.and the second is known as automatic model l selection.0 45. R.oo0 population. The first methodknown as a lpossible subsets regression. unemployment rate of urban males per o 0 in the agep u p lo 1424 years. police expenditure: per capita in expenditure on police protection by state and local government 1969.4.0 54.2 82. X income inequality:the number of families per o 0 earning below In of the median lo income. wealth as measured by the median value of transferable gods and assets or family income (unit 10 dollars). We do not intend provide any details concerning the actual calculations involved. police expenditureas ExO. U1.8 84. 6. but in 1959. of 6. educational level: mean number d variable distinguishing southern states 2 of years of schooling x 10 of the population aged 2 5years.
.7. 6. 50 m 0.0 0 O S 80 140 503 0 loo 80140 XfJm FIG. Scatterplot matrix of crime rate data.184 CHAPTER 6 I 1 S.
59 0.43 15.19 0. and r be the model in which they areall out is excluded.01 Note.models a epossible.89 I.a total of 2P .03 0.10 Multiple Regression Results Crime Rate Data for (Ex0 not Used) Estiinated Coe&ients.19 0. MultipleR squared.41 4.LINEAR AND MULTIPLE REGRESSION ANALYSIS 185 TABLE 6. SEs and I Values Value Parameter SE t P (Intercept) Age S Ex1 M Ed LF N N W U1 U2 W X 739.44 0. In the crime rate example withp = 12explanatory variables. 8. k .90 0. Parameter abbreviations definedinTable 6.9.15 0.88 0.07 0. possible models.82 1.02 4.16 l .24 4. 22.94 155.65 0.18 0. Residual are standard error.55 0.15 0.03 0. for the ice cream data with p = 2 explanatory variables.87 0.11 0.64 on 12 and 34 degrees of freedom. or perhaps a subset of possible models. Model 2: price.50 3. there r three possible models.17 0.97 0.97 0. for example.6. there 212.62 0. Such a table is usually organized according to the number of explanatory variables in the candidate model. 0. Model 3: temperature and price.= are 1 4095 models to consider! The numericalcriterionmostoftenused for assessingcandidatemodels is Mallows c statistic.54 2.03 0. p 1 For amultiple regression with explanatory variables. in which each candidate model is identified by the list of explanantory variables it contains and the values of one or more numericalcriteria to use in comparing the various candidates.39 2. the p value is 4 0 1 .002 0. So.63 1.24 1.02 0. F statistic.001 0.39 0.01 0.21 0.03 1.37 3.75.13 0.09 8.13 0.27 0. which is described in Display 6. because each explanatory variable can in or out of the model.001 0.76 2.20 0.53 0.35 on 34 degrees of freedom. ae Model 1: temperature.016 0.
the subsetsof variables worth considering searching for in a parsimonious model are those lying close to theline C.6 is n shown in Figure Here only the subset Age and Ex1 appears an acceptable 6.14 42.186 Display 6. Ed.00 Ex 1 Ed Age Age.8. but we use only the three explanatory variables.12.9.11 Model ck Size Subset Variables in Model 9. This feature makes the plot useful device for abroad assessment of the c values a k of a range of models. If Ct is plotted againstk. of Some of the results from using the all subsets regression technique on the crime rate data with 12 explanatory variables are given in Table6. whereas remaining C.9. In this plot the value of k is (roughly) the contribution to k from the variance the c of estimated parameters. k is (roughly) the contribution the from the bias ofthe m d l oe.6 Mallows Ct Statistic CHAPTER 6 Mallows Ct statistic is defined as c k = @ssk/s2).Exl EdBXl Age.and the plot describedi Display 6. = k. to be alternative to the use all three variables. and Ex1 TABLE 6.Ed. where RSSk is the residualsum of squares from the multiple regression model with a set of k explanatory variables and is the estimateof uzobtained from the model sz that includesall the explanatory variables under consideration. Now for an example and to begin. we shall examine all subsets approach the the crime data.15 41. corresponding plot of on c k . and the all values appears in Figure Here the plot is too cluttered 6.23 4. Age. The resultsare shown in Table 6.14 3.02 11. Abbreviations defined are in 'Ihble 6.11.(n 2p).  Results of all Subsets Regression for Crime Rate Data Using Only Age. and Exl. The intercept is countedas a term in the model.Ed AgeSExl Note.10 50.
The criterion used for assessing .91. in a statistical sense. the predicted value of the mean response.Ed. Age.Ed. w. These methods rely on significance tests known partial F tests to as or select an explanatory variable for inclusiondeletion from the regression model.E I 2 I 3 4 Size of subset FIG. 6. using three explanatory variablesin the crime rate data. Thereare three typical approaches:(1) forward selection. would contribute information is statistically important conthat cerning the mean value of the response. The backward elimination method begins with an initial model thatcontains all the explanatory variables under investigation and then successively removes variables until no variables among those remaining in the model can be eliminated without adversely affecting. Age.65.U1 4kY.LINEAR AND MULTIPLE REGRESSION ANALYSIS 187 51 W Q 8 Ed '3 2 83 x 2 S a m  8 E.U1 I E4U1 W. and it successively adds explanatory variablesto the model until if the pool of candidate variables remaining contains no variables that. The forward selection approach begins an initial model that contains only a with constant term. However. Exl. Exl. to be helpful. added to the current model. 1. 6.x: x c 6 c 6 = 5.(2) backward elimination. Plot of c against k for all k subsets regression.4. and(3) stepwise regression. AutomaticModelSelection Software packages frequently offer automatic methods selecting variablesfor of a final regression model from a list of candidate variables.2. an inspection of the results from the analysis suggests that the followingare two subsets that give a good description of the data. = 5. 2. u2.8.
38 32.88 12.19 22.59 33.89 79.80 90.82 18.56 34.90 20.47 12.61 20.06 30.27 8.40 9.58 93.40 11.41 89.03 80.38 93.30 90.85 Subset Sue VariabIes in Model 1 2 3 4 5 6 7 8 9 10 l1 12 13 14 15 16 17 18 19 20 21 22 23 2 4 2 5 26 27 28 29 30 31 32 33 34 35 36 37 38 39 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 (Continued) .188 CHAPTER 6 TABLE 6.19 12.48 10.28 23.56 30.19 15.33 21.40 21.65 12.50 67.91 35.67 12.36 22.12 Some of the Results of All Subsets Regression for Crime Rate Data UsingAll 12 Explanatory Variables Model c k 33.36 88.95 16.89 31.86 10.39 12.46 35.
94 10.87 5. when all true effects are weak. the as (1996) makes very clear. of one if the presence newvariables has made their contribution to the model no longer of significant.07 10. Details of the or are criterion. and how the forward and backward methods applied in practice.22 10. Abbreviations are defined in Table 6. They must be used with care. but it is in no way guaranteed. The initial model stepwise for regression is one that contains only a constant term.65 5. Certainly none of the automatic procedures for selecting subsetsof variables are foolproof.95 14. followed by the possible eliminationof the variables included earlier. In the bestof all possible worlds.30 9. whether a variable should be added an existing model in forward selection. Subsequent cycles of the approach involve first the possible addition of an explanatory the current to variable model. to or removed in backward elimination. The stepwise regression method of variable selection combines elementsof both forward selection and backward elimination. Often does happen.9.37 11.7.the largest sample the effect may substantially .30 5 6 6 6 6 6 6 6 6 6 6 Note. following statement from Agresti Computerized variable selection procedures shouldused with caution.91 8. are given in Display 6.12 (Continued) 189 Model Ck Size Subset Variables in Model 40 41 42 43 44 45 46 47 48 49 50 12. one two of or them that are not really important may look impressive simply to chance.When one be considers alarge number of terms for potential inclusion in a model. For due instance.LINEAR AND MULTIPLE REGRESSION ANALYSIS TABLE 6.21 9. is based on the change in the residual sum of squares that results from the inclusion exclusion of a variable. the final model selected by applying eachof this the three procedures outlined above would be the same.
it often makes sense include variablesof to special interest in a model and report their estimated effects even if they are not statistically significant at some level.9. for some more thoughtson automatic selection methods in regression.190 CHAPTER 6 ma f M W I I I I I I 2 4 6 8 10 12 Size of subset FIG.) . .1982a. 6. c plottedagainst k forallsubsetsregression.1982b. 4 l explanatory variables in the crime rate data. (See McKay and Campbell. 2 using overestimateits true effect.In addition.
or the increase in the residual sum of squares when a variableis removed from an existing (k l)variable model (backward elimination). Using the Exl backward elimination technique produces these five variables plus Table 6.Age.7 ANALYSIS 191 Forward Selection and BackwardElination The criterion used for assessing whether a variable should be added to an existing an model in forward selection or removed from existing model in backward elimination isas follows: where SS&3u.t S&idud. andSS&dd. .14. The parameter estimates. of calculated F values greaterthan the Ftoenter lead to the addition the candidate variable to the current model. the former.and that is to check the assumptionson which the model .In the latter. The five chosen variables account 71% o f the variabilty for of the crime rates(theoriginal 12 explanatoryvariablesaccounted for 75%).5. for a multiple regression model including only these variablesare shown in Table 6. Perhaps this reflects a mechanism whereby as crime increases thepolice are given more resources? 6.Ed. Notice that the estimated regression coefficients and their standard errors have changed from the values given in Table calculated for amodel including the original 6.13 W.LINEAR AND MULTIPLE REGRESSION Display 6. X. + to Application of both forward and stepwise regression the crime rate data results in selecting the five explanatory variables. REGRESSION DIAGNOSTICS Having selected a more parsimonious model by usingone of the techniques described in the previous section. Ed. U2. E x l .t+) is the residual sum of squares for the model includingk + 1 explanatory variables. we still must consider one further important aspect of a regression analysis.l. The chosensubsetslargelyagreewiththe selection made by all subsets regression. a calculatedF value less than the Ftoremove leads to discarding the candidate variable from the model. The calculated F value is then compared with a preset term know the Ftoenter as In (forward selection)or the Ftoremove (backward elimination). It appears that one suitable model the crime rate data is that which includes for Age. and U2. shows some o f the typical results given by the forward selection procedure implemented in a package such as SPSS. and so on. Perhaps the most curious feature final model that crime rate increases with of the is increasing police expenditure (conditional on the other four explanatory variables). 12 explanatory variables. and X.t+l is the decrease in the residual sum of squares when a variable is added to an existing kvariable model (forward selection).10.
Ed 4 Constant. a term be missingin a linearmodel.436 25. X Ed. 1 2 Constant. Exl.257 38223.349 19.113 9700. X.941 14628. X 3 Constant. is based. These diagnostics are methods for identifying and understanding differences between a model and the data to which it is fitted.540 22684.240 21.020 68809.445 0. Other differences rest may may be systematic.009 27. X.277 48500.227 43886. U2 R 0.257 849.746 24922. We have.638 0. Age.745 0. already described the use of residuals for this purpose. one.2.081 495. Some differences between the data and the model may or be be the resultof isolated observations.915 579. in Section6.8.947 695. observations may outliers.799 0.530 68809.594 11531. Exl.705 Note.9.for example. but this section we shall go into a little more detail in and introduce several are other useful regression diagnostics that now available.192 Forward Selection Resultsfor Crime Rate Data Source CHAPTER 6 TABLE 6.277 DF 1 45 46 2 44 46 3 43 46 4 42 46 5 41 46 MS F 1 2 3 Regression Residual Total Regression Residual Total Regression Residual Total Regression Residual Total Regression Residual Total 30586. No further variables are judged to contribute significantly to the prediction of crime rate.555 0.400 19093.670 0.58 Model Summary Model Ex1Constant.338 36. Exl. Exl. Model summary abbreviations are defined in Table 6.667 0. or may differin some unexpected way from the of the data.Ed.383 68809.871 68809. Age 5 Constant.406 20308.277 38187.13 Model ss 30586.135 540. a few. A number of the most useful regression diagnostics described in Display are 6.277 46124.737 68809.We shall now examine the use two of these diagnostics the final model of on .893 30621.840 R2 0.819 0.
6.001 4. MULTICOLLINARITY One of the problems that often occurs when multiple regression used in practice is is multicollinearity.001 Note. 19.99 .31 8.16 0. It makes determining the importance a given explanatory variable difficult of because the effects of the explanatory variables are confoundedas a result of their intercorrelations. The term is used to describe situations in which there are moderate to high correlations among some or all of the explanatory variables. 1.03 X U2 1.) 6.9.45 5.30 2. Parameter abbreviationsare defined inTable 6. making the use of the fitted model for prediction less stable.04 0. (Exercise 6.02 99.W1 4. It increases the variances of the regression coefficients.. 3.19 <.86 1.65 0. e. It severely lmt the size of the multiple correlation coefficient R because iis the explanatory variables are largely attempting to explainof the same much variability in the response variable (see.10 shows plots of the standardized and deletion residuals against fittedvalues.705.and 20 might perhaps be seen as outliers because they fall outside the boundary.1967).14 Multiple Regression Results for the Final Model Selected Crime Rate Data for Pammeter Estimate I I 93 SE P Intercept 0. W <.6 invites readers to consider more diagnostic plots these for data. States 11. selected for the crime rate data.50 0. There is also perhaps some evidence that the variance of crime rate is not constant as required. The parameter estimates become unreliable. Dizney and Gromen.11 4. R* = 0.16 0. (2. and Figure 6. 2.12 4.LINEAR AND MULTIPLE REGRESSION ANALYSIS TABLE 6.62 0. Multicollinearity gives rise to a number of difficulties when multiple regression is applied.g. .37 Age Ex1 Ed 528.001 .18 2.2).
n.1 94. . 5 1. are said to be leverage of points.. them lessuseful than they might be. nor do they have the are same variance becausevar(r. r?’. average of value p/n..4. it is by no means foolproofmore subtle forms of multicollinearity may be missed. Observations with large values of h. but although this is often helpful. defined as follows: rp= del r. and thedeletion are residual. In a multiple regression the predicted values the response variable can be written of in matrix form as 9 =W so H “puts the hats” on y. Spotting multicollinearity among a set of explanatory variables may not be The obvious course of action simply to examine the correlations between these is variables.. Values corresponding observation undue influence onthe estimated regression has coefficients. = 1. make.Both propexties hi. An alternative. an indexplot. thatis. = ri “J ri where S& is the residual meansquare estimate of a*. . against i .. A full accountof regression diagnosticsis given inCook and Weisberg (1982). The raw residuals introduced in the text not independent. The diagonal elementsof H. with an . are such that 0 5 h.) = a2(1.). have most effect on high the estimation of parameters. Display 6. defined as is D = l r?w &zi=i5l’ Cook’s distance measures the influence observationhas on estimating the i all greater than one suggest that the regression parameters in the model. . approach is to examine what a e know as the variance infition r .. i hi. and generally far more useful. W Oalternative residuals the standardized residual.r p . andit is often informative to produce a plot h. in A further useful regression diagnostic Cook’s disfunce.8 Regression Diagnostics CHAPTER 6 T begin we need to introduce thehar mar% this is defined as o H = X(X’X)’X’. where X is the matrix introduced in Display 6. after the deletion of observahoni . to identify those observations that have leverage. The deletion residualsare particularly good helping toidenw outliers.
predictardized residuals against fitted values and (b) ed values.10. Diagnostic plots for final model selected for crime rate data: la) stand. . deletion residuals agains. 6.11 11 36 23 2 36 23 26 2 26 8 4 “1 7 4 4 33 3 21 9 39 24 40 38 3 1 45 46 47 47 22 22 19 29 1 9 29 FIG.
Variable abbreviations are defined in Table 6.15.57 3.17 5.88 5 .196 CHAPTER 6 factors (VIFs) of the explanatory variables. An alternative is simply to select one of the set of correlated variables (this is the approach used in the analysis of the crime rate data reported previouslyonly Ex1 was used). The variance inflation factor VIFj. is the square of the multiple correlation coefficient from the regression of the jth explanatory variable on the remaining explanatory variables.00 3. Here it is clear that attempting to use both Ex0 and EX1 in a regression model would have led to problems. Two more complex TABLE 6.70 4.00 8.15 VIFs for the Original 13 Explanatory Variables in the Crime Rate Data Variable VIF Age S Ed Ex0 Ex 1 LF M N N w u1 u2 w X 2.33 Note. for the jth variable. How can multicollinearity be combatted? One way is to combine in some way explanatory variables that are highly correlatedin the crime rate example we could perhaps have taken the mean of Ex0 and Exl.76 5.57 2.9. A rough rule of thumb is that variance inflation factors greater than 10 give some cause for concern. The variance inflation factor of an explanatory variable indicates the strength of the linear relationship between the variable and the remaining explanatory variables.00 100. is given by where R. Returning to the crime rate data. we see that the variance inflation factors of each of the original 13 explanatory variables are shown in Table 6. .00 100.OO 10.32 4.
(6. in fact.a 2 .) How can this be put into an equivalent form to the multiple regression model given in Display 6. In Chapter 3.1 to the multiple regression model is to use an example in which each factor has only two . the model is overparameterized (see Chapter 3).10) i=l where k is the number of groups.10). where p is the overall mean of the number of eggs laid.are simply the deviations of the corresponding group mean from the overall mean. THE EQUIVALENCE OF MULTIPLE REGRESSION AND ANALYSIS OF VARIANCE: THE GENERAL LINEAR MODEL That (it is hoped) ubiquitous creature. using an example in which there are k = 3 groups.5. and the estimates of the a.4? The answer is provided in Display 6. met in Chapters 3 and 4. Moving on now to the twoway analysis of variance. we find that the simplest way of illustrating the equivalence of the model given in Display 4.4 can be used to find the relevant sums of squares in a oneway ANOVA of the data..9) and (6. the following model was introduced for a oneway design (see Display 3. the more observant reader.11) (Other constraints could be used to deal with the overparameterization problemsee Exercise 6.2): yij = + ai + Eij. 6.9. (6. It is now time to reveal that. Consequently we can write ax. the analysis of variance models and the multiple regression model are exactly equivalent. will have noticed that the models described in this chapter look vaguely similar to those used for analysis of variance.10 uses the fruit fly data (see Table 3.10 are those to be expected when the model is as specified in Eqs. The constraint usually adopted is to require that k (6.7.1) to show how the multiple regression model in Display 6.ak1 (6. for example.LINEAR AND MULTIPLE REGRESSION ANALYSIS 197 possibilities are regression on principal components and ridge regression. Notice that the parameter estimates given in Display 6.9) Without some constraints on the parameters. both of which are described in Chatterjee and Price (1991).* * * . Display 6. = a1 .
9. Source Regression Residual SS 1362 5659 DF 2 72 MS 681. The explanatory variables for a balanced twoway design are orthogonal (uncorrelated). and Display 6. Using a Multiple Regression Approach Define two variables as specified in Display 6.1 78. &.. Display 6. X] x.198 CHAPTER 6 Display 6.42.= we can now write as This is exactly the same form as the multiple regression model in Display 6.6 F 8. 1 1 0 Group 2 0 1 3 1 1 The usual oneway ANOVA model for this situation is which.2.are simply the differences between each group mean and the grand mean. levels. to label the group to which an observation belongs. = 3. But now consider what happens when the multiple regression . a. The analysis of variance table from the regression is as follows.4.1 1 gives the details. . Regress the number of eggs laid on X I and x2. Display 6. Notice that in this case the estimates of the parameters in the multiple regression model do not change when other explanatory variables are added to the model. c:=.0.12 describes a numerical example of performing a twoway ANOVA for a balanced design using multiple regression.10 Oneway ANOVA of Fruit Fly Data.9 Multiple Regression Model for a Oneway Design with Three Groups Introduce two variables x 1 and x~ defined below. &I = 2. allowing for the constraint. The estimates of the a.79.16. The estimates of the regression coefficients from the analysis are fi = 27.7 This is exactly the same as the oneway analysis of variance given in Table 3.
x l = . x2 = 1 if first level of B.I .8 2 . In other words.LINEAR AND MULTIPLE REGRESSION ANALYSIS 199 Display 6.8 1 + Y I I I Now define two variables x1 and x2 as follows.. Y I j = y2J? XI = . Factor B at Levels B1 and B2) (See Chapter 4. 1 81 = . A1 A2 B1 P + a l +PI +YIl Pl +81 Y11 B2 P + ~.. The last two equations imply the following: Y E = Y11 Y21 = Y11 Y2z = Y11. The original ANOVA model can now be written as YlJk = p where x3 = X I x x2.8 1 . The model for the observations in each of the four cells of the design can now be written explicitly as follows. . X I = 1 if second level of A.1 if second level of B.11 Multiple Regression Model for a 2 x 2 Factorial Design (Factor A at Levels A1 and A2.X 2 . + a I x l + 81X2+ y1lx3 + 'IJk.) The usual constraints on the parameters introduced to deal with the overparameterized model above are as follows. there is only really a single interaction parameter for this design.Y I I @ . The constraints imply that the parameters in the model satisfy the following equations: a = a2. x1 = 1 if first level of A.
Step 2: x1 and x2 entered.followed by x?. &I = 0.42 The difference in the regression sum of squares between steps 1 and 2.25. The estimates of the regression coefficients at this stage are fi = 28. gives the sum of squares corresponding to factor B that would be obtained in a conventional analysis of variance of these data. first entering the variables in the order XI.50 200. and xj as defined in Display 6.25 15. (Continued) .1 1 and perform a multiple regression.12 Analysis of a Balanced Twoway Design by Using Multiple Regression Consider the following data. 380. Step I : x1 entered. B1 B2 A1 23 25 27 29 26 32 30 31 A2 22 23 21 21 37 38 40 35 Introduce the three variables X I . The estimates of the regression coefficients at this stage are fi = 28. followed by x3.75.25 41. The analysis of variance for the regression is as follows. Source Regression Residual ss 12. This leads to the following series of results. The analysis of variance table from the regression is as follows. &I = 0.25 580.75. Source Regression Residual ss 392.875. b1 = 4.50 DF 2 13 MS 196.875. which have four observations in each cell.875.75 DF 1 14 MS 12.48 The regression sum of squares gives the between levels of A sum of squares that would be obtained in a twoway ANOVA of these data. that is.200 CHAPTER 6 Display 6. x2.
The residual sum of squares in the final table corresponds to the error sum of squares in the usual ANOVA table. both the sums of squares corresponding to each variable and the parameter estimates depend on which variables have already been included in the model.71 The difference in the regression sum of squares between steps 2 and 3. In this case. 1 /9* = 4. gives the sum of squares correspondingto the A x B interaction in an analysis of variance of these data.50 56.50 DF 3 12 MS 178. The response variable is assumed to be normally distributed with a mean that is a linear function of the explanatory variables and a variance that is independent of the explanatory variables.000. 2 6 = 0. as shown in Display 6. . This highlights the point made previously in Chapter 4 that the analysis of unbalanced designs is not straightforward. 3. In practice where this is almost never the case.13. The explanatory variables are strictly assumed to be fixed.8. Source Regression Residual ss 536. The estimates of the regression coefficients at this final stage are j = 28. and x3 entered.00. that is. the results of the multiple regression are to be interpreted conditional on the observed values of these variables. x?. 144. 2.875.75.LINEAR AND MULTIPLE REGRESSION ANALYSIS Display 6. SUMMARY 1. The analysis of variance table for this final regression is as follows.875. Multiple regression is used to assess the relationship between a set of explanatory variables and a continuous response variable. 6. 9 1 = 3. approach is used to carry out an analysis of variance of an unbalanced twoway design. the order in which the explanatory variables enter the multiple regression model is of importance.12 (Continued) 20 1 Step 3: X I . 1 Note that the estimates of the regression coefficients do not change as extra variables are brought into the model.83 4.
The regression ANOVA table is as follows. with variables entered first in the order xl .19 The regression sum of squares gives the A sum of squares for an unbalanced design (see Chapter 4).65 DF 2 29 MS 590.42 16. 2 1 = 2.202 CHAPTER 6 Display 6. Step 2: x1 entered followed by x2.87 DF 1 30 MS 149.63 1505. Source Regression Resdual ss 1180. and x3 are defined as specified in Display 6. followed by x3.37 (Continued) . Step I: x1 entered.86 476. This leads to the following series of results.13 Analysis of an Unbalanced Twoway Design by Using Multiple Regression In this case consider the following data.11 and used in a multiple regression of these data. The regression ANOVA table is as follows.233.567. The estimates of the regression coefficients at this stage are fi = 29. B1 A1 23 25 27 29 30 27 23 25 A2 22 23 21 21 19 23 17 37 38 40 35 39 35 38 41 32 36 40 41 38 B2 26 32 30 31 Three variables X I . followed by x2. Source Regression Residual ss 149.63 50. x2.
$1 p1 = 5. 1 Now enter the variables in the order &x2followed by x 1 (adding xj after x2 and x1will give the same results as step 3 above). 1031. Step 2: xz entered followed by xl.667. Step I : x? entered. is the sum of squares that is due to the interaction of A and Byconditional on A and B.93 The regression sum of squares is that for B for an unbalanced design. The estimates of the regression coefficients at this stage are f i . ABIA. that is. The estimates of the regression coefficients at this stage are 6 = 29.is the sum of squares that is due to B.B. Source Regression Residual ss 1177. conditional on A already being in the model.667.42 6. The regression ANOVA table is as follows. The regression ANOVA table is as follows. = 5. Source Regression Residual DF 2 29 MS 590. The estimates of the regression coefficients at this stage are 6 = 28.745.302. The regression ANOVA table is as follows.85 474. that is.80 DF 1 30 MS 1177. that is. ss 1180.25 DF 3 28 MS 491.977.23.70 15. 1 fi. = 6.42 16. = 0.13 (Continued) ~~ ~ 203 The increase in the regression sum of squares.37 (Continued) .606.47 The increase in the regression sum of squares. Source Regression Residual ss 1474.LINEAR AND MULTIPLE REGRESSION ANALYSIS Display 6.078. Step 3: x I and x2 entered followed by x3. 293.65 6 = 29. that is. 6 = 0.341.39.70 477.115 9 1 = 3.25 181. BIA as encountered in Chapter 4.
depending on which stage the variable is entered into the model. Select dependent variable and explanatory variables. and click on the explanatory variables and move them to the Independent box. that is. Click on the Plot tag to select residual plots. and so on. COMPUTER HINTS SPSS To conduct a multiple regression analysis with one set of explanatory variables. Click Statistics. j1 5. Hold down the ctrl key. 2. Care is required when the latter is used as implemented in a statistical package. . use the following basic steps. click Regression. 1. 4. = Note how the regression estimates for a variable alter.341. regression can be used by means of the Statistics menu as follows. 1. sPLUS In SPLUS. outliers. Click Statistics.977. It may be possible to find a more parsimonious model for the data. and click Descriptives. 5.13 (Continued) The estimates of the regression coefficients at this stage are fi = 29. 3. 4. An extremely important aspect of a regression analysis is the inspection of a number of regression diagnostics in a bid to identify any departures from assumptions. 3. 6. 2.667. The multiple linear regression model and the analysis of variance models described in earlier chapters are equivalent. or define the regression of interest in the Formula box. &I = 0. Click Statistics. click Regression. Click the dependent variable from the variable list and move it to the Dependent box. and so on.204 CHAPTER 6 Display 6. one with fewer explanatory variables. by using all subsets regression or one of the “stepping” methods. and click Linear to get the Multiple Regression dialog box. and click Linear.
All subsets regression is available in SPLUS by using the leaps function. crime with variable names as in the text.04 6.95 7.14 8.74 7.74 7. For example.Age+ S+Ed+ Ex1 LF+ M+ N+ N W+ U1+ U2+ W+X) .O 8.56 7.0 8.33 9.74 8.13 3.26 10.73 8.0 13.08 5.0 9.0 6.25 12.26 8.91 6.68 9. Explore other possible models for the vocabulary data.84 8.0 5. mates of the slope and intercept parameters in a simple linear regression are the TABLE 6.81 8.96 7.13 7.0 8.0 8. if the crime in the USA data in the text are stored in a data frame.10 6. Also show that the esti.04 5.10 9.84 4. Possibilities are to include a quadratic age term or to model the log of vocabulary size.0 11.0 19.24 4.89 .42 5.71 8.16 Four Hypothetical Data Sets.0 8. Each Containing 11 Observations for Two Variables Data Set 13 I 2 Variable 3 4 4 Observation u V y Y X Y 1 2 3 4 5 6 7 8 9 10 11 10.0 4.82 5. After fitting a model.0 8.0 8.11 7.0 7.0 8.LINEAR AND MULTIPLE REGRESSION ANALYSIS 205 Multiple regression can also be applied by using the command language with the lm function with a formula specifying what is to be regressed on what.84 6. + EXERCISES 6.16.0 14.0 12.58 8.77 12. multiple regression could be applied by using Im (R.0 8.0 8.0 8.46 6.77 9.58 5.76 7.26 4.14 8. SPLUS has extensive facilities for finding many types of regression diagnostics and plotting them..39 8.81 8.50 5. 6 2 Examine the four data sets shown in Table 6.1.47 7.15 6.0 6.
7 6.9 1.1 4.6 6.0 6.9 6.8 6.1 5.206 CHAPTER 6 TABLE 6.20 0.2 1.3 2.08 TABLE 6.4 0.84 0.38 0.4 1.71 0.45 0.9 2.1 9.0 2.8 .9 2.2 2.9 4.61 0.O 5.7 2.2 6.36 0.54 0.18 Marriage and Divorce Rates per 1000 per Year for 14 Countries Marriage Rate Divorce Rate 5.O 2.0 2.47 0.26 0.4 6.8 5.16 0.17 Memory Retention t P 1 5 15 30 60 120 240 480 720 1440 2880 5760 10080 0.3 5.0 3.56 0.
08 3.38 2. 6. p .94 2.28 59.86 3.08 83.07 same for each set.1 1 3.01 2.15 80.LINEAR AND MULTIPLE REGRESSION ANALYSIS TABLE 6. For example.18 3. in a oneway analysis of variance.89 1.53 54.66 4.02 2.21 2.91 45.07 2.62 1.23 47.) 6.91 1.79 45 27 102 39 41 70 72 41 71 56 88 13 29 39 10 15 46 57 26 14 45 19 9 No Yes No No No Yes No No No Yes Yes No No Yes No Yes Yes No Yes No No Yes Yes 34.64 57.59 37. Plot the data (after a suitable transformation if necessary) and investigate the relationship between retention and time by using a suitable regression model. The measurements were taken five times during the first hour after subjects memorized a list of disconnected items. t (minutes).51 3. Plot some suitable diagnostic graphics for the four data sets in Table 6.4.3 1 2.11 2. ANOVA models are usually presented in overparameterized form.32 2.30 2.78 3.87 57.23 39.63 3.81 49.16 1.63 4. 6. As mentioned in Chapter 3.64 68.81 1.54 4.47 77.41 2.35 2.53 67.5.33 2.32 4.98 4.13 4.17 give the average percentage of memory retention.21 3.59 76.72 4. (This example illustrates the dangers of blindly fitting a regression model without the use of some type of regression diagnostic.3.65 42.7 1 3. and then at various times up to a week later. the constraint .87 2.77 3.87 1. Find the value of the multiple correlation coefficient for each data set.90 2.93 3.38 49.12 2.67 78.16. measured against passing time.70 4.07 45.99 2.44 65.3 1 3.07 44.19 Quality of Children’s Testimonies 207 Age Gender Location Coherence Maturity Delay Prosecute Quality 56 56 56 56 56 56 56 56 89 89 89 56 56 56 89 89 89 89 56 89 89 89 56 Male Female Male Female Male Female Female Female Male Female Male Male Female Male Female Male Female Female Female Male Male Male Male 3 2 1 2 3 3 4 2 3 2 3 1 3 2 2 4 2 3 4 2 4 4 4 3.11 36. The data shown in Table 6.61 3.00 2.
including selecting the best subset of explanatory variables and examining residuals and other regression diagnostics. or an interview room specially constructed for children). what are 3 the parameter estimates in this case? 6.6.18 shows marriage and divorce rates (per lo00 populations per . and produce a similar index plot for the values of Cook's distance statistic. Pay careful attention to how the categorical explanatory variables with more than two categories are coded. 0 is often introduced to overcome the problem. . examine plots of the diagonal elements of the hat matrix against observation number. the overpa= rameterization in the oneway ANOVA can also be dealt with by setting one of the atequal to zero. The variables are statement quality. and whether or not the case proceeded to prosecution. the delay between witnessing the incident and recounting it. Derive the linear regression equation of divorce rate on marriage rate and show the fitted line on a scatterplot of the data. How much conviction do you have in each prediction? 6. school. Do these plots suggest any amendments to the analysis? 67 Table 6. On the basis of the regression line. age. For the final model used for the crime rate data. predict the divorce rate for a country with a marriage rate of 8 per lo00 and also for a country with a marriage rate of 14 per 1o00. The data shown in Table 6. In terms of group means.8. child's gender. year) for 14 countries.208 CHAPTER 6 Cf=. how coherently the child gave evidence. the location of the interview (the child's home. and maturity.. a formal interviewing room. Carry out a multiple regression of the fruit fly data that is equivalent to the ANOVA model with a = 0. However. Carry out a complete regression analysis on these data to see how statement quality depends on the other variables.19 are based on those collected in a study of the quality of statements elicited from young children.
variables). which are usually given in different orders to different subjects. concentrating 209 . largely because of its increasing importance in clinical trials (see Everitt and Pickles. on several different occasions. there is no possibility of randomizing the“occasions:’ and it is this that essentially differentiates longitudinal data f o other repeated measure situations arising in psychology. on to With longitudinal data it is very unlikely that the measurements taken close to one another in time will have the same correlation measurements made at more as widely spaced time intervals. combinations of or conditions. The special structureof longitudinal data makes the sphericity condition described in Chapter 5. Here we shall cover the area only briefly. The analysis of longitudinal data has become something a growth industry of in statistics. INTRODUCTION Longitudinal data were introduced as a special case of repeated measures in on Chapter 5.1. and methods that depend it for their validity. very difficult justify. rm where the subjects are observed under different conditions. Such data arise when subjects are measured the same variable (or. in some cases. solely from the passing of time. Because the repeated measures arise.7 Analysis of Longitudinal Data 7. in this case.2000).
(3) time to maximum (minimum) response. When more than groups are series present.) The procedure is straightforward but has a number of serious flaws and weaknesses. is in the seriesof separate tests provide little information about the longitudinal development of the mean response profiles.2). an there of particular importance. the analysis of differences levelsof the between subject factor(s) is reduced to a simple (two groups) or an t test analysis of variance (more than two groups or more than a single between sub . this occasionbycompared as with the salsolinol data two occasion approach would involve a oft tests. slope of regression line of response on time.2. of for However. x2.g.. first the question why the observations each occasion should not be separately analyzed has to be addressed.In addition. because the real concern likely to involve something more global. calculate S given by . a fixed percentage of baseline). (Alternatively. The chosen summary measure to be decided on before the analysis the data begins and should. be relevant to the of of particular questions that are of interest in the study. 7. RESPONSEFEATUREANALYSIS: THE USE OF SUMMARY MEASURES A relatively straightforward approach to the analysis of longitudinal data to first is transform the (say) repeated measurements each subject a single number T for into considered to capture some important aspect a subject’s response profile. the separate significance tests do not given overall answer to whether or not is a group difference and. course. making of interpretation difficult. regressionbased modeling procedure.see Chapter 8. example. . of be some distributionfree equivalent might be used. The occasionbyoccasion procedure also assumes that each repeat measurement is of separate interestin its own right. a succession marginally significant group For of mean differences might be collectively compelling the repeated measurements if are only weakly correlated. The first is that the series of tests performed are not independent one another. two groups of subjects are beiig When in Chapter 5 (seeTable 5. . that of is.(2) maximum ( i i u ) value. This is unlikely in general. but much less convincing is a patternof strong if there correlations. for each subject. a series of oneway analyses variance would required. particular. and (4) (5) time to reach a particular value (e. in the Having identified a suitable summary measure.21 0 CHAPTER 7 on one very simple approach and one more complex. XT are the repeated measures the subject andf represents S has the chosen summary function.. for where X I . Commonly used summary mnm m measures are (1) overall mean.do not provide a useful estimate of the overall “treatment” effect.
it does appear that subjects who are severely dependent on alcohol may tend to have a somewhat higher salsolinol excretion level. when a more pragmatic viewis taken of the lack of symmetry of the confidence interval around zero. so perhaps the most obvious summary measure is simply the average response to use 7. (Because the data are clearly very skewed. 1.2 result from an experiment g ent diets. rats with different numbers of repeated measurements canall contribute to this as the sumthe analysis. Alternatively. and after a settling in period their bodyweights (inrams)were recorded weekly for nine weeks. and again using the mean chosen mary measures. However. simplest answer to the missing values The problem is just to remove rats with such values from the analysis. and so. This would it leave 11 of the original 16 rats for analysis.1 of each subject over the four measurement occasions.) We first need to ask. Adopting approach. in Now let us examine a slightly more complex example. An alternative to simply removing rats with any missing values calculate is to the chosen summary measure from theavailable measurements on a rat. Before on the response feature approachapplied to these data.ANALYSIS OF LONGITUDINAL DATA 21 1 factor. Bodyweight was also recorded once before the diets 7.2) because began. the distributionfree equivalents might if there is any beused in evidence of a departure from normality the chosen summary measure. This approach may be simple but is not good! Usingit in this case would mean an almost 33% reduction in sample sizeclearly very unsatisfactory. The confidence interval contains the value zero. leads the valuesin Table 7. Table gives the values of this summary measure for each subject and also the result of constructing a confidence intervalfor the differencein the two groups. How should the missing observations dealt with? be We shall leave aside the first of these questions the moment and concentrate for on only the nine postdiet recordings.3 on which tobase the analysis. no claim canbe made that the data give any evidenceof a difference between the average excretion levels of those people who are moderately dependent and those who are severely dependent on alcohol. What is a suitable summary measure for these data? Herethe difference in average excretion level is likely to be one question of interest. Some observations are missing (indicted by NA in Table as of the failure to record a bodyweightparticular occasions intended. The data given in which three groups of rats were put on differTable 7.two possible complications is have tobe considered. The to results of a oneway analysis of variance these measures. What should be done about the prediet bodyweight observation taken on day l? 2. it will be applied to the salsolinol given in Chapter 5. andthe Bonferonni of .) As a first example the useof the summary measure approach to longitudinal of data data. In this way. strictly. we shall analyze the logtransformed observations.
2301 Note. (It should be noted here that when the number available repeated measures of differs from subject to subject in a longitudinal study.3438).37 00 .17 0.2873.5 04 .05 0.0552 0.212 CHAPTER 7 Summary Measure Results for Logged Saldiiol Data (log to base IO used) TABLE 7.4. (The Scheffk procedure gives almost identical resultsfor these data. but the means for the latter do notdiffer. the calculated summary .28 0.0858)f xs X (1/6+ l 8 ” /)’ when S* is the assumedcommon variance in thetwo groups and is given by S’  = 5 x 0.1186’ +7 x 0.) Clearly the mean bodyweightof Group 1 differs from the mean bodyweight of Groups 2 and 3. 8+62 These calculationsl e a d t the interval (0. Log to base 10 is used.19 0.1 Average Response Each Subject for Gmup Subject Avemge Response Group 1 (moderate) Group 2 (severe) l 2 3 4 5 6 1 0. o multiple comparisonstests.2301 0.20 00 .52 0.0858 n 6 8 0.19 0.19802 = o.1980 0. are given in Table 7.02 0.6 Deviations Mean 0.06 0.1 0.1186 0.08 2 3 4 5 6 7 8 Means andSt&d Gmup Moderate Severe SD 0. The 95% confidence interval for the diffennce between the two groups is (0.
and Little. Diet bodyweight on day 1. 1992. POSTan analysis that ignores the baseline values available and analyzesonlythemeanofthepostdietresponses(theanalysisreportedin Table 7. those readers courageous enoughtackle a little mathematics. Bodyweights of Rats (grams) DY O Gmup 21 3 I 8 IS 29 22 36 43 50 57 64 l 1 1 240 225 255 245 1 1 1 1 1 2 2 2 2 3 3 3 3 250 230 250 255 260 265 275 255 415 420 445 560 465 525 525 510 255 230 250 260 NA 262 240 258 240 255 255 270 260 260 425 430 450 565 475 530 530 520 255 265 270 275 270 268 428 440 NA 262 265 270 275 NA 270 438 448 455 590 481 535 545 530 580 485 533 540 NA 265 268 273 271 274 265 443 460 455 591 493 540 546 266 243 261 270 NA 278 276 265 442 458 45 1 595 493 525 538 535 538 265 238 264 274 276 284 282 273 456 415 462 612 507 543 553 550 272 247 268 273 278 219 NA 274 468 484 466 618 518 544 555 553 218 245 269 215 280 28l 284 278 478 4% 472 628 525 559 548 NA Note. in a of if the average response over time is the chosen summary (as in the two examples above). 2 3. Lazaro. now need to consider if and we as a how to incorporate the prediet bodyweight recording (known generallybaseline measurement) into an analysis. However. after the measurement of measures areliely to have differing precisions. there are three possible methods of analysis prison and Pocock. The analysis should then.) Having addressed the missing value problem. 2. For example. Diet 1. Diet 3.ANALYSIS OF LONGITUDINAL DATA TABLE 7.4). This was not attempted in the rat data example because it involves techniques outside the scopethis of text.7. missing value: 1. . 1992). Such a pretreatment value can be used in association with the response feature approach number ways. 1. ideally. Diets begin . of take this into account by using some form weighting. U A . for to is a description of a statistically more satisfactory procedure given in Gombein.
8 444.0 263.9 239. 269 6.7 ANOVA of Means Given i Rble 73 n .0 272.6 2 119738 239475 1337 13 17375 Bonfenoni Multiple Comparisons <.4 25.2 211 6. Soume DF ss MS F P Group Error 89. TABLE 74 .4 .001 Comparison Estimate SE 285 223.0 39.4 456.1 Group 2 2 3 4 Group 3 1 2 3 4 457.4 577 3.214 Means of Available Posttreatment Observations for Rat Data CHAPTER 7 TABLE 7 3 Gmup Rat AVE.9 495.4 @g3 324 111 120 0' 21 31. 206 7.9 593. Bodyweight Group 1 1 2 3 4 5 6 7 8 1 262.5 22.9 Bound Lower UpperBound glg2 g1g3 22.7 534. 542. 276.1 267.
if available. groups that tially differ through the existence transient phenomena. CHANGEan analysis that involves the differences of the mean postdiet weights and the prediet value. for example. in the absence a diet effect. Consequently. This refers to the process that occursas transient componentsof an initial measurement are dissipated over time. that when groups are formed by randomization. expected to be equal the differences of is to in the means of the prediet values. such some forms of as of measurement error. The CHANGE approach corresponds to the assumption that the difference at outcome. 1994b). an that Frison and Pocock (1992) compare the three approaches and showanalysis of covariance is more powerful than both the analysisof change scores and analysis of posttreatment means only.2. as a function of the mean difference prediet. (Using the mean of several baseline valof covariance even more efficient. nine with posttreatment measures. Selection of highscoring individuals for entry into a study. Figures7. Thus.ANALYSIS OF LONGITUDINAL DATA 215 2. Remeasurement during the study will tend inito show a declining mean value for such groups. Analysis of covariance (ANCOVA) allows for such differences. necessarily also selects for individuals with high values any transient compoof nent that might contribute to that score.3 show some examples for the situation two treatment groups. of the at in order to judge the the factor whichthe observed outcome requires correction by diet effectis also zero. makes the analysis moderate correlations between the repeated mean. 3. ANCOVAhere between subject variation in the prediet measurement is as taken into account by using the prediet recording a covariatein a linear model for the comparisonof postdiet means. and 7. Hence. the expected value mean difference outcome is zero. there are if ues. for example. and varying degrees correlation between repeated of the .) The differences between the three approaches can be illustrated by comparing power curves calculated by using in the results given Frison andPocock (1992). although the description of transient components may bring about the regression to the mean phenomena suc as those previously described.The use of ANCOVA allows for some more general system of predicting what the outcome difference would have been in the absenceof any diet effect. for example. Each method represents a different way of adjusting the observed difference POST. Senn (1994a.L7. will show a tendency to have converged on remeasurement. in the absence a difference then of between diets. Randomization ensures only that treatment groups are in terms of expected similar values and may actually differ not in transient phenomena but alsomore so just in permanent componentsof the observed scores. the extent of regression and the mean value to which separate groups are regressing need not be expected to be the same. The difficulty primarily involves the regression tu rhe mean phenomenon. That suchan assumption is generally false is well documentedsee. apretreatment measure. relies on the fact at outcome by using the prediet difference.
observations (the correlations between pairs of repeated measures are assumed equal in the calculation of these power curves)._. with both being considerably better than POST._. CHANGE approaches ANCOVA in power. suggesting that the finds the group difference not significant at the ae It observed postdiet differences r accountable for by the prediet differences.. The results applying each CHANGE and ANCOVAthe rat data are given of of to 7. . and ANCOVA. 20 40 60 60 Sample size FIG. it can be seen that the sample size needed to achieve a particular power for detecting a standardized treatment difference 0. From these curves.5 is always lower with an ANCOVA.5.216 CHAPTER 7 Power curvesfor three methods of analyzing repeated measuredesigns 9 r I I0 Ancova Post scores .3). 7. in Table 7... (The POST results were given previously in Table Here both POST and CHANGE suggest the presence of group differences.Change scores . CHANGE is worse of than simply ignoring the pretreatment value available and simplyPOST.*.. Power curves comparing POST.1. is clear that the three groups of rats in this experiment have very different initial bodyweights. and the warnings about applyingan analysis of covariance in such 3 it cases spelled out in Chapter have to be kept in mind. might be argued that the postdiet differences in bodyweight between the three groups are . As using the correlation increases. CHANGE._. When of the correlation between pairs repeated measuresis small (0. but ANCOVA 5% level.4). Nevertheless.
2. ANCOVA... . . procedures that are described in the next section. Power cutves comparing POST. Three given by Matthews(1993) are as follows. An appropriate choice of summary measure ensures that the analysis is focused on relevant and interpretable aspects of the data..~ 2 B x2I *.. and .ANALYSIS OF LONGITUDINAL DATA 217 Power curvesfor three methods of analyzing repeated measure designs 9 . not the results of the different diets.6 l 0 20 40 60 Sample size m ~ 7.. as they generally ae an alternative is needed. 3 To some extent. When such issues of interest.. 1..I. can modated. some of which were markedly lighter than the rest? The response feature method has a number of advantages the analysisof for longitudinal data.2. ... .. why were the three diets were tried on animals..* ...OS I Rhoa. r _.."....A telling question that might be asked here is. CHANGE.. . A disadvantage of the response feature approach is that it says little about how the observations evolve over time... such as the more formal modeling r. and whether different groups of subjects are behave in a similar way as time progresses..(. missing and irregularly spaced observationsbe accom. . ...' 2L .. The method is statistically respectable.. * I Post scores Alpha=O.
218 CHAPTER 7 TABLE 7. . CHANGE. Power curves comparing POST.3.....  Anwva Post scores Change scores l I I I 0 20 40 Sample size W 80 FIG.05 I I ... .1 Group 1098 1659 Error CHANGE 2 13 <.. and . ANCOVA.ooO1 254389..9 Alphad.01 254389..5 CHANGE and ANCOVA for the Rat Data Source DF ss MS F P 9.0 1 2 12 0 Group 912 1550 Error ANCOVA Remament Power curves for threemethods of analyzing repeatedmeasure designs IRhod.0 1972. 7.
4. of identify both crosssectional and longitudinal patterns. a to and Zeger (1994). RANDOM EFFECTMODELS FOR LONGITUDINAL DATA A detailed analysis of longitudinal data requires consideration of models that represent both the level and the a group's profile of repeated measurements. Here we shall try to give the flavor only of the possibilities.6 are shown in Figures 7. random growth curve by for and models. selfcare. example. highlight aggregate patterns potential scientific interest. Fuller accounts of this increasingly important area of statistics are availablein Brown (2000). In particular.7. and there aconsiderableliterature surrounding them. The "Yeatmentof Disease Alzheimer's The datain Table 7. although do offer the following simple guidelines. and make easy the identification of unusual individuals or unusual observations.6arise from an investigation useof lecithin. multilevel models.3. and 7. but confusingly they are often referred to different names. they show as much of the relevant raw data as possible rather than only data summaries. and Prescott (1999) and Everitt and Pickles 7. and they are labeled for with the group to which they have been assigned.it is important to graph the data Table 7.Such is now models have witnessed a huge increase in interest in recent years.6. In the first. a precursor of the of choline. Traditionally.1. of slope and also account adequately for the observed pattern of dependences in those measurements. Recent work suggests that the disease involves pathological changes in the central cholinergic system. the raw data each patientare plotted. there single prescription making effective graphical displays is no for of longitudinal data. Liang. random effects models. Three graphical displaysof the datain Table 7. and personality. There is considerable variability .6 in some in informative way prior to more formal analysis. According Diggle. which it might be possible to remedy by longterm dietary enrichment with lecithin.3. in the treatment of Alzheimer's disease. including. it has been assumed that this condition involves an inevitable and progressive deterioration in all aspects of intellect. hierarchical models. the treatment might slow down or perhaps even halt the memory impairment associated with the condition.by examining one relatively straightforward example.5. for As with most data sets. A cognitive test score giving thenumber standard list was recorded monthly 5 months. Such models are now widely available.ANALYSIS OF LONGITUDINAL DATA 2 19 7. Patients suffering from Alzheimer's disease were randomly allocated to receive either lecithin or placebo for a 6month of words recalled from apreviously period.
2.6 Viii Group 1 2 3 4 5 l 1 U) 1 14 7 15 12 5 14 13 8 1 1 1 12 5 9 9 9 10 6 13 10 6 9 7 18 LO 7 3 17 9 1 1 1 1 1 9 11 5 5 7 4 8 5 10 6 15 9 9 7 16 9 6 14 12 11 12 7 8 9 4 9 12 3 10 9 1 1 l l 11 10 5 11 3 8 5 9 5 9 9 17 1 1 I 16 7 2 12 l5 14 10 12 7 1 7 14 6 14 8 1 1 5 16 2 7 2 7 1 1 1 1 2 l 9 19 7 0 7 l 11 16 3 1 3 5 7 7 0 6 5 0 4 5 10 7 2 5 2 8 6 6 10 18 5 5 12 8 17 5 6 9 1 3 11 7 5 12 7 2 2 6 2 2 I8 10 16 10 14 15 21 2 2 2 2 2 2 2 2 2 2 2 9 6 11 10 18 10 3 7 12 8 l2 8 3 4 11 19 3 14 9 14 12 19 16 21 l5 12 16 14 22 8 18 l1 1 10 3 7 3 11 10 7 17 I5 2 7 6 18 15 10 0 l8 15 3 19 4 9 4 16 5 6 10 2 2 2 2 2 2 14 l5 16 7 13 1 4 3 1 3 22 l8 17 9 16 7 16 17 22 19 19 10 6 9 4 4 9 3 13 11 6 10 20 9 19 21 Note.The Treatment of Alzheimer’s Disease TABLE 1. placebo. 220 . 1. lecithin groups.
.2 14  2I 1 I 2 3 ViSlt 4 .4. 7. 221 . 0 x 8 14 2 14  f I 0 I f t 0 0 0 0 0 0 X 2 8 t Q 8 8 0 4 8 0 0 1 2 3 WSR 5 FIG. showing fitted linear regressions of cognitive scoreon visit. Plot Of individual patient profiles for lecithin trial data the in Table 7.5. 7. I 5 FIG.6.14 . Plot of individual patient data from the lecithin trial for each treatment group.
score that includes. lecithin). l. 3.7 shows such a plot the repeated mea2. i C i j k . A further graphic that is often useful for longitudinal data is the scatterplot matrix introduced in ChapterFigure 7. as the explanatory variables. 7. and the fitted linear regression of cognitive score on visit. orof worrying differencesin the variabilityof the observations on each visit. and the cijkare for cr2. although patientslecithin appear have some increase their on to in 7. for example. placebo. Yijk =h f k h x s i t j f BzGroup. with each pointby group (0. in groupi .  (7. some pairs of measurements are more highly related than others. in these profiles. error terms assumed normally distributed with mean zero and variance Note .6 for each group and each visit give little evidenceof outliers. observation appears to be confirmed by Figure This which shows each group’s data. Visit (with values1. for this Now let us consider some more formal models example.2) where Y i j k is the cognitive score subject k on visitj. 1.222 CHAPTER 7 T 14 2 i FIG. cognitive score over time. Finally the box plots given in Figure 7. placebo.6. That is. We might begin by ignoring the longitudinal character of the data and use a multiple regression model as described in Chapter We could. for surements in the lecithin trial data. labeled lecithin). fit a model for cognitive 6. 2.4. 5) and Group (0. Box plots of lecithin trial data. Clearly.5.
placebo. showing treatment group (0. . 1.7. lecithin).W N N HG. Scatterplotmatrix o the cognitive scoreson the five visits o the lecithin f f trial.7.
001.1 to the lecithin data are given in Table 7.7. How can the model be improved? One clue is provided by the plot of the confidence intervals for each patient’s estimated intercept and slope in a regression of his or her cognitive score on Visit.93Bo(Intemept) 2.1.056 8. F for testing that all regression coefficients zero are is 14 on 2 and 232 degrees of freedom. that this model takes no account of the real structure of the data. The results suggest both a significant Visit effect. and a n significant Group effect. the observations made each visiton a subject are assumed independenteach at of other.107.2) q to the Data in Table7. The resultsof fitting the model specified in (7.A suitable way to model such variability introduce random effect terms.8.803 0. shown in Figure 7.640 <.2) is quite unrealistic. for see in that is.224 CHAPTER 7 Results of Fitting the Regression Model Described in E .001 .636 &(Group) 0. R* = 0. This arises because model the used to obtain results Table the in 7. In particular.029 4. and in Display 7.1 although inaoduced in a somewhat different manner. p value is <. The implicationsthis model the relationship between the repeated of for measurements is also describedin Display 7.(7.6 Q.2) to the datain Table 7. in the Figure 7.224 W 4. .001 Note. This suggests that certainly the intercepts and possibly also the slopes the patients of is to vary widely.802 6. the repeated measures. Note that the standard error for the Group regression coefficient is greater in the random effects model than in the multiple regression model (seeTable 7.7).2. are given i Table 7. (7.6 TABLE 7.7 Estimate Parameter SE t P 0.8.7 clearly demonstrates that the repeated measurements lecithin trial have varying degrees of association so that the model in Eq. we that the model Display 7. in particular.494 3. but now written in regression terms the multiple response variables. the model in Display is essentially nothing more than the repeated measure ANOVA model introduced in Chapter 5. The results of fitting the model in Display 7. The results given there show that 7.1 only allows for a compound symmetry pattern of correlations for the repeated mesurements. and the reverseis the casefor the regression coefficientof visit.199 Pi i t ) 0.1 a model where this is done for the intercepts only is described.
AIC = 1197. is unlikely to capture the true correlationpattern suggested by Figure 7.9. For the two random effects models considered above.8. Note that this model allows for a more complex correlationalpattern between the repeated measurements. 7. Random intercepts only. Random effects models can be compared in a variety of ways. 7. . one of which is described in Display 7. ignores the repeated measures aspect of the data and incorrectly combines the between group and the within group variations in the residual standard error. Confidence Intervals for Intercepts and slopes o the f linear regression o cognltlve score on visit for each Individual f patlent In the leclthln trial. Eq.3. I 5 0 5 10 15 20 2 0 2 4 FIG. AIC = 1282. The model in Display 7.2. The results of fitting this new model are given in Table 7. implying as it does the compound symmetry of the correlations between the cognitive scores on the five visits.7. Random intercepts and random slopes.1. One obvious extension of the model is to allow for a possible random effect of slope as well as of intercept. the AIC (Akaike information criterion) values are as follows. and such a model is described in Display 7.ANALYSIS OF LONGITUDINAL DATA 225 .2.
YIU. (7. where the are random effects that model shift in the intercept for eachsubject.'Z' : where I is a 5 x 5 identity matrix.2).2) to include a the in random intercept term each for subject: Yijk = (Bo+ a d +BlMsitj +B2Gmupi + Eijk. (7.1.1. YiU. Between subject variability in theintercepts is now modeled explicitly. because there a fixed change for is The ak are assumed to be normally distributed with mean and variance 6. Z = [l. l 4 Group. This is simplythe compound symmetry pattern describedin Chapter 5. l]. This reduces to + 021.226 C W E R7 Display 7. The presence of the random effect that is common to each visit's observations for of subject k allows the repeated measurementsto have a covariance matrix the following form: I = Zd. . &l. The model can be written in matrix notation as Ylk = &p +zak +Elk+ where p ' fik = b i l k .'. yi4k9 y i d = [ B o * 81. 1. The model can befitted bymaximum likelihood methods.as described in Brown and Prescott (1999). are preserved for all values of visit. the visit. zero A l l other terms in the model are as in Eq. ' 4 k = [€ilk$ CiU. 1 5 Group. which. El3kt ElSkl.1 Simple Random EffectsM d l for the LecithinTrial Data oe The model is an extension of multiple regression model Eq.
The term bk shifts the slope for each subject. Again.916 0. A covariance . Display 7.1 to the Data in Table 7. k The random effects are not necessarily assumed to be independent.133 1.015 Note.8 Results of Fitting the M d l Described in Display 7.1. Random effects: 8. the model can be written in matrix notation as Yik = &p + zbk + €itr where now b = k .1. bkl. A further random effect has been added compared with the model outlined in Display 7. The random effects are assumed to be from a bivariate normal distribution with both given by means zero and covariance matrix. .56 3.001 . = 3. parameter.055 0.6 oe 227 Pammeter Estimate SE t P Bo(InteEep0 pl(Visit) &(Gmup) 6.001 <.89.ANALYSIS OF LONGITUDINAL DATA TABLE7.205 7.87.54 <. The b are assumed to be normally distributed with mean zero and variance uz. Between subject variability in both the intercepts and slopes is now modeled explicitly..2 Random Intercept. ub is allowed in the model.494 3.70 2. Random Slope M d l for the Lecithin Trial Data oe The model in this case is YiJt = ( P O + uk) + (PI + bk)%itJ + & h u p i + CiJt. 8 = 2. The model implies the following covariance matrix for the repeated measures: x = ZIZ’+ U*I. All other terms in the model are as in Display 7. .927 0. I.
591 0. see Brown and Prescott(1999) for details.107 0. d.494 3. the 95% confidence interval is (1. The plot does not appear to contain any .774 1. As with the multiple regression model considered in Chapter the fitting of 6.226 1. is This conclusion is reinforced by examination Figures 7.76.228 CHAPTER 7 TABLE 1. because of the nature of the dataseveral residuals are available for each subject. Display 7. the first model does not capture the varying slopes of the regressions of an individual's cognitive score on the covariate visit. however. Random effect% 3.. = 1. n Out of the models being compared.3 00 . and the second of which shows the corresponding predictions from the random intercepts and slopes model.77. = 6. a = 1.203 5. It appears that the model in Display 7.2 to the Data in "hble 7. the first of of which shows the fitted regressions for each individual by using the random intercept model.9 and 7.6 Estimate Parameter SE r P p&tercept) B WW 1 BzGmup) 6.18).3 Efcs Comparing Random fet Models for Longitudinal Data Comparisons of random effects models for longitudinal data usually involve as their basis values of the loglikelihoods of each model. Various indices are computed thattry to balance goodness of with the number of fit achieve the fit: the searchfor parsimony discussed in Chapter parameters required to 1 is always remembered. Clearly. The loglikelihood associated with a model arises from the parameter estimation procedure.37.W3 Note. One index is the Akaike Information Criterion. Thus here the plot of standardized residuals versus fitted values from the random intercepts and slopes model given Figure 7. b = 0. is the numberof parameters inthe model. defined as MC = 2L +2n. The number words recalled in the lecithin treated group about of is 4 more thanin the placebo group.11 includes in a panel for each subject in the study.2 to be preferred for these data.9 Results of Fitting the ModelDescribed in Display 7.137 4 0 1 . the with the lowest AIC valueto be one is preferred.6. random effects models to longitudinal data to be followed an examination has by of residuals to check violations of assumptions.185 3. The situation more complex for is now.22.43. where L is the loglikelihood and .954 2.10.
229 .
230 .
23 1 .
232 .
The response feature. Longitudinal data are common many areas. click Mixed effects. andthey require particular care in their analysis.Normal probability plots of the random intercept and random slope terms the second model fittedto the for lecithin data are shown in Figure 7.for example. A more detailed analysis of longitudinal data requires the use of suitable models suchas the random effects models briefly described in the chapter. Click Statistics. 4.12.for details). but not simplistic. of . It may also be worth looking at the distributional propertiesof the estimated random effects from the model (such estimates are available from suitable random effects software such as that available in SPLUS).4. there seem to be no particularly worrying departures from linearitythese plots.ANALYSIS OF LONGITUDINAL DATA 233 particularly disturbing features that might give cause concern with the fitted for model. Again. and the Mixed Effects dialogbox appears. 6. if available. Specify data set and details the model thatis to be fitted. 5. 1. can be incorporated into a summary measures analysisin a number ways. procedure analysis that can accommodate irregularly spaced for observations and missing values. 2. The summary measure approach to the analysis longitudinal data cannot of give answers to questions about the development response over time of the or to whetherthis development differs in the levels any between subject of factors. but they can be extended to deal with other types of response variable. Pretreatment values. In this chapter the models have been considered only continuous. by means of the following route. of which being used a covariate in an of as ANCOVA is preferable. COMPUTER HINTS SPLUS In SPLUS random effects models are available through the Statistics menu. Such models have been extensively developed over the past 5 years or so and are now routinely available in a number of statistical software packages. 3. 1999. 2. includingthe behavioral sciin ences. norfor mally distributed responses. and click Linear. those that are categorical (see Brown and prescott. in 7. summary measure approach provides a simple. SUMMARY 1.
and Cognitive score. to obtain the plots of predicted values from the two models fitted (see Figures 7. number of observations are missing (indicated by 9). and many respects this to be preferred because many useful graphic procedures can then also be used. then the two random effects models consideredthe text can be fitted and the results for in saved further analysis using the following commands. Women who had suffered an episode of postnatal depression were randomly allocated to two groups. Take care over coding the a group to which rat belongs.data=lecit) Many graphics are available for examining fits..random=llSubject. we can use the following commands.9 and 7.random=VisitlSubject. 7. Group label.fit2).6)are stored as a data frame. plot(augPred(lecit. The data in Table 7. Fit a random effects model with a random intercept and a random slope.2. . the members of one group received an estrogen patch.fitl). plot(augPred(Jecit. which was recorded on two occasions prior to randomizationand for A each of 4 months posttreatment. The dependent variable was a composite measure of depression.method =“ML”. and suitable fixed effects to the rat data in Table 7.grid=T). and the members of the other group received a du patchthe placebo.10).2. lecit.3.aspect= ‘ky”.grid=T). Several have been used in the text.234 CHAPTER 7 Random effect models can also be fitted by using the lme function in the in is command language approach.B < lme(CognitiveVisit+Group.method =“ML”.6. andso on. For example. if the le trialdata (see Table 7. For example.data=lecit) 1ecit. by lecit. using as your summary measure the maximum number of words remembered on any occasion.aspect= ‘ky”.fl < lme(CognitiveVisit+Group.10 arise from a trial of estrogen patches the treatin ment of postnatal depression. EXERCISES 71 Apply the summary measure approach to the lecithin data in Table 7. . 7. each row of which gives Subject number. Visit number.
ANALYSIS OF LONGITUDINAL DATA TABLE 7. Which of 2. A 9 indicates a missing value. 1. or 3 would you recommend. carry out an ANCOVA to assess posttreatment difference in the two groups. 3. active groups. Find a suitablerandom effects model for the data Table 7. Calculate the change score defined as the difference in the mean of posttreatment values and the mean of pretreatment values. compare the posttreatment measurements of the two groups by using the response feature procedure. 4.10 Estrogen Patch Til Data ra Wit 235 Baseline Gmup I 2 I 2 3 4 0 0 17 18 26 25 19 19 26 0 0 0 20 0 16 0 0 28 0 0 1 1 1 1 1 1 1 1 1 1 24 18 27 16 17 15 18 1723 17 17 14 12 14 15 17 9 13 4 18 9 23 22 28 24 28 25 19 13 2 6 1 3 27 24 10 11 13 8 9 9 9 8 27 18 21 27 24 9 21 27 15 15 28 19 24 14 15 8 12 12 17 12 14 5 16 7 1 19 15 9 15 10 13 8 7 9 14 12 9 10 7 12 14 l1 17 21 18 24 7 21 17 20 18 28 21 6 9 7 I1 3 3 8 7 1 7 8 6 12 2 2 6 Note.4. 0. Usingthemean of thepretreatmentvalues as a covariate andthemean of posttreatment values as the response. 2. Using the meanas a summary. . 1.10 and interpret in the results from the chosen model. and why? 7. and test for a difference in average change score in the two groups. placebo.
ua.236 CHAPTER 7 7. Fit a model to the lecithin data that includes randomeffects forboth the intercept and slope of the regression of cognitive score on Visit and also includes fixed effects forGroup and Group x Visit interaction. b .6.implicitly in t e r n o the parameters U. find the terms in the implied covariance matrix of the repeated measures. 7.2. For the random intercept and random slope model for the lecithin data described in Display 7.5. and f a .
1. for example. to normal based methods are sufficiently powerful to make considerationof alternatives unnecessary. How well these procedures operate outside the confines of this normality conin straint varies from setting setting. Many people believe that most situations. One classof tests that do not rely on the normality assumptions are usually referred to as nonparametric or distribution free.) Distributionfree procedures are generally based on ranks of the raw data and the of do are usually valid over a large class distributions for the data. Although slightly less efficient than their are normal theory competitors when the underlying populationsnormal. although they often assume distributional symmetry. because many are widely quoted in the psychological literature. They may have a point. r tests and F tests. alternatives normal based methods have been proposed and to it is important for psychologists to be aware of these. (The two labels are almost synonymous. and we shall use the latterhere. have been shown be relatively robustto departures from the normality assumpto tion. theyare 237 . Nevertheless. INTRODUCTION The statistical procedures describedin the previous chapters have relied in the main for their validity on an underlying assumption of distributional normality.8 DistributionFree and Computationally Intensive Methods 8.
238
CHAPTER 8
often considerably more efficient when the underlying populations are not normal. A number of commonly used distributionfree procedures will be described in Sections 8.28.5. A further class of procedures that do not require the normality assumption are those often referred to computationally intensivefor reasons that will become as 8.6 apparent in Sections and 8.7. The methods to be described in these two sectio use repeated permutations of the data, or repeated sampling from the data to generate an appropriate distribution for a statistic under some null hypothesis test of interest. Although most of the work on distributionfree methods has concentrated on developing hypothesis testing facilities,is also possible to construct confidence it as is intervals for particular quantities of interest, demonstrated at particular points throughout this chapter.
8.2. THE WILCOXONMANNWHITNEY TEST AND WILCOXON‘S SIGNED RANKS TEST
Fmt, let us deal with the question of what’s in a name. The statistical literature ways refers to two equivalent tests formulated in different asthe Wilcoxonranksum test and theMunnWhimey tesr. The two names arise because of the independent development of the two equivalent tests by W1lcoxon (1945) and by Mann and Whitney (1947). In both cases, the authors’ was to come with adistributionaim up free alternative to the independent samples t test. The main points to remember about the WdcoxonMannWhitney test are as follows.
1 The null hypothesis to tested is that the populations being compared . be two have identical distributions. (For normally distributed populations with two to common variance,this would be equivalent the hypothesis that the means of the two populations are the same.) 2. The alternative hypothesis that the population distributions differ in locais tion (i.e.. meanor median). 3. Samples of observations available fromeach of the two populations bein are compared. 4. The test is based on the joint ranking of the observations from the two samples. 5. The test statistics is thesum of the ranks associated with one sample (the sums is generally used). lower of the two
8.1. Further detailsof the WdcoxonMannWhitney test are given in Display
COMPUTATIONALLY INTENSIVE METHODS
Display 81 . WilcoxonMannWhitney Test
239
Interest lies in testing the null hypothesis that two populations have the same probability distribution but the common distribution is not specified. The alternative hypothesis is that the population distributions differ in location. We assume that a sample of R I observations, XI, x2, . ,x,,, ,are available from the first population and a sample of nz observations, y ~ y2, ,ynl, from the second , population. The combined sample of nl n2 observations are ranked and the test statistic, S, is the sum of the ranks of the observations from one of the samples (generally the lower of the two rank sums is used). If both samples come from the same population, a mixture of low. medium, and high ranks is to be expected in each. If, however, the alternative hypothesis is true, one of the samples would be expected to have lower ranks than the other. For small samples, tables giving p values are available in Hollander and Wolfe (1999). although the distribution of the test statistic can also now be found directly by using the pennurational approach described in Section 8.6. A large sample approximation is also available for the WilcoxonMannWhitney test, which is suitable when nl and nz are both greater than 15. Under the null hypothesis the statistic Z given by
..
...
+
z=
S

nl(nl
ClnZ(n1
+ n2 + 1)/121'/2
+nz
+ 1)/2
has approximately a standard normal distribution, and a p value can be assigned accordingly. (We are assuming that nl is the sample size associated with the sample's giving rise to the test statistic S.) If there are ties among the observations, then the tied observations are given the average of the ranks that they would be assigned if not tied. In this case the large sample approximation requires amending; see Hollander and Wolfe (1999) for
details.
As our first illustration of the application of the test, we shall use the data shown in Table 8.1, adapted from Howell (1992). These data give the number of recent stressful life events reported by a group of cardiac patients in a local hospital and a control group of orthopedic patients in the same hospital. The result of applying the WilcoxonMannWhitney test to these data is a rank sum statistic of 21 with an associated p value of 0.13. There is no evidence that the number of stressful life events suffered by the two types of patient have different distributions. To further illustrate the use of the WilcoxonMannWhitney test, we shall apply it to the data shown in Table 8.2. These data arise from a study in which the relationship between childrearing practices and customs related to illness in several nonliterate cultures was examined. On the basis of ethnographical reports, 39 societies were each given a rating for the degree of typical and socialization anxiety, a concept derived from psychoanalytic theory relating to the severity and rapidity of oral socialization in child rearing. For each of the societies, a judgment
2. 40
CHAPTER 8
Number of Stressful Life Events Among Cardiac
and Orthopedic Patients
TABLE 8.1
Cardiac Patients(C)
32872950
Orthopedic Patients (0)
12436
Observation
Rank
0
2 3 4
1
5 6
7 8 29 32
10
1 2 (0) 3 (0) 4 (0) 5 (0) 6 7 (0) 8 9
If
Note. Sum of “0ranks” is 21.
Oral Socialization and Explanations Illness:Oa of rl Socialization Anxiety Scores
TABLE 8.2
Societies in which Oral Explanations Illness are Absent of
67777788910101010121213
Societies in which Oral Explanationsof lliness are Present
688101010111112121212131314141415151617
was also made (by an independent set of judges) of whether oral explanations of illness were present. Interest centers assessing whether oral explanations on of illness aremore likely to be present in societieswith high levels of oral socialization anxiety. This translates into whether oral socialization anxiety scores differ in two of location in the types of societies. The result the normal approximation test p described in Display 8.1 is a Z value of 3.35 and an associated value of .0008.
COMPUTATlONALLY INTENSlVE
METHODS
Display 8.2
24 1
Estimator and Confidence Interval Associated the WilcoxonMannWhitney Test with The estimator of the location difference, of the two populations the median of A, is the nl x nz differences, yi  x i , of each pair of observations, madeup of one observation from the sample and one from the second sample. first Constructing a confidence interval forA involves finding the two appropriate values
among the ordered n l x n2 differences, y xi. , For a symmetric twosided confidence interval forA, with confidence level 1 a, determine the upper 4 2 percentile point, S+ of the null distribution of S, either from the appropriate tablein Hollander and Wolfe (1999),Table A.6. or from a permutational approach. Calculate


+1
su/2.
The 1  confidence interval (At,A) is then found from the values the C, a , in (A,) and the nlnz + 1 C. (A") positions in the ordered sample differences.

There is very strong evidence of a difference in oral socialization anxietyin the two types of societies. The W1lcoxonMannWhitney test is very likely to have been covered in the introductory statistics course enjoyed (suffered?) by many readers, generally in the context testing a particular hypothesis, described and illustrated above. of as It is unlikely, however,if such a course gave any time to consideration of associated estimation and confidence interval construction possibilities, although both remain as relevanttothis distributionfree approach, they are in applicationstests and as oft the similar parametric procedures. Details estimator associated with rank sum of an ae r 8.2, test and the constructiona relevant confidence interval given in Display of and the results of applying both to the oral socialization anxiety data appear in .. for Table 8 3 The estimated difference in location the twotypes of societies is 3 units, with a95% confidence interval 26. of When samples from the two populations are matched in some way, the or paired wilcoxon's signedranks test. Details of appropriate distributionfree test becomes r .. of the testa e given in Display 8 3 As an example of the usethe test,it is applied to the data shown in Table 8.4. These data arise from an investigation of two types of electrodes usedin some psychological experiments. The researcher was interested in assessing whether the two types of electrode performed similarly. t for Generally, a pairedtest might be considered suitable this situation, but here a n o d probability plotof the differences the observations on the electrode of two types (see Figure 8.1) shows a distinct outlier, andthe signedranks test to be so is preferred. The results the large sample approximation is a Z value of2.20 of test with a p value of .03. There is some evidenceof a difference between electrode
242
CHAPTER 8
TABLE 8.3 Estimate of Location Difference and Confidence Interval Construction for Oral Socialization Data
DifferencesBetween Pairs of Observorions. O e Observationfrom Each Sample n
0224455666677788 899101111133445555 66677788910111334 45555666777889101 1133445555666777 88910111334455556 66777889101 113344 55556667778891020 0223344445556667 7892002233444455 56667789311 11223 3334445556678422 0011222233344455 6742200112222333 4445567422001122 2233344455674220 0112222333444556 7644221100001112 2233456442211000 0111222334575533 2211110001112234
Note. The m d a of these differences is 3. The positions in the ordered ein differrnces of the lower andupperlimits of the 95% confidence interval are 117 and 252, leading to a confidence interval of (2.6).
types. Here, mainlyas a resultof the presenceof the outlier, a paired t test givesa p value of . 9 implying no difference between the two electrodes. (Incidentally, O,
outlier identified Figure 8.1 was put down by experimenter to the excessively in the hairy arms of one subject!) Again you may have met the signedranks test on your introductory statistics course, but perhaps not the associated estimator and confidence interval, both of which are described in Display 8.4. For the electrode, data application of these procedures gives an estimate of 65 and an approximate 95% confidence interval of 537.0 to 82.5. Details of the calculations are shown in Table 8.5.
COMPUTATIONALLY INTENSIVE METHODS
W1lcoxon’s SignedRanks Test
Display 8 3
243
Assume we have two observations,x and y, on each of n subjects inour sample, i i after for example, before and treatment. We first calculate the differences z = xi  between each pairof observations. t yi To compute the Wdcoxon signedrank statistic, form the absolute values the T+, of differences,z, and then order them from least greatest. i to Now assign a positiveor negative sign to the anksof the differences according to r or whether the corresponding difference was positive negative. (Zero values are discarded, and the sample size altered accordingly.) n The statistic T+ is the sum of the positive signedranks.Tables are available for assigning p values; see Table A.4 in Hollander andWolfe (1999). A large sample approximation involves testing the statistic as a standard 2 normal:
z=
T+
[n(n
+ 1)(h+
 + 1)/4 n(n
1)/24]l’2’
If there are ties among the calculated differences, assign each the observations of r in a tiedgroup the averageof the integerranksthat a e associated with the tied group.
Skin Resistance and Electrode lLpe
TABLE8.4
Subject
Electtvde I
Elechvde 2
2 3
1
500 660
250 72 135 27 100 105 90 200
400
600
370 300 84
7 8 9 10
4 5 6
140
50
180 290 45
180
11 12 13 14
15 160 250
170
400
200
15 16
107
66
310 loo0 48
244
CHAPTER 8
l
2
1
0
1
2
Quantiles of Standard Normal
FIG. 8.1. Probability for plot differences between electrode readings for the data in "able 8.4.
Display 8.4 Estimator and Confidence Interval Associated with Wdcoxon's SignedRank Statistic
An estimator of the treatment effect, 6, is the median of the n(n 1)/2 averages of is, pairs of the n dif€erence.s on which thestatistic T+ is based, that the averages ., (zr+z,)/2forisj=l, . .n. For a symmetric twosided confidence interval for with confidence level1 a, we 6, first find the uppera/2percentile pointof the null distributionof T+, from ,,,,r Table A.4 in Hollander and WoKe (1999) orfrom a permutational computation. Then set
L
+
The lower and upperlmt of the required confidence interval now found as the iis are C, and z,/z position valuesof the ordered averages of pairs of differences.
5 165.0 487.052.5 45.0 4.0 90.5 88.5 57.5 53.0 120.5 73.5 35.0 54.5 35.5 Estimation and Confidence Interval Construction for Electrode Data 245 Averages of Pairs of Sample Diffeerences 100.3.5 20.0 4.0 3.0 50.0 512.5 32.0 7.0 150. The medianof these averages 65.0 107.0 40.5 512.0 85.0 60.0 68.0 120.5 103.0 537.0 5.0 105.5 120.5 504.5 120.0 116.5 15.0 57.0 542.5 82.0 55.0 35.0 109.0 59.0 115.0 52. DISTRIBUTIONFREE TEST FOR A ONEWAY DESIGN WITH MORE THAN TWO GROUPS Just as the WllcoxonMmWhirney test and theWilcoxon signedranks test can be considered as distributionfree analogsof the independent samples and paired samples t tests.5 60.025.0 97.0 20.0 90.0 115.5 157.0 15.0 80.5 59.0 54.0 35.5 16.0 482.0 75.5 495.0 120.0.5 25.0 512.0 90.5 4.0 15.0 10.5 112.0 75.0 140.0 527.5 40.5 80.0 9.5 5.5 1.5 75.5 30.5 62.0 4.0 35.5 9.0 145.5 102.0 82.0 537.0 82.0 127.0 417.0 35.0 12.0 542.0 45.0 20.5 105.5 90.045.5 20.0 65.5 109.0 54.05.5 65.5 8.0 40.5 53.5 79.0 30.0 66.5 48.5 103.5 417.0 527.0 437.0 116.5 20.0 130.0 15.5 45.0 442.021.0 40.0 9.0 12.0 90.0 49.0 95.5 142.0 501.0 80.5 75.0 105.5 442.0 130.0 111.0 90.0 504.0 57.0 16.0 102.5 79.5 97.0 79.0 30.0 12.0 115.0 90.5 52.5).0 437.0 15.0 157.0 60.5 90.075.5 71.5 10.0 512.0 45.0 57.0 120.0 94.5 10.5 1.0 71.0 43.0 82.0 145.0 95.0 30.0 5.0 10.5 50.leading to the interval (537.0 30.5 120. The upperand lower is limits of the 95% confidence interval are in position 7 and 107 of the ordered values of the averages.0 49.0 65.5 437. 82.5 48.5 21.0 437.5 127.0 57.5 549.5 43.0 97.0 135. 8.5 5. theKruskawallis procedureis the distributionfree equivalent of .0 73.0 79.0 65.5 98.5 152.0 115.0 105.0 152.0 85.5 107.0 60.5 7.0 482.5 73.5 55.0 10.5 3.0 15.5 12.0 30.0 54.5 66.0 Note.0 487.COMPUTATIONALLY INTENSIVE METHODS TABLE 8.0 32.0 79.0 104.5 1.0 501.0 94.0 549.0 15.5 59.5 50.0 127.0 934.0 60.5 73.0 14.5 9.0 135.5 98.0 15.5 8.5 52.0 10.5 62.5 5.5 15.0 79.0 142.5 45.5 14.080.0 15.0 1.5 495.0 112.5 30.0 40.5 111.5 88.5 127.0 104.5 20.5 97.
of 3 degrees of freedom has an associated value of . vicarious learning). . by A measure of the degreeto which the Rjs differ from one another is given by Assume there are k populations to be compared that a sample of nj observations and is available from population j .5. Therapeutic reading (TR) (indirect learning). describe some distributionfree analogs to the multiple comparison procedures discussed in Chapter 3.. k. These sums will be denoted by RI.) To illustrate the ofthe KruskalWallis method. (When the number groups is two. j = 1. . The hypothesis to be tested is that allthe populations have the same probability where N = nj Under the null hypothesis the statisticH has a chisquared distribution with k 1 degrees of freedom. Details of the KruskalWallismethodforonewaydesignswiththreeormoregroupsaregivenin of Display 8. (Hollander and Wolfe. we apply it to the data use shown in Table 8.246 Display 8. Control (no treatment). the observations first ranked without are regard to group membership and then the sums of the ranks of the observations in each group are calculated. 1999.These data arise from an investigation possible beneficial effects of the of the pretherapy training clients on the process and outcomecounseling and of of psychotherapy. . for identifying which c:=. vicarious therapy pretraining ( " (videotaped. Nine clientswere assigned to each of these four conditions and a measure of psychotherapeutic attraction eventually given to each client.23.26. the KruskalWallis test is equivalent to the WilcoxonMannWhitney test. For the KruskalWallis test tobe performed. . Sauber (1971) investigated four different approaches to pretherapy training. . role induction interview (M) (direct learning). the oneway ANOVA procedure describedin Chapter 3. we would expect the R.. R2.s to be more or less equal. There is no evidence of a p difference between the four pretherapy regimes.5 CHAF'TER 8 KruskalWallis DistributionFree Procedurefor Oneway Designs ~~ distribution. .Rk. apart from differences caused the different samplesizes. and V ) Group. Applyingl the hska Wallis procedure to these data gives a chisquared test statistic which with 4. .6. If the null hypothesis is m e .
after obtaining a significant result from the KruskalWallis test. . The test is described in distributionfree test from Friedman ANOVA Display 8.6 Psychotherapeutic Attraction Scores for Four Experimental Conditions 247 Contml Rroding (m) Wotape (VTP) Gmup (HI) 0 1 3 3 5 10 13 17 26 0 6 7 9 11 13 20 20 2 4 0 5 8 9 11 13 16 17 20 22 1 5 12 13 19 2 5 27 29 Friedman's Rank Test for Correlated Samples Here we assume that a sample n subjects are observed under conditions. . DISTRIBUTIONFREE TEST FOR A ONEWAY REPEATED MEASURES DESIGN When the oneway design involves repeated measurements on the same subject. The test is clearly related to a standard repeated measures applied to ranks rather than raw scores.4. of k First thek observations for each subject ranked. . .degrees of freedom.. . 1 particular groupsin the oneway design differ. S has a chisquared distribution withk . The Friedman statisticS is then given by Display 8. . .6. j = l . i = l .) 8. a (1937) can be used.6 cy=1 Under thenull hypothesis that in the population the conditions have same the distribution. .. j = 1.n . togive the values are r i j .COMPUl"ATI0NALLY INTENSIVE METHODS TABLE8.k. . k.. . Then the sums of these ranksover conditions are found as Rj = rtj.
3 33.4 20.7 85.9 86.1 65.7 35. and the subjects were askedeach case to identify the consonants in each syllableby writing down what they perceived themto b .7 25.0 32.9 33. 40 45.1 10 2.3 45.6 23. lip reading.6 28.1 16 17 0 0 18 2 3 4 1.0 42.2 29. In particular. 45 22.6 42.0 4.1 13.7 13.3 81.6 47.0 31.6 36.4 3.2 To illustrate the use of the Friedman method. audition and cued speech.8 29.1 41. The 18 subjects in the study wereall severely hearingimpaired children who for had been taught through the of cued speech at least 4 years.5 41.4 33.0 98.4 33.1 28.0 34.3 46.2 89.4 73.248 CHAPTER 8 TABLE8.0 81.1 46. audition and lip reading.2 89.2 32.4 32.1 35.7 31.2 86.8 21.6 95.3 35. A: L: AL: C: AC: LC: ALC: audition.5 42.2 29.9 39. we apply it to the data shown in Table 8.0 38.3 3.0 4.0 95.9 59.3 70.4 41.5 42.9 85.4 37.1 22.0 8 9 0 1. into the hand effectiveness of a system using cues in the teachingof language to severely hearingimpaired children. and audition.3 23.3 77.2 91.4 7 5.0 41. lip reading and cued speech. lip reading.6 78.3 95.9 3.2 n.3 26. 45 40. cued speech.8 69. Syllables were use presented to the subjects under each of the seven conditions (presented in ranin dom orders).7 71.1 1. 45 83.0 21.0 0 36.2 63.9 6 0 78.7 44.7.0 33.9 38.1 72.8 91. The subjects’ e .0 76.7 95.7 81.9 31.9 67. 40 33.9 27.0 40.7 4.7 31.5 82.8 40.4 42.7 33.0 69.8 27. they considered syllables presented to hearingimpaired children under the following seven conditions.0 95.5 % A 27.8 33. taken from a study reported by Nicholls and Ling (1982). and cued speech.4 11 12 0 0 13 0 14 1.4 92.9 46.4 26.5 52.6 57.1 15 1.3 28.3 35.7 Percentage of Consonants Correctly Identified under Each of Seven Conditions Subject A L AL C AC L C ALC l 5 11.
) Dealing with Pearson’s coefficient involves  TABLE 8.2. value zero gives t = 1. value 0. 1991. (Readers are reminded that the test statistic here is r d(n . (Possible distributionfree approaches to more complex repeated measures situations are described in Davis. p takes the .5.50. for Io example. examination scores (out of corresponding exam completion times 75) and (seconds) are given for 10 students. DISTRIBUTIONFREECORRELATION AND REGRESSION A frequent requirement in many psychological studies is to measure the correlation between two variables observed on a sample of subjects. an overall percentage correct was assigned to each participant under each experimental condition.8 Examination Scores and lIme to Complete Examination score lime (S) 49 70 55 52 61 65 57 71 69 44 2160 2063 2013 20 00 1420 1934 1519 2735 2329 I590 .degrees of freedom. in and there is clearly a difference the percentage of consonants identifiedin the different conditions. The associated value is very small.) 8. r. takesthe TheusualPearson’sproductmomentcorrelationcoefficient. to Applying the Friedman procedure the datain Table 8.COMPUTATIONALLY INTENSIVE METHODS 249 results were scoredby marking properly identified consonants in the appropriate order as correct.63.36 with 6 degrees of freedom. and a t test that the population correlation coefficient. Finally. A scatterplot of the data is shown in Figure 8.1 r*). it is these percentages that are shown in Table 8.7 gives the chisquared p statistic of 94.8. Table 8. which under thenull hypothesis thatp = 0 has at distribu2)/( 1 tion with n .7.
Details the calculation and testing the coefficient are given in Display 8. Kendall’s tau takes the value 0. 8. Details of its calculation and testing are given in Display 8.24.42.5. although the associated tests for independence both conclude that examination score and completion time are independent. e e e e e I l I I l l 4 5 5 0 5 5 6 0 65 70 Exam w r e FIG.1. times. 8. 8.it takes the value 0. 1999. The estimated correlation is lower than the value given by using Pearson’s coefficient. For the datain Table 8. and z the large sample test is 1.8. Kendall’s mu Kendall’s tauis a measure correlation that canused to assess the independence of be of two variables.29.5. For the examination scores and times data.8.16 with an associated p value of 0. Scatterplot of examination scores and completion the assumption that the data arises from a bivariate normal distribution (see Everitt and Wykes. and the large sample .7.2. Spearman’s Rank Correlation Another commonly used distributionfree correlation measureSpearman’s cois efficient. Buthow can we test for independence when a we are not willingto make this assumption? There. for a definition). without assuming that the underlying bivariate distribution is of of bivariate normal.250 CHAPTER 8 e e .2. are number of possibilities.
there isan of inversion ofthe ranks on variable 2 (rank 3 appears before rank 2). Kendall’s tau statistic can now be definedas 7=1 n(n 21 1)/2’  in where I is the number of inversions the data.) A significance testfor the null hypothesis that population correlationis zero is the given by z = (n which is tested as a standard normal. two Subject 1 2 3 Variable l 1 2 3 Variable 2 1 3 2 When the subjectsare listed inthe order theirrankson variable 1. Within each variable the observations are ranked. (This formula arises simply from applying usual Pearson correlation coefficient the formula to theranked data. l)’’zrs. Display 8.COMPUTATIONALLY INTENSIVE METHODS Display 8. This term can best be explained with the three subjects withscores on variables. W~thin each variable the observations are ranked. A significance testfor the null hypothesis that population correlationis zero is the provided by z given by z= + 5)1/[9n(n l)]’ r which can be testedas a standard normal.8 Spearman’s Correlation Coefficient Ranked Data for Assume that n observations ontwo variables are available. .7 Kendall’s Tau 25 1 Assume n observations on two variables are available. The value of Kendall’s tau based on the number of is inversions in the two sets of help of asmall example involving ranks. Spearman’s correlation coefficient defined as is j in where D is the difference the ranks assigned to subject j on the two variables. whichafter ranking givethe following.
C. DistributionFree Simple Linear Regression Linear regression.ka/2 2 2 ' . based on a methodby Theil suggested 8. For a symmetric twosided confidence interval B with confidence level 1 . It is.kap of thenull distribution of C from u Table A. possible to use a distributionfree approach to simple regression where there is a single explanatory variable.252 CHAPTER 8 z test is 1. x i . and the value 1 if^/ <yi. (1950). 8. . An estimator of the regression slope is obtained as where Si) = (y. yi)/(xj xi).9. We have a sample of n observation on the variables. n. The lower and upper lits of the required confidence intervalare found at the Mth and Qth positionsof the orderedsample values S. 1 5 i <j 5 n).22. Estimation and testing the model r largely on the assumption that the error terms in a enormally distributed. Again. y j j = 1. however.. Details are given in Display We shall use distributionfree regression procedure on the data introduced in the Chapter 6 (see Table 6.3.Iwe for (.1). the conclusion that the is two variablesare independent.5. .9 DistributionFree Simple Linear Regression The null hypothesisof interest is that the slope of the regressionline between two variables x and y is zero. A large sample approximation uses the statistic z given by L= [n(n  l)(% +5)/18I1l2' c referred to a standard normal distribution. first obtain the upper / 2percentile point. as described in Chapter 6.30 in Hollander and Wolfe (1999).  . where M= n  b = median(Si.. is given by i=l I=i+l where the functionc is zero if yj = yi. . The test statistic.24 with an associatedp value of 0. two . The p value of the teststatistic can be found from Table A..30 of Hollander and Wolfe (1999). takes the value 1 if yI yi. is one of the most commonly used of regression coefficients depends of statistical procedures. giving the average number of words known by children Display 8.
00000 Note.9 DistributionFreeEstimation for Regression Coefficientof Vocabulary Scores on Age Si] Values (see Display 8.00000 648. Here the two fitted linesare extremely similar. = 582.60000 512.00000 517.5oooO 600. Interested readers can consult Cleveland (1985) for details.8oooO 500. what might be termed a classical distributionfree approach.00000 634. 8.ooMw) 511.00000 404.3.33333 295.66667 461.33333 446. Details of the calculations and results are shown in Table 8.67.33333 776. However.00000 533. at various ages.The fitted distributionfree regression line for comparison.00000 644. are of increasing importance.44444 585.00000 490.00000 624.67.42857 600.COMPUTATIONALLY INTENSIVE METHODS TABLE 8.33333 712.00000 652.00000 269.2oooO 572.2m 633.9. C. Such methods based on is.33333 588.00000 650.00000 532.00000 660. The associated p values for the appropriate tableinHollanderandWolfe (1999) is very small.4oooO 604.00000 536. PERMUTATION TESTS TheWllcoxonMannWhitneytestandotherdistributionfreeproceduresdeare scribed in the previous section all simple examplesof a general classof tests known as either permutation or randomization tests.25000 607.6. that obtained using and. by least squares are shown on a scatterplot of the data in Figure 8.00000 566.5oooO 487.00000 900. for testing the null hypothesis that the slope is zero takes the value 45. which allow the data themselves to suggest the form regression relationship of the between variables.9)for the Vocabulary Data 253 38.66667 424. the more recent developments such as locally weighted regression and spline smothers. The estimatesof the intercept and thedope are obtained from these values as d = median(vocabu1ary score j x age): j= median(Sij).57143 649.00000 729. Such procedures were first .71429 348. Other aspects distributionfree regression including a procedure of for multiple regression are discussed in Hollander and Wolfe(1999).00000 639.00000 582.  This l a s to values e d the d = 846.33333 511.00000 616. and so we can (not surprisingly) reject the hypothesis that the regression coefficientfor age and vocabularyscore isnot zero. that using ranks in some way. Theteststatistic.33333 555.00000 564.00000 636. are probably not of great practical importance.
8. test 1. so to speak. 2 Compute S for the original set of observations.However. Obtain thepermutarion disrriburion Sof repeatedly rearranging the obserby vations. . 1994). “computedon the fly” (Good.the permutation approach is being applied to a wider and wider variety of problems.g.3. the statistician the psychologist) is not limited the availability of tables but free to choose a test statistic by is a particular alternaexactly matched to testing a particular null hypothesis against tive. when the difference between groups is assessed). For a chosen significance level a. Whentwo or more samplesare involved (e. obtain the upper apercentage point of the permutational distribution and accept or reject the null hypothesis .254 CHAPTER 8 1 2 3 4 5 8 we FIG. Significance levels then. are 4. but initially they were largely of theoretical rather than practical interest because of the lack of the computer technology required to undertake the extensive computation often needed in their application.. are The stages in a general permutation are as follows. showing fitted least squares and distributionfree regressions. and with today’s more powerful generation of personal computers. Scatterplot vocabulary of scoresdifferent at ages. introduced in the 1930s by Fisher and Pitman. Choose a test statistic S. all the observations combined into a single are large sample before they rearmnged. Additionally.witheachincreaseincomputerspeedandpower. 3. faster it is often to to calculate a p value for an exact permutation test thanlook up an asymptotic (or approximation in abook of tables.
treatment 21 10.34. are Treatment 1: Treatment 2 121. as the observations re expected to have almost a the same values in each two treatment groups. treatmentl121. then such an extreme value has a 1 in 20 chance of occurring. the test leadsto the rejection of the null hypothesis in favor of the alternative. because ranks always take the same values(1.33.(The author is aware that the result this case ispretty clear without any test!) The first stepin a permutation testis to choose atest statistic that discriminates between the hypothesis the alternative. the alternative hypothesis true. previously tabulated distributions couldbe used to derive p values. and so on.The alternative is that the first treatment results in in higher valuesof the dependent variable. that is. To illustrate the application a permutation test. Onesum might be smaller or larger than the other by chance. it is seen that the sum of the observations in in the original treatment 1 group. 349. that is. the three treatment 1and three treatment 2. The resultant p value is often referredto as exact. The WilcoxonMannWhimey test described in the previous section is also an example of a permutation test. lengthy computations were avoided. Originally the advantage of this procedure was that.COMPUTATIONALLY INTENSIVE METHODS 255 according to whether value of S calculated for the original observations the is smaller or larger than this value. Consequently.10.118. Suppose that the observed values as follows.05 significance level. at least for small samples. Consequently.22. 12. consider a situation involving of two treatmentsin which three observations of some dependent variable interest of are made under each treatment. To generate the necessary permutation distribution. The null hypothesisis that thereis no difference between the treatments their in effect on the dependent variable. From the results given Table 8.110. If chance alone is operating.2. However.118.is now relatively simple it + + . Therefore at the conventional . If the null hypothesis true. this If is sum ought to be larger than the sum of the observations in the second treatment group. remember that under the null hypothesis the labels “treatment 1 and “treatment 2 provide no ” ” information about the test statistic.12. topermutate the of the observations. The value of the chosen test statistic for the observed values is 121 118 110 = 349. to the six observations. Repeat the process until the possible20 distinct arrangements all have been tabulated as shown in Table 8. for example.10. but the two should not be very different. obvious candidateis the sum of the and An observations for the first treatment group. is equalled only once and never exceeded in the distinct random relabelings. simply reassign six labels. then the sum the observations each group is of in should be approximately the same. etc. 5%. but one applied to the ranks of the observations rather than their actual values. 34.).
(S02) of intertrial subjective organizational measure was used to assess the amount consistency ina memory task given to children intwo groups. The question of interest received a considerable amount of here is whether there is any evidence an increase in organization in the group .256 CHAPTER 8 TABLE 8.37 0.11.33 0.35 0.10 Permutation Distributionof Examplewith Three Observations in Each Group FirstGmup Second Gmup sm of First u Cmup Obs.41 0. 12 121 1 22 121 2 121 3 118 4 5 121 68 121 6 251 121 7 118 8 22 121 9 118 10 11 12 13 14 15 16 17 18 19 110 20 34 121 118 110 121 110 118 110 121 118 110 118 34 34 l18 12 118 110 121 110 118 110 12 110 22 118 12110 118 110 110 1234 121 34 34 34 2234 34 118 22 22 22 22 121 110 2234 34 262 34 22 3422 3412 34 22 12 24012 110 22 22 12 118 22 110 12 12 12 12 12 12 110 118 12 118 110 121 22 121 118 118 121 121 121 121 121 34 273 22 12 12 265 22 34 261 253 250 243 34 167 110 166 156 118 110 110 118 12 22 12 22 34 34 177 174 164 155 152 144 TABLE 8.35 0. oneof which had of training in sorting tasks.11 Data from a Study of Organizationand Memory ~~ Tmining Gmup No Tmining Gmup 0.34 0.49 0. a point bedemonstrated that will a study of organization in by using the data shown in 8.40 0.which results from Table of A bidirectional the memory mildly retarded children attending a special school.39 to calculate the required permutation distribution.46 0.
one where they sum 16. Its introduction into statistics is relatively recent. It is type of calculation thatgives the values in the tables assigning p values to values this for of the WkoxonManwWhitney rank sum t s et that hadreceived training (more details the studya e given inRobertson. Calculate S for the original set of observations. As can be seen. THE BOOTSTRAP The bootstrap is a databased method for statistical inference.) respects. The distributionof the sum of the ranksin thetraining group for all possible252 permutations is as follows.028. to From this distribution we can determine the probability of finding a value the sum of the ranks of equal to or greater than the observed value37. . requires The bootstrap looks l i e the permutational approach in many a minimal number of assumptions for its applications.12. for example. that ( ) of is .this leads toan exact p value for the testof a group difference on the S02 scores of .”widely consideredto bebased on the eighteenth century adventures Baron Munchausen by Rudolph of Erich Raspe. 2. stages The in the bootstrap approach as follows. are 1. Just when looked as if all was lost.7. thereis one arrangement where the sum to15. i the null hypothesis is Vue: of f Pr(S 2 37) = (3 +2 + 1 + 1)/(252) = 0. he thought he could himself it pick up by his own bootstraps. S19 18 17 16 15 f 1 3 1 2 5 S 29 28 30 32 31 33 f 2 0 1 9 1 8 1 6 1 4 1 20 21 22 23 24 26 25 27 7 9 1 1 4 1 6 1 8 1 9 2 0 39 3834 36 35 37 40 1 7 9 5 3 2 1 1 So. (The relevant adventureis the one in which the Baron had fallen into the bottom of a deep lake. and derives critical values for testing and constructingconfidence intervals from the data at hand. because themethod is computationally intensive. 1991) of r . According to Efron and Tibshirani (1993).028. The permutational distribution the WilcoxonMannWhitneyrank sum statistic of is shown inTable 8. and 50 on.12 Permutational Distribution the WllcoxonMannWhitney Rank Sum Statistic of S02 Applied to the Scores 257 There are a totalof 252 possible permutations pupils to groups. There is evidence of a training effect. the term bootstrap derives from the phrase “to pull oneself up by one’s bootstraps.COMPUTATIONALLY INTENSIVE METHODS TABLE 8. Choose a test statistic S. ranks to two where they sum 17. 8.
is(11.14.11..06.81). Here.21 1. In the multigroup situation. 1 . consmct a confidence intervalfor the test statistic by using the bootstrap distribution (see the example given later).65. 5. 3.5. 5. 2 3. we merely illustrate its u e in s first two particular examples. in the constructionof confidence intervals for the difference in means and in medians of the groups involved in the WISC data 3 . 2.3. l 6 were as follows. an priate computer algorithm (these generally available in most statistical are software packages).0). 7.3. 16 O A rough 95% confidence interval can be derived by taking the 25th and 975th largest of the replicates.258 CHAPTER 8 4.6. The mean difference and median difference is calculated for each bootstrap sample.6. resultingl in 0 mean and median o0 differences. 9.6. . . 14.4. 6. of . Obtain the upper apercentage point of the bootstrap distributionand acS cept or reject the null hypothesis according to whether for the original observations is smaller or larger thanthis value. 13.and then in regression.5.1.13. 7. 15. 13. including many fascinating examples of its application. 16 3 1.59).1).2.16 1. 5.6. .207.16.5. using appro. 3. The first five bootstrap samples in terms the integers 1. The random samples a m found by labeling the observations in each p u p with the integers 1.9. median (7. assuming usual the normality of the population distributions and homogeneity variance.12. Sampling wirh replacement.10. 16 1. l 6 and selecting random samples o f these integers (with replacement).. leading to the following intervals: mean (16. 11. Alternatively. In addition. 9.3. Obtain the bootstrap distribution of S by repeatedly resampling from the are are observations. The 95%confidence interval for the mean difference derived inway.4. 3.15.2. using the vocabulary scores data used earlier in this chapter and givenin Chapter 6 (see Table 6. are given by Efron and Tibshirani (1993). 12.9. 7. 1. samples not combined but resampled separately. Full details of the bootstrap. 8. of . 11.2. 15.4. 14. In this study.11. 12.8.8. 15. 15. is Unlike a permutation test. 1 1 .5.14.193.13 Construction of Confidence Intervalsfor the WISC Data by Using the Bootstrap The procedure is based on drawing random samples 16 observations with replacement from of each of the m and comer p u p s . However. 6. 5. .4. it generally less powerful than a permutation test. the bootstrap does not provide exact p values. I. . 3. lo00 bootstrap samples were used. 2. 16.5.14 1. The confidence interval for the mean difference obtained using the bootstrap seen tobe narby is rower than the corresponding interval obtained conventionally witht the statistic. 14.) introducedin Chapter (see Table 3 6 . 13. it be is may possible to use a bootstrap procedure when no other statistical method is applicable. The results the WISC example summarized in Table for are 8. TABLE 8.2. 11.12.12.
03) is wider n of (513. 8. Histogram median of differences samples of the WlSC data. l o o 0 bootstrap 300 I" 100 0 0 100 200 FIG. in We see that the regression coefficients calculated from the bootstrap samples sho a minor degree of skewness.The bootstrap confidence interval(498. Histogram of mean differences for samples of the WSC data.4.4 and 8.617. The bootstrap results are represented graphicallyFigures 8.14.5. .35. The bootstrap results the regression vocabulary scores age are summafor of on rized i Table 8.35.6 and 8. for l o o 0 bootstrap Histograms of the mean and median differences obtained from the bootstrap samn ples areshown i Figures 8. 8.7.14.COMPUTATIONALLY INTENSIVE METHODS 259 !lm 100 0 0 100 200 300 FIG. and that the regression coefficient calculated from is of the the observed data perhaps a little biased compared to the mean bootstrap distribution of 568. than the interval givenChapter 6 based on the assumption of normality in 610.51).5.
14 BootstrapResults for Regression of Vocabulary Scores on Age Observed Value Bias Mean SE 6.42 561. Confidence 500 550 600 650 700 Estimated regression coefficient FIG. They use the ordinal property of the raw data.8. .6. 8. Distributionfreetests are usefulalternativestoparametricapproaches.03). 1o00. SUMMARY 1.93 Number of bootsmp samples. 95% interval (498.14. Histogram regression of coefficients vocabulary of scores on age for I OOO bootstrap samples.260 CHAPTER 8 TABLE 8. 8.617. Such tests generally operateon ranks and so are invariant under transforonly mations of the data that preserve order. when the sample size small and therefore evidence for any distributional is assumption is not available empirically. 2.
Click on Statistics. for example. click on Nonparametric Tests. distributionfree inferences. Move the names of the relevant dependent variables to the Ts Variable et List. and then click on 2 Independent Samples. . Normal probability plot of bootstrap regression coefficients for vocabulary scores data. are COMPUTER HINTS SPSS Many distributionfree tests are available in SPSS. 3. 1.COMPUTATIONALLY INTENSIVE METHODS 26 1 2 0 2 Quantiles of Standard Normal FIG. 2. Comprehensive accounts are given by Good (1994) and by Efron and Tibshirani (1993). This chapter hasgiven only a brief account this increasingly important of area of modem applied statistics. Both methods computationally intensive.7. we would use the following steps. Permutation tests and the bootstrap offer alternative approaches to making 4. 8. to apply the WilcoxonMannWhitney test.
cumulative probability. and random generation for the distribution of the WllcoxonMannWhimey rank sum statistic are also readily and be available by using dwilcox. The density. 4. or the measures. the command would be yl wikox. For example.test.then et (a) Click on One Sample. For related samples. the command would be rank and to apply theW1lcoxon’s signed test to paired data contained in two vecto of the same length. The following steps access the relevant dialog boxes. ( )Click on ’ h 0 Sample.relevantfunctions friedmantest. krusM. and thecommandlanguage. Click on the relevant grouping variable and move it to Grouping Varithe able box.test. in the electrode example the text. in et SPLUS Various distributionfree tests described the textare available from the Statistics in menu. if distributionfree confidence intervals The outer function is extremely helpful are required. qwilcox. click onCompare Samples. after clicking on Nonparametric we would click 2 Related Samples and ensurethat Wilcoxon is checked the Ts Type dialog box.paired=T). and click on KruskalWallisRank Ts or Friedman Rank Ts to get the dialog box for the KruskalWallis oneet Friedman procedure for repeated way analysis procedure. . Click on Statistics.test(y1 . Click on OK. and these can very useful in particular applications. rwilcox. For example. Ensure thatMannWhitney U is checked. and click onWilcoxon Rank Ts for the b et WllcoxonMannWhimey test dialog box. pwikox. 5. x1 and x2.y2. the data each in if for electrodel and electrode2. and y2. to apply the WilcoxonMannWhitney test to the data in two vectors. et (c) Click on k Samples. quantiles. then the required sums electrode are stored in vectors. 1. Al these distribution free tests are also available by a particular function in l are wilm. click on Signed Rank Ts for the Wilcoxon’s and signed rank test dialog box.262 CHAPTER 8 3.
15 Depression Scores ~~ Patient Vi l ur b i t2 1 2 3 4 5 6 7 8 9 1. 1 and Wolfe (1999). required.55 3.05 1. click on Resample.83 0.06 3.15 were obtained in a study reported by Hollander .14 1.62 2. 2. 263 The cor function can be used calculate a variety of correlation coefficients.65 0. Click on Statistics.A measure of depression was recorded each patient on both for of the first and second visit after initiation therapy.30 0.88 0.88 1. Use the Witcoxon signed rank test to assess whether the depression scores have changed over the two visits and construct a confidence interval the difference.06 1.di& "+")/2.50 1.COMPUTATIONALLY INTENSIVE METHODS of pairsof observations needed can be found as diff < eJectrodelelectrode2. wij < outer(dX.48 1.29 1. The bootstrap resampling procedureis available from the Statistics menu as follows.68 1. 1. for TABLE 8.test can be usedfor hypothesis testingof these coefficients. to and cor. EXERCISES 8.06 1. for and click on theOptions tag to alter the number bootstrap samples from of if the default value of 1O00.29 . enter the expression the statistic to estimate. and click on Bootstrap to access the Bootstrap dialog box.60 2. The data in Table 8. Select therelevant data set.
6. Use the bootstrap approach to produce an approximate 95% confidence interval for the ratio of the two population variancesof the WISC data used in this chapter. Reanalyze the S02 data given in Table 8. is selected for study.11 by using a permutational approach.16 give the test scoresof 13 dizygous (nonidentical) 1999). The data shown in Table 8.3.2. Investigate how the bootstrap confidence intervals both the mean and for median differencesin the WISC data change with the size of the bootstrap sample. A total of eight families. 8. 8. male twins (the data are taken from Hollander and Wolfe. Use a atr . 8.17 give the values of this variable for the eight children both before psychotherapy and f e psychotherapy.16 CHAPTER 8 Pair i Twin Xi Twin fi 1 2 3 4 5 6 7 8 9 10 11 12 13 277 169 157 139 108 213 232 229 114 232 161 149 128 256 118 137 14 4 146 221 184 188 97 231 114 187 230 8.4. the therapist uses the number an asthma attack in a of trips to the emergency room of a hospital following 2month period. A therapist is interested in discovering whether family psychotherapyis of any value in alleviating the symptoms of asthma in children suffering from the disease. each having a child with severe asthma. The data in Table 8. taking as the test statistic the sum the scores in the group that had of received training. 8.5. As the response variable.264 Ts Scores of Dizygous Wins et TABLE 8. Test the hypothesis of independence versus the alternative that the twins’ scores are positively correlated. Investigatehow the confidence interval changes with the number of bootstrap samples used.
8.70 3. Thedata in Table 8. and calculatea distributionfree confidence interval the treatment for suitable distributionfre test to determine whether there has been any change effect.15 5.43 3.06 4.50 3.17 265 Psychotherapy Patient Bdom Ajier PEF and S:C Measurements for 10 SchizophrenicPatients PEF TABLE 8.7.COMPUTATIONALLY INTENSIVE METHODS Number of Visits to the Emergency Rwm in a 2Month Period TABLE 8.85 4.21 48 42 44 35 36 28 30 13 22 24 two periods.32 3.18 sc : 3.65 4.74 3.18 show the value of anindex known as the psychomotor expressiveness factor (PEF) andtheratio of striatum to cerebellum .
test whether the population correlation is zero. Kendall’s tau. and Spearman’s correrank lation for the variables.266 CHAPTER 8 radioactivity concentration 2 hours after injection of a radioisotope (the ratiois known as S:C). and each two case. Calculate the values of Pearson’s product moment correlation. for 10 schizophrenic patients. . The data were collected in an investigation of the dopamine hypothesis of schizophrenia.
) Many categorical scales have a natural ordering. In such cases the measurement scale consists of categories. definitely not). Political philosophy:liberal. symptoms absent. 2.e. Examples attitude toward are legalization of abortion (disapprove all cases. . moderate. approve only certain cases.fair. say. on is frequently of a set of interest. unlike. response to a medical treatment (excellent. where so and information about marital status.  1. probable. apin in good. ethnicity. Christian..1. INTRODUCTION Categorical data occur frequently in the social and behavioral sciences. prove inall cases). conservative. one be i. unlikely. occupation. Republican.Analysis of Categorical Data I: Contingency mbles and the Chisquare Test 9. Diagnostic testfor Alzheimer’s disease: symptoms present. poor) and is diagnosis of whether a patient mentally ill (certain. the set apply of categories of liberal. and only one category should to each subject. %o specific examplesare as follows. Such ordinal variables have largely been dealt with in the previous chap 267 . m e categories of a categorical variable should mutually exclusive. sex.
that is. socalled nominal variables. Crossclassifications of pairs of categorical variables.2 3 TABLE 9. classical. the researcher collecting categorical data is most interested in assessing how pairs of categorical variables are related. THE TWODIMENSIONAL CONTINGENCY TABLE A twodimensional contingency tableis formed from crossclassifying two categorical variables and recording many members of the sample in each cell how fall of the crossclassification. the order of listing the categories is irrelevant and the statistical analysis should not depend on that ordering. the of next section will is hoped) act a refresher. 9. rock). and of rock!) In many cases. The main question asked 9. in particular whether they are independent of one another. An example of a 5 x 2 contingency table is given in Table 9. are commonly the starting point for such an investigation.jazz. Jewish.2. of music. course.268 CHAPTER 9 here we shall concentrate on categorical variables having unordered categories. but for those people whose memories these topics have become a little faded.Examples are religious affiliation (Catholic. w and favoritetype of music (classical. might be listed in terms its cultural contentas country. country.1 "%Dimensional Contingency Table: Psychiatric Patients by Diagnosisand Whether Their Wtment PrescribedDrugs Diagnosis Drugs No Drugs schimphnnic disorder Affective Neurosis disorder Personality Special symptom 105 12 18 41 0 19 52 1 3 8 2 . mode of transportation to work (automobile. twodimensional Contingency tables. ables. folk. folk. Protestant.1. and Table shows a x 3 contingency table. For nominal varijazz. (It could. bicycle. for example. be argued that some of of type these examplesdo have an associated natural order. other). bus. Subsequent sections will consider (it as a number of topics dealing with twodimensional contingency tables not usually encountered in an introductory statistics course. Such tables and the associated chisquare test should be familiar to most readers from their introductory statistics course.
2 Incidence of Cerebral Tumors Site A B c Total 9 26 I 37T t l oa n m 23 21 34 78 28 4 24 3 17 75 141 Note. Drugs Diagnosis No Drugs symptoms schizophrenic 4. other cerebral tumors.3 and a chisquare value of 84.84. The question answered by of the familiar is use chisquaredtest. areas.49 12.1 gives the estimated expected values shown in Table 9. C. (Contingency tables formed from more than two variables will be discussed in the next chapter.57 or not. details are about such tablesis whether the two variables forming the table are independent given in Display 9. Clearly. diagnosis and treatment withdrugs are not independent. II.For Table 9. a statistic takes the value which with 4 degrees of freedom leads to p value of . frontal lobes.60 65. Sites: I.ANALYSIS OF CATEGORICAL DATA I 269 TABLE 9.23 24.3 Estimated Expected Values Under the Hypothesis of Independence for the Diagnosis and Drugs Data in Table I 9.2 the chisquare 7.28 8.40 38. Here there is no evidence against the independenceof site and type of tumor. TABLE 9. temporal lobes.098. Ws:A. . The associated p value is very small.77 Af'fective disorder Neurosis Personality disorder 33.1 to the data in Table 9. other cerebral B. malignant tumors.51 9.m.72 Special 4.) Applying the test described in Display 9.19 with 4 degrees of freedom. benignanttumors.1.43 74.
The null hypothesisto be tested is that thetwo variables are independent. in represents the total number of observations thejth column of the in tableboth ni.) In a sample of individuals we would. The possible problems caused by use of the chisquare distribution the to approximate thetrue null distributionof Xz is taken up in the text.. This hypothesis can beformulated more formally as HO Pi/ = Pi.and p.p. we can estimate this expected valueto give The observed and expected values under independence then compared by using are the familiar chisquared statistic If independence holds then the statistic has approximately a chisquared Xz distribution with(r l)(c 1) degrees of freedom.one with r categories andone with c categories.j in category of the column variable. F e hypothesis is just a reflection of that for A elementary mle of probability that two independent events and B.1 Testing for Independence in an r x c Contingency Table Suppose a sampleof n individuals have been crossclassified with respect to two categorical variables. expect n are npi. . pi) is the n in probability of an observation being the ijth cell. X P . to form an r x c twodimensional contingency table.j. j * where in the population from which theobservations havebeen sampled. and is theProbability of being thej t h p. is Variable 1 Here nij represents the number observations in the ijth cell of the table. The general form of a twodimensional contingency tableasfollows. if the variables independent. and n. the probability ofA nnd B is simply the product of the probabilitiesof A and ofB.. . pi. is theprobability of being in the ith category of the row variable.270 CHAPTER 9 Display 9. Using the obvious estimatorsfor p. of represents the total number of observations the ith row of the table andnj. This allows p values to be assigned. individuals in theijth cell.j are usually termedmarginal totals. ni.
categorical variables with only two categories. Classificoion of Psychiatric Patients Sex and Diagnosis by Sa Diagnosis Male Female 32 32 64 Total Schizophrenia Other Total 43 15 58 75 47 122 2.2.5. for The simplest form of twodimensional contingency table is obtained when a sample of observations is crossclassified according to the values taken by two dichoromous variables. Incidence of Suicidal Feelings Psychotic and Neumtic Patients in ripe of Patient Psychotic Feelings Suicidol YS e No Total 2 18 20 14 20 6 8 32 40 “Hem the verdict is classified against whether the defense alleged that the victim was somehow partially at fault the rape. Datafmm Pugh (1983)Involving How Juries Come to Decisions in Rape Cases’ Veniict Guilty Fault Not Guilty Total alleged Not Alleged Ttl oa 153 105 zsa 358 100 24 76 177 181 3. Several examples of such tables shown in Table9. derived confidence intervals for differences in proportions . n The resultsof applying the chisquared test the data sets in Table and the to 9.4.ANALYSIS OF CATEGORICAL DATA TABLE M I 27 1 some Examples 2 x 2 Contingency Tables of 1. that is. the special form of the chisquare test for such tables. 2x and the construction of a useful confidence interval associated with 2 tables are describedi Display 9. The general form of such a are 2 x 2 contingency table.4 a e shown in Table r 9.
The 95% confidence intervalfor the difference in the probability of beiig found guilty or not guilty when the defense docs suggest thatthe not rape is partially the fault the victimis (0.5 Results of Analyzing theh x 2 ContingencyTables in Table9. 4 . has For this type of contingency table. defendentmore of likely to be found guilty. X2 = 7. TABLE 9..272 CHAPTER 9 2 x 2 Contingency Tables The general form of a 2 x 2 contingency tableis as follows.l 1. 0 2 The 95% confidence interval the Merence in the probability 06.01.4 2 1. X2 = 2.46). X* = 3 . The 95% confidence interval for the difference in the probability of having suicidal feelings in the two diagnostic categories ( 0 4 . For the suicidal feelingsdata.41). Display 9. 3.i.25.0. .. independence implies that the probability of being in category1of variable 1 and category 1 of variable 2(PI) equal to the is and probability of being in category 1of variable 1 category 2of variable 2 ( ~ 2 ) . 2. Estimates of these two probabilities are given by B1 = a+c' a jj2 = b b+d' The standard errorof the difference of the two estimates is given by This can be used to find a confidence interval the difference in the for two probabilities in the usual way. For the rape data. .e. 4 0 0 ) ..2 Variable 1 Category 1 a Category2 Variable 2 Category 1 c a+c b Category2 Total d a+b c+d n=a+b+c+d Totalb + d The chisquared statistic used in testing independence can nowbewritten in for simplifiedform as n(ad bc)* x 2 = (a +b)(c +d)(a +c)(b +d)  For a 2x 2 table the statistic a single degreeof freedom.49 with an associatedp value of . is . . 3 with an associatedp value thatis 59 very small.5 with an associatedp value of . For the schizophnia and genderdata.0. for of beiig diagnosed schizophrenic for men and for women is (0.
4.it isreally no longer necessary routine availability of the exact methods describedSection in 9. r Three further topics that should mentioned in connection with x 2 continbe 2 gency tablesare Yates’s continuity correctionsee Display 9. Diagnosis and suicidal feelingsa e not associated. values are squared in the calculationof X*. namely multinomial of the disrriburion (see glossary in Appendix A). Fisher’s exacttestsee Display 9.6.29.3. To improve this approximation. we use the data shown i Table 9.4 comments).4. I 273 1. 2. The pvalue form applying the test .ANALYSIS OF CATEGORICAL DATA The results indicate the following. for We can illustrate the use of Fisher’s exact test on the data on suicidal feeling i n Table 9. indicating that diagnosis is and suicidal feelingsare not associated.2 to become This is now known as the chisquare value corrected continuity. Yates (1934) suggested a correction the test to statistic that involves subtracting from the positive discrepancies 0. a conrinuousprobability the X distribution.5 before these (observedexpected). McNemar’s test matched samplessee Display 9. The correction maybe incorporated into the formula for X 2 for the 2 x 2 table given in Display9. which is clearly not significant. For these data the test statistic takes the value 1. That sex and psychiatric diagnosis are associated in the sense that a higher proportion of men than women are diagnosed as being schizophrenic.4 because this has some small expected values (see Section for more 9. because of the . n To illustrate McNemar’s test.235.5 0. and we can conclude that depersonalization is not associated with prognosis where endogenous depressed patients are concerned. namelythe chisquare distribution. and adding to the negative discrepancies. Display 93 Yates’s Continuity Correction In the derivation of null distribution of the 2 statistic.5. 3.is being used as an approximationto the discrefe probability distribution observed frequencies. That the verdict in a rape case is associated with whetherthe defense or not allege that the rape was partially the fault of the victim. for Although Yates’s correctionis still widely used.
the probability of obtaining particular arrangement of any the frequencies a.) Fisher’s testuses this formula to find the probability of the observed arrangement of as frequencies. Display 95 McNemar’s Testfor P i e Samples ard When the samples to be compared in a 2 x 2 contingency tableare matched in some way. the same subjects observed ontwo occasions. The sum of the probabilities ofallsuch tablesis the relevant p value for Fisher’s test.274 Display 9. For fixed marginaltotals. d ) = (a +b)!(a+c)!@ +d ) ! ( b+d ) ! a!b!c!d!n! where a!read a factorialis the shorthand method of writing the product and a of allthe integers less than it. For a matcheddata set. To test whether probability of having differs in the matched populations. if there is no difference. b . c. for example. the 2 x 2 table takes the following form. and so on. and of every other arrangement givingmuch or more evidenceof an association betweenthe two variables. c . (By definition. has a chisquared distribution with a degree of single freedom. probability distribution of observed the frequencies is used.the the A relevant test statistic is x2 (bC)’ =b+c ’ which.Now a represents the number pairs of observations that both have A. Assuming that the two variables forming the table are independent. and d when the marginal totals are as given is Ma. . the required distribution is what is known as a hypergeomerric distribution.O! is one. then the appropriate test becomes one from McNemar (1962). keeping mind thatthe marginal totals are in regarded as fixed. Sample 1 A present Sample 2 Apresent Aabsent a c A absent b d where A is the characteristic assessed in the pairs of observation makingup the of matched samples. b.4 Fisher’s Exact Test 2 x 2 Contingency Tables for CHAPTER 9 Fisher’s exact test for a 2 x 2 contingency table does not the chisquare use approximation at al Instead the exact l.
6 Depersonalized P0tien:s Recovered Not Recovered Ttl oa Patients N t Depersonalized o Recovered Not recovered Total 14 5 19 4 23 16 2 2 7 9. lack independence. often a crude and blunt as instrument. the first involving suitably chosen residuals and the second that attempts to represent the associationin a contingency table graphically. The U s e of Residuals in the After a significant chisquared statistic is found and independence rejected for a is try twodimensional contingency table. it is usually advisable investigate in more detail the null to why at hypothesis of independence fails to fit. BEYONDTHECHISQUARETEST: FURTHER EXPLORATION OF CONTINGENCY TABLES BY USING RESIDUALS AND CORRESPONDENCE ANALYSIS A statistical significance test is.1) .ANALYSIS OF CATEGORlCAL DATA I Recovery of 23 Pairs of Depressed Patients 275 TABLE9. (9. i = l .1. j = l . Analysis o Contingency nbles f 9. Here we shall look two approaches. . Thisis particularly truein the caseof the chisquare test indepenfor dence in the analysis contingency tables. It for the of might be thought that can be done relatively simply this by looking at the deviations of the observed counts in each of the table from the estimated expected values cell under independence. is usually informative to to identify the it cells of the table responsible.. implied in Chapter 1. and after a significant of the test of value statistic is found. or most responsible. . that is. . by examination residuals: of the rij=nijEij. . c.3. r. ..3. .
276 CHAPTER 9 This would.96) indicate those cells departing most from independence.can be shown the variance of it that eij is always less thanor equal to one. and in some cases considerably less than one. more satisfactory way of defining for residuals for a contingency table might be to take eij = (nij . Unfortunately. 1. Consequently the use of standardized residuals for a detailed examination of a contingency table may often give a conservative reading as to which cells independence does not apply.~ ~ ) / f i ~ ~ .96.69 83.21 Personality disorder 43. however.23 Special symptoms 13.28 Affective disorder Neurosis 11. a more useful analysis can be achieved by using what are known as adjusted residuals. These are defined follows: as When the variables forming the contingency table are independent. Table 9. Consequently. with values outside say ( 1.96) indicating cells that depart significantly from independence.41 4. At the costof some extra calculation. 1.56 21.1 aa Drugs Diagnosis No Drugs schizophrenic 78. be very unsatisfactory because a difference of fixed size A is clearly more important smaller samples.7 Adjusted Residuals for the Dt in Table 9. Here the values for in of the adjusted variables demonstrate thata cells in the table contribute to the N TABLE 9.42 . the adjusted residuals are approximately normally distributed with mean zero and standard deviation one.56 8.96. values outside (1. as suggested by Haberman (1973).7 shows the adjusted residualsthe data Table 9.1.64 151.70 26. dij. ~ These terms are usually knownas standardized residuals and are such that the chisquared test statistic is given by It is tempting to t i k that these residuals mightbe judged as standard normal hn variables.
. but some experience needed in interpreting the diagrams.1. c .3. A cell for which therow and column coordinates both large and the samesign are of is one thathas a larger observed value than that expected under independence. 2001. as For a twodimensional representation. thefirst two coordinate valuesfor each category are the most important.1. The coordinates are derived from a procedure known as singulur vulue decomposition(see Everitt and Dunn. j = I . they are derived by partitioning the chisquared statistic for the table. and the column category coordinates .8. a s v f ~ . 2 . A brief nontechnical account of correspondence analysis is given in Display 9. As a first example of usingcorrespondence analysis.2. a n d k = 1 . and k = 1. In general. it will applied to the a be data shown in Table The resultsof the analysis are shownTable 9. is . Application of the method leads two sets of coordinates. i = 1. .ANALYSIS OF CATEGORICAL DATA 1 277 departure from independence of diagnosisand treatmentswith drugs.. rather than a variance.. . one set representing the to row categories and the other representing the column categories.6 for further work with residuals. r . The pattern isas would Display 9. the category coordinates can row be represented asuit. in resulting correspondence analysis diagram in Figure 9. because they can be used to plot the categories a scatterplot.. details) applied to the matrixE. .) 9. for the elements of which are eij . a full account the techniqueis available in Greenacre of (1984). (See Exercise 9.2 0 ) except that m 01.6 CorresuondenceAnalvsis ~~ Correspondenceanalysis attempts to display graphically the relationship between the two variables forming contingency tableby deriving a sets of coordinates a representing therow and columns categories of the table. multivariate data (see EverittDand . Correspondence Analysis Correspondence analysis attempts display graphically the to relationship between the variables forming a contingency table by deriving a set of coordinate values representing the row and column categories of the table. is Small of coordinate valuesfor the row and column categories a cell indicate that the observed and expected values are not too dissimilar. cell A for which the row and column coordinates are both largebut of opposite signs is one in which the observed frequencylower than expected under independence.2.9 and the 9. nil = where the terms as defined in Display are 9. Correspondenceanalysis diagrams can be very helpful when contingency tables are dealt with. The correspondence r principal components analysis coordinates a e analogous to those derived from a analysis of continuous.6.
Here the twodimensional correspondence diagram is shown in Figure 9.278 TABLE 9.10.2. collected in a survey in which people the UK were asked about in which of a number of characteristics couldbe applied to people i the UK and n in other European countries. the French to be stylish and sexy.17 00 .1 1 0.9 Derived Coordinatesfrom Correspondence Analysisfor Hair ColorEye Color Data Category coonl 1 coonl2 Eye light (U) Eye blue @B) Eye medium @M) Eye @D) Hair fair ( 0 h Hair r d (tu) e Hair medium (hm) Hair dark (hd) Hair brown (hh) 0. the Germans efficient and the British boringwell.8 CrossClassificationof Eye Color and Hair Color CHAPTER 9 Hair Color Eye Color Fair Red Medium Black Dark Light 116 4 Blue38 Medium Dark 48 688 326 343 98 188 110 1 84 68 584 241 26 909 403 412 3 81 TABLE 9.24 0. for example.70 0.09 0.04 0. fair hair being associated with blue and light eyes for and so on. of The second example a correspondence analysis will involve the data shown in Table 9.08 0.23 0.04 0.44 0.17 0. example.40 0.5 0. It appears that the respondents judged. they do say you can prove anything with statistics! .14 0.54 0.59 1.21 0.27 be expected with.
10 What do People in the Think about Themselves and Their e r s in the UK h European Community? 279 Chamcteristic French0 Spanish(S) Italian0 British@) Irish (r I) Dutch@) German(G) 37 7 30 9 29 1 5 4 14 7 4 48 12 14 21 8 19 4 2 1 1 10 6 16 2 12 19 9 27 10 20 27 30 15 3 10 7 7 12 3 2 9 8 3 12 2 10 0 2 13 6 9 8 7 11 13 6 3 5 26 5 2 4 41 23 13 16 11 6 12 10 29 25 22 1 28 1 8 38 5 2 6 1 1 3 1 2 2 27 4 8 6 1.0 c 1 0.5 FIG.ANALYSIS OF CATEGORICAL DATA I TABLE 9.5 0. . I . 9.0 9.8. Correspondence analysis diagram for the hair colorEye color datain Table 9.
One rule of thumb suggested by Cochran (1954).5 0. Correspondence analysis diagram Table 9. whether or not a given data setlikely to suffer from the usual asymptotic is . Cochran also gave a further rule of thumb that appears to have been largely ignored.it is not easy to describe the sample size needed for the chisquare distribution to approximate the exact distribution of X 2 well. a continuous probability distribution.5 1. In the end no simple rule covers all cases. The probfor lem with this rule is that it can be extremely conservative.in fact. which h s gained almost universal acceptance among psychologists (and others). a priori. Unfortunately. 9. a minimum expected count of one is permissible as long as no more than20% of the cellshave expected values below five.0 0. lo. a is that the minimum expected count all cells should beat least five.2.280 CHAPTER 9 0. that is. the multinomial distribution (see the glossary in Appendix A).o c1 FIG. and. and it is difficult to identify.4. namely the chisquared distribution. SPARSE DATA In the derivation of the distribution of the X 2 statistic. The p value associated with the X* statistic is calculated from the chisquare distribution under the assumption that there is a sufficiently large sample size. is beiig used as an approximation to the true distribution of observed frequencies. namely that for tables larger than 2 x 2. for the data in 9.
One solution that is now available is to compute exact p values by using a permutational approach the kind encountered in the previous chapter.56.00108. The statistic for Table 9. X ' 5 is 12. The in if of approach will become clearerwe use the specific example Table 9. Its exact probability . it will contribute to the exact p value. all but here the number of cells with expected value less than cells in the table! The exact p value is then obtained by identifying all tables in the reference set whoseX* values equal or exceed the observed statistic.and because X* value is more extreme than the observed value. of The main idea in evaluating exact p values is to evaluate the observed table.4).13(a) a member of the reference set and has a value 2 of is for X is its 14. relative to a reference set other tablesof the same size that l i e it in every of are possible respect. Table 9. and summing their probabilities. The reference set of tables of for this exampleis shown in Table 9. For example.67. which under the null hypothesis of independence are found from the hypergeometric distribution formula (see Display 9.12. Table 9. which summarizes the results a firefighters' entrance exam.13(b) .1 1.11 Firefighters' Entrance Exam Results EfhnicCIWD ofEntmnt Test Result White Black Hispanic Asian Total Pass Fail No show Total 5 0 0 5 2 2 5 1 0 3 5 2 0 1 4 5 9 2 20 9 TABLE 9.ANALYSIS OF CATEGORICAL DATA I 28 1 TABLE 9. except terms of reasonableness under the null hypothesis.12 Reference Set for the Firefighter Example inference. Again. that is.1 1 is 11.
0398. methods and associated software areavailable that make now this approach a practical possibility (see StatXact. X2 = 9. the number of tables reference set Table 9. For example. Here the exact approach leads to a different conclusion.14 Reference Set fora Hypothetical6 x 6 Contingency 'Mle XI I X13 XI2 x14 x16 x23 x33 X26 x24 X34 X44 X4 5 XI5 X25 x21 x3 I x22 x32 X42 xu 14. for (b). TABLE 9. The exact p value calculated in this way is . The real problem calculating exactvalues for contingency tables compuin p is tational.). . Its X* value is 9. namely that the test result is not independent of race. and the exact probability is .778.00108. Cytel Software Corporation. Xz = 14.5 W X46 X% 7 7 12 4 34 4 x4 I x43 x53 4 x5 I x52 5 6 7 5 X55 7 is also a member of the reference set. for (a).67.778 (not larger that the ohserved X2 value) and so it does not contribute to the exact p value.com.282 TABLE 9. which is less than that of the observed table and so its probability does not contributeto the exact p value.6 in the for billion! Fortunately.07265.14 is 1. pralay@cytel.13 WO Tables in the Reference Set for the Firefighter Data CHAPTER 9 20 9 2 9 9 2 9 20 Note. The asymptotic p value associated with the observed X* value is .
6. In a 2 x 2 table one possible measure might be thought tobe the difference between the estimates of the two probabilities of interest.15 Calculation of the Odds Ratio and Its Confidence Interval for the Schizophrenia and Gender Data The estimateof the odds ratio is D $= (43 X 52)/(32 X 15) = 4.24.66 D The odds in favor of being disgnosed schizophrenic among males is nearly five times the corresponding odds for females. The estimated variance the logarithmof the estimated odds ratio is of 1/4 + 1/52 + 1/15 + 1/32 = 0. as illustrated in Display 9. first suggested by Cohen (1960). The calculation of the odds ratio and standard error for the schizophrenia and genits der dataare outlined in Table 9. which is explained is the in Display 9. Consequently. MEASURING AGREEMENT FOR CATEGORICAL VARIABLES: THE KAPPA STATISTIC It is often required to measure well two observers agree on the use of a cathow egorical scale. THE ODDS RATIO Defining the independence of the two variables forming a contingency table is relatively straightforward. . 1992). The odds of being diagnosed schizophrenic in males is between approximately2 and 10 times the odds in females.9.2. An alternative. 9. The most commonly used indexof such agreement is the kappa coeflcient.29).29)I = [2.96 X m= (0. but measuring the degree of dependence is not so clear cut.7.80.86]. and many measures have been proposed(Everitt.80).ANALYSIS OF CATEGORICAL DATA I 283 TABLE 9. measure odds ratio.66) f 1.2.15.1404 An approximate confidencefor log @ is log(4. the required confidence interval for 1(1 is [exp(0. exp(2. and in many respects a more acceptable. The data shown in Table 9.5. 9. This statistic is of considerable practical importance in the application of both loglinear modelsand logistic regression (see Chapter 10).16 will be used to illustrate the use of this index. These data are taken from a study 1953) in which two neurologists independently classified (Westland and Kurland.
1. If the levels of variable are changed (i. an cells Variable 1 category 1 Category 2 Variable 2 Category 1 P11 P21 Category 2 P12 P22 The ratios p11/p12 p21/p= are know as odds. Multiplying either row by a constant or either column by a constantleaves J r unchanged. The first is the odds ofbeing in and category 1 of variable1 for thetwo categoriesof variable 2.7 The Odds Ratio CHAPTER 9 The general 2 x 2 contingency table met Display 9..2) as * Confidence intervalsfor J can bedetermined relatively simply by using the r (I following estimatorof the variance of log : vwlog J ) l/a r= + l / b + l / c+ l/d.2 can be summarized in terms in of the probabilities of observation beingin each of the four of the table.exp(Jru). The odds ratio is estimated h m the four frequencies in an observed 2 x 2 contingency table(see Display 9.284 Display 9.96 x If the lmt of the confidence interval log(Iobtained in this way are $L. 3. (. remains unchanged if rows and columns interchanged. with a value of1 corresponding to independence (why?). The odds ratio. J. a number of rhas desirablepropecries for representing dependence among categorical variables that other competing measures have. listing category2 before category l). the are 2. (Odds will be familiar those to on readers who like the occasional flutter the DerbyEpsom or Kentucky!) A possible measure for the degree of dependenceof thetwo variables forming a 2 x 2 contingency tableis the socalled odds ratio. for r . d w . (Ia becomes l/*. These do not properties include the following.e. given by Note that(Imay take any value between zero and infinity. An approximate 95%confidence intevalfor log (is given by I log 4 i 1. iis for I" then the corresponding confidence interval J is simply exp(JrL). The second is the corresponding odds for categoIy 2 of variable1.
unlikely. that each three observer is simply allocating subjects at random to the categories in accormarginal rates for the three categories. Howof ever.16. To explain.429. C. two observers would lead to Even such a cavalier rating procedure used by the some agreement and a corresponding nonzeroofvalue chance agreement. each for the first data set Table 9. (2) B. Category 1: the numberof chance agreements to expected is be 100 x 10/100 x 10/100 = 1. 3. P is calculated as follows. consider the two sets data i Table 9. Suppose. the two observers measured of n are as achieving 66% agreement if PO is calculated. in . totally category 2. despite such advantages (and intuition). The problem is that PO makes no allowance be for agreement between raters that mightattributed to chance. for the data in Table 9.can be calculated simply from the marginalofrates observer. (4) D.5) Such a measure has the virtuesimplicity and itis readily understood. In both. probably suffering from multiple sclerosis. however.doubtful. possibly suffering from (3) multiple sclerosis. . For example.17. P = (38 + 11 + 5 + 10)/149 = 0. Observer proceeds likewise. This P. Westland and Kurland (1953). PO.ANALYSIS OF CATEGORICAL DATA 1 Diagnosis of Multiple Sclerosis by ' b o Neurologists 285 TABLE 9. certainly suffering from multiple sclerosis. 149 patients into four classes:(1) A. One intuitively reasonable index of agreement for the two raters is the proportion PO of patients that they classify into the same category.16 A B C D 1 0 6 10 Total A B C 7 11 37 Total D 38 33 10 3 84 S 11 14 l 0 3 S 3 44 47 3s 23 149 Soutre. o (9. For example. 1.. and definitely not suffering from and multiple sclerosis. PO is not an adequate index of the agreement between the two raters. observer A dance with their in the first data set would simply allocate 10% of subjects to category1 80% to .17. and the remaining 10% to category disregarding the suitability B of a categoryfor a subject.
However.66. 3 Category 3 the number of chance agreements be expected is .33. The clearest statement (1975). repeating the calculation on the second set of data in Table 9 1 gives P = 0. in this particular table. P is given by . : to (9. d l the observed agreement might simply be due to chance.who suggested an in favor of such a correction has been made by Fleiss index that is the ratio difference between observed and chance agreement of the the maximum possible excess observed over chance agreement. of is.8) =[“ l 100 1 + 64+ l]= 0.7) 100 x 10/100 x 10/100 = 1. P . which is considerably lower than the observed . agreement.11 WOHypotheticalData Secs. . Eachof which Shows 6 % 6 Agreement Between the WOObservers Observer A Observer B Total Data Set 1 1 2 l 1 10 3 Total Data Set 2 3 Total 8 1 2 8 64 8 80 8 1 10 3 3 1 10 80 10 100 1 2 24 5 1 30 13 20 7 40 5 22 30 40 30 30 100 (Remember how “expected” values are calculated in contingency tables? See Display 9 1. Consequently. ) 2. Category 2 the number of chance agreements be expected is to 100 X 80/100 X 80/100 = 64.286 CHAPTER 9 TABLE 9. A number of authors have expressed opinions on thetoneed incorporate chance agreement into the assessment of interobserver reliability. pc (9. . that 1 . Therefore.7 .
+ +351149 X 1 1 (9. whatconstitutes "good" agreement? Some arbitrary benchmarksfor the evaluationof observed K values have been given by Large Sample Variance of Kappa Display 9. K z 0. If observed agreementis greater than chance. K = 1. Finally.308).1 1) (9.280)/( 1 0. If the observed agreement equal to chance. . with its minimum value depending on the marginal distributions of two raters. like all such estimates. Cohen. the The chance agreement for the multiple sclerosis is given by data P..108.2797. Thus there is some evidence that the agreement between the two raters in this example is greater than chance.002485. pi.208.). have included the value However. 0.8 where pi] is the proportionof observations in the ijth cell of the table of counts of agreements and disagreements for the two observers. which leads to an approximate 95% confidence interval of (0. are the row and column mminal proportions.429 0.p. Pc)/(l (9. in the unlikely is K > event of the observed agreement being less than chance.280) = 0. otherwise the confidence interval would zero. = (44/149 + K 1 X 84 471149 X 37 149 231149 X 17) = 0.12) Consequently.10) If there is complete agreement between the two that all the offdiagonal so raters cells of the table are empty. The formula for the largesample variance of K is rather unpleasant. This calculated value K is an estimate the corresponding population value of of and. and r is the number of rows and columns in the table. for the multiple sclerosis data. Its value for the multiple sclerosis data is 0. and p. = (0. be accompanied some measure its variance to has by of so that a confidence interval can be constructed.ANALYSIS OF CATEGORICAL DATA I 287 This leads to what has become known the kappa statistic: as K =: I (Po . The variance of an observed value of K has been derived under a number of different assumptions by several authors. and Everitt (1969).8. but for those with a strong stomach it is reproduced in Display 9. 0. including Everitt (1968) and Fleiss. K = 0.
Would the people taking the examination be satisfied by this value? This is unlikely.610.60 Moderate 0.20 slight 0. Categorical data occur frequently psychological studies. for example. 4. in the m as tiple sclerosis data a disagreement involving one rater classifying a patientA and the other rater classifying the same patient as D would be very serious and An K would be given a high weight.210.010.40 Fair 0. The chisquare statistic can used to assess the independence or otherwise be of two categorical variables. that two examiners rating examination candidates as pass or fail had K = 0. sources of disagreement should be searched for and rectified.or both. The concept of a chance corrected measure of agreement can be extended to situations involving more than two observers. Sparse data can be a problem when the 2 x 2 statistic is used.00 Poor 0. This is a p values. Nevof ertheless. example of the calculation of weighted and DUM some comments about choice of weights are given by (1989).288 Landis and Koch(1977). to If this were the case.80 Substantial 0. see Fleiss and Cuzick (1979) and Schouten (1985). although replacing numerical values with rather poorly defined English phrases may not be to everybody’s taste. 3. In fact. there is no simple answer to the original question concerning what constitutes good agreement. 9 7 SUMMARY . 1. Suppose. problem that can be overcome by computing exact .811. They are as follows.410.. in 2. for details. only then might one have sufficient confidencein the assessment of a lone examiner. the reasons for the departure from independence have to in more detailby the use of adjusted residuals.54 (in the moderate range according to Landis and Koch). with weights K in reflecting differences the seriousness of disagreements. A weighted version ofis also possible. When a significant chisquare value for a contingency table has been obbe examined tained. they may be helpful in the informal evaluation a series of K values.00 Perfect CHAPTER 9 Of course. particularly if future candidates are going be assessedby one of the examiners but not both. kappa Agreement Strength of 0. For example. correspondence analysis. any series of standards such as these are necessarily subjective.
ANALYSIS OF CATEGORICAL DATA I 289 5. The kappa statistic can be used to quantify the agreement between two observers applying a categorical scale. 4. and then click on Expected and Total and ensure that Observed is also checked. of These tests can alsobe used in the command line approach. is of 6. Now click on Statisticsand select Chisquare. (a) Click onFisher’s exact test for the Fisher test dialog box. click on Cells. If X is a 2 x 2 table. Click on Statistics. Then. SPLUS Various tests described in the text can be accessed from the Statistics menu as follows. with the relevant functions being chiqtest. . Click on Statistics. fisber. Specify the row variable in the Rows box and the column variable in the Columns box.test.test(X. can be applied as follows. then Yates’s correction would applied by default. and mcnemar. b (c) Click onChisquare testfor the chisquare test independence. for a contingency table in which the frequencies were stored in a matrix X. the chisquare testof independence could be applied with the command chisq. 5. click on Continue.test(X). 2. and click on Counts and Proportions. ( )Click on McNemar’stest for the McNemar test dialog box.correct=F). click on Compare Samples. 1.test. 3. 1. The odds ratio an extremely useful measure association. To be stop the correction being applied would require chisq. To get observed and expected values. the chisquare for indepentest dence in a twodimensional contingency table. click on Summarize. Click on Continue and then OK. and then click on Crosstabs. For example. COMPUTER HINTS SPSS The most common test used on categorical data.
290 CHAPTER 9 EXERCISES 9. 93.18 shows a crossclassification of gender and in the afterlife . Swimming Club. Table 9. the results are statistically significant.18 Gender and Belief in the Afterlife Belief in the Afierlifc Gender YCS No 147 134 Female Male 435 375 TABLE 9.19 shows data presented in the case of US versusLansdowne . What is the degreeof agreement in commonsense judgmentsof socioof economic statusby two persons who are themselves radically different status? TABLE 9. Test whether the two classifications are independent. The government. lost the case.19 Application for Membershipt the LansdowneSwimming C u a lb Parameter Black Applicants WhiteApplicants membership Accepted for Rejected for membership Total applicants 1 5 6 379 0 379 . 92 Table 9. 1 belief for a sampleof Americans. however. Lundberg (1940) presented the results of an experiment to answer the question. because of applying the statistic when the expected count in two cells was less than fivethe defendant argued that the software used to analyze the data print exact out a warning that the chisquare test might not be valid.. and construct a confidence intervalthe difference the proportion men and for in of proportion of women who believein the afterlife. Analyzed in the usual way by the Pearson chisquared statistic. Use an to settletest the question.
4.20. Yes Yes No 15 5 Total 20 40 60 No Total 20 5 35 40 . by 9. Investigate the level agreement is of between the two raters using the kappa coefficient.21 Berkeley College Applications Females Admitted Department Males Refucd Admitted Refused A 207 B c E F 138 D 512 353 120 138 53 22 313 205 219 351 l7 202 131 94 2 4 89 19 8 391 244 299 317 One setof data collected shown in Table 9.20 SixPoint Scale Janitor’s Ratings Ratings Banker’s l 2 3 4 5 6 0 0 0 0 0 3 0 0 1 4 4 l 0 0 0 25 11 6 6 21 48 4 3 0 0 8 27 13 1 1 0 8 0 2 0 0 TABLE 9. Consider the following table of agreement for two raters rating a binary response.ANALYSlS OF CATEGORlCAL DATA I J n t r s a d Banker’s Ratingsof the SocioeconomicStatus of 196 Familieson a aio’ n 29 1 TABLE 9.
. the department applied for. for 9. gender versus application result. Comment on your results in the light of the nonsignificant chisquare statistic these data.21 classify applicantsto the Universityof California at Berkeley by their gender.5.22 shows the results of mothers’ rating their children in two consecutive years.7 Table 9. Year2 Doing W l . e l Year 1 No Yes 31 Total No YS e Total 49 17 52 83 66 80 69 149 Show that the kappa statisticfor these data is identical to both their intraclass correlation coefficient and their product moment correlation coefficient.292 CHAPTER 9 Mothers’ Assessments of Their Children’s School Perfomawe. and whether or not they were successfulin their application. ( )Calculate the chisquare test for independence of application result and b department applied for. find the odds for the resulting is ratio 2 x 2 table.2. for ( consistent and.22 Doing Well. why not? b (c) Are the results from (a) and) 9. as to whether or not they were doing well at school. Find the standardized residuals and the adjusted residuals incidence for the of cerebral tumors in Table 9. and also calculate its 95% confidence interval. if not.6. 9. (a) Ignoring which departmentapplied for. Years 1 and2 TABLE 9. The datain Table 9. separately men and women. Carry out the appropriate test whether there has been a change in the mothers’ ratin of over the2 years.
we shall look at threedimensional tables.2. will not usually have been encountered are the in an introductory statistics course. one subject of this chapter. It is such tables and their analysis that tables. however. Crossclassifications of more than two categorical variables. Both tables will be 10. will have been familiar to most readers. which were the subject of the previous chapter. THREEDIMENSIONAL CONTINGENCY TABLES The analysis of threedimensional contingency tables poses entirely new conceptual problems compared with the analysis of twodimensional tables.2. although three 293 .l 0 Analysis of Categorical Data 1: LogLinear Models 1 and Logistic Regression 1 0 . However.1 examined in more detail later in this chapter. INTRODUCTION The twodimensional contingency tables. 10. To begin our account of how to deal with this type of data. 1 . the extension from tables of dimensions to those of four or more. W Oexamples ofmultidimensional contingency resulting from crossclassifying three categorical variables and one from crossclassifying five such variables. appear in Tables and 10.
3. or four an accommodationtype.2 Danish DoItYourself Accommodnlion Dpe Apartment work House Tenurn Answer Age <30 3145 46+ ~ 3 0 3145 46+ Yes Rent Skilled No 15 13 3 Yes Own No 18 15 4 5 1 34 28 35 Unskilled Yes Rent No Yes Own No Yes RentOffice 21 23 61 17 34 2 3 30 2 5 1 10 29 17 0 2 6 9 56 1 211 15 19 3 0 10 2 6 8 56 12 44 16 23 9 51 2 2 own No Yes No 8 4 76 19 40 25 12 5 102 191 54 1 2 19 2 3 13 52 31 13 16 7 49 11 Note. jumping. TABLE 10. gas. in the preceding year.tenurt. an crossclassified against the other variables. suffocating. W Note. at c or S . which age. or explosives. other. answers. yes no. These.&g. andtype of work of respondent. they a craftsman had carriedout work on their home that they would previously have employed to do.1 CrossClassificationof Method of Suicide by Age and Sex Method A@ (Years) StX l 2 3 4 5 6 455 121 1040 398 99 Male 4170 6 93 0 r7 95 15 1040 259 450 Female 4170 r 7 0 154 Male Male Female Female 38 75 316 185 14 33 55 26 40 124 82 38 60 10 71 4.294 CHAPTER 10 TABLE 10. 6. knives. solid or liquid m te 2.5.dataarise from asking employed people whether. Methodsare 1. hanging. .
the chisquared statisticapproximately which the conclusionis that infant survivalis not zero as it is for Clinic B. p = . presents no further new problems. implying that there racial equalityin the application the death penalty. is of care received. respectively. analyzing the data aggregated over the racethe victim classification of gives a chisquared statistic 2. why not simply attempt its analysis by examining the twodimensional tables resulting from summing the observed counts over one of the variables? The example shownin Table 10. Consequently. this section is concerned with only threedimensional tables and will form the necessary for the discussionof models basis for multiway tables to undertaken in the next section. to A further example (see Table shows that the reverse can also happen. p = .suggesting that infant survival associated with amount is . .21 with a single degree freedom andan associof of is of ated p value of.77.05.the separate analysesthe data White victims and for of for Black victims lead to chisquared values of 8. the 10.W3 and 5.26withasingledegreeoffreedomandanassociated p value of less than . from related toamount of care received.ANALYSIS OF CATEGORICAL DATA I1 TABLE 103 Racial Equalityand the M Penalty t h 29s Parameter Death Not Penally Penally Death White Wctim White defendant foundguilty of murder Black defendant foundguilty of murder Black Wctim White defendant found guilty of murder Black defendant foundguilty of murder 190 110 1320 520 90 970 0 60 190 170 2 x 2 TdlefromAggregating Data over Race of Wctim White defendant found guilty of murder Black defendant foundguilty of murder 1410 1490 often increasing the complexityof both analysis and interpretation.019.54.3 illustrates why such a procedure not tobe recomis being data. However. 14.4) data aggregated over a variable can show a relationship between the remaining variables when in fact no such relationship really exists. be The first question that might be asked about a threedimensional contingency table is. Claims of racial equality the applicationof the death penalty now in look a little more difficult sustain. For mended it can often lead to erroneous conclusions drawn about the example. For Clinic A however. Here the aggregated for datathechisquarestatisticis5.
none of the variables are related. There is conditional independence. There is partial independence. is of interest in a twodimensional table. the situation is more complex for a threedimensional table. that is. of 3. an association exists between two of the variables.4 Died Place Where Care Received Less Prenatal Cam More Prenatal Care Less Prenatal Care More Prenatal Care clinicA Clinic B 3 17 4 2 176 197 293 23 2 x 2 Tablefrom Aggregating Dataover Clinics Infant's Survival Prenatal Amount of Less More Total 20 6 26 313 316 689 393 322 715 The reason these different conclusions will become apparent later. There is mutual independence of the three variables. investigator may wishto test an that some variables are independent some others. namely that independence the two variables of the of involved. Howev for these examples should make it clear consideration of twodimensional tables why resulting from collapsing a threedimensional table not a sufficient procedure is for analyzing the latter. More specifically. For example. . 2. CHAPTER 10 TABLE 10. both which are independentof the third. but each may be associated with the third variable (this is the situation thatholds in the case of the clinic data discussed above). and several hypotheses about the three variables may have to be assessed. that is.296 Survival of Infants and Amount of Renatal Care. Only a single hypothesis. two of the variables are independent in each level of the third. However. the following hypotheses may be of interest in a threedimensional table. that is. or that a particular variable of is independent of the remainder. 1.
1.) As will be demonstrated later.1) where “observed” refersto the observed frequenciesin the tableand “expected” refers to the estimated expected values corresponding to a particular hypothesis (see later). and in Table 10. estimated expected values for the hypothesis of no secondorder relationship between the three variables involves the application of a relatively complex iterative pmcedure. the estimated expected frequencies of in a twodimensional table are found from simple calculations involving the marginal totals of frequencies as described in the previous chapter. In many cases X 2 and Xi will have similar values. able in the analysis of more complex contingency tables. X*. for an explanation how to determine the number of the degrees of freedo of . (10. the chisquared statistic. by means usual of the or chisquared statistic. see Chapterfor anexample. and the likelihood ratio statistic for the suicide data in Table are shown.1 number of degrees of freedom corresponding each test statistic27 (see Everitt. each hypothesis is tested in a fashion exactly analogous to that used when independence is tested for in a twodimensional table. investigators analyzing multiway contingency tables will obtain the required expected values and associated test statistics from a suitable piece statistical software.1 and 10. The details areoutside the scope this text.5 the calculationof estimated expected values. The 10. the variables a threeway contingency table may display a more in complex form association. to is 1992.2). Details of how this hypothesis is formulated and tested are given in Display 10. Under the hypothesis independence. as will be illustrated later. with the observed frequencies. estimated expected values in multiway tables cannot always be foundso simply. given by Xi = 2 c observed x In (observed/expected). The first hypothesis shall consider a threedimensional contingency table we for is that the three variables forming the tableare mutually independent. (This is analogous to with the threeway interaction in a factorial design three factors for a continuous 4 response variable. In general. in a threedimensional table. but there are a 1976) that make it particularly suitnumber of advantages to the latter (Williams. Unfortunately.In some cases the required expected frequencies corresponding to a particular hypothesis about the variables in a threedimensional table can also be found from straightforward calculations on certain marginal totals (this will be illustrated in Displays 10. For example.ANALYSIS OF CATEGORICAL DATA II 297 In addition. namely by comparing the estimated expected frequencies corresponding to the particular hypothesis. namely whatknown asasecondorderrelationship. an alternative knownas the likelihoodratio statistic. of is this occurs when the degree direction of the dependence each pair variables or of of is different in some or all levels of the remaining variable. of course. so will not need to too concerned with the of and be details of the arithmetic. but they can be found of in Everitt (1992).
h].$. !. Display 10..?..k* where ?ti. namely that variable 1 (say) is independent of variable 2.1 Testing the Mutual Independence Hypothesisa ThreeDimensional Contingency Table in Using an obvious extensionof the nomenclature introduced in Chapter for a 9 twodimensional table. j k Pik The estimators of the probabilities involvedare ?. where nij. B .! .P.and n. an association between variable 1 and 3 is allowed. the three variables usedto form Table 10.298 C W E R 1 0 Display 10.k. parrial independence hypothesis about the three Now considera more complex variables in a threedimensional contingency table..].. = nij. .k and P .. k are the marginal probabilitiesof belonging to the i. we formulate the hypothesisof mutual independenceas can where pijt represents the probability an observation beingin the ijkth cell of the of table and pi. and that variables2 and 3 are also unrelated..j and kth categories of the three variables.p.k are singlevariable marginal totalsfor each variable obtained by summing the observed frequencies over the other two variables.. represents the twovariable marginal totals obtained by summing the third observed frequencies over the variable. p.. and F = n?i. the hypothesisof partial independence canbe written in two equivalentforms as HO: = Pi. However. n.k = P.r/n. Display 10. Using these probability estimates leads to the following estimated expected values under this hypothesis: = n. The estimated expected values underthis hypothesis when there asample ofn is observations crossclassified are Eijk marginal probabilities are The mtlutrve (Ad fortunately also themaximum likelihood) estimators of the whe? ?.....j.and P ..1 are not mutually independent. k are estimates of the corresponding probabilities.j.j..2 Testing the Partial Independence Hypothesis in a WeDimensional Contingency Table With the use of the same nomenclature as before./n3 Corresponding to a particular hypothesis).2 gives details of . Clearly. . .
9 29.9 244.7 29. consider the hypothesisof no secondorder relationship between the This allows each pairof variables to be associated.9 39.6 = 1753.) Finally.7.5 212.theestimatedexpectedvalue. X! = 790. independenceis obtained as 299 Elll.0 104.5 Testing Mutual Independence the Suicide Data for For thesuicidedata.7 17. They are found from an the iterative procedure referredto earlier. in this case.9 278. how this hypothesis is formulated and the calculationof the estimated expected values.0 The values of the two possible test statisticsare X2 = 747.ANALYSIS OF CATEGORICAL DATA I1 TABLE 10. Clearly. the maximum likelihood equations from . estimated expected values cannot be found directly from any set of marginal totals. El.6 59. are Age 10 04 4170 >70 1040 4170 r70 Sex Male Male Male Female Female Female 1 371.6 128. +556. Other estimated expected values can found in asimilar fashion.6 4.9 + 186. (The technical reason for requiring iterative process is that. Note that the singlevariable marginal totals estimated expected values under the hypothesis of the of mutual independenceare equal to the corresponding marginal of the observed values. even more complicated hypothesis not adequate these data.5 +212.9 186.. 40 14..0 6 34. The mutual independence hypothesis has27 degrees of freedom.8 48.9 556.= 371.5 85. but it constrains the degree and direction of the association to be the same in each level of the third variable.37. totals for example.9 73. underthehypothesis of mutual E111= 5305 X 3375/5305 X 1769/5305 X 1735/5305= 371.89. (A more detailed account of how best to compare observed and expected valuesis given later.3 42.2 20.6 2 51. and the full set values for be of the suicide data under mutual independence as follows.3 730.6 the partial independence hypothesis tested for the suicide is this is for data. knives.7 318. explosives (explosives!) to perform of the tragic task.5 34.0 76. in this case. In Table 10.5 106.8 417.30. A comparisonof the observed values with the estimates values to be expected of the in all under this partial independence hypothesis shows that women age groups are or underrepresented in the useguns.5 69.2 89. Details of testing the hypothesis for the suicide data are given in Table 10.8 59.7 + 318.1 51.9 nl.4 Method 3 4 5 487.5 + 106. = 398 + 399 +93 +259 +450 + 154 = 1753. Note that.4 25.5 139.2 24. variables in the suicide data.
The estimated expected value E I I under this hypothesisis found as l Ell! = (3375 X 657)/(5305) = 417.5 34. but the advantages to be gained from a modelfitting procedure are that it provides a systematic approach to the analysis of complex multidimensional tables and.5 123.9 86. in addition. = 418.7 7.1 453.3 6 103.forex~pmple. is Age 1040 4170 s70 1040 4170 >70 Sex Male Male Male Female Female Female l 418.4 318.6 5 60. we wish to test whether methodof suicide is independent of sex.6 Testing the Partial Independence Hypothesis the Suicide Data for Here.1 90.1 157.1 239.3 15.n~~. = 398+259 = 657 and .7 182. The values of the two test statistics are + X’ = 485.areequal.7 34.4. 10. set such values of * Other estimated expected values canfound in a similar fashion. we wishto allow an association between age and method. These statisticshave 17 degrees of M o m under the partial independence hypothesis. Further comments on this result will be given in the next section. gives estimates of the magnitude of effects of interest.0 49.andE~~. MODELS FOR CONTINGENCY TABLES Statisticians are very fond of models! In the previous chapters the majority of analyses have been based on theassumption of a suitable model for the data of interest.5 70.98.0 540.4 25.3.6 14.6 44.0 = 657. and the be full under the partial independence hypothesis as follows. each particular model corresponds to a specific hypothesis about the variables forming the table.9 2 3 349.3 Notethatinthiscasethehemarginaltotals. As will be seen.7 200. However. demonstrating this particular that hypothesis is acceptable for the suicide data.511..0 239.5 793.9 89. not dissimilar to those used in the analysis of variance. The analysis of categorical data arranged in the form of a multiway frequency table may also be based on a particular type of model. The equations have to be solved iteratively. The models .0 Methods 4 107.) Both test statistics are nonsignificant.3. which estimates arise haveno explicit solution.n. X i = 520. and that age and a m sex also unrelated.3 59.0 51.44 77.4 23.7 8.l.300 CHAPTER 10 TABLE 10.5 61.6 4.6 40.3 60.0 308.
this means that the odds ratios corresponding to the 2 x 2 tables that arise fmm the crossclassificationof pairs of categories of two of the variables are the same in all levels of the mmaining variable. in . the required estimates have to be obtained iteratively byusing a p d u r e described in Everitt(1992).) Estmates of expected values under this hypothesis cannot be found from simple calculations on marginal totals of observed frequencies as in Displays 10.9 166. The estimated expectedvalues derived form this iterative procedure as follows.4 56.3 2 122. The model introduced thereis analogous to the model used a twoway analysis of variance in (see Chapter4)’ but it differsin a numberof aspects.7 The two test statistics take the values X2 = 15. 1.384.7 77. 3 Whereas a linear combination parameters is used in the analysis vari.3.1 496.751. in multiway tables the natural model is multiplicative and hence logarithms are used to obtain a model in which parameters combined additively.1 192.40.3 Method 3 4 5 439.7 13.1 12.7 I1 30 1 Testing the Hypothesis of No SecondOrder Relationship Between the Variables in the Suicide Data The hypothesisof interest is that the association between any two of the variables does differ not in either degree or direction in each levelof the remaining variable.8 427.4 24.90.6 27.3 17. The model does not distinguish between independent and dependent variables.7 6.ANALYSIS OF CATEGORICAL DATA TABLE 10. The degrees of freedom for this hypothesis are 10.3 10. @etails are given in Everitt 1992. are 4. detailsare given in Display10.9 379.2 156. The data now consist of counts.0 13. 00 57. In previous chapters the underlying distribution assumedfor the data was the normal. rather than a score for each subject on some dependent variable. with frequency data the appropriate distributionis binomial or multinomial (see the glossary Appendix A). Instead.6 38.1 308.1 and 10.2.9 33. A l l variables are treated alike as response variables whose mutual associations are to be explored.6 147. are Age 1040 4170 r70 Sex Male Male Male 1 410. used for contingency tables can be introduced most simply a little clumsily) (if in terms of a twodimensional table.7 246.4 99.9 6 122. of of ance and regression models of previous chapters.3 1040 4170 >70 Female Female Female 4. More specifically.9 39. 2.6 8. X i = 14.2 70.1 110.4 2.8 819.
and the expected responseis thefrequency to be expected under a particularhypothesis. the model can be rewritten in theform The form the model now very similarto those usedin the analysis of variance of is (see Chapters 3 and 4). Under the independence hypothesis. Consequently. are given by Fit. the following linear model for theexpected frequencies is arrived at: hF. Fi/. which Can be rewritten.the error in are terms here will notbe normally distributed. Appropriate distributions the binomial and multinomial (see glossary in Appendix A).. that is. again using obvious dot notation. uw) themain effect ofthe jth category ofthe columns and is variable.+hF..hn. in observed response= expected response error.j. nit in a cell of the table..t=:hF. is thestarting point. 1992. is themain effect of category i UI(I) of the row variable. for details). By some simple algebra (it really is simple. ANOVA terms are used for theparameters. fil = npi. Therefore.302 CHAPTER 1 0 Display 10. The main effect parametersare defined as deviations of row columns means of log or frequencies from the overall mean. Here the observed responseis the observed count.p. an The values takenby the main effects parameters in this model simply reflect differences betweenthe row or columns marginaltotals and so are of little concern .3 LogLinear Model for a noDimensional Continnencv Table withRows andc Columns r Again the general model considered previous chapters. as dot When logarithm are taken. the population frequencies. and U is said to represent an overall mean effect. using an obvious notation. butsee Everitt. Hence + Unlike the corresponding terms models discussedin previous chapters.
and the model provides perfect fit to theobserved data. 1 interaction between variables1 and 3. Estimated expected values under this model wouldsimply be the observed frequencies themselves. 1992. ~ 2 3 ( j is) the interaction between variables and 3. The loglinear model be fitted by estimating the parameters. u3(k)is themain effect of variable 3. and hence the can expected frequencies. j ) model the association between the two variables. Eij. 7 .ANALYSIS OF CATEGORICAL DATA I1 303 in the context the analysisof contingency tables. They could estimated by of be replacing the in the formulas above with the estimated expected values. parameters to represent the possible associations between each pair of variables. The saturated modelwill now have to contain main effect parameters for each variable. 1. ulz(ij) is the interaction between variables and 2. j ) are related toodds ratios (see Exercise10. of course. 4. ulu(ijk)is the secondorder relationship between the three variables. If the independencemodel fails to give a satisfactory fit to a twodimensional table. but. ~ 2 8. As a way to assess whether some simpler model would fit a given table. using either the the chisquared or likelihood ratio test statistics. 2. u Now consider how the loglinear model Display 10. extra terms mustbe added to the model to represent the association between the two variables. u 2 ( j ) is the main effect of variable 2. no simplification a in descriptionof the data. and comparing these withobserved values. This would be exactly equivalent to the usual procedure for testing independencein a twodimensional contingency table as described in the previous chapter. 6 . The interaction parameters l 2 ( . to In Fij = + + Ul(i) U2(j) + UIz(ij)’ where the parameters 1 2 ( . This leads a M e r model.2) The parametersin this model are as follows. The model is the Ejk = + + + + Ulg) U13(ik) UZU) UU(jk) + + + u3~) u12(ij) ul23(ijk) (10. u This is known as the saturated modelfor a twodimensional contingency table because the number parameters in the model equal to the numberof of is independent cells in the table (see Everitt. U is the overall mean effect. particular parameters in the saturated model are set to zero and the reduced model is . u l ( i ) is the main effect of variable 1.7). ~ 1 3 ( i t )is the 5. find the unsaturated The purpose of modeling a threeway table would be to model with fewest parameters that adequately predicts the observed frequencies.3 has to be extended to in be suitable for a threeway table. for details). and finally parameters to represent the possible secondorder relationship between three variables. 3.
arises only each variable. 2 4 2 . uls are included. certain marginal totals of the estimated expected values are consmined to be equal to the corresponding (This marginals of the observed values.304. The results given in Table 10. In practice. arises because of the form of the maximu likelihood equations.) The terms to specify the model with the bracket notatio used are the marginals fixed by the model. Therefore. To illustrate the use loglinear models in practice. and that the association between sex and method is the same for all age groups. the lowerorder effects comif terms posed from variables in the higher effect are also included. The fined marginuls or bracket notation is frequently used to specify the series of models fitted when a multidimensional contingency tableis examined. model states that the associationbetween age and method of suicide is the same for males and females. CHAPTER 1 0 assessed for fit. so also must t e r n 1412. The first three models are of no consequence in the analysis of a three4 dimensional table.Details are given in Table The Table 10. details of whichare too technical to be included in this text. as shown in Table 10.(This restriction to hierarchical models arises from the constraintsimposed by the maximum likelihood estimation procedures used in fitting loglinear models.8 demonstrate that only model 7 provides an This adequate fit for the suicide data. it is important to note that. 2.differences in the likelihood ratio statistic for different models areused to assess whether models of increasing complexity (la number of parameters) are needed. For example. their are in Table10. 1.8.and u 3 . when testing particular hypoth about a multiway table (or fitting particular models). a series of such models of be fitted to the suicide data given in 10. For example. aim of the procedure is to arrive at a model that gives an adequate fit to t data and. However. The notation reflects the fact noted earlier that. the largest effect for method. 1 . Model is known as the minimal model for such a table. attention must restricted to what are known as hierarchical models. models such as Kjk = U +% ( j ) + U3(k) + U 123( ijk) (10. in general. Particular points to note about the material inthis display are as follows.55. These are such that whenever a higherorder effect is included in a model.9. .) Each model that can be derived from the saturated model for a threedimensional table is equivalent a particular hypothesis about the variables forming the table.3) are not permissible.4. the restriction is of little consequence because most tables can be described by a series of hierarchical models.8. to the equivalence is illustrated in Display 10.The estimated values simply reflect differences between the marginal totals of the categories of = 1. The parameter estimates (and the ratio of the estimates standard errors) for the fitted are given to their model main effects parameters not of great interest. ~ 1 3~.1.
Model 8 is the saturated modelfor a threedimensional table. l 2. 5. it essential to examine is the fit in more detail than is provided by a single goodnessoffit statistic such as the likelihood ratio criterion. In f i j k = U ul(i) 3. Marginal totals for variable 2 and variable3 are equal. A l frequencies are the same. 3. Variables 1 and 2 are associated and both are independent of variable 3.) .) 4.10. With loglinear models. three 8. or drowning as a method of suicide than the other five possibilities.[31 [121. suggesting that the chosen model does (A give an adequate representationof the observed frequencies. In E j t = U U l ( i ) 4. (Because these first three models do not allow the observed frequencies to reflect observed differences in the marginal totals of each variable.4) The residuals for the final model selected for the suicide data are given in Table 10. l n F j j & = U 2. suffocating. There is no secondorder relationship between the variables. In f i j t 7. All of the residuals are small.  (10. 1996. Marginal totals for variable 3 are equal.) The hypotheses corresponding to the seven models are follows. because more people use hanging.[31 [l219 [l31 [121. 6. example. sex for For interest. The variables are mutually independent. The estimated interaction parameters more are of those for age and method and and method. Variables 2 and 3 are conditionally independent given variable 1. The reversetrue for “gas” and “gun”. is As always when models are fitted to observed data. [131. ation 6. for is Bracket LogLinearModel 1. In f i j k 8.[231 [l231 UIZ(t. generally using the standardized residual calculated standardized residual = (0 E)/&.ANALYSIS OF CATEGORICAL DATA I1 305 Display 10. differences between the and estimated expected frequencies form the basis this more (E) for observed (0) as detailed examination.[21 [11*[21. 7. first as 1. particularly the latter reflect that males use “solid” and “jump” women use them more less and than if sex was independent of method. In E j i t = U Ul(i) 5. they are of no real interest in the analysisof threedimensional contingency tables. In kjlt In E j t = 11 + = U + ul(t)+u u )+ = U + ul(i) +U Z ( j ) + =U + + + + + + + UZ(j) U2(j) U1 U3(&) ~ 3 ( t ) ulzij) u3(k) U 1 x i j ) U3(t) U l Z ( i j ) +u 2 ( / ) + USG) U123(ijt) + + + +u23Uk) + U23(jl) + + + + u3(it) + + + ~13(it) U W t ) UIW) [11. far fuller account of loglinear models is given in Agresti.4 Hierarchical Modelsfor aGeneral ThreeDimensionalContingency Table A series of possible loglinear models a threeway contingency table as follows.
The resulthighly significant and of is with the second model provides a significantly improved fit compandthe mutual independence model.m1 . [ 1 1 Xi 790. [3].4 l The difference in the likelihood ratio statistic for two possible models can be to choose used between them. [l31 [W.0 14.[121.1) of fit of beiig tested over the baseline model. have X i = 790. [231 [2312 . which is of most interest. is L = (790. 3 Model DF 17 22 25 [11. but of often it could be the simpler of two competing models. H : (U 12 = 0 for short).3 with 27 degrees of of freedom andX i = 520.[21.7 389.that is.9 P <.m1 C. and it is the relationshipof this variable to the remainder.3 520. LOGISTIC REGRESSION FOR A BINARY RESPONSE VARIABLE In many multidimensional contingency tables a m tempted to say almost all). It will be the mutual independence model. 121.1%.3 = 98. [131.3 14. and variable is sex). L. The bypothesis that the are 0 u12(il) = 0 for all i and j extra parameters in the more complex model zero. Dl W]. When these situations are introduced .4. [3]. the mutual independence model.8 LogLiear Models for Suicide Data CHAPTER 10 The goodness of of a series loglinearmodels is given below (variableis metbcd.4 658. [l]. and it is these that are considered in this section. respectively.m1 <. [31 I 1131.9 with 10 degreesof freedom.6 [=l.306 TABLE 10. (I in there is one variable that can properly be considered a response.  10.001 <.9)/790. Situations in which the response variable has two categories are most common.4 with 17 degrees freedom.m1 12 15 10 <0 1 . The measure as ( e s L is defined as Xipaseline model) Xi(mode1 of interest) L= Xtpaseline model)  L lies in the range and indicates the percentage improvement in gwdnessthe model (0.0 <.[31 [W. The choice baseline model is not fixed.9 154. For comparing the mutual independence no secondorder relationship models the suicide and on data. P424. For example. Herethis leads to a value 269. and the model that 1 allows an association between variables and 2. variable fit of 1 2 is age. A useful way of judging a seriesof loglinear models is by means an analog the square of of of the multiple correlation coefficient used in multiple regessione Chapter 6). with degrees ratio of freedom equal to the difference in the degrees of freedom of the two models. is tested by the difference in the two likelihood statistics.
42 0.6) 0.1 >70 0.5) 0.81 0.2) O.3 (11.O(O.ANALYSIS OF CATEGORICAL DATA I1 TABLE 10.O(0.4) Gun Jump 0. that is.3) 0.2) Other 0.3) Method 0. i n Sex Male Female 10 04 0.1 (1.6) Gun O.8) .1 (2. i n Male Sex Female Solid Gas 0.3 (3.5) 0.1) Method Hanging 0.2(2.09) >70 0.2) 0.3 (4.5) 0.0) The estimated interaction parameters for method and sex and the these estimates to their ratios of standard emrs ( parentheses)are as follows.2) The estimated interaction parameters for age and and the ratios of these estimates to their sex standard errors (parentheses) are as follows.4 06 0.4(13.33 1.55 EstimaWSE 34.28 7.) each The estimated interaction parameters for method and age and their ratio to the comsponding standard error (inparenthms) are as follows.5 (4.45 11.55 02 .64 7.4) 0.0) 0.1 (1.6 (13.O(0. model 7 in Display 10.1 (0.33  GUn Jump Other 1040 4170 270 .4) 0.8) Age 4170 0.4) 0.90 8.77 16.8) Other 0.3 (11.56 0.4 (3.6) Gas 0.41 Male Female (Note that the parameter estimates for variable sum to zero.9 Parameter Estimates in the Final Model Selected for the Suicide Data 307 The final model s e l e c t e d for the suicide data that of no secondader relationship between the is t r e variables.5) (6.2 (4.75 41.5) 0.1 (2.23 16.3) 0.1) 0.4 (5.6(14.0(0.41 0.4.4) 0.9) 0.5 (8.00 7.1) 0. Solid Age 1040 O.5 (9.5) 4170 0.23 15.1 (1.6 (3.3) 0. he The estimated main effect parameters are as follows.5 (8.33 15.7) 0.2 (6.0 (0. category Solid Gas Hang Estimate 1.1 (1.27 1.1 (4.6) Jump 0.4(5.6 (8.3) (8.5 (4.6) 0.4 0.1 (1.5 0.
4 0.3 0. we As this shall seein the examples to come.3 0.11.8 0.1 0.0 0.1.5) this probability would be the expected response and the corresponding observed response would be the proportion of individuals sample categorizedcases. in observed response = expected response error.2 0.2 0.5 1. and here of a we will use the data shown in Table 10.1 1.3 0.5 0.2 0. Modeling the relationship between a response variable and a number of explanatory variables has already been considered in some detail Chapter 6 and in in a numberof other chapters. How might these data be modeled if interest centers on how "caseness" is related to gender GHQ score? One possibility that springs and to mind would be to consider modeling the probability. in the as .2 in the context of a multidimensional contingency table.10 CHAPTER 1 0 StandardizedResidualsfmm the Fioal Model Fitted to Suicide Data Age Sa I 2 3 4 5 6 4170 r70 1040 1040 4170 >70 Male Male Male Female Female Female 0. model encounteredin earlier chapters and Display 10.3 0.1 0.8 0.7 0.1 1.In terms of the general p. of being a case.2 0. 1972).8 0. is nor the case.In general the data will consist of of either obervations from individual subjects having the value of a zeroone response variable and associated explanatory variable or these observations values. and explanatory variables might be a mixture categorical and continuous.5 0.6 0.308 TABLE 10.10.3 0.0 0. that is. + (10.6 1. are grouped contingency table fashion.0 0. with counts of the number zero and one of values of the response in each cell.so why not simply refer to the methods described previously? The reason with the nature of the response variable.4 0.1 0.8 0.4 0.9 0.5 1.9 0. Explanations lies if of will become more transparentconsidered in the context an example. it might be thought that our interest in them is confined to only categorical explanatory variables. These data arise from a study psychiatric screening questionnaire called the General Health Questionnaire (GHQ) (see Goldberg.4 0.
(10. F.8) by using a leastsquares approach as described in Chapter 6. male.7) To simplify thingsfor the moment. female.6) observed response = p + error. M. let’s ignore gender and fit the model P = : B +BlGHQ o (10. and predicted values of &e response . a possible model is P = BO B w x B2GHQ and + + (10. estimated standard errors. Estimates of &e W O parameters. of Cases No. Therefore.11 GHQ Data I1 309 GHQ Scorn Sa No. o Noncases f 0 1 3 4 5 6 7 8 9 10 2 0 3 4 5 6 7 8 9 10 1 2 F F F F F F F F F F F M M M M M M M M M M M 4 4 8 6 4 6 3 80 29 15 3 2 1 1 2 3 2 2 2 1 1 1 3 3 2 5 8 0 0 0 36 0 4 1 2 4 1 1 3 2 2 Nore.ANALYSIS OF CATEGORICAL DATA TABLE 10.
l). and the transformation chosen ensures that fitted are values of p lie in the interval (0.Ooo 1 .041 0.011 2.096 0.100 0. Thus using a linear regression approach here can lead is to fitted probability values outside the rangeAn additional problem that the 0. but the response probability constrained are is a to be in the interval (OJ).136 0.12 Estimate Parameter SE t &.(inteFcept) PI(GHQ) 0. We see that in addition to the problems noted earlier with using the linear regression approachthis model here.136 0.328 0. Now a or transformed value of p is modeled rather than p directly.425 0. gives the results shown Table 10.065 0.713 0.750 0. Predicted Pmb. That most frequently adopted is the linear logistic model. provides a very poor description of the data. to have serious disadvantages.905 1 . Immediately a problem becomes apparenttwo of the predicted values greater than one. A graphical comparisonof the fitted linear and logistic regressions is shown in Figure 10.233 0.809 11 0.714 0.303 0.13.W7 0.085 8. Observed Pmb. 10 1 2 3 4 5 6 7 8 9 0.ooo are shown in Table 10.5. disregarding gender.700 0.617 0. Note that now the predicted values in all are satisfactory and lie between 0 and 1. h is this is clearly not suitable for a binary response. It is clearly not sensible to contemplate using a which is known apriori model.857 1. .698 Predicted Values of pfrom This Model ObservationNo.500 0. logistic modelfor short. and we needto consider an alternative approach so to linear regression for binary responses.521 0. a e m r term in the e a r regression model assumed to have normal distribution.310 CHAPTER 1 0 Linear Regression Modelfor GHQ Data with GHQ Score as the Single Explanatory Variable TABLE 10. Details of the logistic regression model given in Display 10. Fitting the logistic regression model to the GHQ data.818 0.001 1.12.1.1.
and it is given by exp(B0 + BA + + &x. The parameters in the modelcan beinterpreted as the change the In(odds) of the in response variable produced by a change of one unit the corresponding explanatory in variable. . Thus the log (odds) of being a case increases by 0. . the “one” category of the response variable. the .74 for a unit increase in GHQ score. of a probability. odds As p varies from 0 to l.92). ni is the total number of responses the ith category. .5 I1 3 1 1 The Logistic Remssion Model The logistic rrunsfonnufion.. is defined as follows: p.A is the logarithm of the ratio for theresponse variable. 1991.) The parameters in themodel are estimated by maximum likelihood (see Collett.09 = (0. In other words. andthe fitted proportionsfrom the logistic model for the probability p. thatis. The estimated regression coefficient for GHQ score in the logistic regression is 0.  P= 1 There are a number of summary statistics that measure the discrepancy between the observed proportions of “success” (i.for anexplanation of why not. where XI. see Collett. conditionalon theother variables’ remaining constant. Modeling thelogistic transformation ofp rather than p itself avoids possible I 1 problems of finding fitted values outside permitted range.74. An approximate 95% confidence interval for the regression coefficient is 0.x. A=In~p/~lP~l=Bo+Blxl+~~z+~~~+Bqxq. thenthe deviance cumof be used as a measure of fit. A.) +exp(h + + +p.74 f 1.(n[p/( p)] is their often written as logit(p) for short. (We are assuming here that the raw for Danish data have been collected categories as in the GHQ data and the into doityourself data in the text. that is. Whenthis is not so and the data consist of the original zeros and ones the for response.9) . The most commonis known as the deviance.96 X 0.see the examples in the text. XZ.56. true success say).xg) * ~ I X I * i=l where yi is the number ofsuccess in the ith category ofthe observations. andYi = nisi where fii is the in predicted success probability this category. The logistic regression modelis a linear model for A.ANALYSIS OF CATEGORICAL DATA Display 10. D.A varies betweenm and m. are the q explanatory variables of interest. (10.e. 1991. for details). A = ln[P/(l P)].0.) The deviance is distributed as chisquared and differences deviance valuescan be in used to assess competing models. It is sometimes convenientto consider modelas it represents p itself.
represents the change in log (odds) (here a decrease For as the explanatory variable increases by one.the estimated regression are coefficientfor gender.700 0. havea value of 0. Things become betterwe translate everything back to odds.71 1 0.122 0. Predicted Pmb.55. which would indicate the independenceof gender and caseness (see Chapter g).750 0. 0.75. for Now let us consider a further simple modelthe GHQ data.04.714 0. 1.the dummy variable coding sex.041 0. Sex Female Male 25 Case 43 Not case 131 79 .1.725 Observed Pmb.14. that is. by exponentiating the various terms. for If we now look at the 2 x 2 table of caseness and genderfor the GHQ data.095 7.991 However.272 0. namely a logistic regression for caseness with only gender as an explanatory variable. in the odds beiig a case of when the GHQ score increases one.13 Modelfor GHQ Data with GHQ Score as the Single Explanatory Variable Esrmnte Parameter SE t h(intercept) PI (GHQ) 1 2 3 5 6 7 2.96 anda 95% confidence interval we of (0.980 0. such results given they are in terms of log (odds) are not immeas if diately helpful.51].56).783 ObservationNo.377 0.857 1O OO . 0.818 0.736 9. exp(0.960 0.920 0.312 Logistic Regression CHAPTER 10 TABLE 10.303 0.it appears that gender does not predict caseness these data.O92)] = [1.558 0.08 represents the increase So.500 0.2. The results of fitting such a model shown in Table 10.062 0.74) = 2. 0.224 0. The corresponding by confidence intervalis [exp(O.Ooo 4 9 10 11 8 0. such a change implies that the observation arises from a man rather than a Transfemng back to odds.950 0.70). exp(O.846 0.100 0. Again. Becausethis interval contains the value one.
1.127 we see that the odds ratio is (131 x 25)/(43 x 79) = 0.I I TABLE 10. which includes both GHQ score and gender .14 Logistic Regression Model GHQ Data with Gender as for the Single Explanatory Variable EstimateParameter SE I Bo (intercept) Bo (=x) 1. (Readers encouraged to are to confirm that the confidence interval found from the logistic regression is the same as would be given by using the formula the varianceof the log of the odds for As a ratio given in the previous chapter.ANALYSIS OF CATEGORICAL DATA 1 1 313 I I I I 0 2 4 6 GHQ 6 l 10 I FIG.338 0.176 0.114 0. a further exercise.the estimated regression coefficient is simply the log of the odds ratio from the 2 x 2 table relating the response variablethe explanatory variable. Fitted linear and logistic regression models for the probability of b e i n g a case as a function of the GHQ score for the data in labie 10.96.037 0. 10.readers might fit logistic regression model to the GHQ data. For asingle binary explanatory variable.the same resultas given by the logistic regression.289 6.
706 0. equal intervals along the scale.) As a further more complex illustrationof the use of logistic regression. and (4) work of respondent: skilled. and 3for the work variable would really not do.147 0.835 0. defined follows. There are four categorical explanatory variables: (1) age: under 30. A crossclassification of work + + TABLE 10. when the it use of multiple regression with categorical explanatory variables was discussed. as here. Simply than a s coding the categories.300 5. who were asked whether. was mentioned that although such variables could be used.2. The proper to handle way because it would imply a nominal variable with more that k > 2 categories would be to recode it as a number of dummy variables. let us consider only the single explanatory variable. and over 45. In Chapter 6. workrespont h dent.2. Work Work1 Skilled 0 Unskilled 1 Office 0 Work2 0 0 1 Theresults of thelogisticregressionmodel of theform logit(p) = Blworkl BZwork2 are given in Table 10.134  6. (3) tenure: rent or own.say.237 0. or office.If they do. 3145. unskilled. because of the conditional nature of these regression coefficients w in other variables are included the fitted model. (2) accommodationtype: apartment or house.695 1.314 CHAPTER 1 0 as explanatory variables.15. Thus we will recode work terms of two dummy in as variables. they will notice that the estimated regression coefficient for sex is no longer simply the log of the odds ratio from the 2 x 2 table above.112 0.15 Logistic Regression for Danish DoItYourself Data with Work as the Single Explanatory Variable Parnmerer Estimate SE t (intercept) B (work11 1 F2 (WOW 0. they had canied out work in their home that they would have previously employed a craftsman to do. 1. whichis a categorical variable withree categories. The response variable here is the answer (yedno) to that question. in the preceding year. The data come from asample of employed men aged between 18 and 67 years. some care was neede in howto deal with them when they have more two categories. work1 and work2. the method is applied to the data shownin Table 10. of To begin.768 .
and age has been similarly coded in terms of two dummy variables. The results in Table 10.237. log (odds ratio) 0. However.789. Note that the coefficients for workl and work2 are similar. age1 and age2. We see that the coding used produces estimates for the regression coefficients workl and work2 that equal to the log ratio) from comparing skilled and are (odds unskilled and skilled and office. and tenure are the three most important explanatory variables for predicting the probability of answering yesto the question posedin the survey. So the model fitted is logit(p) = BO Blworkl+ hwork2 Bsagel+ B4age2 Bstenure B6type. but not identical to as those in Table 10. are taken but after the effects of the other three explanatory variables into account.) The results from fitting the logistic regression model all four explanatory with variables to the Danish doityourself data are shown in Table 10. 119 X 301) = 0. however.15.835. the same caveats apply in to logistic regression as were issued in the case of multiple linear regression + + + + .17.16 CrossClassification of Work Against Response for Danish DoItYourself Data 315 work Response m Total Skilled e Unskilled 239 210 119 No Yes 241 Ttl oa 301 481 782 360 449 932 1591 against the response is shown Table 10.434. (Readers encouraged to repeatthis exercise. From this table we can extract the in following pair of 2x 2 tables: 119 Response Unskilled Skilled No Yes 1 24 210 odds ratio = (119 X 210)/(241 ResponseOffice Skilled No 241 Yes 48 1 odds ratio = (1 19X 481)/(241 X 239) = 0. still be interpreted log (odds ratios).ANALYSIS OF CATEGORICAL DATA I1 TABLE 10. Work has been recoded as workl and work2 as shown above.17 appear to imply that work. age. They can.16. are as use the age variable represented alsotwo dummy variables. log (oddsratio) = 0.
141 0.31work2 1P O. forthis model a ~ e shown in Table 10. by examin are ining deviance differencesas variables are added to a current model.152 0.5. the final model to in is " p . Work.825 3. an existing model is to. but here shall try to find a suitable model for the Danish doityouself data a relatively informal manner. although the criterionfor deciding whether or not a candidate variable should be added or excluded from. and Age are all in so required in a final model. now usually involving the deviance index goodness of fit described of in Display 10. different. backward.106 7. and stepwise proceto dures described in Chapter6. (10. Helpful plots of these diagnostics are shown in Figure 10. Consequently.154 0.137 0. way to that described Chapter 6 it similar for multiple linear regression.18 that Tenure.2(a) shows the deviance residuals plotted against the fitted values .141 0.17 Results from Fitting the Logistic Model toD n s DoItYourself aih Data by Using All Four O s r e Explanatory Variables bevd SE 1. The parameter estimates.368 0.10) Before trying to interpretmodel.18.6. Explicitly.+ l. useful diagnostics for logistic regression "bo are detwo scribed in Display 10. Figure 10.168 0. the t values give only arough guide to which variables should be included in a final model. Many statistical packages have automatic variable selection prowe cedures tobe used with logistic regression. and on.316 CHAPTER 1 0 TABLE 10. it would be tolook at some diagnostics this wise in that will indicate any problems in a has.147 Chapter &these regression coefficients and the associated standard errors are estimated.43age2.2. Important subsetsof explanatory variables in logistic regression often seare lected by using a similar approach the forward.017 t 0.76workl 0.0ltenure h 0.019 2.30 0. condirional on the other variables' being in the model. It is clear from the results Table 10.138 0. along with the observed and predicted probabilities of g a yes answer the question posed the survey.19.llage1 0.984 5. Details given in Table 10.
12) .18 Comparing Logistic Models for the Danish Do&Yourself Data 317 4 We will start with a model including only tenure as an explanatory variable and then add explanatory variables in an order suggested by the t statistics from Table 10. it appears that the two younger age groups do not differ in their probability of giving a positive response.61 29.ANALYSIS OF CATEGORICAL DATA II TABLE 10.004 1.59 40. 10. It appears that conditional on work and age. And conditionalon age and tenure.00 In fitting these models. Finally.00 DF diff.17. These plots give no obvious cause for concern. and similarly age is entered as age1 and age2. the general form of such models. (10. as outlined in several previous chapters.94 0.2(b) shows a normal probability plot of the Pearson residuals.61 29. and Figure 10. Differences in the deviance values for the various models can be used to assess the effect of adding the new variable to the existing model. work is entered as the two dummy variables work1 and work2. is observed response = expected response error. THE GENERALIZED LINEAR MODEL In Chapter 6 we showed that the models used in the analysis of variance and those used in multiple linear regression are equivalent versions of alinear model in which an observed response is expressed as a linear function of explanatory variables plus some random disturbance term.98 10. and that this probability is greater than that in the oldest age group. These differences can be tested as chisquares with degrees of freedom equal to the difference in the degrees of freedom of the two models that are being compared.67 DF Deviance diff.5. + (10. the probability of a positive response is far greater for respondents who own their home as opposed to those who rent. often referred to as the ''error'' even though in many cases it may not have anything to do with measurement error.11) expected response = linear function of explanatory variables. 2 2 1 p + + + Age + Age + ?srpe 34 32 30 29 <. Therefore.OOOl . unskilled and office workers tend to have a lower probability of responding yes than skilled workers. We can arrange the calculationsin what is sometimes known as an analysis ofdeviance table Model Tenure Tenure +Work Tenure Work Tenure Work Deviance 12. so we can now try to interpret our fitted model. 31.
36 0.72 0.40 0.TABLE 10.140 2.71 0.39 0.50 12 13 14 15 16 17 18 19 20 22 23 24 25 26 27 28 29 30 0.71 0.81 0.53 0.82 0.39 0.55 33 28 15 0.47 0.36 0.113 0.77 0.54 0.40 0.75 0.54 0.868 5.47 0.141 0.55 0. 0 0 0.304 8.53 0.55 0.47 0.61 0.37 0.58 0.61 0.25 62 14 8 10 11 6 7 8 9 0.114 work1 0.29 0.48 0.79 0.64 51 6 4 2 68 77 43 27 34 73 16 23 21 0.116 n Observed and Predicted Probabilitiesf a Yes Response o Observed Cell cted 1 2 3 4 5 0.40 0.79 0.169 0.00 1.77 0.436 Intercept Tenure" 0.137 0.33 0.29 0.49 0.47 0.55 0.45 0.58 0.19 0.55 0.152 Work2 Age1 Age2 1.47 0.19 StandardErrors for the Final Model Parameter Estimates and Selected for Danish DoItYourself Data SE t Estimate Parameter 0.64 5 2 0.83 0.30 0.763 0.019 0.71 0.50 0.50 100 55 3 32 83 0.63 0.73 0.39 0.44 0.014 0.39 42 61 47 29 23 (Continued) 318 .826 3.34 0.305 0.
71 12 7 3 73 267 163 "Tenure is coded 0 for rent and 1 for own. How might thesemodels beput into a similar form to the models the in used is by a analysis of variance and multiple regression? The answer relatively simple . Specification of the model is completed by assuming some specific distribution for the error terms.71 0.defined as where sgn(yi .72 0.67 0. so a normal probability plot the XI should be of linear. the assumed distributionnormal with mean zero and a constant is variance d . and negative Y1) dl when y~<j l .Plotting the deviance residuals against the fte values canbe useful. itd indicating problems models.71 0.63 0. Now consider the loglinear and the logistic regression models introduced in this chapter.73 0. for example. The second useful diagnostic for checking a logistic regression is the model deviance residual.71 0.6 Diagnostics for Logistic Regression The first diagnostics for logistic regression the Pearson miduals defined as a are For ni 2 5 the distribution of the Pearson residuals can reasonably approximated be by a standard normal distribution. Pearson residualswith absolute values greater two or three might be regarded than with suspicion.73 0.64 0.19 (Continued) 319 Cell n 31 32 33 34 35 36 0. Display 10.ANALYSIS OF CATEGORICAL DATA II TABLE 10.is the function that makes positive whenyi 2 9. Like the Pearson residuals.the deviance residual with is approximately normally distributed.74 0.33 0. In the case of ANOVA and multiple regression models.
t L Ii L 320 .
For example. namely allowing some transformation of of the expected response be modeled as a linear function explanatory variables.13) f(expected response) = linear function of explanatory variables. The fit of a loglinear model is assessed by comparing the observed and estimated expected values under the model by means of the chisquared statistic or. logistic regression. context this generalized In the of linear model (GLM). more commonly. by 2. to that is. details of which are well outside the technical requirements text (see. and researchers in general. However. by introducing a model of the form observed response = expected response error. 4.McCullagh and of this Nelder. fordetails). main of its advantage is that it provides the opportunitycarry out analyses that more to make realistic assumptions about data than the normality assumption made explicitly and. 5. By also allowing the error terms to have distributions other than the normal. Expectedvaluesandparameters in loglinearmodels are estimatedby maximum likelihood methods. Such mode are fitted to data by using a general maximum likelihood approach. In some cases the former consist simple of functions of particular marginal totals observed frequencies. The analysis of crossclassifications of three or more categorical variables can now be undertaken routinely using loglinear models. Nowadays. (10. Apart fromthe unifying perspective the GLM. SUMMARY 1. both loglinear models and be as logistic regression models can included in the same framewok ANOVA and multiple regression models. in of many examples. the estimated frequencies have to be obtained by an iterative process.ANALYSIS OF CATEGORICAL DATA II 32 1 adjustment of the equations given above. 10. the likelihood ratio statistic. 3. The GLM allowing a variety link functions and numerous error distributions of was first introduced into statistics by Nelder and Wedderburn (1972). statistical in all software packages can fit GLMs routinely. often implicitly the past.6. more worryingly.14) + where f represents some suitable transformation. and those working in psychology in particular. f is known as a linkfunction. 1989. need tobe aware of the possibilities such models often of their offer for a richer and more satisfactory analysis data. The loglinear models fitted to multidimensional tables correspond to particular hypotheses about the variables forming the tables. logistic regression can be applied to investigate the . In a data setin which one of the categorical variables be considered to can be the response variable. (10. the n function for lk i would be the logistic the error term and binomial.
2. on 2. 6. see Agresti (1996) for details. click on Regression. Click onOK.322 CHAPTER 1 0 effects of explanatory variables. and then click Logistic. Click onStatistics. Choose Forward as the selectedMethods. be passed to the function by using the GHQ the GHQ data were stored in a data frame with variablessex and score. then the total numberof observations in each group has to glm weights argument. follows. 1.weights=n. Click on Statistics. and then click on Loglinear (Poisson) for the loglinear models dialog box. for are Danish doit. SPLUS In SPLUS. we can use the following. Move the relevant binary dependent variable into the Dependent variable 3. COMPUTER HINTS SPSS The type of analyes described inthis chapter can be accessed from the Statistics menu. and on. so When the SPLUScommand line language is used. and famiJy=binomial for the later. click on Regression. Move the relevant explanatory variables into the Covariates box.data=GHQ). of 1. The regression parameters in such models can be interpreted in terms of odds ratios. loglinear analysis and logistic regression can accessed by means be as of the Statistics menu. Categorical response variables with more than two categories and ordinal be handledbylogisticregression types of responsevariablescanalso models. loglinear models and logistic regression models can be applied by using the glm function. . for example. 4. for example. specifying fmiJy=poisson for the former. So if. box.yourself When the data logistic regression grouped asin the example in the text. to undertake a logistic regression using forward selection by to choose a subset explanatory variables.famiJy= binomial. 5. Logistic for the logistic or regression dialog box. Click on thePlot tag to specify residual plots. the logistic regression command would be glm(psex+score.
Accidents were classified according to their type.2. and Ejk. 10. In addition to stating their preference.21arise from a study which a sample of in 1008 people were asked to compare two detergents. Ei. Show that the marginal totals of estimated expected values Eii. 10. brand M and brand X. the degree of softness the water that they the temperature of the water..k.3. derive and interpret a suitable logistic model these for accounts. and previous use of brand M.ANALYSIS OF CATEGORICAL DATA II 323 where p is the vector of observed proportions. severity. 1999) are for 30 students in a statistics class. The data shown in Table 10. the sample members provided information on of used. The data inTable 10. corresponding to the no secondorder relationship hypothesis suifor the cidedata. and whether or not the driver was ejected. andn is the vector of number of observations.. Using severity as the response variable. EXERCISES 1.22 (taken from Johnson and Albert. The response variable y indicates whether TABLE 1020 Car AccidentData Number Hurt Car Weight Driver Ejected Accident ljpe Severely Severely Not S d Small No No Yes Yes No No YeS Yes Small Small Standard Standard Standard Standard Collision Rollover Collision Rollover Collision Rollover Collision Rollover 150 112 23 80 1022 404 161 265 350 60 26 19 1878 148 111 22 . Use loglinear models to explore the associations between the four variables.20were obtained from a studyof the relationship 01 between car size and car accident injuries.areequal to thecorrespondingmarginaltotals of theobserved values. 10. The data in Table 10.4.
Soft M 55 M X X Medium 23 33 Hard X M 19 29 23 47 2 4 43 57 49 47 37 52 29 27 63 53 66 68 42 42 30 TABLE 10. B r d PreferredTemp.22 Data for Class of Statistics Students Student y i Test Score Grade in Course I 2 3 4 5 6 7 0 0 1 0 1 9 10 11 12 13 14 15 16 17 18 19 20 8 0 1 1 1 1 1 1 1 1 1 1 0 0 1 l 525 533 545 582 581 576 572 609 559 543 576 525 574 582 574 471 595 557 557 584 B C D A C B B D B A C A D C B B C A A A .324 TABLE 1031 Comparisonsof Detergents Data CHAPTER 1 0 Previous User of M WaterSobess Not Previous User of M High Temp. Low Temp. High Low Temp.
5516 0 0 .58 12.58 17. Use your fitted model to predict the probability of passing students with for maths scoresof 350 and 800.83 12.08 11.501550. Numbers y who havereached menamhe f o n in rm age group with center x .33 15.58 14.83 13.58 15. thenfit a linear model to the probability of passing and by using the midpoint of the gmuping intervalthe explanatory variable.33 14. 4.08 14.08 13.08 15.23 Menstruation of Girls in Warsaw Y n X 11. as 2.58 2 5 10 17 16 29 39 51 47 67 81 79 93 117 107 92 1049 90 120 88 105 11 1 100 93 100 108 99 106 117 98 97 100 122 11 1 94 1049 Nofe. Finally fit a logistic regression model to the ungroupedby using both data explanatory variables.ANALYSIS OF CATEGORICAL DATA II 325 TABLE 10. or not the student passed (y = 1) or failed ( y = 0) the statistics examination at Also given are the students’ scores on a previous math test the end of the course. Now fit a linear logistic model the same data and again use the model to to of of predict the probability passing for students with math scores 350 and 800. 10.33 12.5. Show that the interaction parameter in the saturated loglinear model for a 2 x 2 contingency table is related to the odds of the table.08 12. ratio .58 11.33 11.33 13. Group the students into those with maths test scores 400. and >W. 3. and their grades a prerequisite probability course. for 1.
10. the estimated in Plot probability of menstruation as a functionof age and show the linearand logistic on regression fits to the data the plot. Fit a logistic regression model GHQ data that includes main to the effects for both gender and GHQ and an interaction between the two variables. 10.6. The data in Table 10. the response variable indicating whether or not the girl has begun menstruation and the exploratory variable age years (measured to the month). Examine both the death penalty data (Table 10.7. .23 relate to a sample of girls in Warsaw.8.4) by fitting suitable loglinear models suggest what it is that leads to the spurious results the data are aggregated over a particular when variable.326 CHAPTER 10 10.3) and the infant’s survival and data (Table 10.
Dicrionary/outlineofbasicstatistics. some terms of general statistical interest thatare not specific toeither this text or psychology are defined. (1999). Freund. Terms that are explained in detailin the text are not included in this glossary. B. Four dictionaries of statistics that readers mayalso find useful are as follows. New York Dover.J. B. 2. Terms are listed alphabetically. (1995). S. Cambridge: Cambridge University Press.Appendix A Statistical Glossary This glossary includes terms encountered in introductory statistics courses and terms mentioned inthis text with little or no explanation.. Everitt. and Wykes. (1998). 327 . In addition. B.J. London: Arnold. The cambridge dictionaryof statistics in the medical sciences. S. using the letterbyletter convention.andWilliams. The cambridge dictionary of statistics.F.A dictionary o statistics forpsycholof gists. Terms in italics in a definition are themselves defined in appropriate the place in the glossary.E. T. Everitt.. S. 3. 1.(1966). Everitt. 4. Cambridge: Cambridge UniversityPress.
if variable A has an effect of size a on some response measure and variable B one of size b on the same A for response. of 200 reaction times (seconds) to a particular Attenuation: A term applied the correlation between variables when both to two are subject to measurement error. their combined effect would be a b.05. See also significance level. Provides a way of testing for differences between a setof more than two population means.96and1.96. Suppose. Analysis of variance: The separationof variance attributable to one cause from the variance attributable to others. for example.to indicate that thevalue of the correlation between the true values likely to be underestimated. + Alpha(a): The probability of a ljpe I error. Additive model: A model in which the explanatory variables have anadditive efecr on the response variable. . An example would be a distribution with positive skewness as shown in Figure 1. then the acceptance region consists values of ofzbetween1. giving the histogram A. task. Asymmetricaldistribution: Aprobability distribution orfrequency distribution that is not symmetrical about some central value. A posteriori comparisons: Synonym for post hoc comparisons. then in an assumed additive model and B. also additive model. If the significance levelchosen is . for example. is B Balanceddesign: A termappliedtoanyexperimentaldesigninwhichthe same number observations is taken for each combination the experimental of of factors. The hypothesis against which the nullhypothesis is Alternativehypothesis: tested.328 APPENDIX A A Acceptance region: The set of values of a test statistic for which the null hypothesis is accepted. A priori comparisons: Synonym for planned comparisons.So. Additive effect: A term used when the effect of administering two treatments See together is the sum of their separate effects. a z test is being used to test that is the mean of a population is 10 against the alternative that it not 10.
. 2 3 4 5 Reaction time Example of an asymmetrical distribution.M r specifically. this is the extent to which the statistical be to method used in a study does not estimate the quantity thought estimated. Bartlett’stest: A test for the equality the variances anumber of populations. The of example. of of sometimes used prior to applying analysis of variance techniques to assess the assumption of homogeneity of variance. Bias: Deviation of results or inferences from the truth. so that a significant result might be caused departures from normality ratherbythan by different variances. Itis calculated from the raw regression of the coefficients by multiplying them by the standard deviation corresponding explanatory variable and then dividing standard deviation the response by the of variable. buta Student’s t disrribution is also this shape. or processes leading oe to such deviation. See also Box’s test and Hartley’s test. A. It is of limited practical value because of its known sensitivity to nonnormality. I . Bellshaped distribution: Aprobabiliw distributionhaving the overall shapeof normal distributionis the most wellknown avertical cross section a bell.STATISTICAL GLOSSARY 329 I l I l l I 0 1 FIG. Beta coefficient: A regression coefficient that standardized so as to allow for is as to a direct comparison between explanatory variables their relative explanatory power for the response variable.
Such data are frequently encountered in psychological investigations.330 APPENDIX A 4 2 0 2 X 4 6 a ” ‘I *” I I X FIG. Bimodalprobability and frequency distributions.. A. but having underlying continuity and normality. It is estimated from the sam values as .” Binomial distribution: The probability disfributionof the number of successes. Bimodal distribution: A probability distribution. x = 0.2. *.p).2 Binary variable: Observations that occur in one of two possible states. n. . one continuous( )and the other recorded as a binary variable (x).” and “depressedhot depressed. r Specifically. in a series ofn independent trials. commonly occurring examples include “improvedlnot improved. or a frequency distribution. each of which can result in either a succe or failure.2.1. t a. is Biserial correlation: A measure of the strength of the relationship between two y variables. The probability of a success. which are often labeled 0 and 1. the distribution of is given by x P(x) = n! p”(1 x!(n x)! p>””. x. A. Figure shows examples. The mean of the distribution np and its variance is np(1. remains constant from trial toil p. with two modes.
See also pointbiserial correlation. See also Hartley’s test. Coefficient of variation: A measure of spread for a setof data defined as 10 x standard deviation/mean. 0 This was originally proposed a way of comparing the variability different as in be distributions. particularly those fitted to contingency tables.for example. Chisquared distribution: Theprobabilitydistributionof the sumof squares of a number of independent normal variables with means zero and standard deviations one. and q = 1 p is the proportion of individuals with x = 0. C Ceiling effect: A term used to describe what happens when many subjects in a study have scores on a variable that are or near the possible upper limit at (ceiling). a correlation implies that by of the varianceof y is accounted for by x. sy is the standard deviation of the y values. of Coefficient of determination: The square of the correlation coeficient between two variables x and y. Such an effect may cause problems types analysis because for some of it reduces the possible amount of variation variable. Central tendency: A property of the distributionof a variable usually measured by statistics suchas the mean. For example. Gives the proportion the variation in one variable that of of 0.but it was found to sensitive to errors in the mean. at the point division between thep and q of proportions of the curve. j o is the sample mean of the y variable for those individuals having = 0. U is the ordinate (height)of a normal distributionwith mean zero and standard deviation one. median.  Bivariate data: variables. This distribution arises area of statistics. and mode. baseline value. assessin many ing the goodnessoffitmodels. causes similar problems. in the orfloor effect. Data in which the subjects each have measurements on two Box’s test: A test for assessing the equality of the variances in a number of populations that is less sensitive to departures from normality than Bartlett’s test. Finally.STATISTICAL GLOSSARY x x 33 1 where j$ is the sample mean ofthe y variable for those individuals for whom = 1. The converse. p is the proportion of individuals with x = 1. Change scores: Scores obtained by subtracting a posttreatment score on some variable from the corresponding pretreatment.8 64% is accounted for the other. .
say. Composite hypothesis: A hypothesis that specifies more than a single value for a parameter. then95% of the calculated intervals would be expected to contain the true parameter value. It is most often encountered in the context of analysis of varia For example. for example.) and a control group (with mean the followingis X . the hypothesis that the mean of a is greater population than some value. for example. systolic and diastolic blood pressure. Nonconservative tests provide r control over the perexperiment error rate. T ~and XT. the matrix has the generalform where p is the assumed common correlation coeflcient of the measures. are and additionally its offdiagonal elements are also equal. and tend to have low power unless the sample is large. . Conservativeand nonconservativetests: Terms usually encountered discusin p sions of multiple comparisontests. may limit the percomparison ermr rate to unecessarily low values. to contain the true par A ter value. that rebelieved.332 APPENDIX A Commensurate variables: Variables that are on the same scale expressed in or the sameunits. the contrastfor comparing the mean of the control group to the average of the treatment groups: See also orthogonal contrast. calculated from the sample observaa tions. Confidence interval: A range of values. Note that the stated probability level refers to properties of the interval and not to the parameter a itself. size Contrast: A linear functionof parameters or statistics in which the coefficients sum tozero. XC). for example. which is not consideredrandom variable. three treatment groups (with means XT. in contrast. Consequently. Conservative tests. in an application involving. 95% confidence interval. implies that if the estimati process were repeated again and again.. of Compound symmetry: The property possessed a covariance matrix a set by of multivariate data when its main diagonal elements equal to one another. with a particular probability.
The coefficient takes values between and 1. in which the offdiagonal elements correlation are the coefficientsbetween pairs of variables. (. yz). Correlation matrix: A square. The size of the critical region is the probability of obtaining an outcome belonging tothis region when the null hypothesisis true. Covariance: For a sample of n pairs of observation. Critical region: The values of a test statistic that lead to rejection of a null hypothesis. level Cronbach’s alpha: An index of the internal consistency a psychological test. of Covariance matnlx: A symmetric matrix which the offdiagonal elements in are the covariances of pairs of variables. .STATISTICAL GLOSSARY 333 Correlation coefficient: An index that quantifies the linear relationship between a pairof variables. with the sign 1 indicating the direction of the relationship and the numerical magnitude its strength.c x i . the statisticgiven by x.. A value of zero indicates the lack any linear relationship between the two variables. Values of 1 or 1 indicate that the sample values fall a straight on of line.of which the most commonly is Pearson’spmductmoment correlation coefficient.) are the n sample values of the two vari. the probability of a ljpe I errur.. ables of interest. See also acceptance region. The value is related to the particular significance chosen. cxy (XI. y. . and the elements on the main diagonal are variances. symmetric matrix with rows and columns corresponding to variables. z m J n 1 ” where 2and p are the sample means the xi and yi respectively..y i . For sample observations a variety such coefficients have of used been suggested. .. = . thatis. (Q. . (.). y. of If the test consists of n items and an individual’s score is the total answered ..). yl). and elements the main diagonal are on unity.yz). .x.. Critical value: The value with which a statistic calculated the sample data form is compared in order to decide whether anull hypothesis should be rejected.defined as where (XI. (x2. yl).
for example. together with the proportion of the observations or equal to each value. calculatin their exploratory data analysis means and variances and plotting histograms. The term is also used when a lowdimensional representation of multivariate sought is data as principal components analysis and factor analysis. histograms. and so on. then the coefficient given specificallyby is : a where a2is the variance of the total scores and is the variance of the set of 0 lscores representing correct and incorrect answers on item . and calculatas ing statistics such means. in a x 2 contingency table with a given of marginal totals. scatter diagram. one whichis used to estimate the parameters in some model of interest of and the other to assess whether the model with these parameter values fits adequately. In many cases the term corresponds to the number of parameters in a model. Es sentially. the term means the number of independent units of information in of a sample relevant to the estimation parameter or calculation of a statistic. . by use of procedures such Data set: A general term for observations and measurements collected during any type of scientific investigation. See also and initial data analysis. Descriptive statistics: A general term for methods summarizing and tabulatof ing data that make main features more transparent. Data reduction: The process of summarizing large amounts of data by forming frequencydistributions. Dependentvariable: See response variable. only one of the four cell frequencies is free and the table has therefore a singl of freedom. Crossvalidation: The division of data into approximately equalsized subtwo sets. less than D Data dredging: A term used to describe comparisons made within a data set not start specifically prescribed prior to the of the study. Cumulativefrequency distribution: Alisting of the sample values of avariable. F 2 set example. and correlation coefficients. variances. Degrees of freedom: An elusive concept that occurs throughout statistics.334 APPENDIX A correctly. i.
E EDA: Abbreviation for exploratory data analysis. number of trials to leam a particular task. be variable withk categorieswould be recoded in termsk . . k. Dummy variables: The variables resulting from recoding categorical variables with more than two categories into a series of binary variables. For example. widowed. the original numerical codes for the categories. correspond toan interval scale. Dichotomous variable: Synonym for binary variable. Marital status. the values . . Effect: Generally used for the change in a response variable produced by a change inone or more explanatoryor factor variables. Diagonal matrix: A square matrix whose offdiagonal elements are all zero. 3 for example. Doubly multivariate data: A term usedfor the data collectedin those longituis recorded for each dinal studies which morethan a single response variable in subject on each occasion. of two Variable 1: 1 if single. 2 for single. that is. It is usually most obvious in the final recorded digitof a measurement.2. or separated. .STATISTICAL GLOSSARY DF(df): 335 Abbreviations for degrees offreedom. Variable 2: 1 if divorced. both new variables would In general a categorical zero. Discrete variables: Variables having only integer values. Digit preference: The personal and often subconscious that frequentlyocbias curs in the recording of observations. and for divorced. for example. For a married person. and 0 otherwise. if originally labeled1 for married. widowed.dummy variables. or separated could be redefined in terms variables as follows. and 0 otherwise. of 1 Such recoding used before polychotomous variables used as explanatory is are variables in a regression analysis to avoid the unreasonable assumption that 1. .
is If an interval is calculated within which the parameter is likely to fall. The essence this of approach is that. the process calledpoint estimation. Some general principles good design simplicity. and for that seek to predict or explain the response variable.Also commonly known as the independent variables. an Experimental design: The arrangement and procedures usedan experimental in study. avoidance bias. replication. Experimental study: A general termfor investigations in which the researcher candeliberatelyinfluenceeventsandinvestigatetheeffects of theintervention. Estimation: The process of providing a numerical value for a population paof If rameter on the basis information collected from a sample.a single figure is calculated for the unknownparameter. Explanatory variables: The variables appearing on the righthand side of the or equations defining. Experimentwise error rate: Synonym for perexperiment errorrate. where Smoothis the underlying regularity pattern in the data. data assumed to possess the following are structure: Data = Smooth +Rough.See also leastsquares estimationand confidence interval. multiple regression logistic regression.336 A APPENDIX Empirical: Based on observation experiment rather than deduction from basic or laws or theory. Estimator: A statistic used to provide an estimatefor a parameter. for example. broadly speaking. is unbiased estimator of the population mean. Error rate: The proportion subjects misclassified by an allocationrule derived of from a discriminant analysis. The objective or of the exploratory approach is to separate the Smooth from theRough with . Exploratory data analysis: An approach to data analysis that emphasizes the use of informal graphical procedures not based on prior assumptions about the structureof the dataor on formal modelsfor the data. although is not to be recommended because this they are rarely independentof one another. then the procedure is called interval estimation. of are of the use of random allocation for foning treatment groups. and adequate sample size. The sample mean. example.
of Fisher's exact test: An alternative procedure to the usethe chisquared of statis tic for assessing the independence two variables forming x 2 contingency of 2a table.STATISTICAL 337 minimal useof formal mathematics statistical methods. See also initial data or analysis.where n is the sample size. Familywise error rate: The probabilityof making any error in a given family of perexperiment error rate. under investigation in an experimentas a possible source variation. inferences. . See also percomparison error rate and F distribution: The probability distribution of the ratio of two independent random variables. This is essentially simply a of categorical explanatory variable. Fittedvalue: Usuallyusedtorefertothevalueoftheresponsevariableas predicted by some estimated model. Fishing expedition: Synonym for data dredging. each having a chisquared distribution. Followup: The process of locating research subjects or patients to determine whether or not some outcome of interest has occurred.wherepisthepopulationcorrelation value and variance l/(n . given by Thestatisticzhasmean~In(l+p)/(lp). with a small number of levels.r . but most commonly to refer to acategoricalvariable. F Factor: A term used in a variety of ways in statistics. Divided by their respective degrees freedom. Fisher'sz transformation: A transformation of Pearson's product moment correlation coefficient. The transformation 3) may be used to test hypotheses and to construct confidence intervals for p. Floor effect: S e e ceiling egect. particularly when the expected frequencies are small. Eyeball test: Informal assessmentof data simplyby inspection and mental calculation allied with experiencethe particular area from of which the data arise.
Goodnessoffit statistics: Measures of agreement between aset of sample values and the corresponding values predicted from some model of interest. IQ Scom Class Limits Observed Frequency 1519 8084 8589 9599 100104 105109 110114 2115 9094 2 1 10 5 9 l 4 2 1 Frequency polygon: A diagram used to display graphically the values in a frequency distribution. The frequencies are graphed as ordinate against the class midpoints as abcissae. and spread. Grand mean: Mean ofall the values in a grouped data set irrespective of group . Particularly useful in displaying a number of frequency distributions on the same diagram. if it is bound to occur soon. Acts a as useful summary of the main features of the data suchas location. It is testing whether particular variances are the same also for the equalityof a tests set of means. shape. F test: A test for the equalityof the variances of two populations having normal in the in which taken from each. most often encountered analysis of variance. An example ofsuch a tableis given below.338 APPENDIX A Frequency distribution: The division of a sampleobservations into a number of of classes. based on theratio of the variancesof a sample of observations G Gambler’s fallacy: The belief that an event has not happened for a long time. together with the number of observations in each class. The points are then joined by a series straight of lines. distributions.
staff his At the end of the study the person may want to please the staff with his or her improvement. Specifically obtained from 1 1 ” l =c. The test srarisric is the ratio of the largest to the smallest sample variances. H n i=l Xi Hartley’s test: A simple test of the equality of variances of a number of populations.x. Before an intervention. Hawthorne effect: A term used for the effect that might be produced in an experiment simply from the awareness by the subjects that they are particip in some form of scientific investigation. in the Hellogoodbye effect: A phenomenonoriginallydescribedinpsychotherapy research but one that may arise whenever a subject is assessed on two occasions with some intervention between the visits. . or to magnify the effects that did occur. x2..STATISTICAL 339 Graphical methods: A generic term for those techniques in which the results or some are given in the form of a graph.XI. . . The name comes from a study of industrial efficiency at the Hawthorne Plant in Chicago 1920s. other form of visual display. The result is to make it appear that there has been some impovement when none has occurred. and so may minimize any problems. and impressing with the seriousness of or her problems. Histogram: A graphical representation of a set of observations in which class frequencies are represented by the areas of rectangles centered on the class also proporinterval. thereby hoping to qualify or as for treatment. H Ho: Symbol for null hypothesis. set of observations. diagram. . HI: Symbol for alternative hypothesis. If the latterare all equal. the heights of the rectangles are tional to the observed frequencies. Halo effect: The tendency of a subject’s performance on some task to be overrated because of the observer’s perception of the subject “doing well” gained in an earlier exercise when assessedin a different area. or Harmonic mean: The reciprocalof the arithmetic meanof the reciprocals of a . a person may present himself herself in bad as light as possible.
andso on. or See also null hypothesis. composite hypothesis. Finally. including checking the quality of the data. two events said to be independentif knowing the are outcome of one tellsus nothing about the other. .A just identified model corresponds to a saturatedmodel. which consists of a number of informal steps. in a number of different groups. Inference: The process of drawing conclusions about a population on the basis of measurements or observations made on a sample of individuals from the population. Identity matrix: A diagonal matrix in which all the elements on the leading diagonal areunity and all the other elements zero. More formally the concept is two defined int e r n of the probabilities of the events. sign@1 cance test. an overidentified model is one in which par of of can be estimated.340 APPENDIX A Homogeneous: A term that is used in statistics indicate the equalityof some to quantity of interest (most often a variance). two events A and B are said to be independentif P(A and B) = P(A) X P@). and there remain degrees freedom to allow the fit the model to be assessed. populations. Hypothesis testing: A general term for the procedure of assessing whether sample data are consistentotherwise with statements made about the population. where P(A) and P(B)represent the probabilities of and B. alternativehypothesis. unidentified to the model is one in which are too many parameters in relation number of there observations to make estimation possible. A Independent samples t test: See Student’s t test. significance level. D Identification: The degree to which there is sufficient information in the sample An observations to estimate the parameters in a proposed model. are Independence: Essentially. I I A Abbreviation for initial data analysis. ljpe I e m r . and ljpe I error. Initial data analysis: The first phase in the examination of a data set. In particular.
The bias canfor of of arise avariety of reasons. including failure to contact the right persons and systematic errors in recording the answers received from the respondent.for one that leptokurtic is it is positive. and constructing appropriate graphs. An example is shown in FigureA.and perhaps get ideas for a more sophisticated analysis. Interviewer bias: The bias that may occur in surveys of human populations because of the direct result the action the interviewer. Synonym for continuous variable. (Corresponding functions the sample moments are used frequency of for distributions. . Interaction: A term applied when two (or more) explanatory variables do not act independentlyon a response variable. and p2 is its vari4 ance. K Kurtosis: The extent to which the peak of a unimodal frequency distribution departs from the shape a normal distribution. index takes the value zero (other this distributions with zero kurtosis arecalledmesokurtic). J Jshaped distribution: An extremely assymetrical distribution with its maximum frequency in the initial class and a declining frequency elsewhere.3. It is usually measured for a probability distribution as P4IP: 3. also addirive efect. obtain a simple descripis of tive summary.) For a normal distribution. it  L Large sample method: Any statistical method based an approximation to a on normal distributionor otherprobability distributionthat becomes more accurate as sample size increases.STATISTICAL GLOSSARY calculating simplesummary statistics. where p is the fourth central momentof the distribution. 341 The general aim to clarify the structurethe data. See estimation. andfor a platykurtic curve is negative. See Interval variable: Intervalestimation: Interval variable: Synonym for continuous variable. by either beingmore pointed of (leptokurtic) or flatter (platykurtic).
Leastsquares estimation: A method used for estimating parameters. particularly in regression analysis.342 APPENDIX A l I l I I 0 1 2 3 4 5 1 Reaction time FIG. . minimizing the difference between the observed by response and the value predicted by the model. . A.3. (xn. yz). such as weighted least squares. . For example. yn)by minimizing S given by i=l to give Often referred to as ordinary least squaresto differentiate this simple version of the technique from more involved versions. (XZ. yl). . if the expected value ofa response variable is of the form y where x is an explanatory variable.. then leastsquares estimators of the parame terscrand~maybeobtainedfromnpairsofsamplevalues(x~. Example of a Jshaped distribution.
Essentially synonymous with paired samples. leadingto a small residual. x. in contrast to a latent variable. 4. strongly disapprove. 1. Often usedwhen thefrequency distribution of the varito norable. Lower triangular matrix: A matrix in which all the elements above the main diagonal are zero. approve. . shows a moderate large degreeof skewness in order to achieve mality. 2. Marginal totals: A term often used the total number observations in each for of row and each columnof a contingency table. For to example. An example is the following: 1 0 0 0 L=(.Y STATISTICAL 343 Leverage points: A termused in regression analysis for those observations that have an extreme value one or more of the explanatory variables. for example. . for example. Logarithmictransformation: Thetransformation of avariable. in which the raw scores are based on graded alternative responses each of a seriesof questions. obtained by taking y = ln(x). Likert scales: Scales oftenused in studies of attitudes. undecided. Matched pairs: A term used observations arising from either individuals for two of who are individually matched on a number variables. 3. The effect on of such points is to force the fittedmodel close to the observedvalue of the response. or where two observations taken on the same individual onseparate occasions. the subject may be asked to indicate or her degree of agreement his with each of series statements relevant to the attitude. 5. strongly approve. Manifest variable: A variable that can be measured directly. Main effect: An estimate of the independent effectof usually a factor variable in on a response variable an ANOVA. 3. x.2 3 0 0 M . MANOVA: Acronym for multivariate analysis of variance. The sum these is used as of the composite score. age and are two sex. A number a of is attached to each possible response. dissaprove.
For a X 2 . and on.. x . See also Matrix: A rectangular arrangement of numbers.344 APPENDIX A Matched pairs t test: A Student’s t test for the equality of the means of two populations. algebraic functions. If the null hypothesis of the equality of the population l is true. is often used in retrospective studies to of in the selection cases and controlsto control variation in a response variable to that is due sources other than those immediately under investigation. Measurement error: E r r in reading. Severa kinds of matching can be identified. such as age. d is the mean of the differences. . and sd is their standard deviation. thent has aStudent’s t distribution with n . calculating. The test statistic is givenby where n is the sample size. and on. . The test is based on the differences between the observations of the matched pairs. sample of observations I .degrees of freedom.. or recording a numerical ros value. This is the difference between observed valuesof a variable recorded under similar conditions and some underlying true value. sex. the most common of which is when each case is individually matched with a control subject on the matching variables. .x. when the observations ariseas paired samples. occupation. Matching: The process of makinga study group and a comparison group compaIt rable with respect extraneous factors. so paired samples. the measureis calculated as n Mean vector: A vector containing the mean values of each variable in a set of multivariate data. so ’ h 0 examples are Mean: A measure of location or central value for a continuous variable.
Multilevel models: Models for data that are organized hierachically. Occasionsd ally u e as a measure of location. See also mean and mode. See also mean and median. the median is the middle value. Specifically the distributionis given by . for example. 345 of the Median: The valuein a set of rankedobservationsthatdividesthedatainto two parts of equal size. children within families. The purpose of such a description is to aid in understanding the data. and that it is the probability of the data having arisen by chance. For the correct interpretation.the is measure is calculated as the average of the two central values. When there an even number of observations. and categorical variables. see the entry for p value. Mixed data: Data containing a mixture o f continuous variables. n for a Most powerful test: A test of a null hypothesis which has greater power than any other test for agiven alternative hypothesis. Misinterpretation of p values: A p value is commonly interpreted in a variety of ways that are incorrect.Y STATISTICAL Measures of association: Numerical indices quantifyingthestrength statistical dependence of two or more qualitative variables. Mode: The most frequently occurring value in a set of observations. Model: A description of theassumedstructure of a set of observationsthat can range from fairly imprecise verbal account to. Multinomial distribution: A generalization of the binomial distributionto situationsinwhich r outcomescan occur on each of n trials. It provides a measure of location of a sample that is suitable for asymmetrical distributions and is also relatively insensitive to the presence of outliers. more usually. Most common is that it is the probability of the null hypothesis. where r > 2. that allow for the possibility that measurements made on children from the same familyare likely to be correlated. Model building: A procedurethatattemptstofindthesimplestmodel sample of observations that provides a adequate fit to the data.formalized a a mathematical expression the process assumed to have generated the observed of data. When there is an oddnumber of observations. ordinal variables.
).g‘ i=l where x i j .346 APPENDIX A where ni is the number of trials with outcome i. and ni is the number of observations in the ithgroup. i = 1 .. ni vation in the ith group. Multivariate analysis of variance: A procedure for testing the equality of the mean vectors of more than two populations. The technique is directly analogous to the analysis o variance of univariate data. g is the number of groups. (the T total matrixo sums of squares and f cross products) and B (the between groups matrix of sums of squares and cross products). . no single r test statistic be constructed that optimal in all situations. represent the jth multivariate obser. In the multivariatecase. the univariate In case. j = 1.except that the groups are f compared onq response variables simultaneously. WWs lambda is given by theratio of the determinants W and T.. usually after a general hypothesis that they are all equal has been rejected. which is based on three matrices W (the within groups matrix of sums of squares andproducts). . The mean vector of the ith group is represented by %i and the mean vector of all the observations by % These matrices satisfy . The most widely can is used of the available test statistics is wilk’s lambda. Multiple comparison tests: Procedures for detailed examination of the differences between a set of means. No single technique is best in all situations and a major distinctionbetween techniques is how they control the possible inflation of the type I error. g)(%. . . of . Multivariate analysis: A generic term for the many methodsof analysis important in investigating multivariate data. . g. and pi is the probability of outcome i occurring on a particular trial. defined asfollows: W t: B =E = 8 i=1 j=1 ni(Zi &)(Xij &)’. the equation T=W+B. Ftests a e used to assess the hypothesesof interest. .that is.
to F In of the equality of the population mean vectors. ~. Multivariate normal distribution: The probability distributionof a setof variables x = [XI. can be transformed give an test to assess the null hypothesis A. of The PillaiBartlett trace: the sum of the eigenvalues BT". the mean and variancex This distribution of . Nonorthogonal designs: Analysis of variance designs with two or more factors equal. where p and uz is bell shaped. . respectively. . This distribution is E where p is the mean vector of the variables and is their variancecovariance assumed by multivariate analysis procedures such of as multivariate analysis variance.xq]given by ' x .7 The statistic. Null distribution: The probability distribution of a test statistic when the null hypothesis is true. of It has been found that the differences in power between the various test so not statistics are generally quite small.STATISTICAL GLOSSARY 34. matrix. . of The HotellingLawley trace: the sum of the eigenvalues BW".x that is . . N NewmanKeuls test: A multiple Comparison test used to investigate in more detail the differences between a set of means. addition to A. Nominal significance level:The significance level a test when assumptions of its are valid. as indicated by a significant F tesr in an analysis of variance. assumed by many statistical methods. a numberof are other test statistics available. It specifically given by is are. in which the numberof observations in each cell are not Normal distribution: A probability distributionof a random variable. Roy's largest root criterion: the largest eigenvalue BW". and in mostsituations the choice will greatly affect conclusions.
198 might be considered an outlier. The choice between a onesided andtwosided test must be made before any test statistic is calculated. . P Paired samples: ’nvo samples of observations with the characteristic feature that each observation in one sample has one and matching observation one only in the other sample.. then the contrastsare said to be orthonormal. Such extreme observations may be reflecting some abnormality in the measured an characteristicof a patient.131.348 APPENDIX A Null hypothesis: The “no difference”or “no association” hypothesis be tested to that (usually by means ofa significance test) against alternative hypothesis an or postulates a nonzero difference association. Orthogonal: A term that occurs in several of statisticswith different meanareas ings in each case. S~ S cz = +a2282 + +a2. 0 Onesided test: A significance test for which the alternative hypothesis is directional. CLla2 = 1 and orthogonal matrix: A square matrix that is such that multiplying the matrix by its transpose results in identify matrix. Null vector: A vector. Specifically.Sml c1 * * * zy!.198). In the set of systolic blood pressures.. the elements of which are allzero. of of It literally means right angles. if c and c2 are twocontrusts of a set of m parameters such that 1 = ~ I I +I I Z+ Z +a l m S m . for example. Null matrix: A matrix in which all elements are zero. inaddition. (125. that one population mean is greater than another. an Outlier: An observation that appears to deviate markedly from the other members of the sample in which it occurs.or they mayresult from error in the measurement or recording.128. There several waysin which such samples can arise are in . 4= they are orthogonal if alin~ = 0. is most commonly encountered in relationtwo variables It to or two linear functions a set variables to indicate statistical independence.1.130. at Orthogonal contrasts: Sets of linear functionsof either parametersor statistics in which the defining coefficients satisfy a particular relationship. If. for example.
yn)calculated as . The first. for example. (xn. y l ) . example. Parameter: A numerical characteristicof a population or a model. Usually more powerful than a general test mean differfor ences.STATlSTlCAL GLOSSARY 349 psychological investigations. Planned comparisons: Comparisons between a setof means suggested before data are collected. The probafor bility of a successin a binomial distribution. in laboratory experiments involving littermate controls. as in. yz) . Partial correlation: The correlation between a pair of variables after adjusting for the effect of a third. It can be calculated from the sample correlation coefficients of each pairof variables involvedas Q213 = r2 1 rl3r23 Pearson's product moment correlation coefllcient: An index that quantifies of the linear relationship between of variables. but which is devoid of the active component. example. The coefficient takes values between1 and 1. selfpairing. therapeutic trials which each subject receives both treatments. See also percomparison error rate. one each of two separate occasions. on for Next. Lastly. Placebo: A treatment designed to appear exactly like a comparison treatment. Per comparison error rate: The significance level at which each test or comparison is canied out in an experiment. artificial pairing may be used by an investigator to match the two subjects in a pair on important characteristics likely to be related the response variable. one when the corresponding null hypothesis is true in each case. Perexperimenterror rate: The probabilityof incorrectly rejecting at least one null hypothesis in an experiment involving or more tests or comparisons. occurs when each subject in serves as his or her own control. to Paired samples 1test: Synonym for matchedpairs t test. For a sample n observations a pair of two variables(XI. (XZ.  . natural pairing can arise particularly.
0.. of X = 90is the sample mean of the variable for those individuals with = 0. for example. then the probability of a boy is1. For a continous random .2 ). Probability: The quantitative expression of the chance that an event will occur. Power gives a method discriminating between competing tests of the same hypothof esis.1. for example. sYis y x X the standard deviation of the values. of which the most common that is still involving longterm relative frequency: P(A) = number of times A occurs number of times A could occur’ For example. if out of OOO children born in a region. the test with the higher power is preferred. is the sample mean they variable for those individuals with 1.5 Probability distribution: For a discreterandom variable. and 4 = 1 . in an interval of time space. or P(x)= eAAx X! x=o.See. of are both to Population: In statistics this term is used for any finite or infinite collection of units. 100. which are often people but maybe. this is a mathematical of the formula that gives the probability of each value variable. Given by where ji. 51. Poisson distribution: Thepmbabilirydistributionof the number of occurrences of some random event. Power: The probability of rejecting the null hypothesis when it is false.350 APPENDIX A Pointbiserial correlation: A special caseof Pearson’spmduct moment correlation coeficient used when one variable is continuous 0)and the other is a binary variable(x)representing a natural dichotomy. See also p biserial cowelation. the basisprocedures It is also of for estimating the sample size needed to detect an effect of a particular ma tude. The mean and variance a variable with such a distribution equal A. and given by x. institutionsor events. .. p is the proportion of individuals with y = 1.is the proportion of individuals with x = 0. binomial distribution and Poisson distribution.ooO are boys. This can be defined in a varietyof ways. See alsosample.
The data are divided (permuted) repeatedly between treatments. particularly that manipulation of of subjects to For groups is not under the investigator’s control. falls by way of areas under the curve. This is often usedas an easytocalculate measure the dispersion a set of in of observations. and significancelevel.) p value: The probability of the observed data(or data showing a more extreme departure from thenull hypothesis) when the null hypothesisis true. See also misinterpretationofp values.An example is the normal distribution. this is a curve described by a mathematical formula that specifies.GLOSSARY STATISTICAL 35 1 variable. the probability that the variable within a particular interval. the values of which occur according to some specifiedprobabiliv distribution. R Randomization tests: Procedures for determiningstatisticalsignificance directly from data without recourse to some particular sampling distribution. example. Random sample: Either a setof n independent and identically distributed random variables.If that proportion is a the . (A distinction is sometimes made between is density and distribution. also experimental design. but is not recommended this task because of its sensitivity it for to outliers. are a smaller than some significance level results significant at the level. In both cases the term probability densityalso used. . but subjects cannot be deliberately See assigned (randomlyor not) to the two groups. Range: The difference between the largest and smallest observationsin a data set. Q Quasiexperiment: A term used for studies that resemble experiments are but weak on some the characteristics. if interest centered on the health effects of a natural disaster. when the latter is reserved for the probability that the random variable will fall below some value. significancetest. and each for test at division (permutation) the relevant statistic (for example. Random variation: Thevariationinadatasetunexplainedbyidentifiable sources. or a sampleof n individuals selected from a population in such a way that each sample of the same size is equally likely. Random variable: A variable. those who experience the disaster can be compared with those who do not. or F)is calculated to determine the proportion data permutations that provide of the as large a test statistic as that associated with the observed data.
Rank of a matrix. S Sample: A selected subset of a population chosen by some process. but on the average degree. which is particularly useful for certain types of variables.352 APPENDlX A Ranking: The process of sorting a set of variable values into either ascending or descending order.249341 otherwise the last retained digit is increased by one. Resistances. become conductances. 127. not on their observed values. Research hypothesis: Synonym for alternutive hypothesis. The number of linearly independent rows or columns of a matrix of numbers. usually with the objectiveof investigating particular properties of the parent population. example. Ranks: The relative positions the members ofa sample with respectto Some of characteristic.” al Hence the tendency. Thus rounding to three decimal places gives 127.249. is robust against departures from normality. Rounding: The procedure used for reporting numerical information to fewer decimal places than used during analysis. for example. Rank correlation coefficients: Correlation coefficients that depend on only the ranks of the variables. and times become speeds. Regression to the mean: The process first notedSirFrancis Galton that “each by to a less peculiarity in man is shared by his kinsmen. . for tall parents produce t l offspring but to who. are shorter than their parents. Reciprocaltransformation: A transformationoftheform y = l/x. it Robust statistics: Statistical procedures and tests that still work reasonably well even when the assumptions on which they are based are mildly (or perhaps for moderately) violated. on the average.Student’s t rest. The rule generally adopted is that excess digits are simply discarded if the first of them is smaller than five. for example. Response variable: The variable of primary importance in psychological investigations. because the major objective is usually the effects of treatment to study and/or other explanatory variables this variable and to provide suitable modon els for the relationship betweenand the explanatory variables.
. Scatter diagram: A twodimensional plot a sample bivariate observations. Sampling error: The difference between the sample result and the population characteristic being estimated. Saturated model: A model that contains main effectsand all possible interall actions between factors. the sampling distribution of the arithmetic mean of samples of size n taken from anormal distribution with mean p and standard deviationQ is a normal p u distribution also with mean but with standard deviation / f i . usually chosenso that the study has a particular power of detecting an effect a of size particular size. Sampling distribution: The probability distribution of a statistic. Software available for calculating sample for many types is of study. Semiinterquartile range: Half the difference between the upper and lower quartiles. its Sampling variation: The variation shown by different samples of the same size from the same population. the sampling error can rarely be determined because the population characteristicis not usually known. and Wilcoxon's signedrank test. Because such a model contains the same number of fit parameters as observations. of of type ik The diagram an important aid in assessing whatof relationshipl n sthe is two variables. In practice. however.Y STATISTICAL 353 Sample size: The numberof individuals to be included in an investigation. it results in a perfect for a data set.05. it can be kept small and the invest gator can determine probable h i t s of magnitude. Significance level: The level of probability at which it is agreed that the null hypothesis will be rejected. S E Abbreviation for standard error. Essentially the difference the residual sum squares before and in of after adding a variable. Examples include the Student's ttest. Sequential sums of squares: A term encountered primarily regression analyin sis for the contributions variables as they are added to the model aparticular of in sequence. the z test. With appropriate sampling procedures. is Significance test: A statistical procedure that when applied to a set of obserp vations results in a value relative to some hypothesis. See also standard e m r . It conventionally setat .For example.
l b o scores a e obtained from the same test. Standard deviation: The most commonly used measure of the spread of a set of observations. the standard error the sample mean ofn observations is U / ! .S. given by takes the value zero for a symmetrical distribution. and to have negative where p2 and p 3 are the second third moments about the mean. Varianceof covariance matricesand correlation matrices r statistical examples. The most common usage the context transforming a variable dividing is in of by to new of by its standard deviation give a variable with a standard deviation 1. The index and skewness whenit has a long thin tailthe left. Square contingency table: A contingency table with the same number rows of as columns. either from r alternative items. Square matrix: A matrix with the same number rows as columns. to Splithalf method: A procedure used primarily in psychology to estimate the reliability of a test. the socalledoddeven technique. or from parallel sections of items. Standardization: A term used in a variety of ways in psychological research.354 APPENDIX A Singular matrix: A square matrix whose determinantis equal tozero. . often E more used to makerandom variables suspected to have a Poisson distribution suitable for techniques suchas analysis of variance by making their variances independent of their means.Usually quantified by the index. Skewness: Thelackof symmetryin aprobabilitydistribution. where uz is the varianceof the original observations. For example. Standard normal variable: A variable having a normu1 distributionwith mean zero and variance one. Itis equal to the square root of the variance. See also Cronbach’salpha. a matrix whose inverseis not defined. Standard error: The standard deviationof the sampling distribution a statiof of stic. ae Square root transformation: A transformation of theform y = .distribution is said to have A it thin al positive skewness when has a long ti at the right. The correlation these scoresor some transformation of them gives of the required reliability.
See also matchedpairs t test. correlation matrices covariunce matrices of this form. If the null hypothesis of thee u l t of the two population means ism e . the procedure. In addition to homogeneity. When independent samples available from each are as the t population. that is. The shape of the disU tribution varies with n. is often known independent samples test and the test statisticis where 2 1 and fz are the means of samples of sizenl and nz taken from each population. the distribution the variable where 2 is the arithmetic mean ofn observations from an o m l dism’bution with mean . to the square root of a of chisquared variable. Another version is designed to test the e u l t of the qaiy means oftwo populations. then qaiy t has aStudent’s t distribution with nl nz 2 degrees of freedom. allowing p values to be calculated. a matrix with elements ai] such that ail = aji. and S is the sample standard deviation. and as n gets larger it approaches a standard normal distribution. In statistics. the test assumes that each population has an o m l distribution but is known to be relatively insensitive to departures from assumption. Student’s t tests: Significance tests for assessing hypotheses about population means. One versionis used in situations in which it is required to test whether the mean of a populationtakes a particular value. this +  Symmetric ma* A square matrix that is symmetrical about its leading diagonal. for example. and are . Statistic: A numerical characteristic of a sample. This is generally known as a single sample t test. the sample mean See and sample variance.STATISTICAL GLOSSARY Standard scow: 355 Variable values transformed to zero mean and unit variance. In particular. also parameter. ands2 is an estimate of the assumed common variance given by where S: and si are the two sample variances. Student’s t distribution: The probability distributionof the ratio of a n o d variable with mean zero and standard deviation one.
Twosided test: A test in which thealternative hypothesis not directional.So. ifA ( = : then &(A) = 4. for example. though it is not necessarily symmetrical. i) Transformation: A change in the scale of measurement for some variable@). The essential requirementsuch a statistic a known distribution of is when the null hypothesis is true. Such a distribution has greatest frequenciesat the two extremes of the its range of the variable. .. 2 where XI. Tolerance: A term used in stepwise regression the proportionof the sum of for squares about the mean of an explanatory variable not accounted by other for variables already included in the regression equation. estimator of the population value provided by sz given by is An unbiased . ' p I error: The error that results Qe when the null hypothesis is falsely rejected. usually denotedas MA). xz. Small values indicate possible multicolliiearityproblems. Qpe IIerror: The error that results the null hypothesis is falsely accepted. See also onesided test.356 APPENDIX A T in relation to some Test statistic: A statisticusedto assess a particular hypothesis population. x are then sample observations and is the sample mean. Examples are the square root transformation and logarithmic transformation. Ushapeddistribution: A probability distribution of frequencydistribution shaped more or less like a letter U. the second moment about the mean. when U Univariate data: Data involving a single measurement on each subject or patient. Trace of a matrix: The sum of the elements on the main diagonal a square of matrir. V Variance: In a population. that one population mean is either above or below the other. . . . is for example.
The test is based on two the absolute differences of the pairs of characters in the samples.that is H :1 = p2. If H is true. when the variance of each population is known 0 p to be u2.STATISTICAL GLOSSARY Variancecovariance matrix: Synonymous with covariance mat&. ranked according to size. For example. z test: A test for assessing hypotheses about population means when their variances are known. See also 0 Student’s t tests. The test statistic is the of the positive sum ranks. . 2 z scores: Synonym for standard scores. for testing that the means of two populations are equal. the test statisticis where R1 and f z are the means of samples of size nl and n2 from the two populations. Vector: 357 A matrix having only one row or column. then z has a standard normal distribution. W Wilcoxon’s signed rank test: A distribution free method for testing the difference between two populations using matched samples. with eachrank being given the sign of the difference.
2.34 commits the cardinal sin of quoting data out of 3 context. one labeled and the other labeled the results M Q. 4. Herea few more data points for other years in the area would be helpful. W. Justice Streatfield.. Thequotationsarefromthefollowingpeople: 1. leaving out data sufficient lie for comparisons. 3. Lloyd George.Appendix B Answers to Selected Exercises CHAPTER 1 12 One alternative explanationis the systematic bias that may be produced . The graph in Figure 2. FlorenceNightingale. by always using the letterQ for Coke and the letterM for Pepsi. remember that graphics often by omission. showed that a majority of people chose the glass labeled in preference to the labeled Q. In fact. Joseph Stalin. Mr. when the Coca Cola company conducted another study in which Coke was put into both glasses. as would similar data for other areas where stricter enforcement of been introduced. Auden. speeding had not 358 . 5.4. M glass 1. CHAPTER 2 2 . 6. Logan Pearsall Smith. H.
02 27 Source Between methods Within methods DF 2 MS 4. . Suchoffeatures would. I . B.002 2 Between angles 90. 3. 3. The termsi i = 1. one degree freedom's representing variation that with is due to linear and quadratic trends. k are estimated by the deviation of the appropriate group mean from the grand mean. .2. and that the observations in 50" group the in seem to split into two relatively distinct classes.16 27 3.52 .27 2.the between group variation might of be split into components..5.66 2. all the observations. Here the simple plot of the data shown in Figure B1 indicates that the90" p u p appears to contain two outliers. the data general.56 45.93 Withinangles F p Because the p u p s have a clear ordering. Simple plot of data from ergocycle study.11 F p .l4 . CHAPTER 3 3.28 106.ANSWERS TO SELECTED EXERCISES 359 10 1 * 8 t * * * 50 Kneejointangle (deg) 70 90 FIG. . The ANOVA table for the discharge anxiety scores as follows. is SS 8.2. is Source ss DF MS 11. have to be investigated further before any formal analysis was undertaken The ANOVA table for the data as follows. The termp in the oneway ANOVA model is estimated by the grand mean of a.53 54.3.
however.82 0. you ignore then the analysis of covarianceresults follows. with the addition of the regression line of final on initial for each method. f B2.82 1 6.40 The analysis of variance for the difference.98 9.2. Consequently.+B z ( z~+ X) ~Z) 4. The line for method 2 appears to have a rather different slope from the other two lines.oooO Within methods 49.039 .44 yij = CL + a i + Bl(Xij .360 APPENDIX B 2 1 I I 30 IniUel anvlely m 92 34 FIG.75 6.14 2 13.468 . a e as r SS DF MS F p Source Initial anxiety 6.83 A suitable model that allows both age and initial anxiety covariates is be to where x and z represent initial anxiety and from this model are follows.40 26 1. respectively.20 . is as follows.54 Eijk.036 Betweenmethods 19.72 0. Plot of final versus initial anxiety scores for wisdom tooth extraction data.82 4.W3 2 Within methods 36. not I . age. The results ANCOVA F p .87 . as Source SS DF MS 1 6.13 .51 9.08 25 1.40 27 1. final score initial score. Source SS DF MS F P Betweenmethods 48.99 7. 8. Figure B2 shows a plot of final anxiety score against initial anxiety score.78 Age Between methods 19.29 24.78 1 0.W . an analysis of covariance may be justified.82 Initial anxiety 6.76 2 Withinmethods 36.
74 P C.50 181.35 The separate analyses give nonsignificant results for the diet x biofeed interaction. where X1 and 22 are the mean weight losses for novice and experienced slimmers both usinga manual.00 1 1 1 20 DF F 2. 4.02 . The separate analyses of variance for the three drugs give thefollowing results.81 3.32 6.1.2.50 3970. 43.97 6.00 864.00 2538.50 181.X . .50 181. tt2 is the appropriate valueof Student’s t with 12 degrees of freedom. ) ft I Z S J ( 1 I R I ) + (I/nz>. 2.52.07 P 1 1 1 20 DF 1 ss 2773.02 . Applyingthis formula to the numerical valuesgives [6.01 1. The 95% confidence interval is given by Q 1 .00 MS 2773.00 3037.91 Fmx 1 1 20 .001 .36 (1.l0 ss MS 3037.00 384.50 1261.28 .50 144.50 F 13.60 F 21. the interaction terms approachsignificance a t the 5% level.90 294.50 541.00 126. and nl and n2 are the samplesizes in thetwo groups.50 2892.03 P .18&Xd. leading to the interval [7. S is the square root the error mean square of from the ANOVA of the data.ANSWERS TO SELECTED EXERCISES 36 1 CHAPTER 4 4.55)] f2.001 .50 1261. The threeway analysis given in the text demonstrates that the nature of this interaction is different for each drug. although for drugs X and Y .50 181.50 198.36 0.50 541.26 3.00 384. Try a logarithmic transformation.00 864. Drug x Source Diet (D) Biofeed (B) D x B Error Drug Y Source Diet (D) Biofeed(B) D x B Error Drug Z Source Diet (D) Biofeedp) D x B ss DF MS 294.101.l4 .
l7 CHAPTER 5 5.95.72 = 0.72 35.we have in A2 U. Source Pretreatment 1 Pretreatment 2 35. 22378.72 60 Thus usingthe nomenclature introduced the text.mean row column mean grand mean).43 .358.12.66 DF 1 1 1 17 MS 8.66 2.00 Between electrodes 281575.86 8 = 22378.65 0. 4.54 Between electrodes 120224.57 222.72.66 A complication with these data the two extreme readings on subject The is 15.37 22378. The ANOVA table for the data is DF MS Source ss Between subjects 1399155.72 = 3000. = 93277. 5 R= 14179.55 14 60893. is now . reason given for such extreme values was that the subject had very hairyarms! Repeating theANOVA after removing this subject gives the following results.69 Between treatments ss 8. DF MS Source ss Between subjects 852509.67 The intraclass correlation coefficient 0.00  8.43 70393.l2 .69 27.61 30056.72 = 14179.00 15 93277. ” 70393.65 Err ro 27.10 13.439.15 4 Error 639109. An ANCOVA of the data gives the following results.10 F p .57 2.362 APPENDIX B 4. The interaction parameter corresponding to a particular cellis estimated as (cell mean .86 4 Error 1342723.3. : So the estimateof the intraclass correlation coefficient is 16 22378.66.95 +22378.66 + 3000.39 56 11412. The main effect parameters are estimated by the difference between row a (column) mean and the grand mean. 14179.11.
12 Group x Visit Random effects &a = 4.61. Introduce two dummy variables and x2 now defined as follows. suggesting geometric loss memory.0102. O O 0.72 11.9. and (c) vocabulary size = BO plage terms fit.1. Such model should result in linearity of log p of a in a plot versus t . p = . (vocabulary size) = BO plage.1. Figure B3 shows plots of the data with the following models fitted (a) log vocabulary size = /?lage a x 2age2. &c = 1. although this is more difficult is to t to explain.85.28 .4.37 Group . 6. 7. plot( )in Figure B4.5. but plainly does not happen.12 4. ( ) p versus t . Using the maximum score the summary measure the lecithin trial data as for and applying at test gives these results: t = 2. A model b this that does result in linearity plotp against log .002 2.21 0.54.76 .68:45 degrees of freedom. CHAPTER 7 7.301 0.58 0. 0.5. Fitting the specified model with now the Placebo u p coded as 1 and the p Lecithin groupas 1 gives the following results.A S E S TO SELECTED EXERCISES NW R 363 CHAPTER 6 6. &b = 0.68 c.47.OOO1 1. &ab = 0. x1 6. that is. One might imagine that a possible model here = exp(pt).98).Ol 8. but the regression coefficientsnow the differences between are a group mean and the mean of Group 3.72 3. (b) x 2agez p3age3.  Fixed effects t Estimate SE Parameters P .73 < O l Visit 10. A 95% confidence interval is (6. and (c) p versus of b log 1 x x 1 2 Group 2 3 1 0 0 0 1 0 A multiple regression using these two variables gives the same ANOVA table as that in Display 6. It appears that the cubic model provides the best Examine the residuals + + + + + + from each model to check.35 <O O Intercept 0. is p log t . Figure B4 shows plots (a) p versus t .
.Io v) P n N r e 8 ! 364.
m *Q N 0 8'0 SO V0 20 uoguwPl 0 : : ' N .o 365 .
The 95% confidence .91.806. where weight is 0 for small and 1 for standard.. 93 An exact testfor the datagives a p value of less than . with a greater proportion of children rated as improved in year 2 than rated not doingas well. 0. of of p value of less than. The fitted model is ln(odds of being severely hurt) = 0. p = .44. and type is 0 for collision and 1 forrollover. 98 Here because the paired nature the data.064). An Investigation of a numberof logistic regression models. . Speannan’s correlation coefficient is 0. ejectionis 0 for not ejected and 1 for ejected..DF=8..The mothers’ The test statistic is 4.08 with an associated ratings appear to have changed.p = . with z = 2.091. 1 value is 0.7.111.05.012.415 400 800 1.739. by of are as follows.039.012.8. CHAPTER 9 91 The chisquared statistic is 0.9.415.016fortestingforindependence. McNemar’s test should be used. N 95% Confidence Interval 1. is has an associatedp value of 0.030(ejection) + 1.980 8. withz = 2.9.3367(weight) + 1.015 for testing for independence.Pearson’scorrelationcoefficientis0.witht = 2.366 APPENDIX B CHAPTER 8 8. The required confidence interval (0. as CHAPTER 10 1. The signed rank statistic takes the value 40 with n = 9.9401 +0.OooOOl. The associated p . which with a single degree of freedom .717.. Kendall’ss is 0..0391. .020 for testing that the population correlation is zero.7. 83 The results obtained the author for various numbersbootstrap samples .042.887 200 1.639(type).790 1000 1.~ = . including ones 01 that containing interactions between three explanatory variables.600. shows amodel the including each of the explanatory variables providesan adequate fit to the data.
1. is needed. 8. model (B)above Brand. Temp Soft. progressin searching for the best model mightbe made either by forward selection of interaction terms to add to A or by backward elimination of terms tJ r 20r 2 20 t 211 12 13 14 15 16 l? A w FIG. ( )all firstader interactions. Temp. 6. that is.Thus. 4. Prev. Soft. + + + + + + + + + + + + + Model 18 A 9 B 2 C LRStatistic 42.85 0. With four variables there are a host of models to dealwith. The likelihood ratio goodnessoffit statistics forthese models are as follows. 2.5. A good way to 02 begin is to compare the following t r e models: (a) m i effects model. Temp Prev. Temp. for example. 1.ejection.ANSWERS TO SELECTED EXERCISES 367 intervals for the conditional odds ratios of each explanatory variable are weight.66. Prev Brand. but possibly simpler than B. that is. for probability of menstru . Soft Brand. that is.18.type.. Temp Prev.06 those for a collision. Temp Brand. Soft Brand.40. Clearly. the odds of beiig severely hurt in a rollover accident are between 4. 3. Soft. Prev. B and a model more complex than A.38 and 6. he an Brand Prev Soft Temp. model (A) b above Brand. 1.38.06. Soft Prev. (c) all secondorder interactions.93 9.74 DF Model A does notdescribe the data adequately but models C do. Linearandlogisticmodels ation.31. Consequently.
07 10. The estimated parameters (etc. Linear Parameter Estimate SE 2.368 APPENDIX B 10. The plot of probability of menstruation with the fitted linear and logistic regressions is shown i Figure B5.38 .78 0.848 23.8.) for the two n models are as follows.32 Age 1.20 0.32 23.019 Age Logistic Intercept 19.52 0.065 t 8.257 Intercept 0.07 0.
Box. M. N. Kleiner.. G. (1991). The analysisof unbalanced crossclassification. (1994). Comparisonsof different populations: resistanceto extinction and transfer. Multivariate Graphics. N J Hobart. Sex bias in graduate admissions. Science. 369 . D.. Chatteqee. In F! Armirage and T Colton W.andTukey. (1975).Jouml of the Royal Statistical Sociery. Anderson.398404. Andersen. The elements ofgmphing data. . PsychologicalRevim. 25. B.U K Wtley.484498. and Price. H.Annals of Mathem’cal Statistics. I. . problems.(1998). . New York Wdey.W. Wisconsin: Universityof Wmonsin &ss. Chichester. (1985). biostoristics. (1963)..H. (1990). Brown. Boniface. (1983). Cleveland. J (1981).195223. S. 187. 70. and OConnell.. An intmiuction to categoricaldata analysis. .(1995). 141. F! J Hammel.B. R. S. D. S. Cleveland. (1996). Wsualizing data. Berkeley. A. J W. Encyclopedia o f Cm. Gmphicalmethodsfordata . Oxford: Blackwell Scientific. . Bertin.Applied mixed mo&k in medicine. Chambers. Bickel. .Q~crimentdesign London: Chapman andHall.London: Chapman and HalVCRC. (1978). Summit. Summit.R (1999). J M. NJ Hobart.UK:Wdey. Semiology of graphics. ondstatisticalmethodrforbehavioumlondsocialresearch..Effects of inequality of variance andof comlations between errors in the twoway I classification. S. Methodological errors in medical research. B. Chichester. Regression analysisby example. B . New York: Wlley. Sometheorems on quadraticforms applied in the study of analysis of variance .References Agresti. Series A.. analysis.162179. Data from . and ReScott. Cleveland.E A. P A. E P (1954).). . W. AiIkin.W.
S. . London: Arnold. Design and analysis ofreliabiliry studies: Statistical evaluation of measurement e m r s . (1987). Predictive validity and differential achievementon three MLA . Large samplestandard errnrs of kappa and weighted kappa. British Journal ofhiathematical and S:atisticul Psychology. S. Efron. (1999).n M. f London: Imperial College Press. AM/ysis of longitudinal &:U.47.J. B .R. D. Diggle.. 5. . (1969). London: Routledge. D m . G. R. I K. K.. 4 4 1 4 8 .72. E. E . G. . The design andanalysis clinical experiments. The analysis of covariance in psychopathology.and Gromen. London: Chapman and W C R C . ( 9 2. (1938). Hand. M. University Press. and S. . (1960). L. London: Chapman and WCRC.. F. The analysis of medical data using SPLUS. and G.. Statistical aspects of the design and analysis o clinical trials. D. 31. tss e 417451. Fleiss..Modelling binarydatu. Crutchfield. Jones. Semiparametric and nonparametric methods the analysisof repeated measurefor ments with applications to clinical trials.6. and Eliasziw. J L.Psychopathology.J. and Dormer. S.647456. Fleiss.323327. M.S. WCRC. Cwk. (1993). S. B. 6 . 27. Wokingham:AddisonWesley. M. and pickles. (1994). S. New York Wdey.IO. A.651459. Residuals and influence in regression. and Wykes. C. (1954). .. S:atistics in Medicine. A costfunction approach to the design of reliability studies. (2000). Multiple variable design for experimentsinvolving om n interaction of behaviour.537542. London: Arnold. A. SIatistics in Medicine. A. and Weisberg.The companion encyclopediaofpsychology. Psychological Bulletin. A dictionaryof stutistics forpsychologists. S. (1991).. Everitt. Cohen.Salzinger. Es) Dunn. The reliabitity of dichotomous judgements: unequal numbers of judgements per subject. IO. B. D. J. (1986). Everitt. K. Appliedmultivuriute data analysis. Biometrics. 3 7 4 . C. L . Everitt. H . Univariate versus multivariate chologicul Bulletin. D.. of Fleiss. Crutchfield. Daly. New York: SpMger. J. Hammer. L (1967).. S. L. Sample size requirements for reliability studies. B. 21. and McConway.370 REFERENCES Cochran..New York Waley. Davis. and RabeHesketh. Collett. (1989). comparative Foreign Language Tests. A. (1998). P J. L. 19) Everitt. ( d ) 9 4. . Cambridge. (1991). Everitt.. J. Oxford Oxford . An introduction to the bootstrap. tests in repeated measures experiments. Dmey. C. London:Chapmanand . 3.. 2 0 . Stutistics in Medicine. M.. B.( 9 ) 1 Colman A. Sutton ( d . Edueorional and Psychological Measurement. Measuring agreement between judges on the prcsence or absence of a trait. J. J (1979). (2001). Moments of the statistics kappa and weighted kappa. (1995). Efficient factorial design analysisof varianceillustrated in psychological and experimentation. PsychologicalReviov. and T l a . Cambridge University Press. A coefficient of agreement for nominal scales.Journal of Psychology. Dormer. S. and Everitt. W. andZeger. B. Everitt. S.The analysis of contingency tables..3842. (1972). B. Everitt. L. . J. 19591980.. T. B. 11271130. Lunn. PsyDavidson.97103. Liang. and Cuzick.and Tanur.339346. Somemethods for strengthening the common chisquaret.L (1972). .77. London: Arnold. and libshirani. (1940). J. S.446452. (1975). J. Educational and Psychological Measurement. B. R. Fleiss. (1982). Cambridgedictionary of statistics. M. Elements OfStatistics. R..S. J. Cohen.S.Biometrics. (2001). Eliasziw. (1968). (2ndedition)London:Chapmanand Hall/CRC. (1987).Applied Psychological Measurement. M. Fleiss.
(1981). Wlley. R. and Wolfe. Matthews. E . 11. H. American Sociological Review.J.and Albert. A. A refinement to the analysis o f serial mesures using summary measures. and Nelder. A. M. Journal of Educational Statistics. California: Freeman.REFERENCES 37 1 Friedman.D. Description.2939. ..151178. 275295. tically larger than the other. Theory and application of correspondence analysis. R. (1999). Kapor. L. (1986). D (1979). Greenacre. R. Krause. CA: Duxbury Press.319338. . C A . Statistics in Medicine. Lazaro. Journal of the American Statistical Association.W. (1974). J. 292. The analysis of repeated measurements: univariatetests. D (1990). . London: Academic Gaskill. (1982a). Generalized linear models (2nd d ) Hall.L. Journal of General Psychology. Howell. Incomplete data in repeatedmeasures analysis.Psychometri~ 24.G.rgerimental designs. ( 0 0 . A. B. . The measurement of observer agreementfor categorical d t . Variable selection techniques in discriminant analysis.. Permutation tests: A practical guide to resampling methodsfor testing hypotheses. Principle of multivariate analysis. Keselman.. 2 0 ) The Krzanowski. J. J. S. J (1974). D. B. Habeman..32. Maxwell. and Statistical Methodsin Medical Research.Brirish MedicalJournal. Frison.A. C. San Francisco. Little. Landis. New York. Greenhouse. (1993). Delhi. Belmont.Respiration:theuseofanalysis o f varianceandcovarianceinpsychologicaldata.G. S. Designing experiments analysing data. Oxford: OxfordUniversity Goldberg.. Multilcvelstatisticalmodels.Annals of Mathematical Staristics. . M. J. London: Arnold. M. British Journal of Mathematical and Statistical Psychology.. London: Chapman and McCullagh. ..V. (1984). Oxford: Oxford University Press. 33. E. A. Statistical methods forpsychology. (1992). Statistics in Medicine. New York: Springer.(1947). W. S. H. (1976). Analysis of variance in complex e. than b o hypothesis testing. R. Lindman. Estimates of the correction for degrees freedom for sample data L. Goldstein. The analysis offrequency data. Unpublished master's thesis. (1989). J. The measurement of socioeconomic status.Pa~emsinemotionalnactions: 1. J.S. Gorbein. P (1972). 16. (1992). H.andCox. OfMarhematicalandStatistica1Psychology. Belmont. British Journal . M. (1937).I. . (1999). K e s e l m a n . (1991). McKay.32. The use of ranks to avoid the assumption normality implicit in the analysisof of variance. I.C.and Koch. S. (1977).12.. Nonparametric statistical methods. Johnson. G.675701. (1940). (1995).Repeatedmeasuresinclinicaltrials:analysisusingmeanssununary statistics and its implication for design. basics of S and SPLUS. 48. Gardner. and Wadswonh. H. Eflciency on ergocycle in relation to kneejoiht angle and drag. Good. Chicago: University of Chicago Press. D G. Hollander. illness by questionnaire.and Delaney.(1937). Universityof Delhi.andPocock. M. Onfinal data modeling.H. and Altman. M.(1992).129.Onthemethodsintheanalysisofprofiledata. 38. 21 press. M. aa Biomemcs. N.andGeisser. Lovie. On a test of whether oneof two random variablesis stochas. 6982. Confidence intervals rather Pvalues: estimation rather .95112.2737..J. The analysis of variance in experimental psychology:19341945.D R. A. V.16851704. Thedetection ofpsychiatric Press.J. P (1994).10891091. (1995). 18. I.S.multivariate tests or both? British Journal of Mathematical and Statistical Psychology.. J.. and Campbell.(1959).5060. G. A.New York Springer. H. New York SpringerVerlag.C. J. and Olson.746750. J. and Whimey. 35.A. H.M.S. J. Lundberg. ' F . N.. M m .. G.and Lix. H.. Huynh. 5. and Feldt. of in randomised block and splitplot designs.
3703&1. . . Achievement orientation ofaboriginal and wbite AusIralian adolescents. Survey methods. 25. New York: Ronald. Biometriku. British Joumul of Mathematical Statistical Psychology. London: BPS Books @p. R. (1962). Series J55. (1945). Unpublished doctoral dissertation. Erasmus University. S. (1982). Chichester: Wrley. and Schapim. S. (1988). D. J40. S. New York: SpringerVerlag. L. Nicholls.59161 1. Lovie and A. . L. Belmont. Distributionfree metbods for nonparametric problems: a classified and selected and bibliography.3 2 . American Joumul of Hygiene. R. 0. W. Morgan.4863.V. Testing for baseline balance clinical trials. Science. Cambridge: Cambridge University Press. Tallahassce. Reading.49. Wkoxon. WW M. J (1992).. Eary use and interpretation of SPSS Windows. (1998). 207. Allocation.197198. (1965). J (1994a). The failure of Pearson's goodness of fit statistic. G. Q. . The visual display of quantitative infomution.. J A. v. . Stevens. NI: Erlbaum. The contribution of cognifive resrructuring to the effectiveness of behavior reheard in modifiing social inhibition in females. l. 8088. Nelder. Biometrics Bulktiit. Variable selection techniques in discriminant analysis. Canberry.57.3041.and Kurland. Working women scientistsand engineers. . Senn. summary statistics for J3. (1985).Statistics inMedicine. R. (1972). Ausualian National University. Journal ojthe Royal . .262269. .. (1983). M. J A. Statistical methodsfor research workers(4th d. (1986). Weremch A. Rawlings. 35. H. Cheshire.. F. Unpublished doctoral dissertation. Chichester:Wdey. (1991). design. Hillsdale. Studies in multiple sclerosis inW h p e g . Koa Ned. D. .A. An analysisof variance testfor normality (complete samples). (1971). Statistics in Medicine. for NI: Erlbaum. In. Lousiana. l 0. I. Schouten. C.R. Westlund. Approaches to precounseling and therapytraining: an investigation of its potential inluenceon process outcome. 32. Repeated measures in clinical trials: analysis using mean . p. Manitoba and New Orleans. T.. B. H. A. A.160. Senn. (1977). Unpublished doctoral dissertation.. in Senn.2834. Rotterdam. Computationally intensive statistics. C A Wadswonh and Bmks.53. and Speech and Hearing Research. A. Schmid. Graphical displays. and Wedderbum. Es) Scbuman. Singer. (1992). P m . Applied regressiononnlysis. U.F (1954).and Rosnow. Novince. Cued speech and the reception of spoken language. 635). G. A reformulation linear models.386392. Statistical issues in drug development. S.1. Individual comparisons by ranking methods. B. l and Cambell. Florida State University. Unpublished doctoral disserlation. Statistical measurement of interobserver agreement. H. Handbook of graphic presentation. Edinburgh: Oliver and Boyd. visual revclationS. R. K. (1976). (1985). B.(1980). andGriego. University of Cincinnati. Applied multivariate statistics the social sciences.5 Williams. Generalized linear models. and Kalton. 52. K. H. S.New P Es) developments in statistics forpsychology andf h e social sciences 4980). R. (1977). Contrust undysis. Statistical inference: A commentary for the social and behuviouml sciences.). of Series Nelder. l. and Routledge. Statistical Methods in Medical Research.Hillsdale. Akud.17151726. G. A rankinvariant methodof linear and polynomial regession analysis. In E. . Joumul ofthe Royal Stafistical Sociefy. Statistical Sociefy. for Theil. Quine..(1997). S. (1997). Sauber. C. Oakes. The Statisticiun. (1953). (1975). l . M. L. 13..CP Graphics. McNemar. J. G. Vetter.380396. (1985). Ling. E. A. (1950). Robertson. (1979). Wainer. S. Lindzey and Aronson ( d . B. (1982b). MA: AddisonWesley. Tufte. Wilkinson. British Journal OfMathematical andStatistica1 Psychology.T. Rosenthal. H . Joumul of .372 REFERENCES McKay. 25.M.Hundbmk of socialpsychology (Vol. Lovie (d. N. J (1994b).
7273 Bootshap distribution. 134 Bimodal distribution. see analysisof covariance ANOVA. 136139 table. 130 Advantagesof graphical presentation. 116 Barchart.5 . 81 Adjusted value.8490. 66 Avoiding graphical distortion.126127 in longitudinal studies. 225.7 1 12 9multivariate (MANOVA). 304 Bubble plot.228 AU subsets regression. 68 ANCOVA.see variable selection methods in regression Analysis of covariance. factorial design. 7884 in factorial design. 6570 for repeated measuredesigns. 68 Between subject factor.258 Bootstrap. 143150 oneway. 3 W 5 Bracket notationfor log linear models. 257260 Box plots.4 0 4 l 3173 . 215218 Analysis of variance. 288 Between groups sum of squares. 72 Bonferonni test.Index A Adjusted group mean.2 5 Bonferonni correction. see variable selection methods in regnssion Balanced designs. 213 Benchmarks for evaluation of kappa values. 24 22 Baseline measurement.4 35 B Backward selection. 22 Akaike informationcriterion (MC). see analysis of vsriance Assumptionsof F test.
2 4 1 Confidence intervals. 290 memory retention. 323 cerebral tumours. 142 Huynh and Feldt correction factor. 42 improvement scores. 55. 213 brain mass and body mass for 62 species of animal. 263 detergents data. 147 Casecontml study. 13 1 fecundity of fruit ilies.sex and pulse rate. 291 110 blood pressure and biofeedback. 290 GHQ data. 98 information about 10 states in the USA. 292 248 consonants correctly identified.57 Chance agreement. 64 field dependence anda reverse Sttooptask. 95 .270 test. 265 membership of W o w n e Swimming Club. 23 Danish doityourself. 182183 crime ratesfor drinkers and abstainers. family history and smoking.5 Contingency tables.see Coplots Confidence intervalfor Wllcoxon signedrank test. crime in theUSA. 324 diagnosis anddrugs. 87. for 215218 C hi ~q~ar cd 269. 291 knee joint angle and efficiency cycling. 117 depression scores. weight. 139141 Computationally intensive methods. 250 Pearson's product moment.61 oral vocabulary sizeof children at various ages. 268275 Cook's distance. 162 organization and memory. 93 of length guesses. 244 Confidence intervalfor WdcoxonMmWhitneytest. 2&30 o human fatness.45 heights and ages f married couples. 194 Coplots. blood pressure. 143 Correlation coefficients Kendall's tau. 134 firefighters entrance exam.206 rts maternal behaviour in rats. 135 anxiety scoresduring wisdom teeth extraction.277280 Covariance matrix. 4446 Correction factorsfor repeated measures. 270 thrrcdimensional. 60 marriage and divorce a e .253260 Computers and statistical software. 281 gender andbelief in theafterlife.249 Spearman's rank. Compound symmetry. 142143 Greenhouse and Geisser correction factor. 1617 Concomitant variables.2% Conditioning plots. 141 p l e d withingroup.270 residuals for. 206 menstruation. 294 days away from school. 27 caffeine and finger tapping. 76 car accidentdata.60 janitor's and banker's ratings of socioeconomic status. 177 icecream sales. 9 Categorical data. 325 mortality ratesfmm male suicides.44. alcohol dependence and salsolinol excretion. 130 bodyweight of rats. 256 pairs of depressed patients. 293300 twobytwo. 268 estrogen patches inmtment of postnatal depression. 267 Challenger space shuttle.250252 Correspondence analysis.80 Conditional independence. 269 children's school performance. 94 measurements for schizophrenic patients. 285 Change score analysis repeated measures.88 INDEX applications for Berkeley college. chisquared testfor independence in. 271275 twodimensional.278 height.374 C Calculating correction factors for repeated measures. 275277 r x c table. 271 D Data sets. 309 hair colourand eye colour.
118 proportion of d e p s in science and engineering. synonymous with scatterplot matrix Dummy variable. 80 Deletion residual. 240 suicidal feelings. 194 Histograms. 34 of Interaction plot.85 statistics students. 324 stressful life events. 274 8490.68 Dot plot.281 Expected mean squares. 147150 Huynh and Feldt comtion factor. 304 Fitted values.265 visual acuity and lens strength. 126 social skills data. 194 Dependence panel.271 Dot notation. 14 Exponential distribution.102.14 Index plot. 101 scores in a synchronized swimming competition.319 Deviance. 247 psychotherapeutic attraction quality of children’s testimonies. 295 rat data.271 skin resistanceand electrndet p .249 treatment of Alzheimer’s disease. 10 Explanatory variable. 2% of test scoresof dizygous twins. 178180 375 F Factor variable. 14 Factorial design.283288 Kappa statistic.274 Fitted marginals.4046 Exact p values. 2531 Homogeneityof variance.58 Greenhouse andGeisser comtion factor.INDEX postnatal depression and child‘s cognitive development.163.279 WISC blncks data. 108.47 Idealizedresidual plots. 194 Inflating thelLpe1 error. 34 Intraclass correlation coefficient. large sample variance. see comtion factors H Hat matrix. 311 Dichotomous variable. 168 Independent variable.113.271 visits to emergency m m . I E Enhancing scatterplots. 46 verdict in rape cases. 243 ye slimming data.159 racial equality and the death penalty. 2325 Draughtsman’s plot. 4857 Graphical display.264 time to completion of examination scores. see variable selection methods in regression Fricdman test 247249 G Graphical deceptions andgraphical disasters. see correction factors Hypergeometric distribution. 207 quitting smoking experiment.56 scorn. 110 second order. 150157 K Kappa statistic. 294 survival o f infants and amount care. 220 university admission rates. 107 smoking and performance.168 Fivenumber summary. 138 Experiments. 137 Hotelling’s Tztest. 273. 72 Initial examination data. 110 Interqudle range. 287 . 55. 32 Forward selection. 45 Deviance residual. 154 sex and diagnosis. 9 7 Fisher’s exact test.271 suicide by age andsex. 2122 Graphical misperception. 40 what peoplethink about the European community.124 Interactions.97101 first order.
6570 Orthogonal polynomials. 193197 Multidimensional contingency tables. 128 Planned comparisons. 301 random effects. 76 Power curves.219233 saturated 303 0 Observational studies. 293 Multilevel modelling. 46. 12 ordinal scales. 18 Prediction using simple linear regression.99 . 1 21 Models. 165 Predictor variables. 66.216218 Power.138 multiple linear regression. 7075 Multiple correlation coefficient. 128 Latin square. 34 Overparametenzed.52 Likelihood ratio statistic. 187 Partial independence.270 McNemar’s test. 15 Misjudgement of of correlation.284 Oneway design. 8 Odds ratio. 68 fixedeffects.14 Probability of falsely rejecting the null hypothesis. 4648 normal probability plotting. 112 Least squares estimation. 13 54 P Parsimony. 9 Occam’s razor. 120121 N o d distributions. 67 logistic. 12 Method of difference. 171172 one way analysis variance. 302303 main effect.304 linear. see analysis of of variance Muhlal independence. 68 analysis of covariance. 65 Probability plotting.273.1121 16.63. 66 Pearson residuals.2224 Pillai statistic. eqivalence of. 113. size Missing observations.376 Kelvin scale. 102 minimal. 81 in data analysis.297 Locally weighted regression. 306317 loglinear.49 M MdOWSC statistic. 19 3 Permutation distribution. 8 Partial F test.245247 INDEX Multicollinarity. 306317 Logistic transformation. 13 KruskatWallis test. 2% N L Latent variable. 241 Permutationtests.136 Nonoverlapping sums of squares. 304 mixedeffects.197201 Multivariate analysis variance. 49 for twoway design.219 Multiple comparisontechniques. 163 Lie factor. 209 Longitudinal designs. 219. 67 of multiplicative. 185187 k MANOVA. see analysis of variance Marginal totals. 253257 Pie chart. 67 ratio scales. 112116 for contingency tables. 253 Logistic regression. 300306 himhical. 283. 311 Longitudinal data. 102 Post hoc comparisons. 296 Partitioning of variance. 7678 Outside values. 13 nominal scales.274 Measwment scales interval scales. 172 Multiple regression and analysis of variance. 7576 Pooling sums of squares.
262 Wkoxon Rank Test. 289 Crosstabs. 289 General linear model.253 SPLUS ANOVA. 162169 with zero intercept. 306317 multiple linear. 252253 generalized linear. 210218 Response variable.test.test. 58 Chisquare. 10 R raison d’&e of statistics and statisticians.46 377 Simple effects. 59 Regression.226 Regression to the mean. 91 GLMMultivariate. 9 Psychologicallyrelevant difference. 169183 residuals. 262 trellis graphics.test. 194 Response feature analysis.204 Regression.139141 Spline smoothers.204 Regression.187191 all subsets. 1 6 0 Sparse data in contingencytables. 275277 deletion. 2.289 fisher.322 Regression.158 graph menu. 261 Regression. 92 mcnemar. 183187 diagnostics. 276 in contingency tables. 9 S Sample size.59 graphics palettes. 1 6 1 6 7 Relationship between intraclass correlation coefficient and product moment comlation coefficient. 92 GLMRepeated measures. 262 glm. 289 Mixed effects.262 Coplots. 262 Nonparametric tests.test. 18 pvalue. 263 chisq.322 Q Quasicxperiments. logistic.59 wilcox. 289 Compare samples.2 Random allocation. 263 pie.215 Regression. 280283 Sphericity.test.test. 262 SPSS bar charts. 289 Friedman Rank Test.92 outer.59 help. 121 Shrinking family doctor. loglinear.233 Multiple Comparisons. 191193 distributionfee. 137. 194 standardized. Scatterplot matrix. determination 1719 of. 289 Chisquare test. 157 Residuals adjusted. 14 Risk factor. 262 friedman.INDEX Prospective study.3540 Scatterplots. 58 MannWhitney U. 3235 Scheff6 test. 7375 Secondorder relationship. 59 Counts and proportions. 322 graph menu. 167169 simple linear. 289 Fisher’s exacttest. 322 Resample. 262 KruskalWallisRank Test.46 . 93 Bootstrap. 11 Random intercepts andslopes model. 234 W O V A . 161169 automatic model selection in. 92 aov.53 Significance testing. 205 h e . 59 kruskal. 262 Im. logistic.317321 logistic.297 Sequential sums of squares. 227 Random intercepts model. 263 Signed Rank Test.
173175 Standardized residual.272273 Twodimensional contingency tables. 210219 Surveys. 4448 Trend analysis. 121122 V Testing for independence in a contingency table. 270 Testing mutual independence in tbree a 298 dimensional contingency table. 116126 Unique sums of squares. 187191 stepwise selection.378 Standard normal distribution. all subsets regression. Testing partial independence in a three dimensional contingency table. 273 . 68 of W~thii subject factor. 268275 Qpe Isums of squares. 194 Stemandleaf plots.187191 Variance inEation factor.3233 Students’ ttests. 183187 backward selection.see unique sums of I squares squares Yates’s continuity correction.243245 WlkOXOnMaMWhitneytest. 288 Wkoxon’s signed ranks test. use of.194197 W Weighted kappa. 187191 forward selection. 9 T INDEX U Unbalanced designs. see sequential sums of Variable selection methods in regression. 133 Y Qpe II sums of squares. 6365 Summary measures. 69 Trellis graphics. 298 Threedimensional contingency tables. 293300 h s f o r m i n g data. 47 Standardized regressioncoefficients.7678 ’hoxtwo contingency table. 238242 Witbin groups sum squares.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.