You are on page 1of 385

STATISTICS FOR PSYCHOLOGISTS

An Intermediate Course

Institute ofPsychiatry, Kings College Uniuersity o London f

Brian S. Everitt

7 Connie b

and Vera

Copyright @ 2001 by Lawrence Erlbaum Associates, Inc. I All rights reserved. part of this book may be reproduced in No or any form, by photostat, microfilm, retrieval system, any of the other means, without prior written permission publisher. Lawrence Erlbaum Associates, Inc., Publishers 10 Industrial Avenue J Mahwah, N 07430

I Cover design byKathryn Houahtalin~ I Lacev


Library of Congress Cataloging-in-hbllcation Data

Everitt, Brian. Statistics for psychologists : intermediate course Brian S. Everitt an / cm. p. and Includes bibliographical references index. ISBN 0-8058-3836-8 (alk. paper) 1. Psychology-statisticalmethods. I. We. BF39 2M)l .E933 519.5024154~21

oO-065400

Books published by Lawrence Erlbaum Associatesprinted on are are acid-free paper, and their bindingschosen for strength and durability.
of Rinted in the United StatesAmerica I O 9 8 7 6 5 4 3 2 1

Contents
Preface
iv

1
2 3 4 5

Statistics in Psychology: Data, Models, and a LittleHistory Graphical Methodsof Displaying Data Analysis of Variance I: The One-wayDesign Analysis of Variance I:Factorial Designs I Analysis of Repeated MeasureDesigns

1
21 63

97
133

6 Simple Linear Regression and Multiple Regression Analysis 7 Analysis of Longitudinal Data 8 Distribution-Freeand ComputationallyIntensive Methods

161
209 237 267 293 327 358 369 373

9 Analysis of Categorical Data I Contingency Tables


and the Chi-square Test

10 Analysis of Categorical Data I:Log-Linear Models I


and Logistic Regression

AppendixA

Statistical Glossary

Appendix B Answers to Selected Exercises


References
Index

iii

Preface

Psychologists (and even those training to be psychologists) need little persuading that a knowledgeof statistics is important in both the designof psychological studies andin the analysis of data from such studies.As a result, almostall undergraduate psychology students are exposed introductory statistics course an to during their 6rst year of study at college or university. The details of the topics covered in such a course will no doubt vary from place to place, but they will almost certainly include a discussion of descriptive statistics, correlation, simple regression, basic inference and significance tests, p values, and confidence intervals. In addition, in an introductory statistics course, some nonparametric tests may be described and the analysis of variance may begin to be explored. r, Laudable and necessary as such courses ae they represent only the first step in equipping students with enough knowledge statistics to ensure that they of have of a reasonable chance progressing to become proficient, creative, influential, and or even, perhaps, useful psychologists. Consequently, in their second third years (and possibly in later postgraduate courses), many psychology students will be encouraged to learn more about statistics and, equally important,how to apply statistical methodsin a sensible fashion. It is to these students that this book is of aimed, and it is hoped that the following features the text will help itreach its target.

iv

PREFACE

1. The centraltheme is that statisticsis about solving problems; data relevant to these problems are collected and analyzed to provide useful answers. this To end, the book contains a large number data sets arising from real problems. of real Numerical examplesof the type that involve the skiing activities of belly dancers as and politicians are avoided far as possible. 2. Mathematical details methods are largely confined to the displays. For the of mathematically challenged, the most difficult of these displays can be regarded as black boxes and conveniently ignored. Attempting to understand the input to and output from the black box will be more important than understanding the minutiae of what takes place within the box. In contrast, for those students of with less anxiety about mathematics, study the relevant mathematical material (which on occasion will include the use of vectors and matrices) will undoubtedly help in their appreciationof the corresponding technique. 3. Although many statistical methods require considerable amountsof arithmetic for their application, the burden actually performing the necessary calcuof lations has been almost entirely removed by the development and wide availability of powerful and relatively cheap personal computers and associated statistical sof ware packages. is assumed, therefore, that most students will be using such tool It when undertaking their own analyses. Consequently, arithmetic details are nois ticeable largelyby their absence, although where a little arithmeticconsidered it is in helpful in explaining a technique, then included, usually a table or a display. 4. There are many challenging data sets both in the text and in the exercises provided at the end of each chapter. (Answers, or hints to providing answers, to many of the exercisesare given in an appendix.) 5. A (it is hoped) useful glossary is provided to both remind students about the most important statistical topics they will have covered in their introductory course and to give further details some of the techniques covered this text. of in 6. Before the exercises in each chapter, a short section entitled Computer hints is included. These sections intended to help point the way to to undertake are how the analyses reported in the chapter by using a statistical package. They are not intended to provide detailed computer instructions for the analyses. The book r at concentrates largely twopackages, SPSS and S-PLUS.(Detailsa eto be found on www.spss.com and www.mathsoft.com. A useful reference for the software SPSS is Morgan and Griego, 1998,and for S-PLUS, Krause and Olson,0 0 ) 2 0 . The main is widely used by reason for choosing the former should be obvious-the package working psychologists and psychology students and is an extremely useful tool by for undertaking a variety analyses. But why of S-PLUS because, at present least, at the package not a favorite psychologists? Well, there are two reasons: the first is of is that the authoran enthusiastic (although hardly an expert)the software; is of user the second is that he sees S-PLUS as the software of the future. Admittedly, SPLUS requires more initial effort to learn, the reward for such investmentis but the ability to undertake both standard and nonstandard analyses routinely, with added benefit of superb graphics. This is beginning to sound like a commercial, so

vi

PREFACE

perhaps I will stop here! (Incidentally, those readers do not have access for who to the S-PLUSpackage but would like to it out, there a free cousin,R, details try is www.cran.us.r-project.org.) of which are given on It is hoped that the text will useful in a number of different ways, including be the following.
1. As the main part of a formal statistics coursefor advanced undergraduate also and postgraduate psychology students and for students in other areas of

2-7 the behavioral sciences. For example, Chapters could for the basisof a 10 course on linear models, with, perhaps, Chapter being used to show how be such models can generalized. 2. As a supplement to existing coursdhapters 8 and 9, for example, could an be used to make students more aware of the wider aspects of nonparametri methods and the analysis categorical data. of 3 For self-study. . 4. For quick reference.
I would like to thank MariaKonsolaki for running some SPSS examples for me, and my secretary, Haniet Meteyard, for her usual superb typing and general helpful advice in preparing the book.
Brian S. Everitt 00 London, June2 0

Statistics in Psychology: Data, Models, anda Little History

1.1.

INTRODUCTION

Psychology is a uniquely diverse discipline, ranging from biological aspects of behaviour on the one hand to social psychology onthe othel; and from basic research to the applied professions of clinical, counselling, educational, industrial, organizational and forensic psychology.

-Andrew M.Colman, Companion Encyclopediaof Psychology. 1994.

As Dr. Colman states, psychology indeed a diverse and fascinating subject, and is many students are attracted to it because of its exciting promise of delving into many aspects of human behavior, sensation, perception, cognition, emotion, and as to personality, to name but a It probably comes a disagreeable shock many few. such students thatthey are often called on, early in their studies, to learn about statistics, because the subject sadly, its practitioners, statisticians), mostly (and, are seems seen as anything but exciting, and the general opinion to be that both should be avoidedas far as possible. However, this head in the sand attitude toward statistics taken by many a psychology student is mistaken. Both statistics (and statisticians) can be very 1

CHAPTER 1

exciting () and anyway, a knowledge and understanding the subjectis essen!, of tial at each stage in the often long, and frequently far from smooth, road from planning an investigation to publishing its results in a prestigious psychology journal. Statistical principles will beneeded to guide the design of a study and the collection of data. The initial examination data and their description will of involve a variety of informal statistical techniques. More formal methods esof timation and significance testing may also be needed building a model for the in data and assessing itsfit. Nevertheless, most psychologists not, and have no are desire to be, statisticians (I cant t i k why). Consequently, questions have to be asked about what and hn how muchstatistics the average psychologist needs It is generally agreed to know. that psychologists should have some knowledge of statistics, at least, and this is reflectedin the almost universal exposure of psychology students to an introductory as course in the subject, covering topics such descriptive statistics-histograms, means, variances, and standard deviations; elementary probability; the normal distribution; inference; t tests, chi-squared tests, confidence intervals, and on; so correlation and regression; and simple analysesof variance.

in statistics, it freAlthough such a course often provides essential grounding quently gives the wrong impression of what is and what is not of greatest importance in tackling real life problems.For many psychology students (and their p teachers), for example, a value is still regarded as the holy grail and almost the ruison d &re of statistics (and providing one, the chief of statisticians). Derole still seem spite the numerous caveats issued in the literature, many psychologists determined to experience on finding ap value of .M9 and despair on finding joy one of .051. Again, psychology students may, on their introductory course, learn how to perform a t test (they may, poor things, even be made to carry out the arithmetic themselves), but they may be ill equipped to answer the question, still How can I summarize and understand main features of this set of data? the The aim of this text is essentially twofold. First it will introduce the reader to a variety of statistical techniques not usually encountered in an introductory course; second, and equally, if not more importantly, will attempt to transform it the knowledge gained in such a course into a more suitable form for dealing with the complexities of real data. Readers will, for example, be encouraged to replace the formal usesignificance tests and the rigid interpretation values of of p as with an approach that regards such tests giving informal guidance on possible evidence of an interesting effect. Readers evenbe asked to abandon the ubiqmay uitous signficance test altogether favor of,for example, a graphical display that in makes the shucture in the data apparent without any formal analyses. buildBy ing on and reshaping the statistical knowledge gained their first-level course, in

STATISTICS I PSYCHOLOGY N

students will be better equipped(it is hoped) to overcome the criticisms of much current statistical practice implied in the following quotations from two British statisticians: Most real-life statistical problems have or more nonstandard features. There one are no routine statistical questions; only questionable statistical routines.

sir David Cox.

Many statistical pitfallslie in wait for the unwary. Indeed statistics is perhaps more open to misuse than any othersubject,particularly by the nonspecialist. The misleading average, the graph withfiddled axes: the inappropriate p-value and the linear regression fitted to nonlinear dataare. just four examplesof horror stones whichare part of statistical folklore. Christopher Chaffield.

1.2. STATISTICS DESCRIPTIVE, STATISTICS INFERENTIAL, AND STATISTICAL MODELS


This text willbe primarily concerned with the following overlapping components of the statistical analysis of data.
1. The initial examination of the data, with the aimof making any interesting patterns in the data morevisible. 2. The estimation of parameters of interest. 3. The testing of hypotheses about parameters. 4. Model formulation, building,and assessment.

Most investigations will involve little clear separation between each of these four components,but for now it will be helpful totry to indicate, in general terms, the unique parts each. of

1.2.1. The

The initial examination of data is a valuable stageof most statistical investigations, not only for scrutinizing and summarizing data, but often also as an aid to later model formulation.The aim is to clarify the general structure of the data, obtain simple descriptive summaries, perhapsget ideas and for a more sophisticated analbe the ysis. At this stage, distributional assumptions might examined (e.g., whether data are normal), possible outliers identified (i.e., observations very different from the bulkof the data that may the resultof, for example, a recording error),relabe tionships between variables examined,so on. In some cases the results from and this stage may contain such an obvious message that more detailed analysis becomes

Initial Examination of Data

CHAPTER 1

largely superfluous. Many the methods used inthis preliminary analysisof the of gmphicul, and itis some of these that are described in Chapter 2. data will be

1.2.2.

Estimation and Significance Testing

Although in some cases an initial examination the datawill be all that is necof essary, most investigations will proceed to a more formal stage of analysis that involves the estimationof population values of interest and/or testing hypotheses about paaicular values for these parameters. is at this point that the beloved sigIt nificance test (in some form or other) enters the arena. Despite numerous attem by statisticians to wean psychologists away from such tests (see, e.g., Gardner p and Altman, 1986), the value retainsa powerful hold overthe average psychology researcher and psychology student. There are a number of reasons why it should not. First, p value is poorly understood. Althoughp values appearin almost every account of psychological research findings, there is evidence that the general degree of understanding of the true meaningof the term is very low. Oakes (1986). for example, put the following test 70 academic psychologists: to
a Suppose you have treatment whichyou suspect may alter performance on a certain task You compare the meansyour controland experimental groups (say 20 subjects of in each sample). Further suppose youa use simple independent means t-test your and result is t = 2.7, df = 18, P = 0.01. Please mark each of the statements below as true or false.

You have absolutelydisproved the null hypothesis that there is no difference between the population means. You have found the probability of thenull hypothesis being true. You have absolutely proved your experimental hypothesis. You can deduce the probabilityof the experimental hypothesis beingtrue. Youknow, if you decided to reject the null hypothesis, the probability that you are making the wrong decision. You have a reliable experiment in the sense that if, hypothetically, the experiment were repeated a great number of times, you would obtain a significant result on 99% of occasions. The subjects were all university lecturers, research fellows, or postgraduate students. The results presented Table 1.1 are illuminating. in Under a relative frequency of probability,all six statements are in factfalse. view Only 3 out of the 70 subjects came the conclusion. The correct interpretation to of t value the probability associated with the observed is
the probability of obtaining the observed (or data that represent a more extreme data departure from the null hypothesis) if the null hypothesisis true.

STATISTICS I PSYCHOLOGY N
TABLE 1.1 Frequencies and Percentagesof True Responses i a Test of Knowledge n of p v l e aus
Statement

f
1 2 5 4 46

1. The null hypothesis is absolutely disproved. 2. The probabilityof the null hypothesis hasbeen found. 3. The experimental hypothesis is absolutely proved. 4. The probabilityof the experimental hypothesis be deduced. can 5. The probability that the decision taken is wrong is known. 6. A replication has a. probability of being significant. ! B

1.4 35.1

60 42

65.1 85.1 60.0

5.1

Clearly the number of false statements described as true in this experiment would have been reduced if the true interpretation of a p value had been included with the six others. Nevertheless, the exercise is extremely interestingin highlighting the misguided appreciation p values held by a group of research of psychologists. Second, ap value represents only limited information about the results from a study. Gardner and Altman make the point that the excessive p values (1986) use of in hypothesis testing, simply as a means of rejecting or accepting a particular hypothesis, at the expense other ways of assessingresults, has reached such a of degree that levelsof significance are often quoted alone in the main text and in abstracts of papers with no mention of other more relevant and important quantiti or The implicationsof hypothesis testing-that there can always be a simple yes no answeras the fundamental result frompsychological study-is clearly false, a and usedin this way hypothesis testingis of limited value. The most common alternative to presenting results in terms of p values, in relation to a statistical null hypothesis, to estimate the magnitude some pais of rameter of interest along with some interval that includes the population value of the parameter with some specified probability. Suchconfidence intervals can be found relatively simplyfor many quantities of interest (see Gardner and Altman, 1986 for details), and although the underlying logic of interval estimaton is essentially similar to that of significance tests, they do not carry with them the pseudoscientific hypothesis testing languageof such tests. Instead they give a plausible range values for the unknown parameter.As Oakes (1986) rightly of comments,
the significance test relates to what the population parameter is nor: the confidence interval gives a plausible range for what the parameter i. s

CHAFTER 1

So should thep value be abandoned completely? Many statistician would anI at swer yes, butthink a more sensible response, least for psychologists, would be a resounding maybe. Such values should rarely be used in a purely confirmato way, but in an exploratory fashion they can be useful in giving some informal guidance on the possible existence interesting effect, of an even when the required often posassumptions of whatever test beiig used are known to be invalid.isIt is sible to assess whethera p value is likely to be an underestimate overestimate or and whether the result clear one way or the other. In this text, bothp values and is be point former confidence intervals willused; purely from a pragmatic of view, the are neededby psychology students because they remain central importancein of the bulk of the psychological literature.

of Data

1.2.3.

The Role of Models in the Analysis

Models imitate the properties of real objects in a simpler or more convenient of form. A road map, for example, models part the Earths surface, attempting to reproduce the relative positions towns, roads, and other features. Chemists use of models of molecules to mimic their theoretical properties, which inturn can be used to predict the behavior of real objects. model follows accurately as A good as possible the relevant properties the real object, while being convenient to use. of or Statistical model sallow inferences to be made an object, oractivity, proabout cess, by modeling some associated observable data. Suppose,example, a child for for has scored20 points on a test of verbal ability, and after studying a dictionary 24 If it some time, scores points on a similar test. is believed that studying the dictionary has caused improvement, then a possible model is happening is an of what

20 = [persons initial score), 24 = (persons initial score]

+ [improvement}.

(1.1)

( 1.2)

The improvement can be found by simply subtracting the first score from the second. Such a model is, of course, very naive, because it assumes that verbal ability can be measured exactly.A more realistic representation of the scores, which two allows for a possible measurement is enor,

y represents the true initial wherexl and x2 represent the two verbal ability scores, measure of verbal ability, and 6 represents the improvement score. The 1 and terms

STATISTICS I PSYCHOLOGY N
2 represent

A model gives a precise description what the investigator assumes occurof is ring in a particular situation; in the case above it says that the improvement S is (An considered to be independentof y and is simply added to it. important point that should be noted here that if you do not believe in a model, you should not is on perform operations and analyses the data that assumeit is true.) Suppose now that it is believed that studying the dictionary does more good if a child already has a fair degreeof verbal ability and that the various random influences that affect the test scores are also dependent on the true scores. Then an appropriate model would be

as x1

the measurement error. Here the improvement score estimated can be

x2.

Now the parametersare multiplied rather than added to give the observed scores, x1 and x2. Here S might be estimated dividing x2 by xl. by A further possibility that there is a limit, to improvement, and studying the is h, dictionary improves performance on the verbal ability by some proportionof test the childs possible improvement,h y. A suitable model would be

x1
x2

= y +El, = y +(A. y)S + 2. -

(1.7)
(1.8)

With this model there is no way to estimate S from the data unless a value of A. is given or assumed. One of the principal uses statistical models to attemptto explain variation of is in measurements.This variation may be the result of a variety of factors, including variation from the measurement system, variation caused by environmental conditions that change over the course of a study, variation from individual to individual (or experimental unit to experimental unit), and so on. The decision about an appropriate model should be largely based on the investigators prior knowledge of an area. In many situations, however, additive, linear models, such as those given in Eqs. (1.1) and (1.2). are invoked by default, because such models allow many powerful and informative statistical techniques to be applied to the data.Analysis of variance techniques (Chapters 3-5) and regression analysis (Chapter 6),for example, use such models, and in recent yearsgeneralized linear models (Chapter 10) have evolved that allow analogous models to be applied to a wide varietyof data types. Formulating an appropriate model can be a difficult problem. The general principles of model formulation are covered in detail in books on scientific method, as but they include the need to collaborate with appropriate experts, to incorporate

CHAPTER 1

much background theoryas possible, and so on. Apart from those formulated entirely onapriori theoretical grounds, most modelsto are, extent at least, based some on an initial examinationof the data, although completely empirical models are rare. The more usual intermediate case arises whenofamodels is entertained class apriori, but the initial data analysis is crucial in selecting a subset models from of In is determined the class. a regression analysis, for example, the general approach apriori, but ascatlerdiugrum and a histogram (see Chapterwill be of crucial im2) of and portance in indicating the shape the relationship in checking assumptions such as normality. The formulation a preliminary model from initial examination the data of an of is the first step in the iterative, formulation-criticism, cycle of model building. This can produce some problems because formulating a model and testingon it the same data not generally considered good science. is always preferableto is It confirm whether a derived modelis sensible by testing on new data. When data are difficult or expensive to obtain, however, then some model modification and assessment of fit on the original data almost inevitable. Investigators need to be is aware of the possible dangers such a process. of The most important principalto have in mind when testing modelson data is that of parsimony; that is, the best model one that provides an adequate fit to is data with thefewest numberparameters. This is often known as Occams razor, of which for those with a classical education is reproduced herein its original form: Entia nonstunt multiplicanda praeter necessitatem.

I .3.

TYPES OF STUDY

It is said that when Gertrude Stein lay dying, she roused briefly and asked her as-

sembled friends, Well, whats the answer? They remained uncomfortably quiet, at which she sighed, In that case, whats the question? Research in psychology, and in science in general, is about searching for the answers to particular questions of interest. Do politicians have higher IQs than university lecturers? men havefaster reaction times than women? Should phobic Do patients be treated psychotherapy or by abehavioral treatment such as flooding? by Do children who are abused have more problems laterin life than children who are not abused? Do children of divorced parents suffer more marital breakdowns themselves than children from more stable family backgrounds? In more general terms, scientific research involves a sequence of asking and answering questions about the naturerelationships among variables (e.g., How of B? does A affect Do A and B vary together? Is A significantly differentfrom B?, and so on). Scientific research carried out at many levels that differ the types is in of question asked and, therefore,the procedures used to answer them. Thus, the in choice of which methods to usein research is largely determined by the kinds of questions that are asked.

STATISTICS I PSYCHOLOGY N

Of the many types of investigation used in psychological research, the most common are perhaps the following: surveys; observational studies; quasi-experiments; and experiments. Some briefcomments about each of these four types are given below; a more deby and tailed account availablein the papers Stretch, Raulin Graziano, and Dane, is which all appear in the second volume of the excellent Companion Encyclopedia ofPsychology;see Colman (1994).

1.3.1. Surveys
Survey methods are based on the simple discovery that asking questions is a remarkably efficient to obtain information from and about people (Schuman way and Kalton, 1985, p. 635). Surveys involve an exchange of information between researcher and respondent; the researcher identifies topics of interest, and the respondent provides knowledge or opinion about these topics. Depending upon length and content of the survey well as the facilities available, exchange as this or can be accomplished by means of written questionnaires, in-person interviews, telephone conversations. Surveys conducted psychologists are usually designed to elicit information by about the respondents opinions, beliefs, attitudes and values. Some examples of data collected in surveys and their analysis given in several later chapters. are

1.3.2.

Observational Studies

Many observational studies involve recording data the members of naturally on of occurring groups, generally over a periodtime, and comparing the rate at which a particular event ofinterest occurs in the different groups (such studies are often referred to as prospective). If, for example, an investigator was interested in the health effectsof a natural disaster such an earthquake, those who experienced as the earthquake could be compared with a group people who did not. Another of commonly used type of observational is the cme-control investigation. Here a study group of people (the cases) that all have a particular characteristic (a certain d perhaps) are compared with a grouppeople who do not have the characteristic of (the controls),in terms of their past exposure some event or factor. A recent to risk example involved women that gave birth to very low birthweight infants (less than 1500g), compared with women that had a child of normal birthweight with res to their past caffeine consumption. The types of analyses suitable for observational studiesare often the sameas those used for experimental studies (see Subsection 1.3.4). Unlike experiments,

10

CHAPTER l

however, the lack of control over the groups to compared in an observational be in study makes the interpretation of any difference between the groups detected the study open to a variety of interpretations. investigation into the relationIn an ship between smoking and systolic blood pressure, for example, the researcher (as cannot allocate subjects to be smokers and nonsmokers would be required in an experimental approach), for obvious ethical reasons; instead, the systolic blood pressure of naturally occurring groups of individuals who smoke, and those who do not, are compared. In such a study any difference found between the blood be pressure of the two groups would open to three possible explanations.

1. Smoking causes a change in systolic blood pressure. 2. The level blood pressure has tendency to encourage or discourage smokof a ing. 3. Some unidentified factors play a part determining both the level of blood in pressure and whether not a person smokes. or

1.3.3.

Quasi-Experiments

Quasi-experimental designs resemble experiments proper (see next section), but they are weak on some of the characteristics. particular (and like the observaIn tional study), the ability to manipulate the be compared is not under the into groups vestigators control. Unlike the observational study, however, the quasi-experimen involves the intervention of the investigator in the sense that he or she applies a variety of different treatments to naturally occurring groups. investigation In an 15 year of the effectiveness of three different methods of teaching mathematics to be given olds, for example, a method might to all the members of a particular class three in a school. The classes that receive the different teaching methods would be selected to be similar to each other on most relevant variables, and the methods would be assigned to classes on a chance basis.

1.3.4.

Experiments

Experimental studies include most of the work psychologists carry out on anim and many of the investigations performed on human subjects in laboratories. The an essential feature of experiment is the large degree of control in the hands of th experimenters-in designed experiments, the experimenter deliberately changes the levels of the experimental factors to induce in the measured quantities, variation to leadto a better understanding of the relationship between experimental factors and the response variable. And, in particular, experimenters control the manner in which subjects are allocated the different levels of the experimental factors. to In a comparison of a new treatment with one used previously, for example, the researcher would have control over the scheme for allocating subjects to treat if the The manner in which this control is exercised is of vital importance results of

STATISTICS I PSYCHOLOGY N

11

the experimentare to be valid. f, for example, subjects who first to volunteer I are are all allocated to thenew treatment, then the two groups may differ level of in motivation and so subsequently in performance. Observed treatment differences would be confounded with differences produced by the allocation procedure. The method most often used to overcome such problems random allocation is of subjects to treatments. Whether a subject receives or the old treatment new the is decided, for example, by the toss a coin. The primary benefit that randomization of has is the chance (and therefore impartial) assignment of extraneous influences among the groups to be compared, andoffers this control over such influences it ae r whether or not theyknown by the experimenter to exist. Note that randomization does not claim to render the two samples equal with regard to these influences; if however, the same procedure were applied in repeated samplings, equality would be achieved in the long run. This randomization ensures a lackof bias, whereas other methodsof assignment may not. In a properly conducted experiment (and is the main advantage of such a this study), the interpretation an observed group differencelargely unambiguous; of is its cause is very likely to be the different treatments or conditions received by the two groups. In the majority of cases, the statistical methodsmost useful for the analysis ae of data derived from experimental studiesr the analysisof variance procedures -. described in Chapters 3 5

1.4. WPES OF DATA


The basic material is the foundation of all psychological investigationsthe that is measurements and observations made set of subjects. Clearly, not all measureon a ment is the same. Measuring an individuals weightqualitatively different from is as measuring his or her response to some treatment on a two-category scale, such improved or not improved: for example. Whatever measurements are made, they have to objective, precise, and reproducible,reasons nicely summarized be for (1986): in the following quotation from Fleiss
The most elegant design a study will not of overcomethe damage causedby unreliable imprecise measurement. The requirement that ones data be of high quality is at least as important a component of a proper study design as the requirement for

randomization,double blinding,controlling where necessary for prognostic factors, and so on. Larger sample sizesthan otherwise necessary, biased estimates and even biased samples are some of the untowardconsequences of unreliable measurements that can be demonstrated.

Measurement scales are differentiated according to the degree of precision involved. If it is said that an individual has a highIQ, it is not as precise as the

12

CHAPTER 1

statement that the individual has an of 151. The comment that a woman is tall IQ is not as accurate as specifying that her height 1.88 m. Certain characteristics is of of interest are more amenable to precise measurement than others. With the use an accurate thennometer, a subjects temperaturebe measured very precisely. can Quantifying the level anxiety or depression of a psychiatric patient, assessing of or the degree of pain of a migraine sufferer is, however, far a more difficult task. Measurement scales may be classified into a hierarchy ranging from categorical, though ordinal to interval, and finally to ratio scales. Each of these will now be considered in more detail.

1.4.l . NominalorCategorical Measurements


Nominal or categorical measurements allow patientsbe classified with respect to to some characteristic. Examples of such measurements are marital starus, sex, and blood group. The propertiesof a nominalscale are as follows. 1. Data categories mutually exclusive (an individual can belong one are to only category). 2. Data categories haveno logical order-numbers may be assigned to categories but merely as convenient labels. A nominal scale classifies without the categories being ordered. Techniques particularly suitable for analyzing this type of data are described in Chapters 9 and 10.

1.4.2. Ordinal Scales


The next level in the measurement hierarchy is the ordinal scale. This has one additional property over those a nominalscale-a logical orderingof the cateof gories. With such measurements, the numbers assigned to the categories indicate the mount of a characteristic possessed. psychiatrist may, for example, grade A patients on an anxiety scaleas not anxious, mildly anxious, moderately anxious, or severely anxious, and he she may use the numbers0,1,2, and 3 to label the or categories, with lower numbers indicating less anxiety. The psychiatrist cannot infer, however, that the difference in anxiety between patients with scores of say 0 and 1 is the same that between patients assigned scores 2 3. The scores as and on an ordinal scale do, however, allow patients ranked with respect to the charto be acteristic being assessed. Frequently, however, measurementson an ordinal scale are described in terms their mean and standard deviation. is not appropriate of This if the steps on the scale are not known to be of equal length. Andersen (1990), Chapter 15, gives a nice illustration of whythis is so.

STATISTICS I PSYCHOLOGY N

.l 3

The followingare the properties of ordinal scale. an

1. Data categories are mutually exclusive. 2. Data categories have some logical order. 3. Data categories scaled according to the amount of a particular characterare
istic they possess. Chapter 8 covers methodsof analysis particularly suitable for ordinal data. 1.4.3.

Interval Scales

The third level in the measurement scale hierarchy is the interval scale. Such l scales possess a l the properties ofan ordinal scale, plus the additional property that equal differences between category levels, on any part of the scale, reflect equal differences in the characteristic being measured. An example of such a scale is temperature on the Celsius or Fahrenheit scale; the difference between temperatures of 80F and 90F is the same between temperatures of as 30F and 40F. An important point to make about interval scales is that the zero point is simply another point on the scale; does not represent the starting point of the it scale, nor the total absence of the characteristic being measured. The of properties an interval scale are follows. as

1. Data categories are mutually exclusive. 2. Data categories have a logical order. 3. Data categories are scaled according to the amount of the characteristic they possess. 4. Equal differences in the characteristic are represented by equal differences in the numbers assigned to the categories. 5. The zero point is completely arbitrary. 1.4.4. Ratio Scales

The highest level in the hierarchy measurement scales is the ratio scale.This of type of scale has one property in addition to those listed for interval scales, name the possession of a true zero point that represents the of the characteristic absence being measured. Consequently, statements can be made both about differences on the scale the ratio of points on the scale. and An example is weight, where not only 100 as is the difference between kg and 50 kg the same between 75 kg and 25 kg, but an object weighing100 kg can be said to be twice as heavy as one weighing 50 kg. This is not true of, say, temperature on the Celsius or Fahrenheit scales, where a reading of 100" does not represent twice the warmth of a temperature of 50". If, however, the temperatures were measured on theKelvin scale, which true does have a zero point, the statement about the ratio could be made.

14

CHAPTER 1

The properties of a ratio scale are as follows.

2. Data categories have a logical order. 3. Data categories are scaled according to the amount of the characteristic they 4. Equal differences in the characteristic are represented by equal differences
in the numbers assigned the categories. to 5. The zero point represents an absence of the characteristic being measured. possess.

1. Data categories are mutually exclusive.

again inlater chapters. A further classification of variable typesis into response or dependent variables (also oftenreferredto as outcome variables), andindependent or explanatory variables (also occasionally calledpredictor variables). Essentially, the former are the variables measured by the investigator that appear the left-hand side of the on equation defining the proposed model the data; the latterare variables thought for to possibly affect the response variable and appear on the right-hand side of the model. It is the relationship between the dependent variable and the so-called independent variables (independent variablesare often related;see comments and examples in later chapters) with which most studies in psychology concerned. are One further point; in some contexts, particularly that of analysis of variance (see Chapters 3-3, the independent variables also often knownasfactor variables, are or simplyfactors.

An awareness of the different types of measurement that may be encountered in psychological studies is important, because the appropriate method statistical of analysis will often depend on the typeof variable involved; this point is taken up

1.5. A LlTTLE HISTORY


The relationship between psychology and statistics is a necessary one (honest!). A widely quoted remark by Galton that until the phenomenaof any branch of is knowledge have been submitted to measurement and number, it cannot assume the dignity of a science. And Galton was not alone in demanding measurement and numbers as a sine qua non for attaining the dignityof a science. Lord Kelvin is quoted as sayingthatonecannotunderstand a phenomenon until it is subjected to measurement, and Thorndike has said that whatever exists, exists in same amount, and could therefore eventually be subjected to measurement and counting. Psychology has long striven to attain the dignity of science by submitting its observations to measurement and quantification. According to Singer (1979), David Hartley (1705-1757), in his major work, Observations on Man (1749), discussed the relevance o f probability theory to the collection of scientific evidence,

STATISTICS I PSYCHOLOGY N

1 5

and argued for the use of mathematical and statistical ideasin the study of psychological processes. A long-standing tradition in scientific psychology is the application of John Stuart Mills experimental method of difference to the study of psychological problems. Groups of subjects are compared who differ with respect to the experimental treatment, but otherwise are the same in all respects. Any difference in outcome can therefore be attributed to the treatment. Control procedures such as randomization or matching on potentially confounding variables help bolster the assumption that the groups are the same in every way except the treatment conditions. in The experimental tradition psychology has long been wedded to a particular statistical technique, namely the analysis variance (ANOVA). The principles of of experimental design and the analysis variance were developed primarily by of Fisher in the 1920s,but they took time to be fully appreciated by psychologists, who continued to analyze their experimental data with a mixture of graphical and simple statistical methods until into the 1930s.According to Lovie (1979),the well earliest paper that ANOVA in its titlewas by Gaskill and Cox(1937). Other had in early uses of the technique are reported Crutchfield (1938) and in Crutchfield and Tolman(1940). Several of these early psychological papers, although paying service to the lip use of Fishers analysis of variance techniques, relied heavily on more informal strategiesof inference interpreting experimental results. in Theyear 1940,however, saw a dramatic increase in the use of analysis of variance in the psychological Garrett and Zubin was able cite over to literature, and by 1943 the review paper of 40 studies using an analysis of variance or covariancein psychology and education. hs Since then, ofcourse, the analysis variance in all its guises a become the main of 2 technique usedin experimental psychology. An examination of years of issues of the British Journal ofpsychology, for example, showed that over50% of the papers contain one the other application the analysis variance. or of of Analysis of variance techniques are covered in Chapters and 5, and they 3,4, are then shown to be equivalent to the multiple regression model introduced in Chapter 6.

1.6. WHY CANT A PSYCHOLOGIST BE MORE LIKE A STATISTICIAN

(AND VICE VEFXSA)?


Over the course of the past 50 years, the psychologist has become a voracious consumer of statistical methods,but the relationship between psychologist and statisticianis not always easy, happy,or fruitful Statisticians complain an one. that psychologists put undue faith signficance tests, often use complex methods in of in general analysiswhen the data merit only arelatively simple approach, and abuse

16

CHAPTER l

many statistical techniques. Additionally, many statisticians feel that psychologists have become too easily seduced by user-friendly statistical software. These statisticians are upset (and perhaps even made to feel a little insecure) when their to plot a few graphsignored in favor of a multivariate analysis covariance or is of similar statistical extravagence. But if statisticians at times horrified by the way in which psychologists apare are by ply statistical techniques, psychologists no less horrified manystatisticians apparent lackof awareness of what stresses psychological research can place on an investigator. A statisticianmay demand a balanced design with subjects in 30 for each cell,so as to achieve some appropriate power the analysis. But it is not the statistician is faced with the frustration caused by a last-minute phone call who to from a subject who cannot take an experiment that has taken several hours part in arrange. The statistician advising on a longitudinal study may call effort for more in carrying out follow-up interviews,that no subjects are missed. however, so is, It the psychologist who must continue to persuade peopletl about potentially to ak distressing aspects of their lives, who must confront possibly dangerous respondents, or who arrives at a given (and often remote) address to conduct an intervie In do only to find that the personnot at home. general, statisticians not appear to is In appreciate the complex stories behind each data in a psychological study. point addition, it is not unknownfor statisticians perform analyses that are statistically to sound but psychologically naive or even misleading. accomplished statistician, An for example, once proposed interpretation of findings regarding the benefits an of nursery education, which all subsequent positive effects could be accounted in for For in terms of the parents choice of primary school. once, it was psychologists who had to suppress aknowing smile; in the country for which the results were obtained, parents typically did not have any opportunity the schools their to choose children attended! One way of examining possible communication problems between psycholthe ogist and statistician is for eachto know more about the language of the other. It is hoped this text will help in this process and enable young (and notso young) psychologists to learn more about theway statisticians approach the difficulties of data analysis, thus making their future consultations with statisticians more productive and .less traumatic. (What missing in this equation is, of course, a is suitable Psychologyfir Statisticians text.)

1.7. COMPUTERS AND STATISTICAL SOFTWARE


The development computing and advances statistical methodology have gone of in hand in hand since the early 196Os, and the increasing availability, power, and low cost of todays personal computer has further revolutionalized the way users of statistics work. It is probably hard for fresh-faced students of today, busily

STATISTICS I PSYCHOLOGY N

17

e-mailing everybody, exploring the delights of the internet, and in every other way displaying their computer literacy on the current generationof Pcs, to imagine just what life was like for the statistician the userof statistics in the daysof and or of simple electronic mechanical calculators, or even earlier when large volumes numerical tables were the only arithmetical aids available. It is a salutary exercise (of the young people today hardly know theyborn variety) to illustrate with are a little anecdote what things were like in the past. GodfreyThomsonwasaneducationalpsychologistduringthe1930sand 1940s.Professor A. E. Maxwell(personalcommunication)tellsthestory of Dr. Thomsons approach to the arithmetical problems faced when performing a factor analysisby hand. According to Maxwell, Godfrey Thomson and his wife would, early in the evening, place themselves on either sidesitting of their room fire, Mrs. Dr. Thomson equipped with several pencils and much paper, and Thomson with a copy of Barlows Multiplication Tables. For several hours the conversation would consist little more than Whats 613.23 multiplied by 714.62?, and of 438134.44: Whats 904.72 divided by 986.31? and 0.91728: and so on. Nowadays, increasingly sophisticated and comprehensive statistics packages are available that allow investigators easy access enormous variety of statistito an cal techniques. is not without considerable potentialperforming very poor This for grasped and often misleading analyses potential that many psychologists have (a with apparent enthusiasm), but it would be foolish to underestimate the advantages as of statistical software to users of statistics such psychologists. In this text it will or be assumed that readers willbe carrying out their own analyses by using one other of the many statistical packages now available, and we shall even end each by chapter with l i i t e d computer hints for canying out the reported analyses using SPSS or S-PLUS. One major benefit of assuming that readers will undertake analyses by using a suitable packageis that details of the arithmetic behind the methods will only rarely have given; consequently, descriptions arithmetito be of cal calculationswill be noticeable largely by their absence, although some tables will contain a little arithmetic where this is deemed helpfulin making a particular point. Some tables will also contain a little mathematical nomenclature and, occasionally, even some equations, which a numberof places will use vectors in and matrices. Readers who find too upsetting to even contemplate should pass this speedily by the offending material, regarding it merely black box and taking as a heart that understandingits input and output will,in most cases, be sufficientto undertake the corresponding analyses, and understand their results.

1.8. SAMPLE SIZE DETERMINATION

One of the most frequent questions asked when first planning a study in psychology is how many subjectsdo I need? Answering the question requires consideration of a number factors, for example, the amount time available, the difficulty of of of

18

CHAPTER 1

finding the type of subjects needed, and the cost of recruiting subjects. However, when the question is addressed statistically,a particular type of approach is generally used, in which these factors are largely ignored, at least in the first instance. Statistically, sample size determination involves identifying the response variable of most interest, specifying the appropriate statistical test to be used, setting a significance level, estimating the likely variability in the chosen response, choosing the power the researcher would like to achieve (for those readers who have forgotten, power is simply the probability of rejecting the null hypothesis when it is false), and committing to a magnitude of effect that the researcher would like to investigate. Typically the concept of apsychologically relevant difference is used, that is, the difference one would not like to miss declaring to be statistically significant. These pieces of information then provide the basis of a statistical calculation of the required sample size. For example, for a study intending to compare the mean values of two treatmentsby means of a z test at the a! significancelevel, and where the standard deviation of the response variable is know to be (I, the formula for the number of subjects required in each group is

(1.9)
where A is the treatment difference deemed psychologically relevant, /? = 1 power, and
Za/2 is the value of the normal distribution that cuts off an upper tail probability of a/& that is, if a! = 0.05, then Zap = 1.96; and Z is the value of the normal distribution that cuts off an upper tail probability , of /3; that is, if power = 0.8, so that /? = 0.2, then Z, = 0.84.

m e formula in Eq. (1.9) is appropriate for a two-sided test of size 01.1 For example, for A = 1, u = 1, testing at the 5% level and requiring a power of 80%, the sample size needed in each group is

= 15.68. (1.10) 1 So, essentially, 16 observations would be needed in each group. Readers are encouraged to put in some other numerical values in Eq. (1.9) to get some feel of how the formula works, but its general characteristicsare thatn is anincreasing function of 0 and a decreasing function of A, both of which are intuitively

n=

2 x (1.96

+ 0.84)'

x 1

STATlSTlCS I PSYCHOLOGY N

19

understandable: if the variability response increases, then other things being of our equal, we ought to need more subjects come to a reasonable conclusion.we to If seek a bigger difference, we ought ableto find it with fewer subjects. to be In practice there are, of course, many different formulas for sample size determination, and nowadays software is widely available for sample size determination in many situations (e.g., nQuery Advisor, Statistical Solutions Ltd., www.statsolusa.com). However, one note of caution might be appropriate here, and it is that such calculations are often little more than a guess masquerading as is mathematics (see Senn, 1997). as in practice there often a tendency to adjust the difference sought and the power, in theoflight the researcher knowsa what is practical sample size respect to time, money, and on. Thus reported sample in so size calculations should frequently be taken a large pinch salt. with of

1.9. SUMMARY
1. Statistical principles central to most aspects of a psychological investiare

2. Data and their associated statistical analyses form the evidential paas of
3 Significance testingis far f o the be alland end allof statistical analyses, . rm that but it does still matter because evidencecan be discounted as an artifact

gation.

psychological arguments.

p of sampling will not be particularly persuasive. However, values should are not be taken too seriously; confidence intervalsoften more informative. 4. A good statistical analysis should highlight thoseaspects of the data that be are relevant to the psychological arguments, so clearly and fairly, and do resistant to criticisms. 5. Experiments lead to the clearest conclusions about causal relationships. 6. Variable type often determines the most appropriate method of analysis.

EXERCISES
1. A well-known professor of experimental psychology once told the author, . 1

If psychologists carry out their experiments properly, they rarely need statistics
or statisticians.
1. Guess the professor! 2. Comment on his or her remark.

1. The Pepsi-Cola Company carried out research to determine whether people . 2 tended to prefer Pepsi Cola to Coca Cola. Participants were asked to taste two glasses of cola and then state which they prefemd. The two glasses were not

20

CHAPTER 1

labeled as Pepsi or Coke obvious reasons. Instead the Coke glass was labeled for Q and the Pepsi glass was labelledM. The results showed that more than half choose Pepsi over Coke and Sandler, 1979, 11).Are there any alternative muck p. explanations for the observed difference, other than the taste the two drinks? of Explain how you would carry a study assess any alternative explanation you out to think possible.
1 . Suppose you develop a headache while working for hours at your com3 but puter (this is probably a purely hypothetical possibility,use your imagination). You stop, go into another room, and take two aspirins. After about 15 minutes you e your headache has gone and r m to work. Can you infer a definite causal relationship between taking the aspirin curing the headache? and

1.4. Attribute the following quotations about statistics and/or statisticians. 1. To understand Gods thoughts we must study statistics, these are a meafor sure of his purpose. 2. You cannot feed the hungry statistics. on 3. A single death a tragedy; a million deathsa statistic. is is 4. I a m not anumber-I a m a free man. 5. Thou shallnot sit with statisticians nor commit a Social Science. 6. Facts speak louder than statistics. 7. I a m one of the unpraised, unrewarded millions without whom statistics would be a bankrupt science. is we who are born, marry, and who die in It constant ratios.

2
Graphical Methods of Displaying Data

2.1.

INTRODUCTION
-H. Wainer, 1997.

A good graph is quiet and lets data tell their the story clearly and completely.

According to Chambers, Cleveland, Kleiner, and m e y (1983), there is no stathis tistical tool that is as powerful as a well chosen graph, and although may be a trifle exaggerated, thereis considerable evidence that there are patterns in data, and relationships between variables, are easier to that identify and understand from graphical displays than from possible alternatives such as tables. For this reason researchers who collect data are constantly encouraged by their statistical colleagues to make bothpreliminary graphical examination their data and to use a of a variety of plots and diagrams aid in the interpretationof the resultsfrom more to formal analyses. The prime objective of this approach is to communicate both to ourselves and to others. But just what is a graphical display? A concise description is given by hfte (1983).

21

22

CHAFTER 2

Data graphics visually display measured quantities by means of the combined use of points, lines,a coordinate system, numbers, symbols, words, shading andcolour. Tufte estimates that between 900 billion (9 x 10") and 2 trillion (2 x images of statistical graphics are printed each year. Some of the advantages of graphical methods have been listed by Schmid
(1954).

1. In comparison with other types of presentation, well-designed charts are more effective in creating interest andin appealing to the attention of the reader. 2. Visualrelationships as portrayed by charts and graphs are more easily grasped and more easily remembered. 3. The use of charts and graphs saves time, because the essential meaning of large measures statistical data can visualized a glance (like Chambers of be at and his colleagues, Schmid may perhaps be accused prone to a little of being exaggeration here). 4. Charts and graphs provide a comprehensive picture of a problem that makes for a more complete and better balanced understanding than could be d from tabularor textual formsof presentation. 5. Charts and graphs can bring out hidden facts and relationships and can stimulate, as well as aid, analytical thinking and investigation.

John Schmid's last point is reiterated by the late m e y in his observation that "the greatest value a picture when it forcesus to notice what we never expected of is to see:' During the past two decades, a wide variety of new methods for displaying data have been developed with the aim of making this particular aspect of the examination of data as informative as possible. Graphical techniques have evolved that will provide an overview, hunt for special effectsin data, indicate outliers, identify patterns, diagnose (and criticize) models, generally searchfor novel and and unexpected phenomena.Good graphics will tellus a convincing story about will the data. Large numbers graphs may be required and computers generally of be needed to draw them for the same reasons that they are used for numerical analysis, namely that theyare fast and accurate. This chapter largely concerns the graphical methods most relevant in the initial phase of data analysis. Graphical techniques useful for diagnosing models and be interpreting results will dealt within later chapters.

2.2. POP CHARTS


Newspapers, television, and the media in general fond of two very simple are very and graphical displays, namely pie char% the barchart.Both can be illustrated by the of using the data shownTable 2.1, which showthe percentage people convicted in

G W H l C A L METHODS OF DISPLAYING DATA


TABLE 2.1 Crime Ratesfor Drinkers and Abstainers

23

Drinkers

Crime

Arson
Rape Vtolence
S1ealing Fraud

6.6 11.7 20.6 50.3 10.8

6.4 9.2 16.3 44.6 23.5

Drinkers

Abstainers

FIG. 2. I . Pie charts for drinker and abstainer crime percentages.

of five different typesof crime. In the pie charts for drinkers and abstainers (see Figure 2.1), the sections of the circle have areas proportional to the observed percentages. In the corresponding bar charts (see Figure 2.2), percentages are represented by rectangles of appropriate size placed along a horizontal axis. Despite their widespread popularity, both the general and scientificof pie use (1983) comments that charts has been severely criticized. For example, Tufte tables are preferable to graphics for small data sets. many A table is nearly always better than adumb pie chart; the only worse design than a pie chart is several of them . . charts should never be used. A similar lack of affection is shown . pie ae r by Bertin(1981). who declares that pie charts completely useless, and more of recently by Wainer (1997), who claims that pie charts are the least useful all graphical forms. An alternative display that always more useful than the pie chart often is (and preferable to a bar chart) dotplot. To illustrate, we first use an example from is the 2.4 Cleveland (1994). Figure 2.3shows apie chart of 10 percentages. Figure shows

24

CHAPTER 2

Drinkers

Abstainers

Arson

Rope

VlOience Stealing

Fraud

Arson

Rope

Violence Stealing

Fraud

FIG. 2.2.

Bar charts fordrinker and abstainer crime percentages.

Pie chart for IO percentages. (Reproduced withpermission from Cleveland. 1994.)


FIG. 2.3.

GRAPHICAL METHODS

OF DISPLAYING DATA

25

the alternative dot plot representation the same values. Pattern perception is far of more efficient for the dot plot than for the pie chart. In the former it is far easier all to see a number of properties of the data that are either not apparent at in the pie chart or only barely noticeable.First the percentages have a distribution with two modes (a bimodal distribution); odd-numbered bands lie around the value 8% and even-numbered bands around 12%. Furthermore, the shapeof the pattern for the odd values as the band number increases is the same as the shape for the even values; each even value shifted with respect to the preceding odd by is value approximately 4%. Dot plots for the crime data in Table 2.1 (see Figure 2.5) are also more informative than the pie charts in Figure 2.1, but a more exciting application of the dot plot is provided i Carr (1998). The diagram, shown in Figure 2.6, gives a n particular contrastof brain mass and body mass for 62 species of animal. Labels and dots are grouped into small units of five to reduce the chances of matching error. In addition,the grouping encourages informative interpretationthe graph. of Also note the thought-provoking title.

2.3. HISTOGRAMS, STEM-AND-LEAF PLOTS, AND BOX PLOTS


The data given i Table 2.2 shown the heights and ages of both couples a sample n in of 100 married couples. Assessing general features of the data is difficult with the data tabulated inthis way, and identifying any potentially interesting patterns is virtually impossible. A number of graphical displaysof the data can help. For example, simple histograms the heightsof husbands, the heightsof wives, and of

Drinkers
l
I
I

0
Violence

Rape

Fad ru

Anwn

Abstainers

ages.
26

FIG. 2.5.

D t p o sfor (a)drinker and (b)abstainer crime percento lt

Intelligence?
Human Rnesus Monkey Baboon Chirnoanzee Othonkey w l Patas Monkey Aslan Elephant Vewet Arctlc Fox Red FOX Ground Squirrel Gray Seal Atrican Elephant Rock Hyrax:H. Brucci Raccoon Galago Genet Donkey Goat Okapi Mole Ral Sneep Echidna Gorilla Cat Chlnchilla Tree Shrew Gray Wolf Gfratle Horse stow Loris Rock Hyrax:P.Habessinica Phalanager Tree Hyrax Jaguar cow Mountain Beaver Earlern American Mole Yettow.Bellled Marmol Alrlcan Giant Pouched Rat Rabbit Star Nosed Mole Arctic Ground Squirrel Brazilian Tapir Pig Little Brown Bat Gubnea Pig Giant Armadillo Kangaroo Mouse Lesser Shoot.Tailed Shrew Nine-BandedArmadlllo

I
I
f i
.1.5

Rat

Big Brown Bat Desert Hedgehog Tenrec Musk Shrew Water Opossum

.l

LogIO(Brain Mass). 2/3 LoglO(BodyMass)

.o

.0.5

FIG. 2.6. Dot p o with lt

from Carr,

1998).

positional linking (taken with permission


27

28
Heights and Ages of Married Couples

CHAPTER 2
TABLE 2.2

49 25 40 52 58 32 43 47 31 26 40 35 35 35 47 38 33 32 38 29 59 26 50 49 42 33 27 57 34 28 37 56 27 36 31 57 55 47 64 31

1809 1841 1659 1779 1616 1695 1730 1740 1685 1735 1713 1736 1799 1785 1758 1729 1720 1810 1725 1683 1585 1684 1674 1724 1630 1855 1700 1765 1700 1721 1829 1710 1745 1698 1853 1610 1680 1809 1580 1585

43 28 30 57 52 27 52 43 2 3 25 39 32 35 33 43 35 32 30 40 29 55 25 45 44 40 31 25 51 31 25 35 55 23 35 28 52 53 43 61 23

1590 1560 1620 1540 1420 1660 1610 1580 1610 1590 1610 1700 1680 1680 1630 1570 1720 1740 1600 1600 1550 1540 1640 1640 1630 1560 1580 1570 1590 1650 1670 1600 1610 1610 1670 1510 1520 1620 1530 1570

25

19 38 26 30 23 33 26 26 23 23 31 19 24 2 4 27 28 22 31 25 23 18 2 5 27 28 22 21 32 28 2 3 22

44

25 22 20 25 21 25 21 28

(Continued)

GRAPHICAL METHODS OF DISPLAYING DATA


TABLE 2.2
(Continued)
~~

29

Wife Husband's Husband's Age (years)

Height (mm)

' s Age (yeam)

Wifeet Height (mm)

Husband's Age at First Mardage


~

35 36 40 30 32 20 45 59 43 29 47 43 54 61 27 27 32 54 37 55 36 32 57 51 50 32 54 34 45 64 55 27 55 41 44 22 30 53 42 31

1705 1675 1735 1686 1768 1754 1739 1699 1740 1731 1755 1713 1723 1783 1749 1710 1724 1620 1764 1791 1795 1738 1666 1745 1775 1669 1700 1804 1700 1664 1753 1788 1680 1715 1755 1764 1793 1731 1713
1825

35 35 39 24 29 21 39 52 52 26 48 42 50 64 26 32 31 53 39 45 33 32 55 52 50 32 54 32 41 61 43 28 51 41 41 21 28 47 31 28

1580 1590 1670 1630 1510 1660 1610 1440 1570 1670 1730 1590 1600 1490 1660 1580 1500 1640 1650 1620 1550 1640 1560 1570 1550 1600 1660 1640 1670 1560 1760 1640 10 60 1550 1570 1590 1650 1690 1580 1590

25 22 23 27 21 19 25 27 25 24 21

20

23 26 20 24 31 20 21 29 30 25 24 24 22 20 20 22 27 24 31 23 26 22 24 21 29 31 23 28

(Continued)

30
TABLE 2 . 2
(Continued)

CHAPTER 2

Husbands Age (years)

Husbands Height (mm)

MfGS

Ase (yeam)

Wife> Height (mm)

Husbands Age at First Marriage

36 56 46 34 55 44 45 48 44 59 64 34 37 54 49 63 48
64

33 52

1725 1828 1735 1760 1685 1685 1559 1705 1723 1700 1660 1681 1803 1866 1884 1705 1780 1801 1795 1669

35 55 45 34 51 39 35 45 44 47 57 33 38 59

1510 160 1660 1700 1530 1490 1580 1500

1M)o

46

60 47 55 45 47

1570 1620 1410 1560 1590 1710 1580 1690 1610 1660 1610

26 30 22 23 34 27 34 28 41 39 32 22 23 49 25 27 22 37 17 23

the difference husband and wife height may be a good way understand in to to begin the data. The three histograms are shownin Figures 2.7 and 2.8. All the height distributions are seen to be roughly symmetrical and bell shaped, perhaps roughly normal? Husbands tend to be taller than their wives, a finding that simply refle that men are on average taller than women, although there are a few couples in of which the wife is taller; see the negative part the x axis of the height difference histogram. The histogramis generally usedfor two purposes: counting and displaying the distribution of a variable. Accordingto Wilkinson (1992), however, it is effecbe tive for neither. Histograms can often misleading for displaying distributions because of their dependence on the number of classes chosen. Simple talliesof the observationsare usually preferable for counting, paaicularlywhen shown in the formof a stem-and-leafplot, as described in Display Such a plot has the 2.1. advantage of giving an impression the shapeof a variables distribution, while of retaining the values of the individual observations. Stem-and-leaf plots of the 2.9. heights of husbands and wives and the height differences shown in Figure are

GRAPHICAL METHODS OF DISPLAYING DATA

31

1400

" l

1600

1700

1800

1SW

Husband's heiphla (mm)

F?
1400
1500
1600

1703

1800

1900

Wife's heights (mm)

FIG. 2.7. Hlstograms of heights of husbands and their wives.

-100

IW

2W

300

Height difference (mm)

FIG. 2.8. Histogram of height difference for

L O O married couples,

32
Display 2.1 Stem-and-Leaf Plots

CHAPTER 2

choosing asuitable pair of adjacent digits in theheights data, the tens digit, andthe units digit. Next, split each value betweenthe two digits. For example, the value data 98 would besplit as follows: Datavalue 98 Split 9/8 stem

T construct the simplest form of stem-and-leaf display of a set of observations, begin o by

and and

leaf 8

Then a separateline in thedisplay is allocated for each possiblestring of leading digits (the sremr). Finally, the trailing digit (the leaf) of each data value is written downon the line corresponding to its leading digit.

A further useful graphical displayof a variables distributional properties is the box plot. This is obtained from thefive-number summary of a data set, the minimum, the lower quartile, the median, the five numbers in question being the upper quartile,and the maximum. The constructionof a box plot is described in Display 2.2. The box plots of the heights husbands and their wives are in Figure 2.10. of shown In One unusually short man identified in the plot, and three very short women. is addition, one rather woman is present among the wives. tall

2.4. THE SlMPLE SCATTERPLOT


The simplexy scatterplot has been use since least the 19th in at Century. Despite its age, it remains, according n f t e (1983). to
the greatestof all graphical designs.It links at least two variables, encouraging and even imploring the viewer to assess the possible causal relationship between the x plotted variables. confronts causal It theories that causes y with empirical evidence as to theactual relationship between x and y.

Our first scatterplot, givenin Figure 2.1l(a), shows age husband against age of of wifefor the data in Table 2.2. might be expected, the plot indicates a strong As y l@), correlation for the two ages. Adding the line =x to the plot, see Figure 2.1 highlights that there are a greater number of couples in which the husband is older than his wife, than there those in which the reverse is true. Finally, in are Figure 2.11(c) the bivariate scatter of the two age variables is framed with the observations on each. Plotting marginal and joint distributions together in this way is usually good data analysis practice; a further possibility for achieving this goal is shown in Figure 2.12.

GRAPHICAL METHODS

OF DISPLAYING DATA

33

N = 100 Median = 1727 Quartiles = 1690.5, 1771.5


Decimal point is 2places to the right of the colon 15 : 6888 16 : 1223 16 : 666777778888888899 17 : 00000000001111112222222233333334444444 17 : 555556666677888899999 18 : 00001112334 18 : 5578

N = 100 Median = 1600 Quartiles = 1570, 1645


Decimal point is 2 places to the right of the colon

Low:

1410

14 : 24 14 : 99 15 : 0011123344 15 : 5555666667777777888888899999999 16 : 00000000111111112222333444444 16 : 555666666777778899 17 : 001234 17 : 6

N = 100 Median = 125 Quartiles = 79, 176.5


Decimal point is 2places to theright of the colon
-0 :

-1

:0

-0 : 32 0 : 22011333444 0 : 566666777778888999 1 : 0000011111222222222333333344444 1 : 55566666677788999999 2 : 0011233444 2 : 5667889

FIG. 2.9. Stemand-leaf p o s of heightsofhusbandsand lt and of height differences.

wives

34

CHAPTER 2
Display 2.2 Constructing a Box Plot
The plot is based on five-numbersummary of a data set: 1, minimum; 2, lower , quartile; 3 median; 4, upper quartile;5, maximum. The distance betweenthe upper and lower quartiles,the interquarrile range, a is a measure. of the spread of distribution thatis quick to compute and, unlike the range, is not badly affected by outliers. The median, upper, and lower quartiles can beused to define rather arbitrary butstill useful lmt ,L and U,t help identify possible outliers the data: i is o in U L

= UQ + 1.5 x IQR, = LQ - X IQR. 1.5

Where UQ is the upper quartile, is the lower quartile, and IQR interquartile LQ the

range, U - Q QL . Observations outsidethe limits L andU are regarded as potential outliers and identified separatelyon thebox plot (and known as outside values),which is constructed as follows: T construct a box plot, a "box" with ends at the lower and upper quartilesis fmt o drawn. A horizontal line (or some other feature) is used toindicate the position of the median in the Next, lines are drawn from each end of the box most box. to the remote observations that,however, are no?outside observations as defined in the minas the text. The resulting diagram schematically represents body of the data the extreme observations. Finally,the outside observationsare incorporated into the final diagramby representing them individually someway (lines, stars, etc.) in

Outside values
Upper adjacentvalue

c-

1 n
3-

Upper quartile

c . Median
. c Lower --

quartile

1-

.c --

Lower adjacent value

Outside value

GRAPHICAL METHODS

OF DISPLAYING DATA

35

1
I

Husbands

wives

FIG. 2. IO.

Box plots of heights of husbands and wives.

The age difference in married couples might be investigated by plotting i husbands age at marriage. The relevant scatterplotshown in Figure 2.13. The is points on the right-hand side the linex = 0 in Figure 2.13 represent couples in of in which which the husband older; those to the left, couples a husband is younger is than his wife. The diagram clearly illustrates the tendency of men marrying late life to choose partners considerably younger than themselves. Furthermore, there are far fewer couples which the wife is the older partner. in The relationship between the heights the married couples might also be of of interest. Figure2.14(a) shows a plotof height of husband against heightof wife. There is some indicationof a positive association, but not one that is particularly strong. Again adding the line = x is informative; see Figure y 2.14(b). There are few couples in which the wife taller than her husband. is Finally, we might be interested in examining whether is any evidence of an there age difference-height difference relationship. The relevant scattergram is shown in Figure 2.15. The majority of marriages involve couples in which the husband is both taller and older than his wife. only one of the 100 married couples the In is wife both older and taller than her husband.

2.5. THE SCATTERPLOT MATRIX


In a set data in which each observation involves more than variables (multiof two variate dura), viewing the scatterplotsof each pair of variables is often a useful way to begin to examine the data. The number of scatterplots, however, quickly becomes daunting-for 10 variables, for example, there are 45 plots to consider.

36

CHAPTER 2

20

30

40

50

60

20

30

40

50

60

Husband's age (years)


(a)
0

Husband's age (years)


(b)

ID

b
m
m

3: :
m

2 - ,3 5
-m

F i
20

30

40

50

60

Husband's age (years)


(0)

FIG. 2 . I l . Scatterplotsof (a)ages of husbands and wives in 100

married couples: (b) with line y = x added; (c)enhanced with ob. sewations on each variable.

Arranging the pairwise scatterplots in the form square grid, usually of a known as a draughtsmun'splof or scatfeplotmatrix, can help assessing all scatterplotsat in the same time. Formally, a scatterplot matrixdefined as a square symmetric of bivariate is grid scatterplots (Cleveland, 1994). This grid has p rows and p columns, each one corresponding to a different of thep variables observed. Eachthe grid's cells one of shows a scatterplot of two variables. Because the scatterplot matrix symmetric is about its diagonal,variablejis plotted against variable the ijth cell, and the same iin variables also appear cellji with thex and y axes of the scatterplot interchanged. in in The reasonfor including both the upper and lower trianglesthe matrix, despite

GRAPHICAL METHODS

OF DISPLAYING DATA

37

1700

le00
Husbands height
0

1e00

1w)0

8.

00
0 0

P-

Q, -

0
0

O 0 0

0 0 0
0

5la001750 1550 le00 1650 1850 1700

Husband's hm e

FIG. 2.12. Scatterplot of wife's height, against husband's height, showing marginal distributions of each variable.

FIG. 2.13. Age difference of married couplesplotted against h u s band's age at marriage.

38

CHAPTER 2
0 0 0 0

T P

c-

, 0 0

0 0

g B-,
g-

ooo
0

0.

F0

00 0 0
000

oo

00
0
0 0 0

0 0 0 ooooo 0 -m o 00 0 0

o ~ o 0

0. 0"

rooyo
O

eoo

00

0 -0

I
1550

0 0 0

1600

1850

17W

17H)

1BW

l850

Plotsof (a) husbancfsheightandwife'sheight married couples: (b) enhanced with the line y = x.
FIG. 2.14.

in

100

si -0

j
0 0 0 0

g
m

ii .
1
E m
. _

w0
0

" i o
6

i o

o
0 0
0 0

i e

0 0 0

0 0

0
0

:
0 0

0 0

i t

.................................................. ... ...... ...... .. . 0 ......................................................


0 :

F-

-10

-6

0
Age difference of cwple

10

FIG. 2.15.

Married couple's age difference and height difference.

the seeming redundancy,that it enables a row and column to is be visually scanned to see one variable against all others, the scalefor the one variable lined up with or along the horizontal the vertical. As our first illustration of a scatterplot matrix, Figure. 2.16 shows such an 2.2. arrangement for all the variablesin the married couples' data given in Table

GRAPHICAL METHODS

OF DISPLAYING DATA

39

FIG. 2.16.

couples.

Scatterplotmatrix of heights and ages of married

From this diagram the very strong relationship between age husband and age of of wife is apparent, as are the relative weaknesses the associations between of all other pairs of variables. For a second example the useof a scatterplot matrix, shall use the data of we in which shown in Table 2.3. These data arise from an experiment 20 subjects had their response times measured when a light was flashed each of their eyes into through lenses of powers 6/6,6/18,6/36, and 6/60. (A lens of power a/b means as that the eye will perceive being at a feet an object that is actually positioned so at b feet.) Measurements are in milliseconds, the data are multivariate,but of a very special kind-the same variable is measured under different conditions. repeated measuresand will the subject detailed be of Such data are usually labeled 5 consideration in Chapters and 7. The scatterplot matrix of the data in Table2.3 is showninFigure 2.17. The diagram shows that measurements under particular pairs of lens strengths are quite strongly related, for example, 616and 6/18, for both left and right eyes. In general, however, the associations are rather weak. The implications

40
TABLE W Visual Acuity and Lcns Strength

CHAPTER 2

L 4 Eye
subject
L6/6

Right Eye R6/60 RW18 LW60 R6/36 R6/6

W18

W36

11 12 13 14 15 16 17 18 19 20

10

2 3 4 5 6 7 8 9

l16

110

117 115 112 113 114 119


110

119 110115 118 116 114 115 119


110
120

116 114 120 120 94 105 123 119 118

124
120 113 118 116 118 123 124 120 121 118

12 106 115 114 105

100

119 117 116 118 120


110

118 118 110 112

118 119 112 120 120122


110 112

11s

122 120

14 2

112

105

110 105 122

112 105 120

114

110

120

1122

120 110

120 118
122
110

112 I0 1 120 120 112 120

115

110

123 l24

118 117

110 105

100

117 112 120 116 117 99 105 110 115 120 119 123 118 120 119 I0 1 I IO 120 118 105

114 110 120 116 116 94 115 120 120 118 123 116 122 118

122

110

124 119

1s 1

108

l15
118 118
106

97 115 111 114 119 122 124 122 124 124 120 105 120

112

122

110

Note. Taken with permission from Crowder and Hand( 9 0. 19)

of this pattern of associations for the analysis of these data will be taken up in Chapter 5.

2.6. ENHANCING SCATTERPLOTS


The basic scatterplot can accommodate two variables, but thereare ways in only which it can enhanced to display further variable values. The possibilities can be of be illustrated with a numberexamples.

2.6.1. Ice Cream Sales


The data shownin Table 2.4 give the ice cream consumption over thirty 4-week in in periods, the price of ice cream each period, and the mean temperature each o period. T display the values ofall three variables on the same diagram, we use

GRAPHICAL METHODS

OF DISPLAYING DATA

4 1

FIG. 2.17.

Scatterplot matrix of visual acuity data.

what is generally known as a bubble plot. Here two variables are used to form a scatterplotin the usual way and then values of a third variable are represented by circles with radii proportional to the values and centered on the appropriate point in the scatterplot. The bubble plot for the ice cream data is shown in Figure 2.18. The diagram illustrates that price and temperature are largely unrelated and, of more interest, demonstrates that consumption remains largely constant as price varies and temperature varies below50F. Above this temperature, sales of ice cream increasewith temperature but remain largely independent of price. The maximum consumption corresponds to the lowest price and highest temperature. One slightly odd observation is indicated in the plot; this corresponds to a month with low consumption, despite low price and moderate temperature.

2.6.2.

Height, Weight, Sex, and Pulse Rate

As a further example the enhancement scatterplots, Figure shows a plot of of 2.19 of the height and weight of a sample of individuals with their gender indicated. Also included in the diagram are circles centered on each plotted point; the radii

42
Ice Cream Data

CHAPTER 2
TABLE2.4

Period

Y(Pints per Capifa)

X I(Price per Pint)


0.270 0.282 0.277 0.280 0.280 0.262 0.275 0.267 0.265 0.277 0.282 0.270 0.272 0.287 0.277 0.287 0.280 0.277 0.277 0.277 0.292 0.287 0.277 0.285 0.282 0.265 0.265 0.265 0.268 0.260

X2 (Mean Temperatufz, "F)


41 56 63 68 69 65 61 47 32 2 4 28 26 32
40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 2 5 26 27 28 29 30

0.386 0.374 0.393 0.425 0.406 0.344 0.327 0.288 0.269 0.256 0.286 0.298 0.329 0.318 0.381 0.381 0.470 0.443 0.386 0.342 0.319 0.307 0.284 0.326 0.309 0.359 0.376 0.416 0.437 0.548

55 63 72 72 67

60
44

40 32 27 28 33 41 52

64

71

of these circles represent the value of the changein the pulse rate of the subject after running on the spot 1 minute. Therefore, in essence, Figure2.19 records for the values of four variables for each individual: height, weight, changes in pulse rate, and gender. Information was also available on whether or not each person in the sample smoked, and this extra information is included in Figure 2.20, as described in the caption. Figure is very usefulfor obtaining an overall picture 2.20 of the data. The increase in pulse rate after exercise greaterfor men than is clearly or not for women, butit appears to be unrelated to whether an individual smokes.

GRAPHICAL METHODS

OF DlSPLAYlNG DATA

43

8
0 B@
Odd observation
I I

p
I I

30

40

50

60

70

Mean temperature (F)


FIG.2. IS. Bubble p o of ice cream data. lt

There is perhaps a relatively weak indication of a larger increase in pulse rate among heavier women, butno such relationship is apparent for men. 0men in 'h the sample, one a smoker and one a nonsmoker, show a decrease in pulse rate exercise.
..

2.6.3.

University Admission Rates

Bickel, Hammel, and OConnell (1975) analyzed the relationship between admission rate and the proportion of women applying to the various academic departments at the University of California at Berkeley. The scatterplot of percentag ofwomenapplicantsagainstpercentageofapplicantsadmitted ii shown in Figure 2.21; boxes, the sizes of which indicate the relative number of applicants,

44

180

e
E m ._

160

140

0
SO

&
0
F

CHAPTER 2

0 .

120

$S
Height (in.)

70

FIG. 2.19. Scatterplot of heightandweight for male (M) and female (F) participants: the radii of the circles representa change in

pulse rate; I Indicates a decrease In pulse rate.

enhance the plot. The negative correlation indicated by the scatterplot is duealmost exclusively to a trend for the large departments. If only a simple scatterplot had been used here, vital information about the relationship would have been lost.

2.7. COPLOTS AND TRELLIS GRAPHICS

The conditioning plot or coplot is a particularly powerful visualisation tool for studying how two variables are related, conditional on one more other variables or are beiig held constant. There several varieties of conditioning plot; the differenc are largely a matter presentation rather than of real substance. of A simple coplot can be illustrated on the data from married couples given in Table 2.2. We shall plot the wife's height against the husband's height condition 2.22. on the husband's age. The resulting plot is shown in Figure In this diagram

GRAPHICAL METHODS OF DISPLAYING DATA

45

0
so
65

Height (in.)

70

75

nG. 2.20. scatterplot o heightandweight with additionalinforf mationabout sex, smoking status, and change in pulse rate: M. malesmoker: F, femalesmoker; m, malenonsmoker; f. female nonsmoker: I indicates a decrease in p l erate. us
the panel at the topthe figure known as the given panel; panels below are of is the dependencepanels. Each rectangle in the given panel specifies a range of values of the husbands age. On corresponding dependence panel, the husbands height a is plotted against the wifes height, those couples which the husbands age for in is in the appropriate interval. For age intervals to be matched to dependence panels, the latter are examinedin order from left to right in the bottom row and again from left to right subsequent rows. Here the coplot implies that the relationship in between the two height measures is roughly the same for each of the six age intervals. Coplots are a paaicular example of a more general form graphical display of known as trellis graphics, whereby any graphical display, not simply a scatterplot, of 2.23 can be shown for particular intervals some variable. For example, Figure shows a bubble plot height of husband, height wife, and of wife (circles), of of age conditional on husbands age for the in Table 2.2. We leave interpretationof data this plot to the reader!

46

CHAPTER 2

r
90

0 Number ofapplicantsS 40

D
D

0
0

L"

1 ~

Percentage women applicants

Scatterplot of the percentage of female applicants versus percentage of applicants admitted for 85 departments at the University of California at Berkeley. (Reproduced with permission from Bickel et al.. 1975.)
FIG. 2.2 1.

2.8. PROBABILITY PLOTTING

Many of the statistical methods to be described in later chapters are based on of of of assuming either the normalitythe rawdata or the normalitysome aspect a is statistics fitted model. As readers will (it hoped) recall from their introductory course, normality implies that the observations in question arise from a bell probability distribution, defined explicitly the formula by

a2 where p is the mean of the distribution and is its variance. Three examples of normal distributions are shown in Figure 2.24.

GRAPHICAL METHODS OF DISPLAYING DATA Given :Husbands's age (years)


20
I

47

30

40

....................l).l).l)........,.,,..,. . . . . . . . . .L. . . . . . . ........................... ~ ..,,.,,. . . . . . . . . . . . . . . . . . .......................... . ,,.,,., .~


e

"

00

S
0 0
00
0

1650

1550

1650

Husband's height (mm)

1550 16501650

1750

FIG. 2.22. Coplot

husbands age.

of heights of husband and wife conditional o n

There are avariety ways which the normality assumption might of in be checked. Here we concentrate on the n o m l probability plotting approach, in which the observations are plotted against a set of suitably chosen percentiles of the standard n o m l distribution, that is, a normal distribution with p = 0 and o = 1. Under the assumptionof normality, this plot should approximate a straight line. Skewed populations are indicated by concave (left skew) and convex (right skew) plots. (For those readers who would like the full mathematical details behind such plots, see Display 2.3.) As a first illustration of this approach, Figure 2.25(a) shows a normal probability plot for a set of 100 observations generated from a normal distribution, and Figure 2.25(b) shows the corresponding plot for a set of 100 observations generated from exponential distribution, distribution with a high an a degree of skewness. In the first plot the points more or less lie on the required is straight line; the second, there considerable departure from linearity. on of Now let us look at some probability plotsreal data, and here we shall use the data on married couples given in Table 2.2. Probability plots all five variables are of shown in Figure 2.26. Both height variables appear to have normal distributions, of normality. but allthree age variables show some evidence a departure from

48
l500
1600
1700

CHAPTER 2
1800

1900

0
1600

f
0
1600.

1400

0
1400, 1500

0
0

l600

1700

1800

1900 hh

FIG. 2.23. Bubble p o ofheightofhusband,heightofwife,and lt age of wife (circles), conditional on age of husband.

2.9. GRAPHICAL DECEPTIONS AND GRAPHICAL DISASTERS

In general, graphical displays of the kind described in previous sections are extremely useful in the examinationof data; indeed, they are almost essential both in the initial phase data exploration and in the interpretation of the results from of as more formal statistical procedures,will be seen in later chapters. Unfortunately it is relatively easy to mislead the unwary with graphical material, and not all graphical displaysare as honest as they should be! For example, consider the plot of the death rate per million from cancerof the breast, for several periods over the past three decades, shown in Figure 2.27. The rate appears toshow a rather alarming increase. However, when the data are replotted with the vertical scale beginning at zero, shown in Figure 2.28, the increase in the breast cancer death as rate is altogether less startling. This example illustratesthat undue exaggeration or compressionof the scales is best avoided when oneis drawing graphs (unless, of course, you are actually in the business deceiving your audience). of A very common distortion introduced into the graphics most popular with newspapers, television, and the media in general is when both dimensions of a two-dimensionalfigure or icon are varied simultaneously in response to changes

GRAPHICAL METHODS

OF DISPLAYING DATA

49

Normal Probability Plotting


defined as

Display 2 3

A normal probability plot involves plotting the n-ordered sample values, y(l), 5 y(2), . .5 y(",. against the quantiles of a standard normal distribution, .

where usually
pi =

i0.5 n

If the data arise f o a normal distribution, the plot should approximatelylinear. rm be

"" ".

mean=5,sd=l mean=lO,sd=5 mean=2.sd=0.5

10
X

15

20

FIG. 2.24.

Normal distributions.

50

CHAITER 2

-2

-l

-2

-1

Quantiles of Standard Normal (3

Ouantlles of Standard Normal

(b)

FIG. 2.25. Normal probability plots of (a)normal data, (b) data

from an exponential distribution.

51

52

CHAPTER 2

FIG. 2.27. Death rates from cancer of the breast where they axis does not include the origin.

1955 l a 1965 1970 1975 1980 e0

FIG. 2.28. Death rates from cancer of the breast where they axis does include the origin.

in a single variable. The examples showni Figure 2.29, both taken from Tufte n distortion with what he calls the (1983), illustrate this point. Tufte quantifies the lie factor of a graphical display, whichdefined as is

lie factor =

size of effect shown in graphic size of effect indata

(2.2)

GRAPHICAL METHODS OF DISPLAYING DATA

53

I IN THE BARREL...
light ciude. leaving Saudi Arabia on Jan 1

THE SHRINKING FAMILYDOCTOR


in Calnornia

$14.55

FIG. 2.29.

Graphics exhibiting lie factors of (a)9.4 and (b)2.8.

Lie factor values close to unity show that the graphic is probably representing the underlying numbers reasonably accurately. The lie factor oil barrels in for the Figure 2.29 is 9.4 because a 454% increase is depictedas 4280%. The lie factor for the shrinking doctors 2.8. is A furtherexamplegivenbyCleveland (1994), andreproducedhere in Figure 2.30, demonstrates that even the manner in which a simple scatterplotis drawn can lead to misperceptions about data. The example concerns the way in which judgment about the correlationof two variables made on the basis of looking at their scatterplot can be distorted by enlarging the area in which the of points are plotted. The coefficient correlation in the right-hand diagram in Figure 2.30 appears greater. Some suggestions avoiding graphical distortion taken from Tufte are for (1983) as follows.
1. The representation numbers, as physically measured on the surface of the of graphic itself, should be directly proportional to the numerical quantities represented. 2. Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data.

54

CHAPTER 2

2-

l0-

-lf

-\ .I..:;.p** . . . L. * .. . .. ... . . .* .
1

..., ..S* .. .. . .. .. . . . .... .. . ...*.c... . .. . . . .C!.


v,. : *
B ; : .
*a.

a .

-2

.. : -.
a .
*

. 1

4 5

FIG. 2.30.

Misjudgment o size o correlation caused by enlarging f f

the plotting area.

3. T be truthful and revealing, data graphics must bear on the heart of quano

titative thinking: compared to what? Graphics must not quote data out of context. 4. Above all else,show the data.

Of course, not all poor graphics are deliberately meant to deceive. They are just not as informative as they might be. An example is shown in Figure 2.31, which originally appeared Vetter (1980); its aim is to display the percentages in of degrees awarded women in several to disciplinesof science and technology during three time periods.At first glance the labels suggest that the graph a standard is divided bar chart with the length the bottom division each bar showing the of of percentage for doctorates, the length middle division showing the percentage of the for masters degrees, and the top division showing the percentage bachelors for is degrees. In fact, a little reflection shows that thisnot correct, becauseit would is imply that in most cases the percentage bachelors degrees given to women of generally lower than the percentage of doctorates. A closer examination of the diagram reveals that thethree values of the data for each discipline during each time periodare determined by the three adjacent vertical dotted lines. The top end of the left-handline indicates the valuefor doctorates, the top end the middle of line indicates the for masters degrees, and top end the right-hand value the of line indicates the position bachelors degrees. for Cleveland (1994) discusses other problems with diagram i Figure 2.31and the n also points out that manner of constructionmakes it hard to connect visually the its three values of a particular type of degree a particular discipline, that to see for is, change through time. Figure shows the same data in Figure 2.31, replotted 2.32 as by Cleveland in a bid to achieve greater clarity. Itis now clear how the data are

GRAPHICAL METHODS

OF DISPLAYING DATA

55

h
engineering

and

sciences

sciences

sciences

..

sciences

Agrkunurfil sciences

: : : : :: .:: .. . .:. ..

.: .: .y: ... ..

sciences

FIG. 2.31. Proportion of degrees in science and engineering earned by women in the periods 1959-1960. 1969-1970. and 1976-1977. (Reproducedwith permission from Vetter. 1980.)

represented, and the design allows viewers to easily see the values corresponding to each degree in each discipline through time. Finally, the legend explains figure the graphsin a comprehensive and clear fashion. All all, Cleveland appears to in have produced a plot that would satisfy even that doyen of graphical presentation, Edward R. Tufte, in his demand that excellence statistical graphics consists in of complex ideas communicated with clarity, precision and efficiency. Being misled by graphical displays is usually a sobering but not a life-threatening experience. However, Cleveland (1994)gives an example which using the wrong in graph contributed to a major disaster the American space program, namely the in explosion of the Challenger space shuttle and the deaths of the seven people on board. To assess the suggestion that low temperature might affect the performance of the O-rings that sealed the joints of the rocket motor, engineers studied the graph of the data shown in Figure 2.33. Each data point was from a shuttle flightin which the O-rings had experienced thermal distress. The horizontal axis shows the O-ring temperature, and the vertical scale shows the number of O-rings that had experienced thermal distress. On the basis of these data, Challenger was allowed to take off when the temperature was3loF, with tragic consequences.

56

CHAPTER 2

f
FIG. 2.32. Percentage of degrees earned women for three deby grees (bachelors degree, masters degree, and doctorate), three

points foreach time periods. andninedisciplines.Thethree 1959-1960. discipline each and degree indicate periods the 1969-1 970, and 1976-1 977. (Reproduced with permission from Cleveland, 1994.)
0

60 70 Calculated jolnt temperature (F)

80

FIG. 2.33. Dataplotted by space shuttle enginners the evening before the Choknger accident to determine the dependence of O-ring failure on temperature.

GRAPHICAL METHODS OF DISPLAYING DATA

57

Calculated joint temperature (OF)

M)

70

Bo

FIG. 2.34. The complete s e t of (rring data.

The data for no failures are not plotted in Figure 2.33. The engineers into volved believed that these data were irrelevant the issue of dependence. They 2.32, which includes all the were mistaken, as shown by the plot in Figure does emerge, and a dependence of failure on temperature is data. Here a pattern revealed. To end the chapter on a less sombre note, and to show that misperception andmiscommunication are certainlynotconfined to statisticalgraphics,see Figure 2.35.

2.10. SUMMARY
2. In some case a graphical analysis mayallthat is required (or merited). be
3 Stem-and-leaf plots are usually more informative than histograms for dis.

1. Graphical displays are an essential feature analysis empirical data. in the of

playing frequency distributions. 4. Box plots display much more information aboutsets and are very useful data for comparing groups. In addition, they are useful for identifying possible outliers. 5. Scatterplots are the fundamental tool for examining relationships between variables. They can be enhanced in a variety of to provide extra inforways mation.

58

CHAPTER 2

FIG.2.35. Misperception and miscommunication are sometimes a way of life. ( The New Yorker collection I96 I Charles E. Martin 0 l from cartoonbank.com.Al Rights Reserved.)

6. Scatterplot matrices are a useful first in examining data with more than step two variables. 7. Beware graphical deception!

SOFTWARE HINTS

SPSS
Pie charts, bar charts, the l k are easily constructedf o the Graph menu. and ie rm You enter the data you want to use in the chart, select the type of chart you want from the Graph menu, define how the chart should appear, and then click OK. For example, the first steps in producing a simple bar chart would be as follows.
1, Enter the data want to use to create the chart. you 2. Click Graph, then click When you do this you will see the Charts Bar. Bar

dialog box appear and the required chart can constructed. be 3. Once the initial chart has been created, it can be edited and refined very and then making required the simply by double-clicking chart the changes.

GRAPHICAL METHODS

OF DISPLAYING DATA

59

S-PLUS In S-PLUS, there is a powerful graphical users interface whereby a huge variety of

graphs can be plotted by using either the Graph menu or the two-dimensional (2D For or three-dimensional (3D) graphics palettes. example, the necessary steps to as produce a scatterplot are follows.

1. Enter the data want to useto create the scatterplot. you 2. Click Graph; then click 2D P o . lt 3. Choose P o Type, S a t r l t x y , 2 . .. lt c t e P o ( , l y , .) 4. In the LindScatterPlot dialog box that now appears, choose the relevant
x data setand indicate the and y variables for the plot.

Alternatively, when using the command line language, functions such as plot, pairs, hist, pie, boxplot, dotplot many, many others beused to produce and can graphical material. For example, if the vector drink contains the values of the percentage of crime rates for drinkers (see Table 2.1). the following command produces a pie chart:
pie(drink,density=-lO,names=c(amon, Rape,uViolence, Stealing, fraud"))

[Detailed information about the function, orany of the other functions menpie help(pie).] tioned above,is available by using the help command; for example, Coplots and trellis graphics are currently only routinely available in S-PLUS. They can be implemented use ofthe coplot function, or using the 2D 3D by by or graphical palettes with the conditioning mode on. Details are given in Everitt and Rabe-Hesketh (2001).

EXERCISES
21 According to Cleveland (1994), The histogram .. is a widely used graphical method that is at least a century old. maturity and ubiquitydo not guarantee But the efficiency of a tool. The histogram is a poor method. Do you agree with Cleveland? Give your reasons.
22 Shortly after metric units length were officially introduced in Australia, .. of each of a groupof 44 students was asked to guess, to the nearest meter, the width of the lecture hall which they were sitting. Another group 69 students in the in of same room were asked to guess the width in feet, to the nearest foot. (The true ft). width of the hall was 13.1 m or 43.0

60
Guesses in meters: 8 10 10 10 9101010 1314 13 14 14 15 15 15 15 15 16 17 17 17 18 18 20 17 Guesses in feet: 24 30 30 30 30 27 25 30 3230 35 35 36 36 36 4037 4037 40 40 40 41 41 42 42 42 43 43 42 45 45 45 46 46 48 48 47 70 75 94 55 55 63 80 60 60
11 1211 13 11 12 15 2238 25 40 27 35
11

CHAPTER 2

1516 15 16

50

44 50

32 3433 34 34 40 40 44 44 45 50 51 54 54 54

40 45

40 45

Construct suitable graphical displays both sets guesses to aid throwing for of in light on which set more accurate. is
2 . Figure 2.36 shows the traffic deaths in a particular area before and after 3 stricter enforcement the speed limit the police.Does the graph convince you of by that the efforts of the police have had the desired effect of reducing road traff~c deaths? If not, why not? 2.4. Shown in Table 2.5 are the values of seven variables for 10 states in the USA.

Information about 10 States in the USA

TABLE 2 3

Variable
State

Alabama

California Iowa Mississippi New Hampshire Ohio Oregon Pennsylvania South Dakota Vermont

3615 21198 2861 2341 812 10735 2284 11860 681 472

3624 5114 4628 3098 4281 4561 4660

4449
4167 3907

2.115.1 1.1 10.3 0.5 2.3 2.412.5 0.7 3.3 0.8 0.6 16.1 0.5 0.6

.o

69.05 71.71 72.56 68.09 71.23 70.82 72.13 70.43 72.08 71.64

7.4 4.2 1.7 5.5

41.3 62.6 59.0 41.0 57.6 53.2 60.0 50.2 52.3 57.1

20 20 140 50 174 124 44 126 172 168

Note. Variables anas follows: 1, population (xIOOO); 2.average per capita income) 3.illiteracy (; $ @er rate (96population);4. life expectancy (years);5, homicide rate IOOO): 6,percentage of high school graduates; 7, average numberof days per year below freezing.

GRAPHICAL METHODS

OF DISPLAYING DATA

61

enforcement

300.

275 L

After stricter enforcement


1955
1056
I

FIG. 2.36. Irafflc deathsbeforeandaftertheintroduction stricter enforcement of the speed limit.

of a

Mortality Rates per 1oO~ooO r m Male Suicides fo

TABLE 2 6

Age Group

Country

25-34 45-54

35-44

55-64

65-74

Canada Israel Japan


Austria

France GelIllany Italy Netherlands Poland Spain Sweden Switzerland UK


USA Hungary

22 9 22 29 16 28 48 7 8 26 4 28 22 10 20

27 19 19 40 25 35 65 8 11 29 7 41 34 13 22

31 10 21 52 36 41 84 11 18 36 10 46 41 15 28

34 14 31 53 47 59 81 18 20 32 16 51 50 17 33

2 4 27 49 69 56 52 107 27 28 28 22 35 41 22 37

62

CHAPTER 2

1. Construct a scatterplot matrix of the data,labeling the points by state name. 2. Construct a coplot of life expectancy and homicide rate conditional on the average per capita income.

2.5. Mortality rates per 1 0 O from male suicides for a 0, O O number of age groups and a number of countries are shown in Table 2.6. Construct side-by-side box plots for the data from different age groups, and comment on what the graphic says about the data.

Analysis o Variance I: f The One-way Design

3.1. INTRODUCTION
In a studyof fecundity of fruit flies, per diem fecundity (numbereggs laid per of 25 female per day the first 14 days of life) for females of each of three genetic for lines of the fruit Drosophila melanoguster was recorded. The results shown fly ae r in Table 3.1. The lines labeled and SS were selectively bred resistance and RS for for susceptibility to the pesticide,DDT, and the NS line as a nonselected control strain. Of interest here is whether the data give any evidenceof a difference in fecundity of the three strains. In this study, the effect of a single independent factor (genetic strain) on a response variable (per diem fecundity) of interest. The data arise from is is what generally known as a one-way design. The general question addressed in such a data set Do the populations giving rise to the different of the independent is, levels factor have different mean values? What statistical technique is appropriate for addressing this question?

3.2. STUDENT'S t TESTS


Most readers will recall Student's tests from their introductory statistics course t (thosewhosememoriesofstatisticaltopicsare just alittle shaky can give
63

64
TABLE 3.1 Fecundity of Fruitflies

CHAPTER 3

Resistant (RS)

Susceptible (SS)

Nonselected INS)

12.8 21.6 14.8 23.1 34.6 19.7 22.6 29.6 16.4 20.3 29.3 14.9 27.3 22.4 27.5 20.3 38.7 26.4 23.7 26.1 29.5 38.6 44.4 23.2 23.6

38.4 32.9 48.5 20.9 11.6 22.3 30.2 33.4 26.7 39.0 12.8 14.6 12.2 23.1 29.4 16.0 20.l 23.3 22.9 22.5 15.1 31.0 16.9 16.1 10.8

35.4 21.4 19.3 41.8 20.3 37.6 36.9 37.3 28.2 23.4 33.7 29.2 41.7 22.6 40.4 34.4 30.4 14.9 51.8 33.8 37.9 29.5 42.4 36.6 41.4

themselves a reminder by looking at the appropriate definition in the glossary in Appendix A). most commonly used form this test addresses the question The of of whether the means two distinct populations differ. Because interest about the of fruit fly data in Table 3.1 also involves the questionof differences in population of the pair means, is it possible that the straightforward application t test to each of strain means would provide the required answer to whether strain fecundities the two at differ? Sadly,this is an example of where putting and two together arrives the wrong answer. To explain why requires a little simple probability and algebra, and so the details are confined to Display 3.1. The results given there show that the consequences of applying a series o f t tests to a one-way design with a moderate number groups is very likely to be a claim of a significant difference, of even when, in reality, no such difference exists. Even with onlythree groups as in the fruit fly example, the separate t tests approach increases the nominal5%

ANALYSIS OF VARIANCE I: THE ONE-WAY DESIGN


The Problem with Using Multiple Tests t The null hypothesisof interest is
Display 3.1

65

H~:fili 2 = * = f i k . =f
Suppose the hypothesisis investigated by using a series of r tests, onefor each pair of means. The total numbero f t tests needed is N = k(k 1)/2. Suppose that each t test is performed at significance level , so that for each of the a

tests,

Pr (rejecting the equality the two means given that they equal) =a. of are
Consequently,

Pr (accepting the equality the two means when they are equal) = 1 -a. of
Therefore,

Pr (accepting equalityfor allN , t tests performed) = (1 a)N.

Hence, finally we have

For particular values of and for a = 0.05, this results leads to the following k numerical values:

Pr (rejecting the equality af leasr one pair ofmeans when H&, true) of is = 1 (1 alN(Psay).

k
2 3

N
6

4 10

1 3

45

1 (0.95) = 0.05 1 (0.95) = 0.14 1 (0.95) = 0.26 1 (0.95)45= 0.90

The probability of falsely rejecting the hypothesis quickly increases above the null of an nominal significance level .05. It is clear that such approach is very likely to lead to misleading conclusions. Investigators unwise enough to apply the procedure would be led to claim more statistically significant results than their data justified.

significance level by almost threefold. The messageclear: avoid multipler tests is like the plague! The appropriate methodanalysis for aone-way design one-way analysis of is the of variance.

3.3. ONE-WAYANALYSIS OF VARIANCE


The phrase analysis of variance was coined by arguably the most famous statisit tician of the twentieth century, Sir Ronald Aylmer Fisher, who defined as the separation of variance ascribable to one group of causesthe variance ascribfrom able to the othergroups! Stated another way, the analysis of variance (ANOVA)

66

CHAPTER 3

is a partitioning of the total variance in a set of data into a number of component parts, so that the relative contributions of identifiable sources of variation to the total variation in measured responses can be determined. But how does this separation or partitioningof variance help in assessing differences between means? For the one-way design, the answer question can be found by considering this to two sample variances, one which measures variation between the observations of within the groups, and the other measures the variation between the group mea If the populations corresponding to the various levels independent variable of the have the same mean for the response variable, both the sample variances describ if estimate thesame population value. However, the population means differ, variation between the sample means will be greater than that between observations within groups, and therefore the two sample variances will be estimating diferent population values. Consequently, testing for the equality of the group means requires a test the equalityof the two variances. The appropriate procedure.an of is F test. (This explanation how an F test for the equalityof two variances leads of to a test of the equalityof a set of means is not specific to the one-way design; it applies toall ANOVA designs, which should be remembered in later chapters where the explanation not repeated in detail.) is An alternative way of viewing the hypothesis test associated with the one-way design is that two alternative statistical models are being compared. In one the mean and standard deviation are the same in each population; in the other the means are different but the standard deviations are again the same. The F test assesses the plausibility of the first model. If the between group variability is p greater than expected (with, say, < .05), the second modelis to be preferred. A more formal account of both model andthe measures variation behind the of the F tests in a one-way ANOVA is given in Display 3.2. The data collected from a one-way design satisfy the following assumpto have F tions to make the test involved strictly valid.

1. The observations in each group come from a normal distribution. 2. The population variances each group are the same. of 3. The observations are independent one another. of

Taking the third of these assumptions first,is likely that in most experiments it and investigations the independence or otherwise observations will be clear of the to cut. When unrelated subjects are randomly assigned treatment groups, for example, then independence clearly holds. And when thesubjects are observed same under a number of different conditions, independence of observations is clearlyunlikely, a situation shall deal we with in Chapter 5. More problematic are situations in which the groups to be compared contain possibly related subjects, for example, pupilswithin different schools. Such data generally require special techniques (see such as multilevel modelingfor their analysis Goldstein, 1995, for details).

ANALYSIS OF VARIANCE I: THE ONE-WAY DESIGN


The One-way Analysis of Variance Model
Display 3.2

67

In a general sense, usual model considered a one-way designis that the the for

expected or subjects in particular group a come from population with a particular a average value for the response variable, with differences between the subjects within type usually referred to as a group beiig accounted for by some of random variation, "error." So the modelcan be written as

observed response= expected response error.

* The expected value in the population


Yij

givingrise to the observationsin the ith group for is assumed to be pi, leading to the following model the observations:

= Pi + G j ?

where yij represents the jth observation in theith group, and the represent E(] zero random error terms, assumedto be from a normal distribution with mean and variance c2. T h e hypothesis of the equality of population means now bewritten as can
Ho:w,=@2==".=

pk

= PP

leading to a new model for the observations, namely, Ytj = It Eij. There are some advantages (and, unfortunately, some disadvantages) in reformulating the model slightly, by modeling mean value for a particular the mean value of the response plus a specific population as the sum of the overall population or group effect. This leads to a linear model of the form

Yij=G++i+Eij,

where p represents the overall mean of the response variable,is the effecton an ai ., observation of being in theith group (i = 1,2, . . k), and againq j is a random error term, assumed be from a normal distribution with meanand variance to zero 02. When written in this way, the modeluses k 1 parameters (p, , ,a*,. . a . ,at)to describe onlyk group means. In technical termsthe model is said to be overparameterized, which causes problems becauseit is impossibleto find unique estimates for each parameter-it's a bit like trying to solve simultaneous equations when there are fewer equationsthan unknowns. * This feature theusual form ofANOVA models is discussed in detail in Maxwell of and Delaney (1990), and also briefly in Chapter 6 of this book. One way of overcoming the difficulty to introduce the constraint is ai = 0, so that at = -(a, a1 ak-1).The number of parametersis then reduced byone as required. (Other possible constraints discussed in the exercises Chapter 6.) are in If this modelis assumed, the hypothesis of the equality of population means can be rewritten in tern of the parameters ai as

+ + -+

Et,

Ho:a1=a2=..-=ak=O,
so that underH the model assumedfor the observationsis 0
Yij

= Y +E i j

as before.

(Continued)

68
Display 3.2 (Continued)

CHAPTER 3

The total variation inthe observations, that is, the sum of squares of deviations of

[xf=Ixy=,(y,j y..)2]. observations from overall mean the response variable the of can be partionedinto that dueto difference in group means, the behveen groups sum of squares n(p,, y,,)2],where n is the number of observations each in group, and thatdue to Merences among observations within groups, within groups sum the ofsquares C;=,(yij y i , ) 2 ] . mere we have writtenout the formulasfor the various sums of squares in mathematical terms by using conventional dot the notation, in which adot represents summation over a particular suffix. In future chaptersdealing with more complex ANOVA situations, weshall not botherto give the explicit formulasfor sums of squares.) These sums of squares can be convertedinto between groups and within groups variances by dividing them by their appropriate degreesfreedom (see following of are text). Underthe hypothesis ofthe equality of population means, both estimates of u2.Consequently, an F test ofthe equality ofthe two variancesis alsoa test of the equality of population means. The necessary termsfor the F test are usually arrangedin ananalysis o variance f fable as follows (N = kn is the total number of observations).

[x:=,[x;=, -

Source DF Between k groups 1 W~thingr~~p~ k N (error) Total N-l

SS MS BGSS BGSS/(k 1) WGSS WGSS/(N k)

MSRO MSBG/MSWG

Here, DF is degrees of freedom; is sum ofsquares; MS is mean square; BGSS is SS of between groupssum of squares; and WGSSis within group sum squares. If H is true (and the assumptions discussedin the textare valid), the mean square . ratio (MSR) has an F distribution withk 1 and N k degrees of freedom. Although we have assumed equal number of observations, in each group this an n, in group sizes are perfectly acceptable, although account, this is not necessary; unequal in the see the relevant comments made the Summary section of chapter.

The assumptions of normality and homogeneity of variance can be assessed by informal (graphical) methods and also by more formal approaches, as the such Schapiro-Wilk test normality (Schapiro and for W&, 1965) and Bartlens test for homogeneityof variance (Everitt, 1998).My recommendation to use the informal is in (which will be illustrated the next section) and to largely ignore the formal. The reason for this recommendation is that formal tests of assumptions normalof both of ity and homogeneity are little practical relevance, because the good news is tha are are even when the population variances a little unequal and the observations a little nonnormal, the usual test is unlikelyto mislead; the test is F robust against minor departuresf o both normality and homogeneity variance, particularly rm of of of are when the numbers observations in each the groups being compared equal or approximately equal. Consequently, the computed p values will notbe greatly

ANALYSIS OF VARIANCE I: THE ONE-WAY DESIGN

69

distorted, and inappropriate conclusions unlikely. Only if the departures from are either or both the normality homogeneity assumptions are extreme will there and be real cause for concern, and need to consider alternative procedures. Such gross a be of the departures will generallyvisible from appropriate graphical displays data. ANOVA approach should not be When there is convincing evidence that the usual used, thereare at least three possible alternatives.
1. Transformthe raw data to make it more suitable usual analysis; that is, for the perform the analysis not the raw data, but on the values obtained after applying on some suitable mathematical function. example, if the data are very skewed, For a logarithm transformation may help. Transformations discussed in detail by are Howell (1992) and are also involved some of the exercises this chapter and in in in Chapterrl. Transforming the data is sometimesfelt to be aickused by statisticians, a a belief thatis based on the idea that the natural scale of measurement is sacrosanct in some way. This is not really the case, and indeed some measurements pH (e.g., values) are effectively already logarithmically transformed values. However, it is almost always preferable present results in the original scaleof measurement. to (And it should be pointed out that these days there is perhaps notso much reason for psychologists to agonize over whether they should transform their data so as to meet assumptionsof normality, etc., for reasons that will be made clearin Chapter 10.) 2. Use distribution-free methods (see Chapter8). 3. Use a model that explicitly specifies more realistic assumptions than nor10). mality and homogeneity of variance (again, see Chapter

3.4. ONE-WAYANALYSIS OF VARIANCE OF THE FRUIT FLY FECUNDITY DATA


It is always wise to precede formal analyses of a data set preliminary informal, with usually graphical, methods. Many possible graphical displays were described in Chapter 2. Here we shall use box plots and probability plots to make an initial examination of the fruitfly data. The box plots of the per diem fecundities the three strains of fruit fly are of shown in Figure 3.1. There is some evidence of an outlier in the resistant strain (per diem fecundity of M),and also a degree of skewness of the observations the in the susceptible strain, which may be a problem. More about distributional properties of each strains observations can be gleaned from the normal probability plots shown in Figure 3.2. Those for the resistant and nonselected strain suggest that normality is not an unreasonable assumption, but that the assumption is, perhaps, more doubtfulfor the susceptible strain. For now, however, we shall conveniently ignore these possible problems and proceed with an analysis of

70

CHAPTER 3

variance of the raw data.(A far better way of testing the normality assumption in an ANOVA involves the useof quantities known as residuals, to be described in Chapter 4.) The ANOVA table from a one-way analysis of variance of the fruitfly data is shown in Table 3.2. The F test for equality of mean fecundityin the three strains three has an associated p value thatis very small, and so we canconclude that the strains do have different mean fecundities. Having concluded that the strains do differ in their average fecundity does not, of course, necessarily imply that all strain means differ-the equality of means hypothesis might be rejected because theofmean strain differs from the means one two of the other strains, which are themselves equal. Discovering more about which requires the useof what are particular means differ a one-way design generally in known as multiple comparison techniques.

3.5. MULTIPLE COMPARISON TECHNIQUES

When a significant result has been obtained a one-way analysis variance, from of further analyses may be undertaken to find out more detailsof which particular means differ. A variety of procedures known genericallyas multiple comparison techniques are now available. These procedures all have the of retaining the aim nominal significance level at the required value when undertaking multiple tes of

os

B
L

71

72
One-way ANOVA Results for Fruit Fly Data

CHAPTER 3
TABLE 3.2

Source

SS

DF

MS

Between strains 681.11 8.67 1362.21 2 Wifhinstrains(Enor) 5659.02 78.60 72

e.001

mean differences. We shall look at Bonferonni test andScheffi's test.

two multiple comparison procedures, the

3.5.1.

Bonferonni Test

The essential feature the Bonferonni approach to multiple testing compare of is to t is each pair meansin turn,by using a Student'stest. However, this not the very of is process dismissed hopelessly misleading Section 3.2? Well, yes and nothe as in somewhat unhelpful answer. Clearly some more explanation is needed.

1. The serieso f t tests contemplated here carried out only after a significant is F value is found in the one-way ANOVA. 2. Each t test here uses the pooled estimate of error variance obtained from the observations in all groups rather thanjust the two whose means are being within groups mean square as defined compared. This estimate is simply the in Display3.2. 3. The problem of inflating the m e I error, as discussed in Section 3.2, is tackled by judging the value from each t test against a significance level p of a h ,where m is the numbero f t tests performed and a is the sizeof the Type I error (the significance level) that the investigator wishes to maintain in the testing process. (This is known as the Bonferonni correction.)

More details of the Bonferonni multiple comparison procedure are given in Display 3.3. The practical consequence using this procedure that eacht test of is will now have to result in a more extreme value than usual for a significant mean difference to be claimed. In this way the overallm e I error will remain close to the desired value a. The disadvantage of the Bonferonni approach is that it may be highly conservative if a large number of comparisons are involved; that is, some real differences are very likely to be missed. (Contemplating a large number of comparisons may, of course, reflect a poorly designed study.) Howe small number of comparisons, and the the procedure can be recommended for a results of its application to the fruit data are fly shown in Table 3.3. The results are

ANALYSlS OF VARIANCE l: THE ONE-WAY DESIGN


Dlsplay 3.3 Bonfemoni t Tests

73

The t statistic usedis


where S* is the emr mean square (an estimate of u2)and nl and nz are the number of observationsin the two groups being compared. Each observedt statistic is compared withthe value froma t distribution withN k degrees of freedom corresponding a significance level a/m rather thana , to of where m is the number of comparisons made. Alternatively (and preferably) a confidence interval be can constructed as

mean differencef tN-r(a/h)x sJl/nl+ 1/nz, where tN+(a/2m) thet value withN k degrees of freedom, corresponding is to significance level / h .(These confidence intervalsare readily availablefrom most a statistical software;see computer hints at the end of the chapter.)

TABLE 3 3
Resulu fmm the Bonferonni Procedure Used on theFruit Fly Data
k = 72 and n~ = nz = 25 for each pair of gmups. Three comparisons can be made between the pairsof means of the three strains, so that thef value comspondingto 72 degrees of freedom and (1/6 for ( = 0.05 is 2.45. Finally sz = 78.60 (see Table 3.2). I The confidence intervals calculatedas d s r b d in Display 3.3 are ecie

In this example N

Comparison NS-RS

E t of mean s. difference Lower bound Upper bound


8.12 9.74 1.63 1.97 3.60 -4.52 14.34 15.W 7.77

NSSS
RSSS

zero. 'Interval does not contain the value


also shown in graphical form in Figure3.3.Here itis clear that the mean fecundity of the nonselected strain differs from the means the other two groups, which of

themselves do not differ.

3.5.2.

Scheffes Multiple Comparison Procedure

Scheffes approach is particularly useful when a large numbercomparisons are of to be carried out. The is again basedon a Student'st test, but the critical point test in used for testing each of the t statistics differs from that used the Bonferonni procedure. Details of the Scheff6 test are given in Display 3.4, and the results of applying the test to the fruit fly data are shown in Table 3.4 and Figure 3.4.

74

CHAPTER 3
TABLE3.4 Scheffc's Procedure Applied to FruitFly Data

Here the critical value 2.50 and the confidence intervals each comparison are is for

NS-RS NSSS

Comparison

usss

Est. of mean difference bound Lower 8.12 1.85 9.74 3.48 1.63 -4.64

Upper bound
14.44 16.W 1.9

'Interval does not contain the value e o zr.

NS-RS NS-SS RS-SS

(""""-4""""

6
I

(-""""4""""4
1

l l

" " " "

4" " " "

4 6 8 1 0 1 2 1 4 1 6 simultaneous 95 %confidencelimits, Bonferroni meMod response variable: fecundity

-2

FIG. 3.3. Graphical display of Bonferroni multiple comparison resuits for fruit fly data.

Display 3.4 Scheffes Multiple Comparison Procedure

The test statistic used here onceagain the t statistic used in the Bonferunni is procedure anddescribed in Display 3.3. In this case, however,each observed test statistic is compared with [(k l)Fk-l,,+~&)]'", where F~-I.N-&Y) F value is the with k l . N - degrees of freedom correspondingto significance level a,(More k details are given in Maxwell and Delaney, 1990.) The method can again be used to construct a confidence interval for two means as

mean difference& criticd value x sJl/nl+ 1/m.

The confidence intervals very similar to those are arrived at by the Bonferonni are approach, and the conclusions identical. One general point to make about multiple comparison teststhe whole, is that, on of is it they allerr on the side safety (nonsignificance). Consequently, quite possible (although always disconcerting) to find that, although the F test in the analysis of variance is statistically significant, no pair of means is judged by the multiple comparison procedure to be significantly different. One further that a host is point of such tests available, and statisticians usually very wary an overreliance are are of

ANALYSIS OF VARIANCE I: THE ONE-WAY DESIGN


NS-RS NS-SS RS-SS

75

&
I I

+
l

" " " -

" ( " " (


I
l

l l
I
I

-l
I

-6

-2 0 2 4 6 8 12 10 1418 16 Simultaneous 95 %confidence limits, Scheff6 method response variable:fecundity

FIG. 3.4. Graphical display results for fruit fly data.

of Scheff6 multiple comparison

on such tests (the authorno exception). Readers need to avoid being seduced by is the ease with which multiple comparison tests canbe applied in most statistical software packages.

3.6. PLANNED COMPARISONS


Most analyses of one-way designs performed in psychological research involve ANOVA the approach describedin the previous two sections, namely a one-way in which finding a significant F value is followed by the application of some type of multiple comparison test. But there is another way! Consider, for example, the data shown in Table3.5, which were obtained from an investigation into the effect of the stimulant caffeine on the performance of a simple task Forty male students were trained finger tapping. They were then divided at ranin dom into four groupsof 10, and the groups received different doses of caffeine ( , 0 , 0 , and 300 m ) TWO hours after treatment, each 01020 l. man was required to carry out finger tapping and the number taps per minute was recorded. Here, of because the question of interest is whether caffeine affects performance on the finger-tapping task, the investigator may be interested priori in the specific hya pothesis that the mean of the three groups treated with caffeinediffers from the mean of the untreated group, rather than in the general hypothesis of equality of means that is usually tested in a one-way analysis of variance. Such a priori planned comparisons are generally more powerful, thatis, more liely to reject F the null hypothesis when it is false, than the usual catch-all test. The relevant test statistic for the specific hypothesis interest can be constructed relatively of simply (see Display 3.5). As also shown in Display 3.5, an appropriate confidence interval canbe constructed for the comparison of interest. In the caffeine example, the interval constructed indicates that it is very likely that there is a difference in finger-tapping performance between the "no caffeine" and "caffeine" conditions. More finger tapping takes place when subjects are given the stimulant. The essential difference between planned and unplanned comparisons (i.e., those discussed in the previous section) is that the former can be assessed by

76
TABLE 3 5 Caffeine and Finger-Tapping Data
0 m1
1 0m1 0

CHAPTER 3

2 0m1 0

3 0m1 0

242 245 24 4 248 241 248 242 244 246 242

248 246 245 241 248 250 247 246 243 244

26 4 248 250 252 248 250 252 248 245 250

248 250 25 1 251 248 251 252 249 253 251

minute.

Note. The response variable is the number of t p per as

using conventional significance levels whereas the latter require rather stringent significance levels. An additional difference is that when a planned comparison approach is used, omnibus analysisof variance is not required. The investigator moves straight to the comparisons most interest. However, there one caveat: of is planned comparisons have to be just that, and not the result of hindsight after inspection of the sample means!

3.7. THE USE OF ORTHOGONAL POLYNOMIALS: TREND ANALYSIS In a one-way design where the independent variableis nominal, as in the teaching methods example, the data analysis is usually limited to testing the overall post null hypothesisof the equalityof the group means and subsequent hoc comif parisons by using some typeof multiple comparison procedure. However, the independent variable has levels that form an ordered scale, it is often possible to extend the analysis to examine the relationship of these levels to the group An in means of the dependent variable. example is provided by the caffeine data ., l , Table 3 5 where the levels of the independent variable take the valuesof 0 m 100 ml, 200 ml, and 300 ml. For such data most interest will center on the presence of trends of particular types, that is, increases or decreases in the means of the dependent variable over the ordered levelsthe independent variable. Such of

ANALYSIS OF VARIANCE I: THE ONE-WAY DESIGN


Display 3.5

77

Planned Comparisons
The hypothesis of particular interest in the finger-tapping experiment is

Ho : = ~ / ~ [ P I w ~ PO +

2 w~ 3 r d

where PO, ~ 1 0 0 P~M), PM)are the population means of the ml, 100 ml, , and 0 200 ml, and 300 ml groups, respectively. The hypothesis can be tested by usingthe following t statistic:
t=

.io

-1/3(.iloo .izw +.im) s(1/10 + 1/30)1/2

where .io, ?m, . m the sample means of the four groups, and s2is and i are once again theerror mean square from one-way ANOVA. the The values of thefour sample meansare .io = 244.8, .ilw = 246.4,2200= 248.3, and .iZw = 250.4. Here s2 = 4.40. The t statistic takes the value -4.66. This is tested as a Studentst with 36 degrees of freedom (the degrees of freedom oferror mean square).The associated the p value is very small(p = .000043), and the conclusion that the tapping mean is in the caffeinecondition does not equal that when no caffeine is given. A corresponding 95% confidence intervalfor the mean difference between two the conditions can be constructedin theusual way to give result (-5.10, -2.04). the More finger tapping occurs when caffeine is given. An equivalent method of looking planned comparisons is first put the at to hypothesis of interest a slightly differentform from that given earlier: in

Ho:~o-f~lw-3~~0-4/13(*)=0.
The estimate ofthis comparison ofthe fourmeans (called a contrast because the defining constants, 1, -1/3, -1/3, -1/3 sum to zero), obtained fromthe sample means, is
244.8

-1/3 X 246.4 - X 248.3 -113 X 250.4 = -3.57. 1/3


(-3.57)2 1/10[1+ (1/3)2 (1/3Y

The sum of squares (and the mean square, because only a single degree of freedom
is involved) corresponding this comparison is found simplyas to

+ (1/3)21 = 95.41.

This mean squareis tested as usual against theerror mean square as an F with 1 and U degrees of freedom, where is the numberof degrees of freedom the error U of
mean square (inthis case the value36). So F = 95.41/4.40 = 21.684. The associated p value is .000043, agreeing with that the t test described above. of The two approaches outlined exactly equivalent, because calculated F are the statistic is actually the square of t statistic (21.68 = [-4.6712). the The second version of assessing a particular contrast by using the F statistic, andso on, will, however, help the explanation of orthogonal polynomials be given in in to Display 3.6.

(For more details, including the general case, see Rosenthal and Rosnow,1985.)

78

CHAPTER 3

N I

None

l l m

Amounl d cenelne

2ooml

3wml

FIG. 3.5. Box plots for

finger-tapping data.

trends can be assessed relatively simply by using what are known orthogonal as polynomials. These correspond to particular comparisons among the levels of the independent variable, with the coefficients defining the required comparisons dependent on the number of levels of this variable. How these coefficients arise is not of great importance (and anyway is outside the levelthis book), but how they of are used is of interest; this is explained in Display 3.6. (Comprehensive tablesof orthogonal polynomial coefficients corresponding to a number of different levels of the factor in a one-way design can be found in Howell, 1992.) Using these coefficients enables the between groups sumof squares to be partitioned into a number of single-degree-of-freedom terms, each which represents the sum of of squares corresponding to a particulartrend component. (The arithmetic involved in is similar to that described for planned comparisons Display 3.4.) The particular are results for the caffeine example also shown in Display 3.6 and indicate that there is a very strong linear trend the finger-tapping means over the four levels in data 3.5). of caffeine. A box plot of the demonstrates the effect very clearly (Figure

3.8. INTRODUCING ANALYSIS OF COVARIANCE


The data shown in Table were collected from a sample first-grade children. 3.6 24 of Each child completed the Embedded Figures Test (EFT), which measures field dependence, thatis, the extent to which a person can abstract the logical structure of a problem from context. Then the children were randomly allocated to one its

Display 3.6 The Use of Orthogonal Polynomials


When the levels of the independent variable a series of ordered steps, is often form it examine the relationship between levels andthe means of the the of interest to be response variable.In particular the following questions would ofinterest. 1.Do the means of the treatment groups increase linear fashion withan increase in a in the level ofthe independent variable? 2. Is the trend just linear or is there evidenceof nonlinearity? any 3. If the trend is nonlinear, what degree equation (polynomial) is required? The simplest wayto approach these and similar questions the use of is by orthogonal polynomial contrasts. Essentially, these correspond to particular comparisons among so the means, representing linear trend, quadratic trend, and on. Theyare defined by for of a series of coefficients specific the particular number of levels the independent variable. The coefficients are available from most statistical tables.small part of A such a table shown here. is Level 3 2 4 5 / Trend 1 c -1 0 1 3 Lmear Quadratic 1 -2 1 4 Linear -3 -1 1 3 Quadratic -1 1 -1 1 Cubic -1-3 3 1 5 Linear -2 -1 0 1 2 Quadratic 2 -1 -2 -1 2 Cubic -1 2 0 - 2 1 Quartic -4 1 6 -4 1 (Note that these coefficients only appropriatefor equally spaced levels of the are independent variable.) These coefficients can beused to produce a partition of the between groups of sums to squares into single-dep-of-freedom sums of squares corresponding the trend components. These sums of squaresare found by using the approach described for planned comparisonsin Display 3.5. For example, the sum of squares Corresponding to the linear trend for the caffeine data isfound as follows:
S&,,

[-3

244.8

+ (-1)

= 174.83.

1/10t(-3)2

+ 1 248.3 +3 X 250.41 + + + (3)21


246.4
X

The resulting ANOVA table for the finger-tapping example as follows. is


Source DF SS MS CaffeineLevels175.47 3 58.49 13.29 Linear 1 174.83 174.83 39.71 Quadratic 0.14 0.62 1 0.62 Cubic 1 0.00 0.00

c.001 c.001
0.00
.71
.97

within

36 4.40 158.50

Note that the sum of squaresof linear, quadratic, and cubic effects add to the between groupssum of squares. Note also that in this example the difference in means ofthe fourordered the caffeine levelsis dominated by the linear effect; departures from linearity have a sum of squares of only(0.62 0.00) = 0.62.

79

80
WISC Blocks Data

CHAPTER 3
TABLE 3.6

Row Group
lime

Comer Gmup

EFT
59 33 49 69 65 26 29 62 31 139 74 31

lime

EFT
48 23 9 128 44 49 87 43 55 58 113
l

464

317

525 298 491 1% 268 372 370 739 430 410

342 222 219 513 295 285 408 543 298 494 317 407

of two experimental groups. They were timed they constructed a3 x 3 pattern as from nine colored blocks, taken from the Wechsler Intelligencefor Children Scale (WISC). The two groups differed in the instructions they were for the task given the row group were told to start with a row of three blocks, and the comer group were told to begin with a comer of three blocks. The experimenter was interested in whether the different instructions produced any change average in the of time to complete the pattern, taking account the varying field dependence into the children. These data will help introduce a technique known us to as analysis ofcovariance, which allows comparison of group means after adjusting for what are called concomitant variables, or, more generally, covariates.The analysis covariance of tests for group mean differences in the response variable, after allowingfor the possible effect of the covariate. The covariate is not in itself of experimental interest, exceptin that using it in an analysis can lead to increased precision by square decreasing the estimate experimental error, that the within group mean of is, in the analysis variance. The effect the covariate allowed for by assuming a of of is linear relationship between the response variable and the covariate. Details of the analysis of covariance modelare given in Display3.7. As described, the analysis of of covariance assumes that the slopes the lines relating response and covariate ae the same in each group; thatthere is no interaction between the groups and r is, f the covariate. I there is an interaction, then it does not make sense to compare the groupsat a single value of the covariate, because any difference noted will not apply for other valuesof the covariate.

Display 3.7 The Analysis of CovarianceModel In general terms the model is observation = mean group effect covariate effect error. More specifically, if yij is used to denote the valueof the response variablefor the j individual in the ith group and xi/ is the valueof the covariatefor this individual, t h then the model assumedis

Yij = N + W B h j E) i j + where B is the regression coefficient linking response variable and covariate and is 2 the grand mean of the covariate values.(The remaining terms in the equation are as in the model defined in Display 3.2). Note that the regression coefficient is assumed to the same in each group. be The means of the response variable adjusted the covariateare obtained simplyas for

adjusted group mean= group mean & r a n dmean of covariate -group mean of covariate), where is the estimate of the regression coefficient in the model above.

5 3i B 5E c

8-

wFIG. 3.6.

Row o

Calm group

Box plots of completion times for the WlSC data.

Before applying analysis covariance (ANCOVA) the WISCdata, we should of to look at some graphical displaysof the data. Box plots of the recorded completion times in each experimental group shown in are Figure 3 6 and a scattergram giving ., (see EFT the fitted regressions Chapter6)for time against the in each group is given in Figure 3 7 The box plots suggest a possible outlier. Here shall conveniently .. we ignore this possible problem and analyze all the data (butsee Exercise 3 5 . . )The

82

CHAPTER 3 TABLE 3.7 ANCOVA Results from SPSS for WISC Blocks Data
Source

ss
109991 11743 245531 11692

DF
1

MS

F e d dependence il
&UP

Error

1
21

109991 11743

9.41 1.00

.006

.33

... .. . ..

Calumn group

/
80 EFT 80

mw

20

40

100

l20

140

Scatterplot for time against the EFT for the WiSC data, showing f i t t e d regressions for each group.
FIG. 3.7.

scatterplot suggests the slopes the two regression linesnot too dissimilar that of are for application of the ANCOVA model outlinedin Display 3.7. Table 3.7 shows the default results from using SPSS for the ANCOVA of the S-PLUSfor data in Table3.6. In Table 3.8 the corresponding results from using are are the analysis shown. You will notice that the results similar but not identical The testfor the covariate differs.The reasonfor this difference will be made clear in Chapter 4. However despite the small differencein the results from the two is the is packages, the conclusion from each same:it appears that field dependence is predictive of timeto completion of the task, but that there no difference between the two experimental groups in mean completion time.

ANALYSIS OF VARIANCE I: THE ONE-WAY DESIGN


TABLE 3.8 ANCOVA of WISC Data by Using S-PLUS

83

Source

ss
110263 11743 245531 11692

DF

MS

Field dependence
Group Error

21

1 9.43 110263 1 11743

1.00

0.006 0.33

Further examplesof analysis of covariance are given in the next chapter, and then in Chapter6 the modelis put in a more general context. However, one point about the method that should be made here concerns its use with naturally oc or intact groups, rather groups formedby random allocation. than The analysis covariance was originally intended be used in investigations of to in which randomization had been used to assign patients to treatment groups. Experimental precision could be increased by removing from the error term in the ANOVA that part of the residual variabilityin the response that was linearly predictable from the covariate. Gradually, however, the technique became more widely used to test hypotheses that were generally stated in such terms as the group differences on the response are zero when the group means on the covariate are made equal,or the group means on the response after adjustmentmean for differences on the covariate are equal. Indeed, some authors (e.g., McNemar, 1962) have suggested that if there only a small chance difference between the is groups on the covariate, the useof covariance adjustment may not be worth the effort. Such a comment rules out the very situation for which the ANCOVA was originally intended, because the caseof groups formedby randomization, any in are necessarily the result chance! Such advice of group differences on the covariate is clearly unsound because more powerful tests of group differences result from the decrease in experimental error achieved when analysis covariance is used of in association with random allocation. In fact, it is the use of analysis of covariance in an attempt to undo built-in differences among intact groups that causes concern. For example, Figure 3.8 shows a plot of reaction time and age for psychiatric patients belonging to two distinct diagnostic groups.An ANCOVA with reaction time as response and age as covariate might lead to the conclusion that reaction time does not differ in the are of two groups. In other words, given that two patients, one from each group, be approximately the same age, their reaction times are also likely to similar. Is such a conclusion sensible? An examination of Figure 3.6 clearly shows that it is not, because the ages of the two groups do not overlap and an ANOVA has

84

CHAPTER 3

,
0

0 o 0 0

oo

ooo

0 0

0 0

group2

Diagnostic

Age

FIG. 3.8. Plot of reaction time against age for two groups of psychiatric patients.

essentially extrapolated into a region with no data. Presumably,is this type of it problem that provoked the following somewhat acidic comment from Anderson (1963): one may well wonder what exactly means to ask what the data would it be likeif they were not they are! Clearly, some thought to be like has given to the use of analysis of covariance on intact groups, and readers are referred to Fleiss and Tanur(1972) and Stevens(1992) for more details.

3.9. HOTELLINGS T 2TEST AND ONE-WAY MULTIVARIATE ANALYSIS OF VARIANCE

The data shown in Table 3.9 were obtained from a study reported by Novince (1977), which was concerned with improving the social skills of college females and reducing their anxiety heterosexual encounters. There were three groups in in the study: a control group, a behavioral rehearsal group, and a behavioral reh plus cognitive restructuring group. The values of the following four dependent in variables were recorded for each subject the study: anxiety-physiological anxiety in a series of heterosexual encounters; measure of social skillsin social interactions; appropriateness; assertiveness.
this by Between group differences in example could be assessed separate oneway analyses of variance on each the four dependent variables. alternative of An

ANALYSIS OF VARIANCE I: THE ONE-WAY DESIGN


Social Skills Data

85

TABLE 3.9

ills ss

Social

Anxiety

5 4 4

Behavioral Rehearsal 5 3

4 3

4 5
5

4 5 5 4

5 5

4 4 4
4

3 4 4 5 5 4 5
4 4

3 4

4
4

4 3

4 5 4 5

3 4

Control Gmup 6 6
5

6
4 7 5 5 5 5

2 4

2 2 2

2
2

1 2

4 2

4 1 3

2 4
3

3 1

6
4

Behavioral Rehearsal Cognitive Restructuring


4 4

4 2 4

3 3 3 3

3 3 3 3

4 4 4 4
4

4 5

4 4
5 5

4 4
6

4 5 4

4 5 6 4 3
4

4 5 5 4
4 5 4

3 4
~~

3 4

Note. From Novince (1977).

86

CHAPTER 3

and, in many situations involving multiple dependent variables, a preferable procedure is to consider the four dependent variablessimultaneously. This is particularly sensible if the variables are correlated and believed to share a common conceptual meaning.In other words, the dependent variables considered together make sense as a group. Consequently, the real question of interest is, Does the set of variables as a whole indicate any between group differences? To answer this question requires the use of a relatively complex technique knownas multivariate analysis of variance, or MANOVA. The ideas behindthis approach are introduced most simply by considering first the two-group situation and the multivariate analog of Students t test for testing the equalityof two means, namely Hotellings T Ztest. The test is described in Display 3.8 (the little adventure in matrix algebrais unavoidable, Im afraid), and its application to the two experimental groupsin Table 3.9 is detailed in Table3.10. The conclusion tobe drawn from the T2test is that the mean vectors of the two experimental groups not do differ. T2test would simply reflect It might be thought that the results from Hotellings those of a seriesof univariate t tests, in the sensethat if no differences are found by the separatet tests, then Hotellings2test will also lead to the conclusion that T the population multivariate mean vectors not differ, whereasif any significant do TZ be difference is found for the separatevariables, the statistic will also significant. In fact, this is not necessarily the case (if it were, the TZtest would be a waste of time); it is possible to have no significant differencesfor each variable tested separately but a significant TZ value, and vice versa. A full explanation of the differences between a univariate and multivariate testsituation involving for a just two variablesis given in Display 3.9. When a setof dependent variables to be compared in more than groups, is two the multivariate analog the one-wayANOVA is used. Without delving into the of technical details, let us say that what the procedure attempts to do is to combine the dependent variables in some optimal taking into consideration the way, correlations betweenthem and deciding what unique information each variable beprovides. A number of such composite variables giving maximum separation tween groups are derived, and these form the basis of the comparisons made. Unfortunately, in the multivariate situation, when there are more than two groups to be compared, no single test statistic can be derived that is always the most powerful for detecting all types of departures from the null hypothesis of the equality of the mean vectors of the groups. A number of different test statistics have been suggested that may give different results when used on the same set of data, although the resulting conclusion from each is often the same. (When only two groups are involved, all the proposed test statistics are equivalent to Hotellings T2.) Details and formulas for the most commonly used MANOVA test statistics are given in Stevens (1992), example, andin the glossary in Apfor pendix A. (The various test statistics are eventually transformed F statistics into

ANALYSIS OF VARIANCE I: THE ONE-WAY DESIGN


Hotellings T 2Test

87

Display 3.8

If there are p dependent variables,the null hypothesis is that the p means of the first population equal the corresponding meansof the second population. By introduction of some vector and matrix nomenclature, null hypothesis can be the written as

H : = pzr o P:
where p and p contain the population means of dependent variables in the two 1 2 the groups; thatis, they are the population mean vectors. The test statistic is

when nl and n2 are the numbers of observations in the two groups and is defined D2
as

= (Z: &)S(%1- %2),

where Z and Z2 are the sample mean vectors in two groups and is the estimate l the S of the assumed common covariance matrix calculated as

S=

(nl

1)s: +n2-2 nl

- + (n2 -1 ) s ~

where S1 and S2 are the sample covariance matrices in each u p . p Note that the form of the multivariate test statistic is very similar to that of the univariate independent sample test of your introductory t course; it involves a difference difference between the means (here the means are vectors and the between themtakes into account the covariances of the variables) andan assumed common variance(here the variance becomesa covariance matrix). Under H (and when the assumptions 0 listed below are true), the statistic F given by

The assumptions of the test are completely analogousto those of an independent samples t test. They are as follows: (a) the data in each populationare from a multivariate normaldistribution; @) the populations havethe same covariance matrix; and(c) the observations are independent.

+n2 -2 ) ~ has a Fisher F distribution with p and n +n2 - - degrees of freedom. p 1


I

F=

(nl +n2 - p

-1)T2

of to enable p values to be calculated.) Only the results applying the tests to the 3.9 are data in Table will be discussed here (the results given in Table 3.11).Each test indicates that the three groups are significantly different on the set of four variables. Combined with the earlier test on the two experimental groups,this T2 result seems to imply that it is the control group that givesrise tothe differen This

88
Calculationof Hotellings T 2for the Data in Table 3.8 The two sample mean vectors are given by

CHAPTER 3
TABLE 3.10

p, = [4.27,4.36.4.18,2.82]. p$ = [4.09,4.27.4.27,4.09].

The sample covariance matrices the two groups are of


0.42 -0.31
-0.44 = (-0.31

-0.25

0.45 0.33 0.37

-0.25 -0.44 0.33 0.36 0.34

0.34 0.56 0.37)

-0.13 -0.13 -0.11

0.09 -0.13 -0.13 -0.11 0.82 0.62 0.62 0.62 0.47 0.57 0.47 0.49 0.57)

The combined covariance matrix is


0.28 -0.24 -0.21 -0.31 -0.24 0.52) 0.52 0.71 0.52 -0.21 0.54 0.45 -0.31 0.52 0.45 0.59

Consequently. D2= 0.6956 and T 2= 3.826. The resultingF value is0.8130 with 4 and 17 degrees of freedom. The correspondingp value is 53.

Display 3.9 Univariate and Multivariate Tests for muality of Meansfor TWO Variables Supposewe have a sample n observations on two variables x1 and x2, and we wish of of to test whether the population means the two variables p l and p2 are both zero. Assume the mean and standard deviation thexI observations are PIand s1 of respectively, and of the x2 observations, P2 and s2. If we test separately whether each mean takes the valuezero,then we would use two t tests. For example, to test = 0 against p1 # 0 the appropriate test statistic is P -0 I
t=-

The hypothesis p1 = 0 would be rejected at the a percent levelof significance if t c -rlwl-~m) or t tlwI-+); that is, ifP1 fell outside the interval [--s~f~~~-f.)lJii~ where rlwl-j., is the 1W1- fa) Sltlwl- ./l )& percent point of the distribution wi!hn - degrees of freedom. t 1 Thus the hypothesis would not be rejected i f f 1 fell within this interval.
(Continued)

SllJ;;

ANALYSIS OF VARIANCE 1 THE ONE-WAY DESIGN :


Display 3.9 (Continued)

89

Similarly, the hypothesis112 = 0 for the variable would not be rejected the x2 if mean, 22, of the observations fell within a corresponding interval with s2 substituted for SI. The multivariate hypothesis[PI, = [ ,0 would therefore notberejected if both 1121 O 1 these conditions were satisfied. If we were to plot the point (21,22)against rectangular axes, the area within which the point could and the multivariate hypothesis not rejected lie is given by the AB rectangle ABCD of the diagram below, where and DCare of length 2Sltlaocl-~+/iiwhile AD and BC are of length 2SZrlM(I-~a)fi.

Thus a sample that gave the means2 , represented by the pointP would lead to ( 1 22)

acceptance of the mutivariate hypothesis. Suppose, however, that the variables and x2 are moderately highly correlated. xI Then allpoints (XI,x2) and hence(21, should lie reasonably close to the straight a , ) line MN through theorigin marked onthe diagram. Hence samples consistent with the multivariate hypothesis should be represented by points (21, that lie withina region encompassing the line 22) MN. When we take account of the natureof the variationof bivariate normal samples that include be on correlation, this region can shown tobe an ellipse such as that marked the be diagram. The point P is nor consistent with this region and, in fact, should rejected for this sample. Thus the inference drawn from thetwo separate univariate tests conflicts with the one drawn from a single multivariate test, and isthe wrong inference. it A sample giving the(Zlr22) values represented by pointQ would give the other type of mistake, where the application two separate univariate tests leads to the of rejection of the null hypothesis, but the correct multivariate inference the is that hypothesis shouldno? be rejected. (This explanation is taken with permission from Krzanowski, 1991.)

90
TABLE 3.11 Multivariate Testsfor the Data in Table3 9 .

CHAPTER 3

Test Name Value

Corresponding

F Value

DFI

DF2

56.00 8.003.60443 Pillai 0.67980 Hotelling 5.12600 1.57723 Wilk 0.36906 54.00 8.00 4.36109

.002 52.00 8.00


<.001

<BO1

conclusion could be checked by using multivariate the equivalent of the multiplecomparisontestsdescribed in Section 3.5 (see Stevens, 1992, for details). of (The multivariate test statistics are based on analogous assumptions to those the F tests in a univariate one-way ANOVA; that the dataare assumed to come is, from a multivariate normal distribution and each populationis assumed to have the samecovariance matrix. For more details see Stevens, 1992, and the glossary in Appendix A.) In situations in which the dependent variables of interest can genuinely be of advantages in using a multivariate approach regarded as a set, there are a number rather than a series separate univariate analyses. of
1. The useof a seriesof univariate tests leads a greatly inflated ?Lpeerror to 1 rate; the problem is analogous to the multiple t test problem described in Section 3.2. 2. The univariate tests ignore important information, namely the correlations among the dependent variables. The multivariate tests this information take into account. 3. Although the groups notbe significantly different any of the variables may on individually, jointly the of variables may reliably differentiate the groups; set that is, small differences on several variables may combine to produce a reliable overall difference. Thus the multivariate test be more powerful will in this case.

However, it should be remembered that these advantages are genuine only if the dependent variables can honestly be regarded as a set sharing a conceptual meaning. A multivariate analysis variance on a number of dependent variables of when there is not strong rationale for regarding them simultaneously not to be is recommended.

ANALYSIS OF VARIANCE I: THE ONE-WAY DESIGN

91

3.10. SUMMARY
1. A one-way ANOVA is used to compare the means of k populations when k 2 3. 2. The null and alternative hypotheses are Ho:pI=p2=.'.= pk,

HI:not all means are equal.


variances. 4. A significant test should be followed one or other multiple comparison F by test to assess which particular means differ, although care needed in the is application of such tests. 5. When the levelsof the independent variable form an ordered scale, the use of orthogonal polynomials often informative. is 6. Including variables thatare linearly related to the response variable as covariates increases the precision study the sense leading to a more of the in of powerful F test for the equalityof means hypothesis. 7. The examples in chapter have all involved the same number of subjects this in each group. This is not a necessary condition for a one-way ANOVA, of although it is helpful because it makes depatures from the assumptions normality and homogeneityvariance even less critical than usual. of of Some the formulas in displays require minor amendments for unequal group sizes. 8. In the one-way ANOVA model considered this chapter, the levelsof the in independent variable have been regarded asfired; that is, the levels used are are those of specific interest.An alternative model in which the levels considered as a random samplefrom some populationof possible levels might also have been considered. However, because such models are usually of greater interestin the case of more complex designs, their description will be left until later chapters. 9. Several dependent variables can be treated simultaneously byMANOVA; this approach is only really sensibleif the variables can truly be regarded, in some sense, a unit. as
3. The test of the null hypothesis involves an F test for the equality of two

COMPUTER HINTS

SPSS
In SPSS the first steps conducting a one-way in ANOVA are as follows.
1. Click Statistics, click GeneralLinearModel,

andthenclick GLMGeneral Factorial.The GLM-General Factorial dialog box will appear.

92

CHAPTER 3

2. Select the response variable the grouping variable. and 3. Click on Options to choose any particular options required such homoas of so geneity tests, estimates effect sizes, descriptive statistics, and on.
MANOVA, the basic steps are as follows. are To conduct a one-way

1. Click Statistics, click General Linear Model, andthenclick GLMMultivariate. The GLM-Multivariate dialog box becomes available. now 2. Select the dependent variables the grouping variable from the relevant and data set. 3. Click Options to refine the analysis and select things such as estimates and post hoc tests.

S-PLUS In S-PLUS, a simpleone-way ANOVA can be conducted by using the following


steps.
1. Click Statistics, click ANOVA, and click Fixed Effects;then theANOVA

dialog box will appear. 2. Choose the relevant set and select the dependent variable and the grou data ing (independent) variable. 3 Click thePlot tag of the dialog box if, for example, sometype of residual . plot is required.
A one-way MANOVA can be undertaken follows. as

1. Click Statistics, click Multivariate, and click MANOVA togetthe MANOVA dialog. 2. Select the multiple response variables the grouping variable. and
are Multiple comparison test conducted by using the following.

1. Click Statistics, click ANOVA, and click Multiple Comparisons to get the Multiple Comparisons dialog. 2. Select the resulting analysis variance object saved from previous analof a ysis of variance and select the method required, the confidence level, and so on.

In S-PLUS, an ANOVA can be undertaken by means of the command line approach using the function. I,for example, thefruit fly data (see Table 3.1) BOV f

ANALYSlS OF VARIANCE l: THE ONE-WAY DESIGN

9 3

were storedin a data frame called with variable names fruit group and fecund,a by one-way ANOVA could be carried out using the command aov(fecund^.group,dat~~uit~. Multiple comparisons can be appliedby using the multicomp function. The multiple comparison procedure produces a useful graphical display results of the (see Figures3.3 and 3.4). A multivariate analysisof variance is available by using themanow function.

EXERCISES
3.1. Reproduce all the results given inchapter by using your favorite piece this of statistical software.m s exercise should be repeated subsequent chapters, in and those readers finding any differences between their results are invited and mine to e-mail me atb.even'rr@iop.kclac.uk) 3.2. The datagiven in Table 3.12 were collectedin an investigation described by Kapor (1981), in which the effect of knee-joint angle on the efficiency of cycling was studied. Efficiency was measured in terms of the distance pedalled on three an ergocycle until exhaustion. The experimenter selected knee-joint angles
TABLE 3.12 The Effect of Knee-Joint Angle the Efficiency on of Cycling: TotalDistance Covered (km)

8.4 7.0 3.0 8.0 7.8 3.3 4.3 3.6 8.0 6.8

10.6 7.5 5.1 5.6 10.2 11.0 6.8 9.4 10.4 8.8

3.2 4.2 3.1 6.9 7.2 3.5 3.1 4.5 3.8 3.6

94

CHAPTER 3

of particular interest: 50", 70", and 90". Thirty subjects were available for the experiment and 10 subjects were randomly allocated to each angle. The drag of the ergocycle was kept constant at N,and subjects were instructed 14.7 to pedal at a constantspeed of 20 kmih.

1. Cany out an initial graphical inspection the datato assess whether there of are any aspects of the observations that might be a cause for concern in later analyses. 2. Derive the appropriate analysis variance tablefor the data. of 3. Investigate the mean differences in more detail using a suitable multiple comparisons test.
3.3. Suggest suitable estimatorsfor the parameters p and ui in the one-way ANOVA model given in Display3.2. Use your suggested estimators on the fruit fly data.

3.4. The datai Table 3.13 were collectedin an investigationof maternal ben havior of laboratory r t .The response variable was the time (in seconds) require as for the mother to retrieve the pupto the nest, after being moved a fixed distance away. In the study, pups of different ages were used.

1. Cany out aone way analysis of variance of the data. 2. Use an orthogonal polynomial approach to investigate whether there is any evidence of a linearor quadratic trend in the group means.
3.5. The data in Table 3.14 show the anxiety scores, both before and after the operation, for patients undergoing wisdom tooth extraction by one of three
TABLE 3.13 Maternal Behavior in Rats

Age of P p u
5 Days

20 Days

35 Days

15 10 2 5 15

20

18

30 15 20 25 23 20

4 0 35 50 43 45 40

ANALYSIS OF VARIANCE I: THE ONE-WAY DESIGN


Anxiety Scores for Patients Having Wisdom Teeth Exlracted
Method of Exrmction

95

TABLE 3.14

Age (yeats)

Initial Atuiety

Final Anxiety

Method 1

Method 2

Method 3

27 32 23 28 30 35 32 21 26 27 29 29 31 36 23 26 22 20 28 32 33 35 21 28 27 23 25 26 27 29

30.2 35.3 32.4 31.9 28.4 30.5 34.8 32.5 33.0 29.9 32.6 33.0 31.7 34.0 29.9 32.2 31.0 32.0 33.0 31.1 29.9 30.0 29.0 30.0 30.0 29.6 32.0 31.0 30.1 31.0

35.0

32.0 34.8 36.0 34.2 30.3 33.2

34.0 34.2 31.1 31.5 32.9 34.3 32.8 32.5 32.9 33.1 30.4 32.6 32.8 34.1 34.1 33.2 33.0 34.1 31.0 34.0 34.0 35.0 36.0

methods. The ages of the patients (who were all women) were also recorded. Patients were randomly assigned the three methods. to 1. Carry out a one-wayANOVA on the anxiety scores on discharge. 2. Construct some scatterplots to assess informally whether the relationship between anxiety score on discharge and anxiety score prior to the operation is the same in each treatment group, and whether the relationship is linear.

96

CHAFTER 3

3. If you t i k it is justified, carry out anANCOVA of the anxiety scoreson hn

discharge, using anxiety scores priorthe operationas a covariate. to 4. Carry outan ANOVA of the difference between the two anxiety scores. 5. Comment on the difference between the analyses in steps 4. 3 and 6. Suggest a suitable model for an ANCOVA of anxiety score on discharge of by using the two covariates anxiety score prior to the operation and age. Carry out the appropriate analysis.
3.6. Reanalyze the S C data in Table after removing any observation you W 3.6 as feel might reasonably be labeled an outlier.Calculate the adjusted group means for time to completion by using thefonnula given in Display 3.7.

3.7. Show that when only a single variable is involved, the test statistic for Hotellings T2 given in Display 3.8 is equivalent to the test statistic used in an independent samplest test.

3.8. Apply Hotellings T 2 test to each pair groups in the data in Table 3.9. of Use the three T2 values and the Bonferroni correction procedure assess which to groups differ.

Analysis of Variance 1: 1 Factorial Designs

4.1. INTRODUCTION
Many experimentsin psychology involve the simultaneous study of the effects of two or more factors on aresponse variable. Such an arrangement is usually referred to as afucroriul design. Consider, for example, the data shown in Table 4.1, taken from Lindman (1974), which gives the improvement scores made by patients with one of two diagnoses, being treated with different tranquillizers. The questions or of interest about these data concern the equality otherwise of the average improvement in the two diagostic categories, and similarlyfor the different drugs. Because these questions are the samethose considered in the previous chapter, as the reader might enquire why do not simply apply a one-wayANOVA sepawe rately to the datafor diagostic categoriesand for drugs. The answer is that such analyses would omit an aspect of a factorial design often very importaut, is that as we shall demonstrate the next section. in

4.2. INTERACTIONS IN A FACTORIAL DESIGN


The results of aone-way analysis (1)psychiatric categories only, tranquillizer of (2) (3) l are drugs only, and alsix groups of observations inTable4.1 shown inTable4.2. F test At first sight the results appear a little curious. The for the equalityof the means of the psychiatric category is not significant, and neither is that for the
97

98
TABLE 4.1 Improvement Scores for ' b o Psychiatric Categories
and Three TranquillizerDrugs

CHAPTER 4

Dm8

Category

El
4 6 8 8 4 0 10 6 14

B2

E3

A1

8 10 6
0 4 2

A2

15 9 12

TABLE 4.2 ANOVA of Improvement Scores in Table 4.1

Sourre

ss

DF

MS

Psychiatric Categories Only Diagosis 1

Error
Drugs Only 1.20 Drugs

20.05 289.90 44.11 274.83

16 2 15

18.68 22.05 18.32

1.07

0.31

Error AN Sx Gmups i 42.45Groups

Error

8.89

12

212.28 106.67

tranquillizer drug means. Nevertheless, the F test for the six groups togetheris significant, indicating a difference between means that h snot been detected six the a A of is by the separate one-way analyses of variance.clue to the cause the problem provided by considering the degrees freedom associated with each the three of of between groups sum squares. That correspondingto psychiatric categoriesh s of a a single degree freedom and that drugs h s two degrees freedom, making of for a of of of However, the between a total three degrees freedom for the separate analyses. groups sumof squares for the six groups combinedh s five degrees of freedom. a

ANALYSIS O F VARIANCE 1: FACTORIAL DESIGNS 1

99

The separate one-way analyses of variance appear to have omitted some aspect of the variationin the data. What been left out is the effect the combination has of of of factor levels thatis not predictable from the sum the effectsof the two factors as the separately, an effect usually known interaction between the factors. Both the model for, and the analysis a factorial design of, have to allowfor the possibility of such interaction effects.

4.3. TWO-WAY DESIGNS


Many aspects of the modeling and analysis of factorial designs canconveniently be illustrated by using examples with only two factors. Such WO-way designs are considered in this section. An example with more than factors will discussed two be in the next section. The general model for a two-way design is described in Display 4.1.Notice that this model containsa term to represent the possible interaction between factors.
Model for a 'Ibo-Way Design with Factors A and B
The observed valueof the response variable for each subjectassumed tobe of the is form
Display 4.1

observed response = expected response error.

The expected valueof the response variable for a subjectassumed to be made up is of an effect for the level A under which the subject observed, the corresponding of is B, AB term for factor and the interaction effect for the particular combination. be Consequently, the model above can rewritten as
observed response= overall population mean factor A effect factor B effect +AB interaction effect error.

+ +

More specifically. lety i j k represent thekth observation inthejth levelof factor B (with b levels) and the ith levelof factor A (with levels). Assume thereare a n subjects ineach cell of the design.The levels of A and B are assumed to be of particular interest to the experimenter, and so both factors andtheir interaction are regarded as having fixed effects Section 4.5). (see The model assumed for the observations can now written as be
Yijt

= fi +ai +p] + X j + B i l k *

where represents the overall mean, ai is the effect on an observationof being in the ith levelof factor A, is the corresponding effect for thejtb levelof factor B, v / represents the interaction effect, andj k is a random error term assumed to have i ei a*. a normal distribution with mean zero and variance (Continued)

100
Display 4.1 (Continued)

CHAPTER 4

Formulated in this way, the model has too many parameters constraints haveto and be introduced as explained in Chapter 3. The most common method tor q i ethat is eur b ai = pl = the parameters this fixed-effects modelare such that CLIy11 = yii = 0. For furtherdetails, see Maxwell and Delaney (1990). The hypotheses of mterest, namely no factor effect, no factor effect, and no A B interaction, imply the following aboutthe parameters of the model:

XI=,.

x;=, XI=,

H?) : a , a2 = H :@I = h = i) H?): yI1 = =

. .= a, = 0, .
*.m

p b = 0, = yab = 0.
I

Under each hypothesis, a different model assumed to be adequateto describe the is observations; for example, underHTu) model is this
Yilt

= P +ai +P]+Pi]t,

that is.the main eflects model. The total variationin theobservation is partioned into that causedby differences between the levels the factor that caused by differences between levels of of A, the factor B, that caused by interaction A and B, and that caused differences the of hy among observationsin the same cell. Under the hull hypotheses given above, factor the factor and the interaction the A, B, l mean squares are d estimators of u2.The error mean square is also an estimate of a*but one that does not rely the on truth or otherwise of any the null of hypotheses. Consequently, each mean square ratio can be testedas an F statistic to assess each of the null hypotheses specified. The analysis of variance table the modelis for
Source A

ss

ASS

DF
U -

B
AB

BSS
ABSS WCSS

1 b- 1 (U l)(b

MS

Error (within cells)

ab(n

-1)

-1)

- MSNerror MS USS - MSB/error MS MSAB/e.rrorMS wcss


ASS (1)
(b-I)

.(-i 6nl

The F tests are valid underthe following assumptions:(a) normality of the error ) and t e r n in the model, @ homogeneity of variance, (c) independence of the
observations for each subject.

The total variation in the observations can be partitioned into parts due each to factor and a part due their interactionas shown. to of 4.1to The results applying the model described in Display the psychiatric data in Table 4.1 are shown in Table The significant interaction now explains 4.3. tern the previously confusing results the separate one-way analysesvariance. (The of of detailed interpretationof interaction effectswill be discussed later.)

ANALYSIS O F VARIANCE 1: FACTORIAL DESIGNS 1


TABLE 4.3

101

%o-Way ANOVA of Psychiahic Improvement Scores

Source

DF

ss
2.04 18
106

MS

48

A B AxB

18
24

0.18 2.72

Error

2 2 12

8.15 144

72 8.83

0.11 0.006

Rat Data: Times to Run a Maze (seconds)

TABLE 4.4

Dnrg

dl

d2

d3

31

D1

22 45
46

36 29
40

21 18

82

D2

23 43 30 37 110 88 29 72 45 63 76 45 71 66 62

23 92 61 49 124
44

38 23 25 24 22

43

D3

35 31
40

D4

30 36 31 33

56 102 71 38

4.3.1.

Rat Data

To providea further example of a two-way ANOVA, the data in Table 4.4 will be used. These data arise from an experiment in which rats were randomly assigned to receive each combination of one of four drugs andone of three diets for a week, and thentheir timeto negotiate a standard maze wasrecorded; four rats were used

102

CHAPTER 4

for each diet-drug combination. The ANOVA table for the data and the means cell 4.5. and standard deviations are shown in TableThe observed standard deviations in this table shed some doubtthe acceptability the homogeneity assumption, on of a problemwe shall conveniently ignore here see Exercise 4.1). The tests in (but F Table 4.5 show that thereevidence of a difference drug means and diet means, is in but no evidence of a drug x diet interaction. It appears that what is generally known as a main effects model provides an adequate description of these data. In other words, diet and drug (the effects)act additively on time to run the maze main The results of both the Bonferonni and Schefft? multiple comparison procedures for diet and drug are given Table 4.6. A l l the results are displayed graphically in in Figure 4.1. Both multiple comparison procedures give very similar results for this example. The mean time to run the maze appears to differfor diets 1 and 3 and for diets2 and 3 but not for diets 1 and 2. For the drugs, the average time , is different for drug and drug2 and alsofor drugs 2 and 3 1 . When the interaction effect is nonsignificantas here, the interaction mean square becomes a further estimate of the error variance,u2. Consequently, there is the sum to possibility of pooling the error of squares and interaction sum of squares provide a possibly improved error mean square based on a larger number of of freedom. In some cases the of a pooled error mean square will provide more use powerful tests of the main effects. The resultsof applying this procedure to the rat data are shown Table 4.7. Here the results are very similar to those in in given Table 4.5. Although pooling is often recommended, it can be dangerous for a number of reasons. In particular, cases may arise in which the test for interaction is nonsignificant, but in which there is in fact an interaction. As a result, the (this pooled mean square may be larger than the original error mean squarehappens for the rats data), and the difference maybe large enough for the increase in degrees of freedom to be more than offset. The net result is that the experiis really only acceptable when there menter loses rather than gains power. Pooling to are good a priori reasons for assuming no interaction, and the decision use a r pooled error term based on considerations thata e independent of the observed is data. With only four observations in each cell of the table, itis clearly not possible to use, say, normal probability plots to assess the normality requirement needed within each cellby the ANOVA F tests used in Table 4.5. In general, testing the normality assumptionin an ANOVA should be donenor on the raw data, but on deviations between the observed values and the predicted values from the orfitted model applied to the data. These predicted values from a main effects model for the rat data are shown Table 4.8; the terms calculated from in are

fitted value = grand mean

-- (diet mean t grand mean).

+(drug mean - mean) grand


(4.1)

ANALYSIS OF VARIANCE II: FACTORIAL DESIGNS


TABLE 4 5 ANOVA lhble for Rat Data
Source

I 03

DF

ss

MS

Diet

DW

Diet x Drug

Error
Cell Means S f a n d a dDeviafions and

2 <.001 5165.06 23.22 10330.13 3 <.W1 3070.69 13.82 9212.06 6 1.87 416.90 2501.38 222.42 8007.25 36
Diet

.l 1

dl

d2

dl

Dl

M m

SD D2 Meall SD D3

41.25 6.95 88.0 16.08 56.75 15.67 61.00 11.28

32.00 7.53 81.5 33.63 37.50 5.69 66.75 27.09

21.00 2.16 33.5 4.65 23.50 1.29 32.50 2.65

Mean

SD D4 Mean SD

The normality assumption may now be assessed by looking at the distribution residuals. of the differences between the observed and fitted values, the so-called These terms should have a normal distribution. A normal probability plotthe of residuals from fitting a main efEects model to the rat data 4.2) shows that (Figure a number of the largest residuals some cause concern. Again, it will be left give for as an exercise (see Exercise for the reader to examine this possible problem 4.1) in more detail. (We shall have much more to say about residuals and their usein Chapter 6.)

4.3.2.

Slimming Data

Our final example of the analysisof a two-way design involves the data shown in Table 4.9.These data arise an investigation into from types of slimming regime. In

104.

CHAPTER 4
TABLE 4.6 Multiple Comparison Test Results forRat Data
Std Error

fifimate

Lower Bound

UpperBound

Diet Bonfemni d l 4 dl43


d243 Scheffbc d l 4 dl43

7.31 34.10 26.80 7.31 34.10 26.80

5.27 5.27 5.27 5.27 5.27 5.27

-5.93 20.90 13.60 -6.15 20.70 13.30

20.6 47.46 40.16 20.8 47.6b 40.36

d243
D w
Bonfemnid Dl-D2 D1-D3 Dl-D4 DZ-D3 D2-D4 D344 ScbeW Dl-D2 Dl-D3 Dl-D4 D2-D3 D2-W DSD4

-36.20 -7.83 -22.00

14.20 -14.20

28.40

6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09

-53.20 -24.80 -39.00 11.40 -2.75 -31.20

-19.3ob 9.17 -5.oob 45.406 31.20 2.83 -18.406 10.00 -4.15b 46.3ob 32.10 3.69

-36.20 -7.83 -22.00 28.40 14.20 -14.20

-54.1 -25.7 -39.9 10.6 -3.6 -32.0

95% simultaneousconfidence intervals for specified linear combinations, by the Bonfemni method, critical point: 2.51 1;response variable: time. bIntervalsthat excludez r . eo 95% simultaneous confidenceintervals for specified linear combinations. by the Scbeffb method; critical pint: 2.553; response variable: time. d95% simultaneousconfidence intervals for specified linuv combinations. by the Bonfemni method, critical point: 2.792; response variable: time. 95% simultaneous confidenceintervals for specified linear combinations. by the ScheffC method, critical point: 2.932; response variable: time.

this case, thetwo factor variablesare treatment with two levels, namely whether

or not a woman was advised to use a slimming manual based on psychological behaviorist theory as an addition to the regular package offered by the clinic, and status, also with two levels, thatis, novice and experienced slimmer. The 3 dependent variable recorded was a measure of weight change over months, with negative values indicating a decrease weight. in

ANALYSIS OF VARIANCE It: FACTORIAL DESIGNS


dl& dl-d3 d2-d3 -5-10

1 05
j

_"e" "
l

*
I

" " "

l -

e"*" "" ""


I I I

" " " " " "

*
I

4
J I

1510 5 0 20 25 30 40 35 45 simultaneous 95 % confidence limits, Bonfenoni method response variable: The

50

dl& d l 4 d2-d3 -5-10

_"e""
I

*
1

" " 1

(a)

e"""*"""0 10 5

")"_ 6
1 1

" " . " "

)
0 1 1

30 35 40 45 simulteneous 95 % c o n f i d e n c e limb. Scheff6 method response vaflable: Tlme

Dl-D2 D1-D3 Dl-D4 D2-D3 D2-D4 DSD4

&""C""

" t

(""-W.I

Dl-DZ Dl-D3 ; ; - - : & -e *;- I * ~ j j


" "
~

D1-D4 D2-D3 D2-D4 D3-D4

e"""

; l+i;, T' ljl


15 25 20

50

(b)

" "

(I " " " " "


&

" " " "

C+ "" ""
6

"," _

T'

" "

* " -

6 "" " + -

(""4

" "

-60

-50

-40 -30 -20 -10 0 10 20 30 40 slmultsneous Q5% ConRdonw lm .Schen6 memod i b response varlable: Tkne

50

(d)
FIG. 4.l . Multiple comparison results for rat data: (a) Bonferroni tests for diets. (b)Sheffk tests for diets, (c)Bonferroni tests for drugs. (d)Xheffk tests for drugs.

The ANOVA table and the means and standard deviations for these data are shown in Table 4.10.Here the maineffects of treatment and statusare significant but in addition there is asignificant interaction between treatmentand status. The presence of a significant interaction means that some care is needed in arriving at a sensible interpretation of the results. A plot of the four cell means as shown in Figure 4.3 is generally helpful in drawing conclusions about the significant

106

CHAPTER 4
TABLE4.7 Pooling Error and Interaction Terms in the ANOVA the Rat Data of

Soume

DF

ss

MS

P
c.ooo1 c.ooo1

Diet Drugs Pooled emr

5165.06 3 3070.69 9212.06 42 250.21 1058.63

20.64 12.27

TABLE 4.8 Fitted Valuesfor Main Effects Model on Rat Data

Diet

D w

Dl

D2

D3

33.10

Dl 74.17 D2 45.75 D3 59.92 D4

45.22 81.48 53.06 67.23

37.92

11.10 47.35 18.94

Note. Grand m m of observations is 47.94, diet means are dl: 61.75, dz: 54.44, d : 27.63; drug means are Dl: 31.42, D 2 67.67, 3 D3: 39.25.D :53.42. 4

interaction between the two factors. Examining the plot, we find it clear that the decrease in weight produced by giving novice slimmers access to theslimming manual is far greater than that achieved with experienced slimmers, where the reduction is negligible. Formalsignificance tests of the differences between experienced and novice slimmersfor each treatment level could be performedin general they are they are usually referred to as tests of simple efects-but unnecessary because the interpretation a significant interaction usually clear of is from a plot of cell means. (It is never wise to perform more significance tests than are absolutely essential). The significant m i effects might be interpreted an for as indicating differencesin the average weight change novice compared with experienced slimmers, and for slimmers having access to the manual compared with those who do not. In the presence of the large interaction, however, such an interpretation is not at all helpful. The clear message from these results as is follows.

ANALYSIS OF VARIANCE 11: FACTORIAL DESIGNS


TABLE 4.9 Slimming Data

107

Qpe of S l i m r Manual Given? Novice Experienced

No manual

-2.85 -1.98 -2.12


0.00

Manual

-4.44
-8.11

-9.40 -3.50

-2.42 0.00 -2.14 -0.84 0.00 -1.64 -2.40 -2.15

Note: Response variable is a measure of weightchangeover 3 months; negative values indicate a decrease in weight.

e e

-2

-1

Ouantiles of Standard Normal


FIG. 4.2. Normal probability plot of residualsfrom a main effects model fitted to the rat data.

ANOVA of Slimming Data

TABLE 4.10

Analysis of VarianceTable
Source

SS

DF

MS

21.83 1 21.83 1 25.53 25.53 c x s 20.95 0.023 6.76 20.95 1 Enor 3.10 12 37.19 Condition(C)

sau (S) tt s

7.04 0.014 8.24

0.021

Cell Means and Standard Deviations Experienced Parameter Novice

1.30

No manual Meall

1.74
1.22 -6.36 2.84

-1.50

SD
Manual Mean SD

-1.55 1.08

... ...

No manual Manual

Novice

Experienced

status
FIG. 4.3.

Interaction plot for slimming data.

108

ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1


1. Nomanual-no

109

difference in weight change of novice and experienced slimmers. 2. Manual-novice slimmers win!
(It might be useful to construct a confidence interval the difference in the for weight changeof novice and experienced slimmers when both have access to the 4.2.) manual; see Exercise One final point to note about the two-way design that when thereis only a is square has zero degrees single observation available in each cell, the error mean do of freedom. In such cases, itis generally assumed that the factors not interact so that the interaction mean square can be usedas "error" to provide F tests for the main effects.

4.4. HIGHER-ORDER FACTORIAL DESIGNS


Maxwell and Delaney (1990) and Boniface (1995) give details of an experiment in which the effects of three treatments on blood pressure were investigated. The three treatments were follows. as
1 The firstis drug, with three levels, drugX,drug Y and drugZ . ; 2. The secondis 6iofeed, in which biofeedback is either present or absent; 3. The third is diet, in which a special diet either given or not given. is

Seventy-two subjects were usedin the experiment, with six being randomly allocombinations. data The are shown in cated each to of the 12 treatment for is outlined in Display 4.2.The Table 4.11.A suitable model a three-factor design Table results of applying this model to the blood pressure data are shown in 4.12. The diet, biofeed,and drug main effects are all significant beyond the5% level. None of the first-order interactions are significant, but the three way, second-order interaction of diet, drug, and biofeedback is significant. Just what does such an are of variance effect imply, and what its implications for interpreting the analysis results? First, a significant second-order interaction implies that the first-order interaction between two of the variables differs in form or magnitude in the different levels of the remaining variable. Once again the simplest way gaining insight of here is to construct the appropriate graphical display. Here the interaction plots of diet and biofeedback, for each of the three drugs, will help. These plots are shown in Figure 4.4. For drug X the dietx biofeedback interaction plot indicates that diet has a negligible effect when biofeedback is supplied, but substantially reduces blood pressure when biofeedback is absent. For drug Y the situation is essentially the reverse that for drug X.For drug2,the blood pressure difference of

110
Blood Ressure Data

CHAPTER 4
TABLE 4.11

Biofeedback Biofedback Present Absent Drug X Drug Y Drug Z Drug X Drug Y Drug Z

Diet Absent
170 175 165 180 10 6 158 161 173 157 152 181 190

186 194 201 215 219 209 164 16 6 159 182 187 174

204

180 187 19 9 170 194

173 194 197 10 9 176 198 164 10 9 169 164 176 175

189 194 217 206 19 9 195 171 173 1 % 19 9 180 203

202 228 10 9 206 224 204 205 19 9 170 10 6 179 179

Diet Present

162 184 183 156 180 173

Model for a Three-Factor Design with FactorsB, and C A,


In general t e r n the model is

Display 4.2

Observed response= mean factor A effect factor B effect factor C effect AB interaction AC interaction BC interaction ABC interaction error.

+ + +

+ +

More specifically, letyilt1 represent the Ith observation in thekth level of factor C (with c levels), thejth level offactor B (with b levels), and theih level of factor A t (with a levels). Assume that thereare n subjects percell, and that the factor levels are of specific interestso that A, B, and C have fixed effects. The linear model for the observations in case is this
YijU

= K +Qi +B] + n +&l

+ +Olt
Ti&

+6ijt

Ciltt.

Here a, B j , and y represent maineffects ofA, B, and C, respectively: S,, s t , and i k representfirsr-order interactions, AC, and BC;0,tl representssecond-oder AB, interaction. ABC; and are random error terms assumedto be normally qltl distributed with zero mean and varianceU*. (Once againthe parameters have to be constrained in some way make the model acceptable); Maxwell and Delaney, to see. 1990, for details.) (Continued)

ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1


Display 4.2

111

(Continued)

when the diet is given and when not is approximately equal for both levels of is it biofeedback. The implication of a significant second-order interaction is that there is little point in drawing conclusions about either the nonsignificant first-order interaction or the significant main effects. The effect of drug, for example, is not consistent for all combinationsof diet and biofeedback. It would, therefore, be misleading to conclude on the basis the significant main effect anything about the specific of three drugs on blood pressure. The interpretation of the data might effects of these become simpler carrying out separate two-way analyses of variance within each by drug (see Exercise 4.3). Clearly, factorial designs will become increasingly complex the number of as A factors is increased. further problem is that the number of subjects required for

112
ANOVA for Blood Pressure Data

CHAPTER 4
TABLE 4.12

Source

ss
5202 2048 3675 32 903 259 1075 9400

DF

MS

Diet Biofeed mug Diet x biofeed Diet x drug Biofeed x drug Diet x biofeed x drug Error

1 2
1 2 2 2 60

5202.0 2048.0 1837.5 <.W1 32.0 451.5 129.5 537.5 156.67

33.20 13.07 11.73 0.20 2.88 0.83 3.42

<.W1 <.001
.M

.M
.4 4

.04

a complete factorial design quickly becomes prohibitively large. Consequently, alternative designs have to be considered that are more economical in terms of subjects. Perhaps the most common is the latin of these square (described in detail In in by Maxwell and Delaney, 1990).such a design, economynumber of subjects required is achieved by assuming a priori that thereare no interactions between factors.

4.5. RANDOM EFFECTS AND FIXED EFFECTS MODELS


In the summary Chapter 3, a passing reference was made of to random andfixedeffects factors. is now timeto consider these termsa little more detail, beginning It in with some formal definitions.

1 A factor is random if its levels consistof a random sample levels from a . of population of all possible levels. 2. A factor is fixed if its levels are selected by a nonrandom process its levels or of consist of the entire population possible levels.
Three basic types of models can be constructed, depending on what types of factors are involved.

1. A model is called afixed effects model if all the factorsin the design have fixed effects. 2. A model is called a random effects model the factors in the design have if all random effects.

ANALYSIS OF VARIANCE 11: FACTORIAL DESIGNS

113

Absent (a)

Present

210

Biofeed

7,

180 170
Present

Drug Y, diet absent DNQ Y, diet present

Absent

a
(C)

Drug 2,diet absent Drug 2 diet present ,

170
Present

Absent

Biofeed

FIG. 4.4. Interaction plots for blood pressure d a t a .

3. A model is called amixed effects model if some of the factorsin the design

have fixed effects and some have random effects.

No doubt readers will not find wildly exciting until some explanationat this is hand as to why it is necessary to differentiate between factors this way. In fact, in the philosophy behind random effects models different m that behind the is quite h used parameters use of fixed effects models, both in the sampling scheme in the and of interest. The difference can be described relatively simply by using a two-way design; a suitable model such a design when both factors regarded as ranfor are 4.3. this dom is shown in Display The important point to note about model, when 4.1), i compared with thatfor fixed effect factors (see Display is that the termsa,

I14

CHAPTER 4
Display 4.3 Random EffectsModel for a 'No-Way Design with Factors and B A

Let yijt represent thekth observation inthejth level of factor B (with b levels
= P +(XI Sj

randomly sampled from some population possible levels) and the ith level of of factor A (with a levels again randomly sampled from a population of possible levels). The linear model for the observations is
Ydjt

+ + +

cijtl

where now thecri are assumed to be random variables from a normal distribution with mean zero and variance Similarly, Bj and yil are assumed to be random U,'. variables from normal distributions and zero means with variancesand U:, U; normal respectively. As usual the 6 i j k are error terms randomly sampled from a distribution with mean zero and variance a2. The hypotheses of interest in the random effects model are

H) a = 0 :: , ' HZ': U; = 0 H$'):' = 0. aY


The calculation of the sumsof squares and mean squares remains the same for as the fixed effects model. Now, however, the various mean squares provide estimators of combinationsof the variancesof the parameters: (a)A mean square, estimator of u2 +nu: + nbu,'; (b) mean square, estimator of U* +nu: +nu$; (c) AB mean B square, estimator of u2 nu:; and (d) error meansquare, as usual, estimator of 2. If H$)is true, then the mean square and the error meansquare both estimatea2. AB Consequently, the ratio of the AB mean square and theerror mean square can be tested as an F statistic to provide atest of the hypothesis. If ' is true, the A mean square and the AB mean square both estimate a2 nu:. : H so the ratioof the A mean square and theAB mean square provides an F test of the hypothesis. If H ) is true, the B mean square and theAB mean square both estimate u2 + U ; , so i ' the ratioof the B mean square and theAB mean square provides an F test of the hypothesis Estimators of the variancesU,', U;, and ; can be obtained from a

a = MSA -MSAB , '

nb MSB -MSAB 8; = nu MSAB errorMS 3 = 2 Y n

Bj,

and y i j a enow assumedto be random variableswith particulardistributions. r The parameters of interest a ethe variances of these distributions, and they be r can estimated as shown in Display4.3. The terms in the analysis variance tableare of calculated as for the fixed effects model described in Display but the F tests for 4.1, assessing whether the main effect variances are zero now involve the interaction mean square rather than theerror mean square. However, the test for whether the

ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1


TABLE 4.13 ANOVA of Rat Data by Using a Random Effects Model
DF

I15

Source

ss

MS

Diets Drugs Diets x drugs Enor


Note.

2 c.Ooo1 10330.1312.39 3 9212.06 7.37 6 2501.38 36 222.42 8007.25

5165.06 3070.69 416.90

1.87

C.001

.l 1

416.90 ; = 3070.69a

l?:= 5165.0616 416.90 =296.76,


L

12

221.15,

variance of the distribution the interaction parameterszero is the same that of is as used for testing the hypothesis no interaction in the fixed effects model. of To illustrate the applicationarandom effects model, the data of in Table 4.3 will again be used, making the unrealistic assumption that both the particular diets and a large population possible of the particular drugs used a random sample from are is shown in Table 4.13. The F tests indicate diets and dmgs. The resulting analysis that the variancesof both the drugand diet parameter distributions are not zero; however, the hypothesis thatc; = 0 is not rejected. The main differences between using fixed effects and random effects models are as follows.
1. Because the levelsof a random effects factor have been chosen randomly, the experimenter is not interested in the means of the particular levels observed. In particular, planned or post hoc comparisons are no longer relevant. 2. In a random effects model, interest lies in the estimation and testing of variance parameters. 3. An advantage of a random effects model that it allows the experimenter is to generalize beyond the levels of factors actually observed.

It must be emphasized that generalizations a population of levels from the to are tests of significance for a random effects factor warranted only when the levels of of the factorare actually selectedat random from a population levels. It seems unlikely that such a situation will in most psychological experiments that use hold

116

CHAPTER 4

factorial designs, and, consequently, fixed effects models those most generally ae r as used except in particular situations, we shall see in Chapters 5 and 7.

4.6. FACTORIAL DESIGNS WITH UNEQUAL NUMBERS OF OBSERVATIONS IN EACH CELL


More observant readers will have noticed that the examples discussedin the all in each previous sections this chapter have had the same number of observations of cell-they have been balanced designs. In most experimental situations, equality of cell sizeis the aim, although even in well-designed experiments, things can, of course, go wrong. A researcher may be left with an experiment having unequal numbers of observations in each cell as a result of the deathof an experimental animal, for example, or because of the failure of a subject to attend a planned unexperimental session.In this way, an aimed-for balanced design can become balanced, although in an experimental situation the imbalance sizes likely in cell is to be small. In observational studies, however, far larger inequalities cell size may arise. in Two examples will serve to illustrate point. this of Australian First, the data shown in Table 4.14 come from a sociological study (1975). In the study, children of Aboriginal and White children reported by Quine first, second, both sexes, from four age groups (final grade in primary schools and two cultural groups were used.The and third form in secondary school), and from as children in each age group were classified slow or average learners. The basic design of the study is then an unequally replicated4 x 2 x 2 x 2 factorial. The response variable of interest was the number of days absent from school during the school year. Children who had suffered a serious illness during the year were excluded. 4.15. Second, the datafor the second example are shown in Table These data arise from an investigation into thea mothers postnatal depression on child of effect biah to their first-born child in a major teaching development. Mothers who give hospital in London were divided into two groups, depressed and not depressed, on the basisof their mental state3 months after thebiah. The childrens fathers were also divided into two groups, namely those who had a previous history of psychiatric illness and those who did not. The dependent to be considered variable first is the childs IQ at age 4. (The other variables Table 4.15 will be used later in in this chapter.) So unbalanced factorial designs do occur. But does it matter? Why do such designs merit special consideration? Why is the usual ANOVA approach used earlier in this chapter not appropriate? The answers toall these questionslie in recognizing that in unbalanced factorial designs the factors forming the design

ANALYSIS OF VARIANCE II: FACTORIAL DESIGNS


TABLE 4.14 Study of Australian Aboriginal a dwhite Children n

1l 7

1 2 3 4

A A

M M
M M M

6 7 8 9 10

A A A

F0 F0 F1 F1
F2

A A

12 13 14 1s 16 17 18 19 20 21 22 2 3 2 4 25 26 27 28 29 30 31 32

11

A A A A

M M M
F F F F F F F F

A A

F2 F3 F3 F0 F0 F1 F1 F2 F2 F3 F3

N N N N N N N N N N N N N N

M M M M M M M M
F F F F F F F F

F0
F0 F1

F2 F2

F1

F3

F3 F0 F0 F1 F1 F2 F2

N
N

F 3

F3

SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL SL AL

2, 11, 14 5,5.13,20.22 6.6. IS 7. 14 6,32,S3, S7 14, 16, 16. 17.40.43.46 12,15 8.23.23.28.34.36.38 3 S, 11.24.45 5,6,6,9,13,U,25,32,53,54 55, 11, 17.19 8,13,14,20.47,48.60.81
L

5.9.7 0.2.3.5. 10, 14,21,36,40 6.17.67 0 , 0 . 2 , 7 , 11, 12 0.0.5.5.5, 11, 17 3.3 22,30,36 8.0, 1.5.7, 16.27 12.15 0,30, I . 14,27,41.69 O 25 10.11. U), 33 5,7,0,1,s,5,s,5,7,11,15 S, 14.6.6.7.28 0,5,14,2,2,3,8.10,12

8 1,9, 22,3,3, IS, 18.22.37 S,

I8 1

CHAPTER 4
TABLE 4.15 Data Obtained in a Study of the Effectof Postnatal Depressionof the Motheron a

Child's Cognitive Development

sa of
Mother

Depression Child

IQ

PS

VS

QS

HuSband's Psychiatric History

1 2 3 4
5

ND
ND

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
34

ND ND D ND ND ND ND ND ND D D ND D

103 124
124

104 96 92 124 99 92 116


99

56 65 67 52
46 46

50
55

64

61
52

42 61 63
44 46

58
55

43 68
46

43
65
50

46

ND ND

ND

ND
ND ND

ND
D
ND

D ND

ND ND

D ND D ND
ND ND

22 81 117 100 89 l25 l27 112 48 139 118 107 106 129 117 123 118
84

61 58 22 47 68 54 48
64 64 50

41 58 45
50

55

51
50

38 53 47 41
66

23 68 58
45 46

68 57 20 75
64

22 41 63 53 46 67 71
64 25
64

63 43
64 64

58 57 70 71
64
64

61 58 53 67 63
61 60

56

41
66

37 52
16

35 36 37 38 39

ND ND ND ND ND

117 102 141 124 110 98


109

48
66 60
50

54 64

120 l27 103

71 67
55

77 61 47 48 53 68 52

43 61 52 69 58 52 48
50

63 59 48

No history No history No history No history No history No history No history No history No history No history No history History History No history History No history No history History No history No history No history No history History No history No history No history History No history History No history No history No history No history No history No history No history No history No history No history (Continued)

ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1


TABLE 4.15

119

(Continued)

s x of e
Mother

Depression Child

IQ

PS

VS

QS

Husband's Psychiatric History

40

41 42 43
44
46

45 47 48 49 50 51 52 53
54

55 56 57 58 59
60

61 62 63
64

65
66

67 68 69 70 71 72 73 74 75 76 77 78 79

ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND ND D ND ND D ND ND ND ND ND ND
ND ND D ND

118 117 115 119 117 92


101

119
144

65 60 52 63 63 45 53 59 78
66

64

57

119 127 113 l27 103 128 86 112 115 117


110

67 61
60

67 62 56 42 48 65 67 54 63 54 59
50

99

57 70 45 62 59 48 58
54

61 48 55 58 59 55 52 59 65 62 67 58 68 48
60
28

No history

No history No history

139 117 96 111 118 126 126 89

70
64

12 0

ND ND ND ND ND

134 93 115 99 99 122 106


124

52 54 58 66 69 49 56 74 47 55 50 58 55 49
66

65 45 51 58 68 48 61 72 57 45
60

62
64

66 36 49 68
46

60 59 62 41 47 66 51 47 50 62 67 55 36 50 59 45
60 54

61 48

44 64

100 114

43 61

54 58 56 56

47 74 59 68 48 55

No history No history No history No history No history No history No history No history No history No history No history No history No history No history No his~~ry No history No history No history No history No history No history No history No history No history No history No history No history No history No history No history History No history No history History No history No history No history

(Continued)

120

CHAPTER 4
TABLE 4.15
(Continued)

Mother

s xof e
Depression
Child
IQ

PS

VS

QS

Husbands PsychiafricHistory

80 81 82 83
84

85 86 87 88 a9
90

91 92 93 94
Note. Score.

ND ND ND ND ND ND ND D D ND ND ND D D ND 51

48 62 62 48 65 S8 65 50

121 119 108 110 127 118 107 123 102 110 114 118 101 121 114

64 63

66

61

48 66 61 66 58 53 52 59 58
64

60 56 71 S6 54 68 52 47
S6

50 60 50

55 47

52 47 44 64 62

63 56 62

No history No history No history No history No history No history No history No history No history History No history No history No history No history No history

ND.not depressed, D,depressed, PS, perceptual score: VS. verbal score; QS, quantitative

are no longer independent they are in a balanced design. For example,2the2 as x table of counts of children of depressed (D) and nondepressed(ND) mothers and well and ill fathers for the data table in 4.15 is in Mother Father Well

ND
79

75(70) 4(8)

D
9(14)

Total
84 10 94

6(2)
15

Total

The numbers in parenthesis show that the expected number of observations (rounded to the nearest whole number) under the hypothesis that the mother is independent of the fathers state. Clearly there are more depressed mothers wh husbands are psychiatrically ill than expected under independence. (Review the 9 coverage of the chi-square test given in your introductorye and in Chapter com for details of how expected values are calculated.) is The resultsof this dependence of the factor variables that itis now nolonger nonoverlapping possible to partition the total variation response variable into in the

or orthogonal sums of squares representing factor main effects and interactions. For example,in an unbalanced two-way design, there proportion the variance is a of A of the response variable that can be attributed to (explained by) either factor or factor B. A consequence is that A and B together explain less of the variation sum of the dependent variable than the of which each explains alone. The result is the sum of squares that can be attributed to a factor depends on which other factors have already been allocated a sum of squares; in other words, the sums of squares of factors depend on the order in which they are considered. (This point will be taken up in more detail in the discussion of regression analysis in Chapter 6.) The dependence between the factor variablesunbalanced factorial design, an in and the consequent lackuniqueness in partitioning the variation in the response of variable, has led to a great deal confusion about whatis the most appropriate of are analysis of such designs. The issues not straightforward and even statisticians (yes, even statisticians!) are not wholly agreed on the most suitable method of analysis for aU situations, as is witnessed by the discussion following the papers of Nelder (1977) and Aitkin (1978). Basically the discussion over the analysis unbalanced factorial designs has of involved the question of whether what are called sequential or I)pe I sums of squares, or unique or I)pe III sums of squares, should be used. In terms of a two-way design with factors and B, these sums of squares are as follows. A

Wpe I Sum of Squares (SequentialSums of Squares)


Here the sumsof squares come from fitting the terms the model sequentially, in giving, for example, Sum of squaresfor A, Sum of squares for B afterA (usually denoted BIA), as Sum of squares for AB after A and B (ABIA, B).
To establish a suitable model the data later), we would need to calculate for (see sums of squares above, and these with the order B reversed, that A and of is, both the

Sum of squares for B, Sum of squares AIB, for Sum of squares for ABIB, A. (The interaction sum squares would be identical both cases.) of in

122

CHAPTER 4

m 111 Sum of Squares e (Unique Sums of Squares)


Here the sum squares come from fitting each term model,afer all other of in the terms; these sum squares are said to be unique for the particular Thus in of term. our two-way design we would have Sum of squaresfor A = AIB, AB, Sum of squaresfor B = BIA, AB, Sum of squaresfor AB = ABIA, B. (Again, the interaction sum of squares will be the same as in the sequential approach.) In a balanced design, TypeI and 'Qpe 1 1sums of squares are identical. How 1 the respective sums of squares are calculated will be come clear in Chapter 6, but the question here is, Which type should be used? Well, if you take as your Statistical Methods authority Professor David Howell best-selling text book, in his for Psychology, then there is no doubt-you should always use Type ID: sums of squares and never Type I. And in Maxwell and Delaney's Designing Experiments the and Analysing Dura, same recommendation appears, although somewhat less strongly. However, the papers Nelder andAitkin referred to earlier, adjusting in by main effects for interaction as used in Type III sums of squares is severely criticized both on theoretical and pragmatic grounds. The arguments are relatively subtle, this. but in essence they go something like

1. When models are to data, the principle parsimony is of critical imporfit of we tance. In choosing among possible models, do not adopt complex models for which thereis no empirical evidence. 2. So if there is no convincing evidence an AB interaction, wedo not retain of the termin the model. Thus additivityof A and B is assumed unless there is convincing evidenceto the contrary. 3. The issue does not arise as clearly in the balanced case, for there the sum of squares for A, say, is independent of whether interaction is assumed or not. Thus in deciding on possible models the data, we not include the for do interaction term unless it has been shown to be necessary, in which ca (or on main effects involved in the interaction are not carried out if carried out, not interpreted, see biofeedback example). 4. Thus the argument proceeds Type III sum of squares for A in which it that is adjusted for AB as well as B makes no sense. 5. First, if the interaction term necessary in the model, then the experimenter is will usually wish to consider simple effects eachAlevel B separately. at of of A test of the hypothesis of no A main effect would not usually be carried out if the AB interaction is significant.

ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1


TABLE 4.16 ANOVA of Postnatal Depression Data

I23

Source

DF

ss

MS

Mother Husband Mother x Husband Error Order 2 Husband Mother Husband x Mother Error

Onier l

90

1 l 1
1 1

1711.40 6.82 1323.29 9.29 2332.21 22583.58 2526.44 508.26 2.025 0.003 2332.21 9.29 22583.58

1711.40 .02 1323.29 2332.21 250.93 2526.44 508.25 2332.21 250.93

5.27

.01
.m3

10.07

0.002
0.16

1 90

6. If the AI3 interaction is not significant, then adjusting it is of no interest for and causes a substantial loss of power in testing the A and effects. B main The arguments Nelder andAitkin against the of5e 111 sums of squares of use p are persuasive and powerful. Their recommendation Qpe sums of squares, to use I perhaps considering effectsanumberof orders,as the most suitable in which in way

to identify a suitable model a data set also convincing and strongly endorsed for is by this author. Let us now consider the analysisof the two unbalanced datasets introduced earlier, beginning with the involving postnatal depression and childs IQ. one We shall consider only the factors of mothers mental state and husbands mental state here. The results of two analyses using 15pe I sums of squares a e shown r is of in Table 4.16. Hereit is the interaction term that the key, and a plot average IQ in the four cells of the designis shown in Figure 4.5. It appears that the effect on childs IQ of a father with a psychiatric history is small if the mother is not depressed, but when the mother is depressed, there is a substantially lowerIQ. (Here we have not graphed the data in any way prior toOUT analysis-this may have been a mistake! See Exercise 4.5.) The Australian data givenin Table 4.14 are complex, and herewe shall only of the scratch the surfacepossible analyses. (A very detailed discussion analysis of of these datais given in Aitkin, 1978.) An analysis of variance tablefor a particular order of effects is given in Table 4.17. The significant origin x sex x type interaction is the first term that requires explanation; this task is left to readers e (e s Exercise 4.6).

14 2

CHAF'TER 4

Husband

. . . No history .. ..

history

depressed

Depressed

Mother

FIG. 4.5. Interaction plot for postnatal depression data.

In Table 4.18 is the defaultSPSS analysis of variance table these data, which, for inappropriately for the reasons given above, gives III sums of squares. Look, Type for example, atthe origin line in this table and compare it with that in Table 4.17. The difference is large; adjusting the origineffect f r the large number of mostly o nonsignificant interactions has dramatically reduced the origin sum of squares and the associated F test value. Only the values in the lines corresponding to the origin x sex x grade x type effect and the error termare common to Tables 4.17 and4.18.

ANALYSIS OF VARIANCE II: FACTORIAL DESIGNS


TABLE 4.17 Analysis of Data on Australian Children

125

Order I
Source

DF

ss
2637.69 342.09 1827.77 129.57 144.44 3152.79 1998.91 3.12 76.49 2171.45 221.54 824.03 895.69 378.47 349.60 23527.22

MS

F
~

Qpe

Origin Sex Grade

l 1 3
1

Origin x Sex Origin x Grade Sex x Grade Origin x Q e p Sex x Qpe Grade x Q e p Origin x Sex x Grade Le Origin x Sex x ? p Origin x Grade x ')p I.e Sex x Grade x Type Origin x Sex x Grade x Q e p

1 3 3 1
I

3 3

Error

3 3 3 122

2637.69 342.09 609.26 129.57 144.44 1050.93 666.30 3.12 76.49 723.82 73.85 824.03 298.56 126.16 116.53 192.85

13.68 1.77 3.16 0.67 0.75 5.45 3.46 0.02


0.40

<.W1
.19

.03 .4 l .39

.m
.019
.0 9

3.75 0.38 4.27 1S5 0.65 0.60

.53 .013 .77


.04

.21
.58

.6 1

TABLE 4.18 Default ANOVA Table Given bySPSS for Data on Australian School Children

Sourre

DF
1 I

ss
228.72 349.53 1019.43 177.104 3.82 1361.06 53.21 0.86 2038.98 1612.12 34.87 980.47 412.46 349.60 23527.22

MS

Origin Sex Grade

m x Sex Origin

Origin x Grade origin x Q e p Sex x Grade x Q e p Sex x Grade Origin x Sex x Grade Origin x Grade x Qpe Sex x Grade x Q e p Origin x Sex x Grade x Q e p Error

3 1 1 3
1

1 2 3 3 3 3 3 122

228.72 .28 349.53 339.81 177.04 .89 3.82 453.69 .08 53.21 .95 0.86 .02 679.66 537.37 11.62 326.83 137.49 116.53 192.85

1.19 1.762 1.762 0.92 0.02 2.35 0.28 0.00 3.52 2.79 0.06 1.72 0.71
0.60

.18 .16

.4 3

.60
.M

.98
.19

55

.61

126

CHAPTER 4
4.7. ANALYSIS COVARIANCE OF

I FACTORIAL DESIGNS N
Analysis of covariance in the context of a one-way design was introduced in Chapter 3. The technique can generalized to factorial designs, and section be in this an example of its use i a 3 x 3 two-way design will be illustrated by using data n from Howell (1992).The data are given in Table and arisef o an experiment 4.19 rm i which subjects performed either a pattern recognition task, a cognitive task, n
Smoking and Perfonnance Data

TABLE 4.19

NS

Errors Distract

12 8 9 107 123 133 12 7 1 4 101 75 138 8

10 83 94

10 9 86 117 112

11

8 10 10 8 8 130 111 102 120 134 118 97

11

10

DS Errors Distract AS Errors Distract Cognitive Task

4 8 1 1 1 6 1 7 5 6 9 6 6 94 138 126 127 124 100 103 120 91 138

7 1 6 88 118

8 9 1 9 7 1 6 1 9 1 1 2 2 1 2 1 8 8 1 0 64 135 130 106 l 2 3 117 124 141 95 98 95 103 134 119 123

NS

Errors Distract DS Errors Distract


AS Errors

27 34 19 20 56 35 23 37 4 30 4 42 34 19 49 126 154 113 87 125 130 103 139 85 131 98 107 107 96 143 48 29 34 6 113 100 114 74

18 63 9 54 28 71 60 54 51 25 49 76 162 80 118 99 146 132 135 111 106 96


21 44

Distract

34 65 55 33 42 54 108 191 112128 76 128 98 145107 142

61 75 38

61 51 32 47 144 131 110 132

Driving Simulation

NS

Errors

Distract DS
Errors

110

14 2 2 15 5 96 114 125 102 112 137 168

0 17 9 14 16

15 3 9 15 13 109 111 137 106 117 101 116


5 1 1

Distract
Errors Distract

7 0 93 108 102
3 130 83 2

0 1 2 1 7 100 123 131 6

1 1 14 4 5 1 6 3 103 101 99 116 81 103 78 139 102

AS

0 0 91 109 92

2 0 6 4 1 106 99 109 136 102 119

84 114 67 68

AS, active smoking; DS, delayedsmoking; NS,nonsmoking.

ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1

127

smoking, smoked duringor just before the task; delayed smoking, not smoked for 3hours; andnonsmoking, which consisted of nonsmokers. The dependent variable

or a driving simulationtask, under three smoking conditions. These were active

was the number of errors on the task. Each subject was also given a distractibility score-higher scores indicate a greater possibilitybeiig distracted. of Inhis analysis of these data, Howell gives the ANOVAtable shown inTable4.20. This table arises from using SPSS and selecting UNIQUE sum of squares (the as default). The arguments against such sums squares are the same here in the of previous section, and a preferable analysis of these data involves I sums of squares with perhaps the covariate orderedfirst and the group x task interaction ordered last. The results this approach are shown in Table 4.21. There is strong of which again we leave an exercise the as for evidence of a groupx task interaction, reader to investigate in more detail. this conflict between is (It Type I and Type III sums of squares that led to the different analysesof covariance results givenby SPSS and S-PLUSon theWISC data reportedin the previous chapter.)
TABLE 4.20 ANCOVA of Pattern Recognition Experiment Using l)pe ILI Sums of Squsres by

Source

DF

ss

MS

(distract) Covariate Group Task

Group x Task

Error

1 464.88 4644.88 281.63 563.26 2 2 23870.49 4 1626.51 125 8942.32

11935.24 406.63 71.54

64.93 3.94 166.84 5.68

Oo .o Oo .o Oo .o
.M2

TABLE 4.21 ANCOVAof Pattern Recognition Data by U i g 'Qpe I Sums of Squares with sn the CovariateOrdeed F i t

Source

DF

ss

MS

Distract Task Group


Group x Task

Error

1 10450.43 2 11863.39 23726.78 2 585.88 4 1626.51 l25 71.54 8942.32

10450.43 292.94 406.63

146.08 165.83 4.10 5.68

<.m1
.M <.m1

<.m1

128

C W E R4 4.8. MULTIVARIATE ANALYSIS

O F VARIANCE FOR FACTORIAL DESIGNS


In the studyof the effect of mothers postnatal depression on development,a child
number of cognitive variables in addition I were recorded each child at age to Q for 4 and are given in Table 4.14. Separate analyses of variance could be carried out on these variables (usingQpe I sums of squares), but in this case the investigator may genuinely be more interested in answering questions about the set of three variables. Suchan approach might be sensible if the variables were be regarded as to indicatorsof some more fundamental concept not directly measurable-a so called latent variable. The required analysis is now a multivariate analysis of variance, as described in the previous chapter. The results are shown in Table 4.22. (Here we have included sex of child as an extra factor.) Note that because the design is unbalanced, orderof effects isimportant. Also note that because each term in the ANOVA table has only a single degree of freedom, the fourpossible multivariate test statistics defined the glossary all lead to the same approximate values and in F the samep values. Consequently,only the value of the Pillai test statistic isgiven in Table 4.22. For this set of cognitive variables only the mothers mental state is significant. The means for the three variables for depressed and nondperessed mothers are as follows. Mother

PS

50.9 53.3

VS

54.8 51.9 57.1 55.3

QS

The children of depressed mothers have lower average scores on each of the three cognitive variables.
TABLE4.22 MANOVA of Cognitive Variables from a Posmatal Depnssion Study

Source

PillaiStatistic DF1Appmr F DE?

Mother Husband 3 Sex 3 Mother x Husband Mother x Sex Husband x Sex Mother x Husband x Sex

0.09

0.05

0.05 00 .7 0.01 002 .0 00 .4

2.83 1.45 1.42 22 .5 0.3 1 00 .7


1.11

3 3 3 3 3

84 84
84 84 84 84
84

.4 0
$23

. U
.09

.82
.98

.35

ANALYSIS OF VARIANCE 1: FACTORIAL DESIGNS 1

129

4.9. SUMMARY
2. The factors in factorial designs can be assumed to have random or fixed
1. Factorial designs allow interactions between factorsbe studied. to

effects. The latter are generally more applicable to psychological experiments, but random effects willbe of more importance in the next chapter 7. and in Chapter 3. Planned comparison and multiple comparison tests can be applied to factor designs in a similar way to that described for a one-way designin Chapter 3. 4. Unbalanced factorial designs require care their analyses. It is important in to be aware of what procedure for calculating sums of squares is being employed by whatever statistical software to be used. is

COMPUTER HINTS
Essentially, the same hints apply as those given in Chapter 3. The most important thing to keep a look out for is the differences in how the various packages deal with unbalanced factorial designs. Those such as SPSS that give 1 p IU sums of '5e squares as their default should be used with caution, and then only after specifiI sums of squares and then considering main effects before cally requesting so interactions, first-order interactions before second-order interactions, and on. When S-PLUS is used for the analysis of variance of unbalanced designs, only I sums of squares are provided.

EXERCISES
4. Reexamine and, if you consider it necessary, reanalyze the rat data in the . 1 light of the differing cell standard deviations and the nonnormality of the residuals as reported in the text.

4.2. Using the slimming data in Table 4.9, construct the appropriate 95% confidence interval the difference weight change between novice and experienced for in slimmers when members of both groups have access a slimming manual. to

43. In the drug, biofeed, and diet example, carry out separate two-way analyse of variance of the data corresponding to each drug. What conclusionsreach do you from your analysis?

4.4. In the postnatal depression data, separately analyze each three variof the ables usedin the MANOVA in Section 4.7. Comment on the results. 4.5. Produce box plots of the IQ score in the postnatal depression data for the four cells in the 2 x 2 design with factorsof father's psychiatric history and

I0 3

CHAPTER 4
Display 4.4 ANCOVA Model for a Tko-Way Design with Factors B A and

In general terms the model is

observed response= mean +factor A effect +factor B effect AB interaction effect +covariate effect error.

More specifically, the analysis covariance model for a two-way design of is


Yilt

= P +0: +6 + Yij + B(xijt - 3 Eijt, 1 -+

where yilt is the kth observed valueof the response variable in the uth of the cell design andxijk is the corresponding valueof the covariate, whichhas grand mea0;. F. The regression coefficient linking the response and covariate variableThe other is p. 4.1. terms correspond to those described in Display The adjusted value an observationis given by of adjusted value= observed response value +estimated regression coefficient x (grand meanof covariate -observed covariate value).

TABLE 4.23 Blood Ressure, Family Histoty, and Smoker Status


Smoking Status
Family History

Nonsmoker

Exsmoker

Current Smoker

Ys e

No

15 2 156 103 129 110 128 135 114 110 91 136 105 125 103 110

114 107

134 140 120 115 128 105

135 120 123 113 145 120 10 4

165

110

125

90

123 108 113 10 6

ANALYSIS OF VARIANCE 1 : FACTORIAL DESIGNS 1


TABLE 4.24 Data f o a Trialof Estrogen Patchesin the Treatmentof Postnatal Depression rm

131

Baseline %ament

Baseline 2

Depression Score

Placebo

18 2 5 2 4 19 22 21 21 26 20
24 24

18 27 17 15 20 28 16 26 19 20 22 27 15 28 18 20 21 24 25 22 26

15 10 12 5 5 9 11 13 6 18
10

Active

27 19 25 19 21 21 25 25 15 27

7 8 2 6 11 5 11 6 6 10

mothers depression. these plots give any concerns about the results the Do you of analysis reported in Table 4.16? If so, reanalyze the data as you see fit.
4.6. Consider different orders of main effects in the multivariate analysis described in Section 4.7.
type interaction found

4.7. Plot some suitable graphs to aid the interpretation of the origin x sex x in the analysis of the data on Australianschool children.

4.8. Graphically explore the relationship between errors and the distraction measure in the smoking and performance data in 4.17. Do your graphs cause Table you any concern about the validityof the ANCOVA reported inSection 4.6?

4.9. Use the formula given in Display 4.4 to calculate the adjusted values of the observations for the pattern recognition experiment. Using these values, plot a suitable diagram to aid in the interpretation of the significant group x task interaction.
4.10. The data in Table 4.23 (taken from Boniface, 1995) were collected during a survey of systolic blood pressure of individuals classed according to smoking

132

CHAPTER 4

status and family history circulation and heart problems. out an analysis of Carry of and variance of the data state your conclusions. Examine the residuals from fitting what you consider most suitable model and use themassess the assumptions the to of your analysis variance. of
41. In the model a balanced two-way fixed effects design (see 4.l, .1 for Display ) suggest sensible estimators the main effect and interaction parameters. for 41. The observations in Table are part of the data collected a clinical .2 4.24 in of trial of the useof estrogen patches in the treatmentpostnatal depression.Carry out an analysis variance of the posttreatment measure depression, using both of of pretreatment valuesas covariates.

Analysis of Repeated Measure Designs

5.1. INTRODUCTlON
Many studies undertaken in the behavioral sciences and related disciplines involve recordingthe value of a response variablefor each subject under more than one condition, on more than one occasion, or both. The subjects may often be arranged in different groups, either those naturally occurring such as gender, or those formed by random allocation, for example, when competing therapies are assessed. Such a design is generally referred to as involving repeated measures. Three examples will help in gettinga clearer picture of a repeated measures type of study.
1. visual acuity data. This example is one already introduced in Chapter 2, involving response times subjects using their right left eyeswhen light was of and flashed through lenses of different powers. The data are givenin Table 2.3. Here there are two within subjectfactors,eye and lens strength. 2. Field dependence and a reverse Stroop task. Subjects selected randomly from a large group of potential subjects identified as having field-independentor field-dependent cognitive style were required to read two types of words (color and form names) under three cue conditions-normal, congruent, and incongruent.

133

134

CHAPTER 5
TABLE 5.1 Field Independence and a Reverse Stmop T s ak
bl Form Subject
Cl

h Color
c3 0
Cl

0 c2 (C)

0 c2 (C) c3 0

182

176

1 219 206 191 2 175 186 183 3 166 165 190 4 210 5185 182 171 187 179 6 182 171 175 183 174 187 168 7 8 185 186 9 189 10 191 208 192 11 162 168 163 12 170 162
a2 (Field Dependenf)

a (Ficld Independent) 1

161 156 148 161 138 212 178 174 167 153 173 168 135 142

146 185

150 184 210 178

169 159 185 201 183 177 187 169 141 147

I60

145

151

267 216

277 235 400 183

165

13 14 15 16 17 18 19 20 21 22 23 2 4

216

150 223 162 163 172 159 237 205

140 150 214 404 215 179 159 233 177 190 186

164

140 146 184 1 4 4 165 156 192 189 170 143 150 238 207 228 225 217 205 230208 211 144 155 187 139 151

271 165 379

187 161

183 140

156 163 148 177 163

Note. Response variable t m in milliseconds.N.normal;C. congruent; is i e I, incongruent.

The dependent variable was the timein milliseconds taken to read the stimulus are are words. The data shown in Table 5.1. Here there two within subjects factors, type and cue, and one between subjects factor, cognitive style. 3. Alcohol dependenceand salsolinol excretion.W O groups of subjects, one with severe dependence and one with moderate dependence on alcohol, had their (in salsolinol excretion levels millimoles) recordedon four consecutive days (for in chemistry, salsolinol an alkaloid is those readers without the necessary expertise

ANALYSIS OF REPEATED MEASURE DESIGNS


TABLE 5.2 Salsolinol Excretion Rates (mol) for Moderately and Severely Dependent Alcoholic Patients

135

DaY

Subject

Group I (Moderate Dependence) 1 0.33 2 5.30 3 2.50 4 0.98 5 0.39 6 0.31 Group 2 (Severe Dependence) 1 7 0.64 8 0.73 9 0.70 10 11 2.50 12 1.90 13 0.50 14

0.70 0.90 2.10 0.32 0.69 6.34 2 0.70 1.85 4.20 1.60 1.30 1.20 1.30 0.40

2.33 1.80 1.12 3.91 0.73 0.63 3 1.00 3.60 7.30 1.40 0.70 2.60 4.40 1.10

3.20 0.70 1.01 0.66 2.45 3.86 4 1.40 2.60 5.40 7.10 0.70 1.80 2.80 8.10

0.40 7.80

with a structure similar heroin). Primary interest centers on whether the groups to is behaved differently over time. data are given in Table 5.2. Here there a single The within subject factor, time, and one between subjects factor, level of dependence. Researchers typically adopt the repeated measures paradigm as a means of reducing error variability andloras the natural way of measuring certain phenomena (e.g., developmental changes over time, and learning and memory tasks). In this type of design, effects of experimental factors giving rise to the repeated measures are assessed relative to the average response made by a subject on al conditions or occasions. In essence,eachsubjectserves l as his or her own control, and, accordingly, variability caused differences in average resby ponsiveness of the subjects is eliminated from the extraneous error variance. A consequence of this is that the power to detect the effects of within subjects experimental factorsis increased comparedto testing in a between subjects design.

136

CHAPTER 5

Unfortunately, the advantagesof a repeated measures design come at a cost, and that is the probable lack of independence of the repeated measurements. Observations made under different conditions involving the same subjects will very likely be correlated rather than independent. You will remember that in Chapters 3 and 4 the independence of the observations was one of the assumpso tions underlying the techniques described, it should come as no great surprise that accounting for the dependence between observations in repeated measure designs brings about some problems the analysisof such designs,as we shall in see later. The common feature of the three examples introduced in this section is that the subjects have the response variable recorded more than once. In the fielddependence example, the repeated measures arise from observing subjects under two different factors-the within subject factors. In addition, the subjects can be divided into two groups: the two levels betweengoup factor. In the visual of the acuity example, only within subject factors occur. In both these examples it is possible (indeed likely) that the conditions under which a subject is observed are given in a random order. In the third example, however, where time is the single within subject factor, randomization of course, an option. makes is not, This the type of study where subjects are simply observed over time rather different from other repeated measures designs, and they often given a different label, are longitudinul designs. Because of their different nature, shall consider we them in a separate chapter (Chapter 7). In a longitudinal design, the repeated measures on the same subject are a it sary part the design. In other examples of of repeated measures, however, would be quite possible to use different groupsof subjects for each condition combias nation, giving rise a factorial design discussed in the previous chapter. The to primary purpose of using a repeated measures design in such is the control cases that the approach provides over individual differences between In the area subjects. are to differences of behavioral sciences, such differences often quite large relative produced by manipulation of the experimental conditions or treatments that the investigator is trying to evaluate.

5.2. ANALYSIS VARIANCE OF FOR EPEATED MEASURE DESIGNS


The structureof the visual acuity study and the field-dependence study is, superficially at least, similar to that the factorial designs Chapter 4, and it might of of be imagined that the analysis variance procedures described there could again of be applied here. This would be to overlook the fundamental difference that a in repeated measures design the same subjects are observed under the levels of at least some the factors interest, rather of of than a different sample as required in a is ANOVA factorial design. It possible, however, to use relatively straightforward

ANALYSIS OF REPEATED MEASURE DESIGNS

137

procedures for repeated measures data if three particular assumptions about the observations are valid. They are as follows.
1. Normality: the data arise from populations normal distributions. with 2. Homogeneity ofvariance: the variances the assumed normal distributions of are required to be equal. 3 Sphericity: the variances of the differences between all pairs of repeated . This measurements are equal. condition implies that the correlations between pairs of repeated measures are also equal, the so-calledcompound syrnrnerry pattern. of Sphericity must also holdin all levels of the between subjects part the design 5.2.1 for the analyses be described in Subsections and 5.2.2 to be strictly valid. to

We have encountered thefirst two requirements in previous chapters, and they need not be further discussed here. The sphericity condition has not, however, been met before, and is particularly critical in the it ANOVA of repeated measures data; the condition will be discussed in detailSection 5.3. For the moment, however, in let us assume that this is the bestof all possible worlds, which data are always in normal, variances of different populations are always equal, and repeated measures always meet the sphericity requirement.

5.2.1.

RepeatedMeasuresModel for Visual Acuity Data

The results a repeated measures of ANOVA for the visual acuity data are shown in Table 5.3. The model on which this analysis is based is described in Display 5.1. The model allows the repeated measurementscorrelated but only to the extent to be 5.3. of the compound symmetry pattern to be described in Section It appears that in this only lens strength affects the response variable experiment. The means for
TABLE 5.3 ANOVA of Vsa Acuity Data iul
Sourre

SS

DF

MS

Eye

52.90 1650.60 Lensstrength 571.85 Error 14.67836.15 57 Eye x Strength1.51 3 14.58 43.75 Error 9.68 57 551.75

Enor

1 19

52.90 86.87 190.62

0.61

.44
401

12.99

.22

138

CHAFTER 5
Display 5.1

Repeated MeasuresANOVA Model for 'bo-FactorDesign (A and B) with Repeated Measures on Both Factors Let Yijk represent the observation on the ithsubject for factor level j and factor B A level k. The model for the observationsis
yijk

=p

+ + + + +
a j

pk

Yjk

Ti

(ra)ij

(rk%ik

(rY)ijk

6jk.

SS

where a j represents the effect of thejth level of factorA (with a levels), is the effect of the kth level factor B (with b levels), andYjk the interaction effect of of the two factors. The term is a constantassociatedwith subject i , and (T& and ( r y ) i j k represent interaction effects subject i with each factor their of and interaction. As usual, represent random error terms. Finally, we assume that there are n subjects in the study(see Maxwell and Delaney, 1990, for full details). The factor A (aj).factor B (a). and theAB interaction ( y j k ) terms areassumed to be fixed effects, but the subject and terms are assumed to be random variables error from normal distributions with means and variances specificto each term. zero This isan example ofa mixed model. Correlations betweenthe repeated measures arise f o the subject terms because rm these apply to eachof the measurements made on the subject. The analysis of variance table as follows: is Source SS DF MS MSR(F) ASS A a- 1 AMsms1 (a-1) A x Subjects ESSl (a l)(n 1) FSSl (a-INn-l) (error 1) B BSS b- 1 BSS BMSiEMS2 &l) Essz ESS2 ( l( 1 b )n ) B x Subjects (b-lI(n-1) (error 2) ABSS AB ABSS (a l( 1 )b ) (0-O(b-1)

- -

Es s3

(error 3) Here it is useful to give specific expressions both thevarious sums of squares for above andfor the terms that the corresponding mean squares are estimating (the expected mean squares),although the information looks a little daunting!

AB x Subjects =S3

- - (a W- 001 1)

(o-])(b-l)(,,-l)

(Continued)

ANALYSIS OF REPEATED MEASURE DESIGNS


Display 5.1

139

(Continued) Here S represents subjects, the sigma terms variances of the various random are effects in the model, identifiedby an obvious correspondence to these terms, and a b : = e a. ; = ; e , ; = ; e p are A, B,and AB effectsthat are nonzero unless the correspondmg null hypothesis that the a], i or yjt terms are p,

c = c:=, XI=, Y;~ ;, &

zero is true.

F tests of the various effects involve different error terms essentially for the same reason as discussed in Chapter 4, Section 4.5, in respect to a simple random effects model. For example, under the hypothesis of no factor A effect, the mean square for A and the mean square for A x Subjects are both estimators ofthe same variance.If, however, the factor effectsaj are not zero, then 0 is greater than zero and the mean , ' square for A estimates a larger variance than the mean square for A x Subjects. Thus the ratio of these two mean squares provides an F test of the hypothesis that the populations corresponding to thelevels of A have the same mean. (Maxwelland Delaney, 1990, give an extended discussion this point.) of
the four lens strengths as follows. are
6/60 6/18 6/36 616 112.93 118.23 115.05 115.10

It may be of interest to look in more detail at the differences in these means because the lens strengths form some of ordered scale(see Exercise 5.2). type

5.2.2. Model for Field Dependence Example


A suitable modelfor the field dependence example is described in Display 5.2. Again the model allows the repeated measurements to be related, but only to the extent of having the compound symmetry pattern. (The effects of departures from this pattern will be discussed later.) Application of the model to the fielddependence data gives the analysis of variance table shown 5.4.The main in Table effects of type and cue are highly significant, but a detailed interpretation the of results is left as an exercisefor the reader(see Exercise 5.1).

5.3. SPHERICITY AND COMPOUND SYMMETRY


The analysisof variance of the two repeated measures examples described above are are valid only if the normality, homogeneity, and sphericity assumptions valid for the data. It is the latter that is of greatest importance in the repeated measures situation, because if the assumption is not satisfied, the F tests used are positively biased, leading to an increase in the probability of rejecting the null

140

CHAPTER 5

Display 5.2 Repeated Measures ANOVA Model for Three-Factor Design(A, B, and C) with Repeated Measures on bo Factors (A and B) Let y1jn represent the observation on theith person for factor level j , factor A B level k, and factor C level 1.The model for the observationsis
Yijtl

= p +aj

+81 +

OJI + t i l

+bk

jk

+r1 + + +
Xjkl

(?ah/ ~1jkl~

+ (rb)it +(ry)ijt

where the terms are in the modelin Display 5.1, with the addition of parameters as for the betweensubjects factor (with c levels) and its interactions with the within C subject factors and withtheir interaction (81, wit, Okl, X,k/). Note that there are no subject effects involving factor because subjectsare not C, crossed with this factor-it is a betweensubjectsfactor. Again the factor A, factor B, and factor C terms are assumed to be 6xed effects,but the subject terms and terms are assumed to be random effects that are normally error distributed withmean zero and particular variances. The analysis of variance table asfollows. is
Source Between subjects

Subjects within C (error 1)


Within subjects

ss css
EsSl

MSR( F) CMSEMS1

CxA C x A x Subjects within C (error 2) B CxB C X B x Subjects within C (error 3)

ASS CASS Ess2 BSS CBSS Ess3

AMsEMs2 CAMSEMS2 BMSEMS3 CBMSEMS3 ABMSlEM.94 CABMSEMS4

AB
CxAxB CxAxBx
Subjects within C (error 4)

ABSS
CABSS Ess4

n is thetotal number of subjects; that n = nl n2 is, +ne., where nl is the C. number ofsubjectsin level l of the between subjects factor

+ +- -

ANALYSlS OF REPEATED MEASURE DESIGNS


TABLE 5.4 ANOVA for Reverse Stroop Experiment
Source

141

ss
18906.25 162420.08 25760.25 3061.78 43622.30 5697.04 292.62 5545.00 345.37 90.51 2860.79

DF

MS

l 2.56 18906.25 22 7382.73 1 25760.25 12.99 .W16 l 1.54 3061.78 22 1982.83 2 22.60 2848.52 2 1.16 146.31 44 126.02 2 172.69 2.66 2 0.70 45.26 44 65.02

.l2 .23

e O1 . 0 O
-32 .08

.so

hypothesis when itis true; that is, increasing the size the Qpe I error over the of This nominal value set by the experimenter. will lead to an investigators claiming of a greater number significant results than are actually justified by the data. Sphericity implies that the variances of the differences between any pair of the repeated measurements are the same. In terms of the covariance matrix the of repeated measures,I (see the glossary Appendix A), it requires the compound : in symmetry pattern, that is, the variances on the main diagonal must equal one another, and the covariances off the main diagonal must also all be the same. Specifically, the covariance matrix must have the following form:

i=

(: .:
U2

pa2

.. .
, )
*

*;.

(5.1)

pa

pa

U2

where a2is the assumed common variance and is the assumed common correp This in of lation coefficient the repeated measures. pattern must hold each level of the between subject factors-quite an assumption! As weshall see in Chapter 7, departures from the sphericity assumption are very to likely in the case of longitudinal data,but perhaps less likely be so much of a of the problem in experimental situations in which the levels within subject factors are given in a random order to each subject in the study. where departures But are suspected, what can be done?

142

CI"TER 5

5.4. CORRECTION FACTORS IN THE ANALYSIS OF VARIANCE OF REPEATED MEASURE DESIGNS


Box (1954)and Greenhouse and Geisser (1959)considered the effects of departures from the sphericity assumption in a repeated measures ANOVA. They demonstrated that the extent to which a set of repeated measures data deviates from the sphericity assumption canbe summarized in terms of a parameter, E, which is a function of the covariances and variances of the repeated measures. (nsome waysit is a pity that E has becomethe established way to represent I this correction term, as there is some dangerof confusion with all the epsilons occurring as error terms in models. Try not to become too confused!) Furthermore, an estimate of this parameter can be used to decrease the degrees of freedom of F tests for the within subjects effects, to account departure from sphericfor ity. In this way larger values will be needed claim statistical significance than to when the correction is not used, and so the increased risk of falsely rejecting the null hypothesis is removed. For those readers who enjoy wallowing in gory mathematical details, the formula E is given in Display 5.3. When sphericity for be holds, E takes its maximum valueas one and the F tests need not amended. The / p l), minimum value of E is l ( - where p is the number of repeated measures. (1959),have suggested using Some authors, example, Greenhouse and Geisser for this lower bound in all cases so as to avoid the need to estimate E from the data (seebelow). Such an approach is, however, very conservative. That it will too is, often fail to reject the null hypothesis it is false. The procedure not recomwhen is mended, particularly now that most software packages will estimateE routinely. In fact, there have been two suggestions for such estimation,both of which are usually providedby major statistical software.

1. Greenhouse and Geisser (1959)suggest simply substituting sample values for the population quantities Display 5.2. in
Greenhouse and Geisser

Display 53 Correction Factor

The correction factor. is givenby S

where p is the number of repeated measures on each subject, (u,,,r=l, .. p , s = l , . , represents the elements of the population ., ..p covariance matrix of the repeated measures,and the remaining terms are as follows: 5 ;the mean of all the elements of the covariance matrix, 8,r;the mean of the elements of the main diagonal; 8,;the mean of the elements in row r. (Covariance matrices are defined in the glossary in Appendix A.)

ANALYSIS OF REPEATED MWiSURE

DESIGNS

143

2. Huynh and Feldt (1976) suggest taking the estimateE of be min(1, a/b), to where a = n g ( p - - and b = (p-) g n 1) (p l)?], and where l)? 2 l[( n is the numberof subjects in each group,g is the numberof groups, and2 is the Greenhouse and Geisser estimate.

-- -

To illustrate the use the correction factor approach of to repeated measures data 5.1, but shall assume we shall again use the field-dependence data in Table now we that the six repeated measures arise from measuring the response variable, time, that C6, under six different conditions, is, Cl, C2, C3, C4, C5, given in a random order to each subject. The scatterplot matrices the observationsin each group of 5.1 are shown in Figures (field independent), and 5.2 (field dependent). Although there are not really sufficient observations make convincing claims, there does to appear to be some evidence a lackof compound symmetryin the data.This is of reinforced by look at the variances of differences between pairsof observations and the covariance matricesof the repeated measures in each levelof cognitive style shown in Table 5.5. Clearly, there are large differences between the groups in terms of variances and covariances that may have implications for analysis, but see that we shall ignore here (but Exercise 5.1). Details of the calculationof both the Greenhouse and Geisser and Huynh and Feldt estimates of E are given in Table 5.6, and the results of the repeated meaare sures ANOVA of the datawith and without using the correction factor given in Table 5.7. Here use of the correction factors doesnot alter which effects are statistically significant, although values have changed considerably. p

5.5. MULTIVARIATE ANALYSIS OF VARIANCE FOR REPEATED MEASURE DESIGNS

An alternative approach to the of correction factors the analysis repeated use in of measures data, when the sphericity assumption is judged to be inappropriate,is to use a multivariate analysisvariance.This technique has already been considered of 3 of briefly in Chapters and 4, in the analysis studies in which a seriesof different response variables are observedon each subject. However, this method can also be applied in the repeated measures situation, in which a single response variable is observed under a numberdifferent conditions of and/or at a number different of times. The main advantage of using MANOVA the analysisof repeated meafor sures designs is that no assumptions now have to be made about the pattern of covariances between the repeated measures. In particular, these covariances need not satisfy the compound symmetry condition. A disadvantage of using MANOVA for repeated measures is often stated to be the technique's relatively power low when the assumption of compound symmetry is actually valid. However, Davidson (1972) compares the power the univariate and multivariate analysisvariance of of

e ,

144

145

I 46
TABLE 5 5

CHAPTER 5
Covariance Matrices and VariancesDifferencesof Repeated Measures of for Shoop Data

Variancesof Differences Between Conditions Diference Field-Dependent Variance Field-Independent Variance

c142
C1-C3 C1 x 4 C145 Cl-C6 m43
C2-a
czc5

(2x6 c x 4
c3-c5

C346 c445 C446 C546

68.52 138.08 147.61 116.02 296.79 123.36 178.00 123.54 304.27 330.81 311.42 544.63 16.08 47.00 66.27

478.36 344.99 2673.36 2595.45 3107.48 778.27 2866.99 2768.73 3364.93 2461.15 2240.99 2500.79 6 S4 1 150.97 103.66

Covariance Matrirfor Reveated Measures for Field-lndevendent GmUD Cl c 4 c2 c3 CS C6

203.12 C1 145.73 169.15 c2 221.32 145.73 375.17 c 3 221.32 203.12 01 270.14 300.09 172.23 160.82 c5 154.70 160.01 C6 343.82 375.36 164.09 185.00

185.00 190.82 160.01 156.46 161.77 192.09


C3

156.46 192.09 172.23 270.14

161.77 256.27 164.09 375.36 343.82 497.64

Covariance M a t k for Repeated Measures Field-DependenfGmup for C1

c2

c4

CS

C6

5105.30 4698.27 1698.36 4708.48 4708.48 4790.02 1443.91 4324.00 c3 4698.27 4324.00 1569.94 4636.24 c 4 1443.91 964.79 1569.94 1698.36 c 5 1603.66 1847.93 1790.64 1186.02 1044.64 C6 1309.36 1595.73 1664.55 1138.00 1193.64 1003.73
c2

c1

1847.93 1603.66 1790.64 1044.64

1595.73 1309.36 1664.55 1003.73 1138.00

ANALYSIS OF REPEATED DESIGNS MEASURE


TABLE 5.6 Calculating Correction Factors

147

The Mean of the Covariance Matricesin Table 5.5 is the Mat& S S y Given By a,
Cl

cz

c3

c 4

c 5

C6
890.364

929.59 1001.31 2450.70 2637.22 c 1 2427.10 2490.42 2272.66 c2 2427.10 750.73 882.72 800.18 .32 871.08 c 3 2272.66 2505.71 2450.70 800.18 632.44 c4 929.59 1.08 87 689.55657.39 c5 882.72 1001.31 975.32 657.39 721.14 740.91 9.55 914.32 750.73 890.36 C6 Mean of diagonal elements of S is 1638.76 Mean of all the elements inS is 1231.68 Sum of squares of all the elements inS is 72484646 The row means of S are

845.64

Cl
1603.97

c3

c 4

c 5

C6

1722.72 Sum of the s u r drow means of S is 10232297 qae

Greenhouseand GeisscrEstimate (of Correction Factor)

a=

[36 X (1638.76- 1231.68)*] = 0.277. 5~(72484646-12~10232297+36~1231.86~1231.86)

Huynh-Feldt Estimate a b min(l,o/b)

=12x5~2~2-2, =5~(2xll-5~8), = 0.303.

approaches when compound symmetry holds and concludes that is nearly the latter as powerful as the formerwhen the numberof observations exceeds the number of repeated measures by more than 20. To illustrate the use of the MANOVA approach to the analysis of repeated measures data, we shall apply it to the field-dependence data, again assuming that the six repeated measures each subject arise from measuring the reponse for variable time under six conditions given in a random order. For the moment, forget the division of the data into two groups, field independent and field dependent, and assume that the null hypothesis of interest is whether the population means of the response variable differfor the six conditions. The multivariate approach to testing this hypothesis uses a version of Hotellings T 2statistic as described in Display 5.4. Numerical resultsfor the Stroop data given in Table 5.8. There is are a highly significant effect condition. of

I48

CHAPTER 5
TABLE 5.7 ANOVA With and WIthout Correction Factors for Stroop Data

Analysis o Variance Wthout Correction Factors f Source

ss
17578.34 162581.49 28842.19 4736.28 53735.10

DF

MS

Enor

Group (G) Condition (C) GxC 110

Enor

17578.34 1 22 7390.07 5 11.81 5768.42 1.94 947.26 5 488.50

2.38

.l4 <.ooOOl .9 04

Nofc. Repeated measures are assumed to arise from a singlewithin subject factorwith six levels. For thesedata the Greenhouse and Geisser estimateof the correction factor is 0.2768. The adjusted degrees of freedom and the revised p values for the within subject tests are as follows. 06: Condition: &l = 0.2768 x 5 = 1.38; df2 = 0.2768 x 110 = 30.45; p = . 0 3 Group x Condition: dfl = 1.38; df2 = 30.45; p = .171. The Huynh-Feldt estimate of c is 0.3029 so that the following is me. Condition: dfl = 0.3028 x 5 = 1.51; dP2 = 0.3029 x 110 = 33.32; p = .OOO41;Group x Condition: dfl = 1.51; dt2 = 33.32; p = .168.

Display 5.4 Multivariate Test for Equality of Means of Levels of a Within Subject Factor A with U Levels
The null hypothesisis that in the population, the means of the U levels of A are equal, thatis, H :p l = p 2 = * . . = P a t 0 where pt. p?, . .p. are the population means the levels of A. ., of This is equivalent to

H o :pl-/12=0;p2-p)=0;..;11.-1-p.=0 (see Exercise 5.5). Assessing whetherthese differences are simultaneously equal to zero gives the required multivariate test H of . T 2is given by

The appropriate test statistic is Hotellings T2 applied to the corresponding sample mean differences; thatis; 6 = [XI- 2 , X), . . t - 1 - fa]. ~ .,

T 2= ndsdd,
where n is the sample size and Sd is the covariance matrix the differences between of the repeated measurements. Under Ho, = (n - l/( F p )[n l( )p l)]T2has an F distribution withp 1 and n p 1 degrees of freedom, where p is the number of repeated measures.

-+

- -

ANALYSIS OF REPEATED MEASURE DESIGNS


Multivariate Test of Equality of Condition Means Stroop Data for

l 49

TABLE 5.8

Differences Between Conditions


Subject

Cl-Q

C S C c243 4 C 5 6 C

c344

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

-15 8

1 16 -8 8 6

-1 7 -1 -1 0

10

19 0
4 18 -53 -16 4 4 28 -12 -22

4 -22 16 4 -19 1 -12 -16 -5 8 -55 -55 -15 25 -22 31 7 -5 -40 -21 -1

-13 3

43 38 23 38 -11 -11 20 32 28 40 33 28 117 110 25 165 47 40 2 16 31 12 67 48

-6 -8 -8 -4 -3 -3 7 -6 4 -1 6 -5 -26 -22 0 9 6 -12

14
-5 -4 -6 -5 2 -5 2 -18 -10 -6 -18 4 -4 -24 -4 -16 7 -17 -9 3 2 -22 -22 -12

-19
7 -18 3 -1 1 -12

-3

Nore. The vector of means of these differences is 6' The covariancematrix of the differencesis

sa=

= [-1.42. -9.33,40.88,

-8.00, -10.921.

261.91 -228.88 -20.53 12.83 36.90 -228.88 442.23 -208.65 34.61 36.90 -208.65 1595.51 -143.13 100.10 54.52 -14.70 12.83 34.61 -143.13 -20.53 65.64 100.10 -14.70 81.73

T 2= 115.08, F = 19.02. and p <.001.

I 50

CHAPTER 5

Display 5.5 Multivariate Test for A x B Interaction i a Repeated Measures Design with Wlthin an n B Subject FactorA (at a Levels) and Between Subjects Factor (at b Levels)

Here the null can be expressed as follows:

where py)represents the population mean level i of A and level j of B. of m e restriction totwo levels for B used simply of convenience is of description.) Testing these equalities simultaneously gives the multivariate test for the x B A interaction. The relevant testis HotellingsTZas described in Chapter 3, with the R1 and R2 terms being re laced by the vectors of group mean differences, d, = [.fy) . $ ) , . $ ) - . ., - and dz defined similarly for the ay). .,a ! b af)], second group. Now S1 and S2 are the covariance matricesof the differences between the repeated measurements in each group.

of the x Again Now what about the multivariate equivalent group condition test? this involves Hotellings T2 test as introduced in Chapter 3 (see Display 3.8). Details are given in Display 5.5, and the numerical results for the Stroop data are given in Table 5.9. There is no evidence of a significant group x condition

interaction. Keselman, Keselman, andLix (1995) give some detailed comparisonsof univariate and multivariate approaches to the analysis of repeated measure designs, and, for balanced designs (between subjects factors), recommend the degreesof freedom adjusted univariateF test, as they suggest that the multivariate approach is affected more by departures from normality.

5.6. MEASURING RELLABILITY FOR QUANTITATIVE VARIABLES: THE INTRACLASS CORRELATION COEFFICIENT
One of the most important issues in designing experimentsis the question of the reliability of the measurements. The concept of the reliability of a quantitative variable can best be introduced by use o f statistical models that relate the unknown true values of a variable to the corresponding observed values.This is

ANALYSIS OF REPEATED MEASURE DESIGNS


Multivariate Test for Groups x Condition Interactionfor Stroop Data

151

TABLE59

The covariance matricesof the condition differences for each p u p are

SI =

S*=

( (
d;

68.52 -26.89 -22.56 -26.89-138.08 123.36 -22.56 330.81 -138.08-17.73 11.44 -17.54 -17.73 0.02 -26.23

-26.23 101.14 16.08 -17.67 101.14 -17.67 66.27 o . o

-17.54

11.44

478.36 -455.82 119.82 10.18 -42.09 -455.82 778.27 -186.21 60.95 168.20 119.82 2461.15 85.18 -186.21 -140.85 10.18 60.95 -140.85 -42.09 168.21 85.18

61.54 -7.11

-7.11 103.66

The two vectors of the means of the differencesare

= [-0.83, -6.08.25.08, -3.92, -11.581, d; = [-2.00, -12.58.56.67, -12.08, -10.251.


'0 ' 2

= 11.83, F = 1.52, p = .23.

a convenient point to introduce such models, because applyingthem generally involves a particular type repeated measures design. of Examples of the models used assessing reliability given in Display for are 5.6; they all lead to theintrucluss correlation coefficient the way of indexing reliaas by bility. Toillustrate the calculationthis coefficient, the ratings givena number of of judges to the competitors in a synchronized swimming competition will be used; these ratings are given in Table 5.10 (Of course, the author realizes that synchronized swimming may only be a minority interest among psychologists.) Before undertaking the necessary calculations, might be useful to examine the it data graphically in some way. Consequently, Figure 5.3 gives the box plots of the scores given by each judge, and Figure shows the draughtsman's plotthe 5.4 of ratings made each pair judges. The plots shown that the scores given by of box by the first judge vary considerably more than those given by the other four judges. The scatterplots show that although there is, in general, a pattern of relatively strong relationships between the scores given by eachofpair five judges, this the is not universally so; for example, the relationship between judges 4 and 5 is far less satisfactory. The relevant ANOVA table for the synchronized swimming data is shown in Table 5.11, together with the details of calculating the intraclass correlation coefficient. The resulting value of 0.683 is not particularly impressive, and the

1 52

CHAPTER 5
Display 5.6 Models for the Reliability of Ouantitative Variables
Let x represent the observed valueof some variable ofinterest for a particular individual. If the observation was made second time, say some a days later,it would almost certainly differ to some degree fromthe first recording.A possible modelfor x is
X=t+,

where t i s the underlying true value ofthe variable for the individualand E is the measurement error. Assume that t has a distribution with mean and variance U:. In addition, assume p that E has a distribution with mean and variance U,', and that t and c are zero independent of each other; that is; size of the measurement error does the not dependon thesize of the true value. A consequence of this model is that variabilityin theobserved scoresis a combination of true score variance anderror variance. The reliability, R,of the measurements is defined as the ratio of the true score variance to the observedscore variance,

which can be rewrittenas

so that theerror variance forms a decreasing part of variability in the the

R is usually known as the intraclass correlation coeficient. U,'/.: decreases, As

observations, R increases andits upper limit of unity is achieved when the error variance is zero. In the reverse case, where forms an increasing proportion of observed U,' the variance, R decreasestoward a lowerlimit of zero,which is reached whenall the variability in the measurements results from the error component ofthe model. The intraclass correlation coefficient be directly interpreted as the proportion of can is variance of an observation that due to between subjectvariability in the true scores. When each of r observers rates a quantitative characteristic interest on each of of R subjects, the appropriate model becomes
x =t
+O+,

where o represents the observer effect, which assumed to be distributed with is zero mean and variance U,'. The three terms t , 0 , and E are assumed to be independentof one another so that the variance ofan observation is now

.=:.'.. ~.+,+~
(Continued)

ANALYSIS OF REPEATED MEASURE DESIGNS


Display 5.6

153

(Continued) The interclass correlation coefficient for SiNation is given by this

R=
ANOVA table.

u:+u,2+u:.

0:

An analysis of varianceof the raters' scoresfor each subject leads to the following

Source DF Subjects n 1
Raters

Error
are

r1

MS

SMS

(n -)r 1(

-1)

RMS EMS

It can beshown that the populationor expected values of thethree mean squares SMS : : +m:, U M S : +nu:, U,' EMS : U,'.
By equating the observed values the three mean squares to their expected values, of the following estimators forthe three variance terns U:. U:, and U,' are found as

follows:
8;

EMS = SMS r

8 :

RMS - M E S
n

8,'

=EMS.

An estimator of R is then simply

for synchronized swimmers involved in the competition might have some cause concern over whether their performances were being judged fairly and consist In the case of two raters giving scores on a variable to the same n subjects, the intraclass correlation coefficient equivalent to Pearson's product moment is correlation coefficient between2n pairs of observations, of which the first n are the original values the secondn are the original values reverse order. When and in are of value depends only two raters involved, the value the intraclass correlation in part on the corresponding product moment correlation coefficient and in part on the differences between the means and standard deviations the two sets of of

I 54

CHAPTER 5
TABLE 5.10 Judges Scores for 40 Competitors in a Synchronized Swimming Competition
Judge I Judge 2 Judge 3 Judge 4 Judge S

Swimmer

2 3 4
5

10

6 7 8 9

11

12 13 14 15 16 17 18 19 20 21 22
2 3

24

25 26 27 28 29 30 31 32 33
34

35 36 37 38 39
40

33.1 26.2 31.2 27.0 28.4 28.1 27.0 25.1 31.2 30.1 29.0 27.0 31.2 32.3 29.5 29.2 32.3 27.3 26.4 27.3 27.3 29.5 28.4 31.2 30.1 31.2 26.2 27.3 29.2 29.5 28.1 31.2 28. I 24.0 27.0 27.5 27.3 3 1.2 27.0 31.2

32.0 29.2 30.1 21.9 25.3 28.1 28.1 27.3 29.2 30.1 28.1 27.0 33.1 31.2 28.4 29.2 31.2 30.1 27.3 26.7 28.1 28.1 29.5 29.5 31.2 31.2 28.1 27.3 26.4 27.3 27.3 31.2 27.3 28. l 29.0 27.5 29.5 30.1 27.5 29.5

31.2 31.4 28.4 25.3 30.1 29.2 27.3 28.1 25.6 26.2 28.1 28.4 28. 27.0 I 26.2 27.3 31.2 30.1 28.1 30.1 29.2 27.0 27.3 25.3 31.2 29.2 32.3 30.3 28.4 29.2 28.1 29.2 31.2 29.2 29.2 27.3 26.4 26.426.4 28.4 26.4 27.3 26.4 28.4 27.5 29.2 27.3 28.1 29.2 31.2 30.3 26.2 26.2 27.0 28.1 27.3 27.3 29.2 28.1 29.229.2 31.2 28.4 27.3 28.4 26.4 25.3 27.3 28.1 24.5 25.3 26.2 28.1 27.3 29.2 27.3 27.3 30.1 28.4

31.2 27.3 31.2 24.7 26.7 32.0 28.1 27.5 32.0 28.6 29.0 26.4 30.3 31.2 30.3 30.9 29.5 29.2 28.1 26.4 27.5 28.4 28.6 31.2 31.2 31.2 25.9 28.1 27.3 28.4 28.1 31.2 28.4 25.1 26.4 25.6 27.5 30.1 27.0 28.4

31.2

ANALYSIS OF REPEATED MEASURE DESIGNS


r l

1 55

Judge 1

Judge 2

Judge 3

Judge4

Judge 5

FIG. 5.3. Boxplots of scores given io synchronizedswimming competitors by five judges.

ratings. The relationship between the intraclass correlation coefficient and Pearsons coefficient isgiven explicitly in Display 5.7. Sample sizes required in reliability studies concerned with estimating the intraclass correlation coefficient a e discussed by Donner and Eliasziw (1987), and r the same authors (Eliasziw and Donner, 1987) also consider the question of the optimal choice of r and n that minimizes the overall cost of a reliability study. Their conclusion is that an increase in r for fixed n provides more information than an increase in n for fixed r.

n
26

156

CHAPTER 5
28 30 32
0
0 0 0
N

0 0 0
0

Judge 1

8
O
O

g o
0

oo

0 0

0000

m oo oo
0
0

o '80'8
0

0 0
0

'008 0 8

O
W

0 0
0 0 0 0 0 0 0 0

0
0 0
0 0 0 0

0-

0 0 0 0 0

0 0

00 0

o,Ooo o p o o o
0

Judge 2
'

88"
0 0

o
0 0

o m
0 0

m
m o
0 0

00

0 0

m g m o

070
0
0

09
0
0

om0
0003

oo

8 0 0 0

8 0
0

0 0
-(u

0 0

030

0ogDmo

0 0 0 O(

00
0 0

0 0

0
0

8
0

L
Judge 3
0 0

o o m
8 0

o
0

0 0 0 0 0 0 0 0 0

0 0 0

-%
W

0 0 8 0 0

8
0

0 0

0 0 0 0

0 0 0

0 0

0 0

O 0

08
0

om 0
O O 0

8
0

0 00 0 0

0 0 0 0 0

80
0

i
m o 0
o m 0 0

Judge 4

0 0 0 000 0 0 0 0

o o o oo
0 2 0 0 0 0 0

" $ 8
0 0 0
0

0g0
0 0 0 0

m
0 0

0
0
0

8
0

o m 0
0 0

o m 0
00

an

o m

m o
000 O m 000

0 0

I0 0 0
$ 1 ,

26

28

30

32

o scatterplots for scores given by each pair f o judges in rating synchronized swimmers f
FIG. 5.4. Matrix

ANALYSIS OF REPEATED MEASURE DESIGNS


TABLE 5.11 ANOVA and Calculation of Intraclass Correlation Coefficient for Synchronized Swimming Judgments

1 S7

ANOVA Table Source

ss

DF

MS

Swimmers Judges Error

521.27 19.08 163.96

39 4 156

13.37 4.77 1.05

Note. The required estimates of the three variance terms in the model are

8; =
, 1

13.37 - 1.05

< = 2.53, 4.77 21.05 = 0.093, 40 82 = 1.05.


C; T

Consequently. the estimate of the intraclass correlation coefficient is

R=

2.53

+ 0.093 + 1.05 = 0.69.

2.53

Display 5.7 Relationship Between the Intraclass Correlation Coefficient, R, and the Product Moment Correlation Coefficient, Rhl, in the Case of l b o Raters

where i, 22 are the mean values of the scores of the two raters, and s; and S; are and the corresponding variances.

5.7. SUMMARY
1. Repeated measures data arise frequently in many areas of psychological research and require special care in their analysis. 2. Univariate analysis of variance of repeated measures designs depends on the assumption of sphericity. This assumption is unlikely to be valid in most situations. 3. When sphericity does not hold, either a correction factor approach or a MANOVA approach can be used (other possibilities are described by Goldstein, 1995).

158

CHAPTER 5

4. Recently developed methods for the analysis of repeated measure designs having a nonnormal response variableare described in Davis (1991).

COMPUTER HINTS

SPSS
In SPSS the basic steps to conducta repeated measuresanalysis are as follows.
Click Statistics, click General Linear Model, and then click GLMRepeated Measures to get the GLM-Repeated Measures Define Variable(s) dialog box. and the numberof levels of Specify the namesof the within subject factors, these factors. Click Add. Click Define and move the names of the repeated measure variables to the Within Subjects Variables box. Both univariate tests, with and without correction factors, and multivariate tests can be obtained.A test of sphericity is also given.

TABLE 5.12 Skin Resistanee Data

Subject

1 2 3 140 4 300 5 6 7
8

500 660 250 220 72 135


105

m 200
600 370
84 50

98 600 54 240
450

75
250

250 310 220

9 90 10
11 12

200 75 15 160
250

58 180 180 290 45

180 135 82 32 220 320 220 300


50

33 430 190
73
34

70 78 32
64

280

135
80

13 14 15 16

m
310 20 lo00 48

88 300
50

92 220 51

150 170 66 26 107

230 1050

280
45

ANALYSIS OF REPEATED MEASURE DESIGNS


TABLE 5.13 Data from Quitting Smoking Experiment

159

Before Gendcr
Tmatment

Afer

Home

work

Home

work
4 2 4 5

Male

Taper

5 8
8

6
4

6
5

l
8 5

l
6 5

Intermediate

6
8 5

l
5

l
5

3 6 4

l 8 l 5
Aversion

6 l
6 6 5 8 4

6
6 5 6

8 9 4

5 5

l
8 Taper Female Intermediate
9

l 5

6 4 8

5 6 5 8

5 6 5 6

l
5

6
5 9
3

5 5 6 4 6 6 8 6 8 6 8
6 8 4

5 5 4 6 8 4 2

3 0 3

2 3 5 5 5
4

6
5 8
4 8

Aversion

l
4 9 5 6 4 5 5

5 8

6 5

4 9

3 3

5 8 5

6
4

3 3

6
5 4

5 6 4

6
8 4

4 8 5 6 3

l
5

160

CHAPTER 5

S-PLUS In S-PLUS, main function for analyzing repeated measures but because the is Ime,
this is perhaps more applicable to longitudinal studies the examples covered to than it until in this chapter, we shall not say more about Chapter7.

EXERCISES
5.1. Apply both the correction factor approachand the MANOVA approach to the Stroop data,now keeping both within subject factors. How do the results in Do compare with the simple univariate ANOVA results given the text? you think any observations should removed prior to the analysis of these data? be
5.2. Reanalyze the visual acuity data by using orthogonal polynomial contrasts for lens strength.

5 .Five differenttypes of electrodes were applied to thearms of 16 subjects 3 and the resistance was measured in kilohms. The results are 5.12.The Table given in experiment was carried out see whether all electrode types performed similarly. to Calculate the intraclass correlation coefficientof the electrodes. Are there any observations that you think may be having an undue influence on the results?

Three different procedures quitting smoking (tapering immediate stopping, for off,

5.4. The data Table 5.13 are taken from an investigation in of cigarette smoking.

and aversion therapy) were compared. Subjects were randomly allocated to each 1treatment and wereasked to rate (on a to 10-point scale) their desire to smoke right now in two different environments (home versus work) both before and after quitting. Carry out both univariate and multivariate analyses of the data, noting that the groups formed by the two between subjects factors, gender and treatment group, have different numberssubjects. of
5.5. Show the equivalence of the two forms of the null hypothesis given in Display 5.5.

Simple Linear Regression and Multiple Regression

Analysis

6.1. INTRODUCTION

In Table 6.1 a small set of data appears giving the average vocabulary size of children at various ages. it possible (or indeed sensible) ty to use these data Is to r to construct a model for predicting the vocabulary size children older than6, of and how should we about Such questions serve to introduceofone most go it? the widely used of statistical techniques, regression analysis. (It has to be admitted that the methodis also often misused.)In very general terms, regression analysis involves the development and use statistical techniques designed to reflect the of way in which variation in an observed random variable changes with changing aim circumstances. More specifically, the of a regression analysisis to derive an equation relating a dependentan explanatory variable, or, more commonly, sevand eral explanatory variables. The derived equation may sometimes be used solely for prediction, but more often its primary purpose as a way ofestablishing the relative is importance of the explanatory variable@) in determining the response variable, th is, in establishing a useful model to describe the data. (Incidentally, the term regression was first introduced by Galton the 19th Century characterize a in to tendency toward mediocrity, that more average, observed the offspringof parents.) is, in 161

162
TABLE 6.1 The Average Oral Vocabulary Size of Children at Various Ages
~ ~~

CHAPTER 6

Age (Years)

Number of WO&

1.o 1 .S 2.0 25 . 3.0


3.5

22
446

896 1222 2072

4.0 4.5 5.0 6.0

In this chapter we shall concern ourselves with regressionfor aresponse models variable that is continuous; in Chapter 10 we shall consider suitable regression models for categorical response variables. No doubt most readers have covered will simple linear regression aresponse for variable and a single explanatory variable their introductory statistics course. in of enduring a little boredom, it may be worthwhile read Nevertheless, at the risk asan and the next section both aide memoir asan initial stepin dealing with the more complex procedures needed when several explanatory variables are considered, a situation to discussedin Section 6.3. be

6.2. SIMPLE LINEAR REGRESSION


The essential components the simple e a r regression model involving a single of h explanatory variable shown in Display 6.1.(Those readers who require a more are Lunn, detailed acoount should consult Daly, Hand, Jones, and McConway, 1995.) Fitting the model to the vocabulary data gives the results shownin Table 6.2. A plot of the fitted line, 95% confidence interval, and the original are given in its data for of Figure 6.1. The confidence interval the regression coefficient age (513.35, 610.5 l), indicates that there a very strong relationship between vocabulary size is and age. Because the estimated regression coefficient positive, the relationship is so is such that as age increases does vocabulary size-all rather obvious even from a simple plot of the data! The estimated regression coefficient implies that the increase in average vocabulary size corresponding to an increase age of a year in is approximately 562 words.

LINEAR AND M U L T l P L E REGRESSION ANALYSIS


Display 6 . 1 Simple Linear Regression Model

163

variable, x. The variance of y for a givenvalue of x is assumed to be independent of x . More specifically, the model can be writtenas

The basic idea of simple linear regression is that the mean values of a response values of an explanatory variable y lie on a straight line when plotted against

X I , x2,

where y1, yz, . . yn are the n observed values of response variable,and ., the . . are the corresponding values of explanatory variable. . ,x, the

Yi =(I+&

+Ei,

*The~i,i=l,...,nareknownasresidualorerrortermsandmeasurehowmuch

an observed value,yi, differs from the value predicted by model, namely the

( I

The two parameters of the model,( and p, are the intercept and slope of the line. I Estimators of ( and p are found by minimizing the sum thesquared deviations of I of observed and predicted values, a procedure known squares. The resulting as least estimators are
& = J-ba,

+pxi.

where Jand I are the sample means of response and explanatory variable, the respectively. The error terms,c,, are assumed tobe normally distributed with mean and zero variance u2. An estimator of u2is sz given by
S =

c:=,(Yi n-2

-9d2

where j = B +jxi is the predicted valueorfirted value of the response variable i for an individual with explanatory variable value xi, Estimators of the standard errors of the estimated slope and estimated interceptsare given by

Confidence intervals for, and of hypotheses about, the tests slope and intercept error parameters can be constructed in the usual way from these standard estimators. The variability ofthe response variablecan be partitioned into a part that due to is [C:=,(9,j 9 2 ] and a residual regression on the explanatory variable

(Continued)

164
Display 6.1 (Continued)

CHAPTER 6

[xy=,(y~ Therelevant terms are usually arranged in an analysis of variance table as follows. SS Source DF MS F

Rearession
Residual

RGSS
RSS

1 MSRGMSR RGSSll n-2 RSS/(n 2)

The F statistic provides a test of the hypothesisthat the slope parameter, p,is

zero.

The residual mean square gives the estimate u2. of

TABLE 6.2 Results of Fitting a Simple Linear Regression o e to Vocabulary Data Mdl

Parameter Estimates Estimate CoejJicient


Intercept 88.25 Slope 24.29
-763.86 561.93 ANOVA Table
SE

Source
Regression Residual 8

ss
7294087.0 109032.2

DF

MS
7294087.0

1 539.19

It is possible to use the derived linear regression equation relating average at different ages. For example, vocabulary size to age, to predict vocabulary size for age 5.5 the prediction would be average vocabulary size = -763.86 561.93 x 5.5 = 2326.7. (6.1) As ever, an estimate this kind is of little use without some measureits variof of ability; thatis, a confidence intervalfor the prediction is needed. Some relevant 6.3, for formulas are given Display 6.2,and in Table predicted vocabulary scores in (CIS) a number of ages and confidence intervals are given. Note that the confitheir dence intervals become wider. That is, the prediction less certain asthe age becomes at which a predictionis made depms further from the mean of the observed ages. Thus the derived regression equation does allow predictions to made, but be it is of considerable importance to reflect a little on whether such predictions are really sensible in this situation. A little thought shows that they not. Using the are fitted equation predict future vocabulary scores based on the assumption that to is the observed upward trend in vocabulary scores with age will continue unabated

LINEAR AND MULTIPLE

REGRESSION ANALYSIS

1 65

4
age

FIG. 6. I . Plot of vocabulary data, showing fitted regression and 95% confidence interval (dotted line).

Display 6.2 Using the Simple Linear RegressionModel for Prediction

The Predicted =pome value correspondingto a valueof the explanatory variable of


say x0 is

An estimator of the variance of a predicted value is provided by

(Note thatthe variance. of the predictionincreases as x0 gets further away from L) A confidence intervalfor a predictioncan now be constructed in the usual way: Here t is the value oft with n confidence interval.

-2 degrees of freedom for the required 100(1- m)%

I66

CHAPTER 6
TABLE 6 3
Predictions and Their Confidence Intervals for the Vocabulary Data

Age

Predicted Vmabulary Size

SE ofPredicfion

95% Cl

5.5 7.0 10.0 20.0

2326.7 (2018.2635) 3169.6 (2819.3520) (4385,5326)

133.58 151.88 203.66 423.72

(9496,11453)

Display 6.3
Simple LinearRegression Model with Zero Intercept

The model is now


yi

= BXi +ci.

Application of leastsquares to this model gives the following estimator for p.

An estimator of the varianceofb is givenby

where s2 is the residual meansquare from the relevant analysis of variance table. into future ages at the same rate as observed between ages 1 and 6, namely between approximately and 610 words per year. assumption is clearly false, 513 This because the rate vocabulary acquisition will gradually decrease. Extrapolation of outside the range of the observed valuesthe explanatory variableis known in of general to be a risky business, and particular exampleis no exception. this Now you might remember that in Chapter was remarked that you do not 1 it if believe in a model you should not perform operations and analyses that assume it is true. Bearing this warning in mind, is the simple linear model fitted to the A little thought shows it is not. The estimated that vocabulary data really believable? on vacabulary size at age zero,that is, the estimated intercept they, axisis -763.86, with an approximate95% confidence intervalof (-940, -587). This is clearly a It silly interval avalue that is known apriori to be zero.reflects the inappropriate for nature of the simple linear regression model for these data.An apparently more suitable model would be one in which the intercept is constrained to be zero. Estimation for such amodel is describedinDisplay 6.3.Forthe vocabulary data, the

LINEAR AND MULTIPLE REGRESSION

ANALYSIS

I7 6

3
Age

line.

FIG. 6 . 2 . Plot of vocabulary data, showing the fitted zero intercept

estimated value of the regression coefficient is 370.96, with an estimated standard error of 30.84. The fitted zero intercept line and the original data are plottedin Figure 6.2. It is apparent from this plot that our supposedly more appropriate model does not fit the data as well as the one rejected as being unsuitable on logical grounds. So what has gone wrong? The answer is that the relationship between age and vocabulary sizemore complex than allowed forby either of is is the two regression models considered above. One way of investigating the possible failings of a model was introduced in Chapter 4, namely the examinationof residuals, that is, the difference between an ji, observed value,yi, and the value predicted by the fitted model,

ri = yi ji.

(6.2)

In a regression analysis thereare various ways of plotting these values that can
be helpful in assessing particular components the regression model. The most of useful plotsace as follows.

A histogram or stem-and-leaf plotof the residuals can be useful in checking for symmetry and specifically for normality of the error terms in the regression model. 2. Plotting the residuals against the corresponding valuesof the explanatory variable. Any sign of curvature in the plot might suggest that, say, aquadratic term in the explanantory variable should be included in the model.

1.

168

CHAPTER 6

-I
Residusl 0

+l

..::::*.. .: - *::* ... *. . .


S .

XI

FIG. 6.3.

Idealizedresidual p o s lt.

3. Plotting the residuals against the fitted values of the response variable (not spelled out in Rawlings,1988). the response values themselves, for reasons Ifthe variability the residuals appears increase with the size fitted of to of the values, a transformation of the response variabletoprior is indicated. fitting Figure 6.3 shows some idealized residual plots that indicate particular points about models.

1. Figure 6.3(a) is what looked forto confirm that the fitted model approis is priate. 2. Figure 6.3(b) suggests that the assumption of constant is not justified variance so that some transformation of the response variable before fitting might b sensible.

LINEAR AND MULTIPLE REGRESSION

ANALYSIS

169

TABLE 6.4 Residuals from Fitting a Simple Linear Regression to the Vocabulary Data

Size Vocabulary Axe

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5


5.0

3 22 272

-201.93 79.03 360.00

446
8% 1222 1540 1870 2072 2562

640.96
92 1.93 120.89 1483.86 1764.82 2045.79 2607.72

6.0

204.93 -57.03 -87.99 -194.96 -25.92 -19.11 56.15 105.19 26.22 -45.70

3. Figure 6.3(c) implies that the model requires a quadratic term in the explanantory variable.

( npractice, of course, the residual plots obtained might be I somewhat more difficult to interpret than these idealized plots.) Table 6.4 shows the numerical values the residuals for the vocabulary data of and Figure 6.4 shows the residuals plotted against age. Here there very few are observations on which to make convincing claims, but the pattern of residuals do seem to suggest that a more suitable model might be found these data-see for Exercise 6.1. (The raw residuals defined and used above suffer from certain problems that make them less helpful in investigating fitted models than they might be. The problem will be taken up detail in Section 6.5, where we also discuss a number in of other regression diagnostics.)

6.3. MULTIPLELINEAR REGRESSION


Multiple linear regression represents a generalization to more than a single explanatory variable,of the simple linear regression procedure described prein the and vious section. Itis now the relationship between a response variable several explanatory variables that becomes of interest. Details of the model, including the estimationof its parameters by least squaresand the calculation of standard errors, are given in Display 6.4. (Readers to whom a matrix is still a mystery should avoid this display at all costs.) The regression coefficients in the

0
b

b
b b b b

170

LINEAR AND MULTIPLE REGRESSION ANALYSIS


Display 6.4 Multiple RegressionM d l oe

171

T h e multiple h a regression modelfor a response variable with observed values er y Y I ,yz, . . y and p explanatory variables,X I ,x2, . . xp,with observed values ., , .,
xil.xi2,.

. , fori = 1,2,. ., is . xip .n,


Yi

The regression inthe response variable associated with a unit change in the corresponding explanatory in variable, conditionul on theother explanatory variables the model remaining unchanged. The explanatory variablesare strictly assumed to be fixed; thatis, they are not results from a random variables.In practice, wherethis is rarely thecase, the multiple regressionanalysis are interpreted as beiig conditionul on the observed values of the explanatory variables. The linear in multiple linear regression refersto the parameters rather than the linear if, for example, a quadratic term, explanatory variables,so the model remains nonlinear model is for one of these variablesis included. ( A n example of a
y = plep2x1

= Bo +BIXII Bzxiz + + Bpxip + 1. + coefficientsPI, . .B,, give the amount ofchange h, . ,


*

The residual terms in the model,cl, i = 1, . . n, are assumed to have a normal ., distribution with mean and varianceoz. This implies that, for given values of zero the explanatory variables,the response variableis normally distributed with a mean that is a linear function of theexplanatory variables and a variance that is not dependent on these variables. The aim of multiple regressionis to arrive at a set of values for the regression coefficients that makes the values theresponse variable predicted of from the model as close as possible to the observed values. As in the simple linear regression model,the least-squares procedureis used to estimate the parameters themultiple regression model. in The resulting estimatorsare most conveniently written with the ofsome help The matrices and vectors. result mightlook complicated particularlyif you are not very familiar with matrix algebra, but you take my word that the result can looks even more horrendous when written without the of matricesand use vectors! Sobyintroducingavectory=[yl,y2, . . . , y n ]a n d a n n x ( p + l ) m a t n j t X given by

f13e@4Xz.)

we canwrite the multiple regression model then observations conciselyas for

(Continued)

172

CHAPTER 6
Display 6.4 (Continued)

The least-squares estimators the parameters in the multiple regression modelre of a given by the set of equations

= (xX)Xy.
These matrix manipulations are. easily performed on a computer, but you must ensure that there are no linear relationships between the explanatory variables, for is example, that one variable the sum of several others; otherwise, your regression software will complain.(SeeSection 6.6.) (More details of the model in matrix form and the least-squares estimation process are given in Rawlings, 1988.) The variation in the response variable can partitioned into a part due to regression be on the explanatory variables and a residual for simple linear regression. These can as be arranged in an analysis of variance tableas follows.
Source

Regression Residual

RGSS p n - - RSS p 1

DF

ss

RGSSlp RSS/(n p

MS

-1)

F RGMSIRSMS

The residual mean squaresz is an estimator of d. The covariance matrix the parameter estimates the multiple regression model of in is estimated from
S,

= s*(xx)-l.

The diagonal elementsof this matrix give the variances the estimated regression of coefficients and the off-diagonal elements their covariances. A measure of the fit of the model is provided by the mulriple correlation coeficient, R,defined as the correlation between the observed values the response variable, of y , , . . yn,and the values predicted by the fitted model, that is, .,
j+ =h++Blxil . . + B p ~ p +*

The valueof RZ gives the proportionof variability in the response variable accounted forby the explanatory variables. multiple regression model give the change expected in the response variable to be when the corresponding explanatory variable changes by one unit, conditional on the other explanatory variables remaining unchanged. is often referred to This as partialling out or controlling for other variables, although such terms are probably best avoided. As in the simple linear regression model described the in previous section, the variability in the response variable in multiple regression can be partitioned into a part due to regression on the explanatory variables a and residual term.The resulting F-test can be used to assess the omnibus,and in most practical situations, relatively uninteresting null hypothesis, all the regression that coefficients are zero, i.e., none of the chosen explanatory variables predictive are of the response variable.

LINEAR AND MULTIPLE REGRESSION ANALYSIS

173

6.3.1. A

Simple Example of Multiple Regression

As a gentle introduction to the topic, the multiple linear regression model will be applied to the data on ice cream sales introduced in Chapter 2 (see Table 2.4). thirty Here the response variable the consumptionof ice cream measured over is 4-week periods. The explanatory variables believed influence consumption are to average price and mean temperature the 4-week periods. in A scatterplot matrix the three variables (see Figure 6.5) suggests that temperof of ature is of most importancein determining consumption. The results applying are the multiple regression model to these data shown in Table 6.5. The F value of 23.27 with 2 and 27 degrees of freedom has an associated p value that is very small. Clearly, the hypothesis that both regression coefficients are zero is not tenable. For the ice cream data, the multiple correlation coefficient (defined in Display 6.4) takes the value 0.80, implying that the two explanantory variables, price and temperature, accountfor 64% of the variability in consumption. The negative regression coefficient for price indicates that,a given temperature, consumption for decreases with increasing price. The positive coefficientfor temperature implies that, for a given price, consumption increases with increasing temperature. The is more sizes of the two regression coefficients might appear to imply thatofprice importance than temperature in predicting consumption.this is an illusion proBut duced by the different scales the two variables. The raw regression coefficients of should not be used to judge the relative importance of the explanatory variables, although the standardized values of these coefficients can, partially at least, be used in this way. The standardized values might be obtained applying the reby gression model to the values of the response variable and explanatory variables, standardized (divided by) their respective standard deviations. such an analyIn sis, each regression coefficient represents the change standardized response in the unit variable, associated with a change standard deviation in the explanatory of one variable, again conditional on the other explanatory variables remaining constant The standardized regression coefficients can, however, be found without undertak ing this further analysis, simplyby multiplying the raw regression coefficient by the standard deviation the appropriate explanatory variable and dividing of by the In standard deviationof the response variable. the ice cream example the relevant r standard deviationsa e as follows.
Consumption: 0.06579, Price: 0.00834, Temperature: 16.422. The standardized regression coefficients become as shown. for the two explanatory variables

174

CHAPTER 6

8-

2
8
0

0 0

0
00

0
0

N-

60

0 0

m
0
0

0
0

0 0
0

0
0 0 0 0 0 0 0 0 00 0 0 0 0

- 0

00
0

8-

0 0
0

0 0 0
0
I I

8
I I I

0 0
I I

.35

0.25

50

30

40

60

70

FIG. 6.5. Scatterplot matrix of variables in ice cream data.

LINEAR AND MULTIPLE REGRESSION ANALYSIS


TABLE 65 Multiple Regressionfor the Ice Cream Consumption Data

175

The model for the data is consumption = &I

+61 x price +82 x temperature.


SE
0.9251 0.0005

The least-squaresestimates of the regression coefficients in the model are as follows.

Pammefer

Estimate

BI 82
The ANOVA tableis Source Regression Residual

-1.4018 0.0030

ss

DF
2 27

MS
0.03972 0.00171

F
23.27

0.07943 0069 .40

Price:
-1.4018 X 0.00834 0.06579
= -0.1777. :

Temperature:
0.00303 x 16.422 = 0.7563. 0.06579

A comparison of the standardized coefficientsnow suggests that temperature is more importantin determining consumption. One further point about the multiple regression model that can usefully be illustrated on this simple example is the effect of entering the explanatoryvariables into the model a diferent order. This will become of more relevance in in later sections, but simply examining some numerical results from the ice cream data will be helpful for now. The relevant results are shown in Table 6.6. The change produced in a previously estimated regression coefficient when addian tional explanatory variableis included in the model results from the correlation between the explanatory variables. the variableswere. independent (sometimes If the tern orthogonal is used), their estimated regression coefficients would remain unchanged with the addition of more variablesto the model. (As we shall see in Section 6.7, this lack of independence of explanatory variablesis also the explanation of the overlapping sums squares found in unbalanced factorial deof 4.) signs; this topic is introduced in Chapter

1 76

CHAPTER 6
TABLE 6.6 Multiple Regressionfor the Ice Cream Data: The Effect of Order

Entering Price Followed by Tempemrum


(a) The first model fittedis consumption = B0

() b The parameter estimates in this model = 0.92 and 8, = -2.05. The multiple correlation coefficient 0.26 and theANOVA table is is
Source

areao

+PI x price.
MS

ss

DF
2.02 1 28

Regression Residual
(C)

0.00846 0.1171

0.00846 0.00418

Temperature is now addedto the model; that the following new model fitted: is, is consumption = B0 /31 x price x temperature.

(d) The parameter estimates and ANOVA table are now as in Table 6.5.Note that the for in above. b estimated regression coefficient price is now different from the value given( )
ErtleringTempemtum Followed by Price

(a) The first model fitted is now

consumption = B0 BI x temperature. ( )The parameter estimatesambo = 0.21 and = 0.0031. The multiple correlation is 0.78 and b the ANOVA table is
Source

ss

DF
42.28 1

MS
0.07551

1786

Regression Residual 28

0.07551 0.05001

(c) Price is now added to the modelto give the results shown in Table6.5.

6.3.2. An

Example of Multiple Linear Regression in which One of the Explanatory Variables is Categorical

The data shownin Table 6.7 are taken froma study investigatinga new method of measuring bodycomposition,and give the body fat percentage, age sex for 20 and normal adults between 23 and 61 years old. The question of interest is how are percentage fat, age and sex related? The data in Table 6.7 include the categorical variable, sex. Can such an exif planatory variable be included in a multiple regression model, and, so, how? In fact, it is quite legitimate to includecategorical variable suchas sex in a multiple a

LINEAR AND MULTIPLE REGRESSION

ANALYSIS

177

TABLE 6.1 Human Fatness, Age. and Sex

Age

Percentage Far
9.5 27.9 7.8 17.8 31.4 25.9 27.4 25.2 31.1 34.7 42.0 20.0 29.1 32.5 30.3 21.0 33.0 33.8 33.8 41.1

SX e

23 2 3 27 27 39 41 45 49 50 53 53 54 54 56 57 57 58 58

0 1 0 0 1 1

0
1 1 1 1 0 1 1 1 0 1 1 1 1

60
61

Note. 1,female; 0,male.

regression model. The distributional assumptions the model (see Display of 6.4) apply only the response variable. Indeed, the explanatory variables are not strictl to considered random variables all. Consequently the explanatory variables in at can, theory at least, be m y type of variable. However,care is needed in deciding how to incorporate categorical variables more than two categories, we shall see with as later in Chapter 10. However, for a twocategory variable suchas sex there are no real problems, except perhaps interpretation. Detailsof the possible regression in models for the human fat data e given in Display 6.5. a r The results of fitting a multiple regression model to the human fat data, with a age, sex, and the interaction of age and as explanatory variables, re shown in sex Table 6.8.The fitted model can be interpreted following manner. in the

1. For men, for whom sex is coded as zero and so sex x age is also zero, the fitted modelis

%fat = 3.47 0.36 x age.

(6.5)

I8 7

CHAPTER 6

Display 6.5 Possible Models that Might Considered for the Human Fat Data Table 6.4 be in The first model that might considered is the simple be linear regression model relating percentagefat to age: %fat =

x age.

After such a model a been fitted, afurther question of interest might be, Allowing hs for the effect of age,does a persons sex have any bearing their percentage of on fatness? The appropriate model would be
%fat=~+~lxage+~2xsex, where sexis a dummy variable that codes men as zero and womenas one. The model now describes the situation shownin the diagram below, namely two parallel l i e s with a vertical separation B . of Because in this case, the effect of age * on fatness is the same for both sexes,or, equivalently, the effect of a persons sex is there no the same for all ages; the model assumes that is interaction between age and sex.

Age

Suppose now that a model r q i e that does allow for the possibility of age x is e u r d an sex interaction effect on percentage of fatness. Such a model must include a new the variable, definedas the product of variables ofage andsex. Therefore,the new the model becomes

sex to here thevalues of both and sex x age are zero and the model reduces
%fat = & ,

x age h x sex a?!, x age x sex. %fat = Bo To understand this equation better,first consider the percentage of fatness of men;

+B,x age.

(Continued)

LINEAR AND MULTIPLE REGRESSION ANALYSIS


Display 6.5

179

(Continued)

However, for women sex= 1 and so sex x age = age and the model becomes %fat

+(Bo +h)+ +B31x age.


(B1

Thus the new model allows the lines for males females tobe other than parallel and (see diagram below). The parameter3 is a measure. of the differencebetween the B slopes of the two lines.

Multiple RegressionResults for Human Fatness Data The model fitted is %fat=h+Bt xage+BIxsex+&xagexsex. The least-squares estimates of the regression coefficients model are as follows. in the Estimate Parameter SE
B 1 P3

TABLE6.8

0.19

0.35 16.64 -0.15

0.14 8.84

The multiple correlation coefficient takes the value 0.8738. The regressionANOVA table is Source ss DF MS Regression 3 Residual
1176.18 364.28 16 392.06 22.77

17.22

180

CHAPTER 6
0 P

-1

.... ....

Female

wS 8
c

0N

11-

..... ....
0

.. . ... . . .... . . . .. . ..... . . .... .... ... ..... ....

...... ......

... ... ... ... ....... ...... . 0 , ..... . ' .... 0

.. ... .. ...

..... ....

30

40 Age

60

FIG. 6.6. Plot of model for human fat data, which includes x age interaction.

a sex

2. For women, for whom sex is coded as one and so sex x age is simply age, the fitted model is
%fat = 3.47

+0.36 x age + 16.64 - 1 x age.(6.6) 0.1 +


(6.7)

Collecting together terms leads to %fat = 20.1 1 0.24 x age.

The fitted model actually represents separate simple linear regressions with different intercepts and different slopes men and women. Figure6.6 shows a for plot of the fitted model together with the original data. The interaction effect is clearly relatively small, and it seems that a model In including only sex and age might describe the data adequately. such a model the estimated regression cbfficient for sex (11.567)is simply the estimated difference between the percentage fat men and that of women, which of is assumed to be the same at all ages. (Categorical explanatory variables with more than two categories have reto of dummy variables,before being used coded int e r n of a series binary variables, in a regression analysis, but shall not give an example until Chapter we such 10.)

LINEAR AND MULTIPLE REGRESSION

ANALYSIS

11 8

6.3.3.

Predicting Crime Rates in the USA: a More Complex Example of Using Multiple Linear Regression

In Table 6.9,data for states of the USA are given. These data will be used to in47 vestigate how the crime rate1960depended on the other variables listed. The data in originatefrom the UniformCrime Report of the F B I and other government sources. Again it is useful to begin OUT investigation of these data by examining the scatterplot matrix the data; is shown in Figure 6.7.The scatterplots involving of it crime rate (the row ofFigure 6.7) indicate that some, at least, explanatory top of the plot variables are predictive of crime rate. One disturbing feature of the is the very strong relationship between police expenditure1959 and in 1960.The reasons in that such a strongly correlatedofpair explanatory variables can cause problems for multiple regression will be taken in Section 6.6. Here we shall preempt one up of the suggestions to be made in that sectionfor dealing with the problem, by simply dropping expenditurein 1960 from consideration. are The resultsof fitting the multiple regression model to the crime rate data given in Table 6.10.The global hypothesis that the regression coefficients are all zero is overwhelmingly rejected. The square of the multiple correlation coeffi12 75% cient is 0.75, indicating that the explanatory variables account for of the variability in the crime rates the 47 states. of The overall test that all regression coefficients in a multiple regression are zero is seldom of great interest. In most applications it will rejected, because be it is unlikely that all the explanatory variables chosen for study will be unrelated to the response variable. The investigatorfar more likely to be interested is in the question of whether some subset of the explanatory variables exists that might be a successful as the full set in explaining the variation in the response variable. If using a particular (small) number explanatory variables results in of a model that fits the data only marginally worse than a much larger set, then a more parsimonious description of the data is achieved (see Chapter 1). How can the most important explanatory variables be identified? Readers might look again at the results for the crime rate data in Table 6.10 and imagine that the answer to this question is relatively straightforward; namely, select those variables for which the corresponding statistic is significant and drop the remainder. t Unfortunately, such a simple approach is only of limited value, because of the relationships between the explanatory variables. For this reason, regression coefficients and their associated standard errors are estimated condirional on the other variables the model. If variableis removed from a model, the regression in a coefficients of the remaining variables (and their standard errors) to be reeshave timated from a further analysis. (Of course, explanatory variables happened if the to be orthogonal to one another, there would be no problem and the t statistics could beused in selecting the most important explanatory variables. howThis is, ever, of little consequencein most practical applicationsof multiple regression.)

1 2 8
Crime in the U S A 1960
State

CHAPTER 6
TABLE 6.9

Age

Ed

Ex0

Er1

L P

NW

U1

U2

l 2 3 4 5

7 8 9 10 11 12 13 14 15 16 17 l8 19 20 21 22 23 24
25

26 27 28 29 30 31 32 33 34 35 36 37 38 39
40

79.1 163.5 57.8 196.9 123.4 68.2 96.3 155.5 85.6 70.5 167.4 84.9 51.1 66.4 79.8 94.6 53.9 92.9 75.0 122.5 74.2 43.9 121.6 96.8 52.3 199.3 34.2 121.6 104.3 69.6 37.3 75.4 107.2 92.3 65.3 127.2 83.1 56.6 82.6 115.1

151 143 142 136 141 121 l27 1131 1 157 140 124 134 128 135 1 152 1 142 143 135 130 125 126 1 157 132 131 130 131 135 152 119 166 140 125 1 147
126

0 0 0

91 113 89 121 121 110 111 109


90

118 0 105 0 108 0 113 0 117 87 88 0 110 66 1 104 123 0 116 128 0 108 113 0 108 74 89 47 0 % 87 0 116 78 0 116 63 0 121 160 0 109 69 0 112 82 0 107 16 6 1 89 58 0 93 55
0 0 0 0 0 1 0 l 109 104 90

58 103 45 149 109 118 82 115 65 71 121 75 67 62 57 81

123 150 177 133 149 1 145

63 118 97 102 97 100 109 87 58 104 51 88 61 104 82

56 510 950 33 301 95 583 1012 13 102 44 533 969 18 219 141 577 994 157 80 101 591 985 18 30 115 547 964 25 44 4 79 519 982 139 50 109 542 969 179 39 62 553 955 286 68 632 1029 15 7 116 580 966 101 106 71 595 97259 47 60 624 972 28 10 61 595 986 22 46 53 530 986 72 30 77 497 956 321 33 6 63 537 977 10 115 537 978 31 170 128 536 934 51 24 105 567 985 78 94 67 602 984 34 12 44 512 962 22 423 83 564 953 43 92 73 574 103836 7 57 641 984 26 14 143 63l 107177 3 71 540 965 6 4 76 571 1018 79 10 157 521 938 89168 54 521 973 4 6 2 5 4 6 20 54 535 1045 81 586 964 82 97 64 560 972 95 23 97 542 990 21 18 87 526 948 76113 98 53 l 964 9 2 4 56 638 974 24 349 47 599 1024 7 4 0 36 54 515 953 165 74 560 981 126 96

108 96 94 102 91 84 97 79 81 100 77 83 77 77 92 116 114 89 78 130 l02 97 83 142 70 102 80 103

41 36 33 39 20 29 38 35 28 24 35 31 25 27 43 47 35 34 34 58 33 34 32 42 21 41 22 28 92 36 72 26 135 40 105 43 76 24 102 35 124 50 87 38 76 28 99 27 86 35 88 31

394 557 318 673 578 689 620 472 421 526 657 580 507 529 405 427 487 631 627 626 557 288 513 540 486 674 564 537 637 396 453 617 462 589 572 559 382 425 395 488

261 194
250

167 174 126 168 206 239 174 170 172 206 190 264 247 166 165 135 166 195 276 227 176 1% 152 139 215 154 237 200 163 233 166 158 153 254 225 251 228

(Continued)

LINEAR AND MULTIPLE

REGRESSION ANALYSIS
TABLE6.9

183

(Continued)

41 42 43 44 45 47
46

88.0 54.2 82.3 103.0 45.5 50.8 84.9

148 141 162 136 139 126 130

0 0

122

109

1 0 0

121

99

88 104 121

72 56 75 95 106
46
90

66 S4 70 96 41 97 91

601 S23 S22 574 480

1012
968 989 1049

998 968 996

40

9 4

623

S99

29 19

40

19 2 208 36 49 24 22

84 107 73 135 78 113

111

20 37 27

590 489 496 37 622 S3 457 2 S93 5 40 588

144 170 224 162 249 171 160

Note. R,crime rate: number of offences known to the policel,ooO,oo0 population; Age, age per distribution: the number of males aged 14-24 years per loo0 of total state population; S, binary (coded as l) from the rest: E . educational level: mean number d variable distinguishing southern states 2 of years of schooling x 10 of the population aged 2 5years; ExO, police expenditure: per capita in expenditure on police protection by state and local government 1969; Exl. police expenditureas ExO, but in 1959; LP,labor force participation rate loo0 civilian urban males in thep age 14-24 per up years; M,number of males per loo0 females; N. state population in hundreds of thousands; W , size N number of non-Whites per 1o00, U1. unemployment rate of urban males per o 0 in the agep u p lo 14-24 years; U2, unemployment rate of urban males per I O 0 0 in the age p u p 35-39 years; W, wealth as measured by the median value of transferable gods and assets or family income (unit 10 dollars); X income inequality:the number of families per o 0 earning below In of the median lo income.

Because the simplestatistics may be misleading when is trying to choose t one subsets of explanatory variables, a number of alternative procedures have been suggested; two these are discussed in the next section. of

6.4. SELECTING SUBSETS OF EXPLANATORY VARIABLES


In this section we shall consider possible approaches to the problem identwo of of as tifying a subset explanatory variables likely to be almost informative as the is complete set of variables for predicting the response. The first methodknown as a lpossible subsets regression,and the second is known as automatic model l selection.

6.4.1. All Subsets Regression


Only the advent modem, high-speed computing and the widespread availability of of suitable statistical sofware has madethis approach to model selection feasible. We do not intend provide any details concerning the actual calculations involved, to but instead we simply state that the result of the calculations of either all is a table

184

CHAPTER 6

1
S,
50

0.0

0
O S

80

140

503

loo

80140

XfJm

FIG. 6.7. Scatterplot matrix of crime rate data.

LINEAR AND MULTIPLE

REGRESSION ANALYSIS

185

TABLE 6.10 Multiple Regression Results Crime Rate Data for (Ex0 not Used)
Estiinated Coe&ients, SEs and I Values Value Parameter
SE

(Intercept) Age S

Ex1
M

Ed

LF

N N W U1
U2 W X

-739.89 I.09 -8.16 l .63 1.03 0.01 0.19 -0.016 -0.002 -0.62 0.15 0.82
1.94

155.55 0.43 15.20 0.65 0.27 0.15 0.21 0.13 0.07 0.44 0.17 0.87 0.03 0.11 0.24

-4.76 2.53 -0.54 2.50 3.19 0.03 0.88 -0.13 -0.03 -1.39 2.24 1.37 3.41

4.001 0.02 0.59 0.02 4.001 0.97 0.39 0.90 0.97 0.18 0.01

Note. Parameter abbreviations definedinTable 6.9. Residual are standard error, 22.35 on 34 degrees of freedom; MultipleR squared, 0.75; F statistic, 8.64 on 12 and 34 degrees of freedom; the p value is 4 0 1 .

possible models, or perhaps a subset of possible models, in which each candidate model is identified by the list of explanantory variables it contains and the values of one or more numericalcriteria to use in comparing the various candidates. Such a table is usually organized according to the number of explanatory variables in the candidate model. p 1 For amultiple regression with explanatory variables,a total of 2P - models a epossible, because each explanatory variable can in or out of the model, and r be the model in which they areall out is excluded. So, for example, for the ice cream data with p = 2 explanatory variables, there r three possible models. ae Model 1: temperature, Model 2: price, Model 3: temperature and price.

In the crime rate example withp = 12explanatory variables, there 212- = are 1 4095 models to consider! The numericalcriterionmostoftenused for assessingcandidatemodels is Mallows c statistic, which is described in Display 6.6. k

186
Display 6.6 Mallows Ct Statistic

CHAPTER 6

Mallows Ct statistic is defined as


c k

= @ssk/s2)- (n 2p),

where RSSk is the residualsum of squares from the multiple regression model with a set of k explanatory variables and is the estimateof uzobtained from the model sz that includesall the explanatory variables under consideration. If Ct is plotted againstk,the subsetsof variables worth considering searching for in a parsimonious model are those lying close to theline C, = k. In this plot the value of k is (roughly) the contribution to k from the variance the c of estimated parameters, whereas remaining C, k is (roughly) the contribution the from the bias ofthe m d l oe. This feature makes the plot useful device for abroad assessment of the c values a k of a range of models.

Results of all Subsets Regression for Crime Rate Data Using Only Age, Ed, and Ex1

TABLE 6.11

Model

ck

Size Subset

Variables in Model

9.15 41.10 50.14 3.02 11.14 42.23 4.00

Ex 1 Ed Age

Age,Exl EdBXl
Age,Ed AgeSExl

Note. The intercept is countedas a term in the model. Abbreviations defined are in 'Ihble 6.9.

Now for an example and to begin, we shall examine all subsets approach the the crime data, but we use only the three explanatory variables, Age,Ed, and Exl. The resultsare shown in Table 6.11,and the plot describedi Display 6.6 is n shown in Figure Here only the subset Age and Ex1 appears an acceptable 6.8. to be alternative to the use all three variables. of Some of the results from using the all subsets regression technique on the crime rate data with 12 explanatory variables are given in Table6.12, and the all values appears in Figure Here the plot is too cluttered 6.9. corresponding plot of on

c k

LINEAR AND MULTIPLE REGRESSION ANALYSIS

187

51-

8-

Ed

'3

2 83 x

2
S
a

m -

8-

E- U1
I

E4U1

W.U1

4kY.E
I

Size of subset

FIG. 6.8. Plot of c against k for all k

subsets regression, using three explanatory variablesin the crime rate data.
to be helpful. However, an inspection of the results from the analysis suggests that the followingare two subsets that give a good description of the data. 1. Age,Ed, Exl, u2, 2. Age,Ed, Exl, w,x:

c 6 c 6

= 5.65, = 5.91.

6.4.2.

AutomaticModelSelection

Software packages frequently offer automatic methods selecting variablesfor of a final regression model from a list of candidate variables. Thereare three typical approaches:(1) forward selection,(2) backward elimination, and(3) stepwise regression. These methods rely on significance tests known partial F tests to as or select an explanatory variable for inclusiondeletion from the regression model. The forward selection approach begins an initial model that contains only a with constant term, and it successively adds explanatory variablesto the model until if the pool of candidate variables remaining contains no variables that, added to the current model, would contribute information is statistically important conthat cerning the mean value of the response. The backward elimination method begins with an initial model thatcontains all the explanatory variables under investigation and then successively removes variables until no variables among those remaining in the model can be eliminated without adversely affecting, in a statistical sense, the predicted value of the mean response. The criterion used for assessing

188

CHAPTER 6
TABLE 6.12 Some of the Results of All Subsets Regression for Crime Rate Data UsingAll 12 Explanatory Variables

Model

c k
33.50 67.89 79.03 80.36 88.41 89.80 90.30 90.38 93.58 93.61 20.28 23.56 30.06 30.89 31.38 32.59 33.56 34.91 35.46 35.48 10.88 12.19 15.95 16.82 18.90 20.40 21.33 21.36 22.19 22.27 8.40 9.86 10.40 11.19 12.39 12.47 12.65 12.67 12.85

Subset Sue

VariabIes in Model

1 2 3 4 5 6 7 8 9 10 l1 12 13 14 15 16 17 18 19 20 21 22 23 2 4 2 5 26 27 28 29 30 31 32 33 34 35 36 37 38 39

2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5

(Continued)

LINEAR AND MULTIPLE REGRESSION ANALYSIS


TABLE 6.12 (Continued)

189

Model

Ck

Size

Subset

Variables in Model

40

41 42 43 44 45
46

47 48 49 50

12.87 5.65 5.91 8.30 9.21 9.94 10.07 10.22 10.37 11.95 14.30

5 6 6 6 6 6 6 6 6 6 6

Note. Abbreviations are defined in Table 6.9.

whether a variable should be added an existing model in forward selection, to or removed in backward elimination, is based on the change in the residual sum of squares that results from the inclusion exclusion of a variable. Details of the or are criterion, and how the forward and backward methods applied in practice, are given in Display 6.7. The stepwise regression method of variable selection combines elementsof both forward selection and backward elimination. The initial model stepwise for regression is one that contains only a constant term. Subsequent cycles of the approach involve first the possible addition of an explanatory the current to variable model, followed by the possible eliminationof the variables included earlier, of one if the presence newvariables has made their contribution to the model no longer of significant. In the bestof all possible worlds, the final model selected by applying eachof this the three procedures outlined above would be the same. Often does happen, but it is in no way guaranteed. Certainly none of the automatic procedures for selecting subsetsof variables are foolproof. They must be used with care, the as (1996) makes very clear. following statement from Agresti
Computerized variable selection procedures shouldused with caution.When one be considers alarge number of terms for potential inclusion in a model, one two of or them that are not really important may look impressive simply to chance. For due instance, when all true effects are weak,the largest sample the effect may substantially

190

CHAPTER 6
ma

10

12

Size of subset

FIG. 6.9. c plottedagainst k forallsubsetsregression, . 4 l explanatory variables in the crime rate data. 2

using

overestimateits true effect.In addition, it often makes sense include variablesof to special interest in a model and report their estimated effects even if they are not statistically significant at some level. (See McKay and Campbell,1982a,1982b, for some more thoughtson automatic selection methods in regression.)

LINEAR AND MULTIPLE REGRESSION


Display 6.7

ANALYSIS

191

Forward Selection and BackwardElination


The criterion used for assessing whether a variable should be added to an existing an model in forward selection or removed from existing model in backward elimination isas follows:

where SS&3u.l.t S&idud.t+l is the decrease in the residual sum of squares when a variable is added to an existing k-variable model (forward selection), or the increase in the residual sum of squares when a variableis removed from an existing (k l)-variable model (backward elimination), andSS&dd.t+) is the residual sum of squares for the model includingk + 1 explanatory variables. The calculated F value is then compared with a preset term know the F-to-enter as In (forward selection)or the F-to-remove (backward elimination). the former, of calculated F values greaterthan the F-to-enter lead to the addition the candidate variable to the current model.In the latter, a calculatedF value less than the F-to-remove leads to discarding the candidate variable from the model.

to Application of both forward and stepwise regression the crime rate data results in selecting the five explanatory variables, , X,Ed,Age, and U2. Using the Exl backward elimination technique produces these five variables plus Table 6.13 W. shows some o f the typical results given by the forward selection procedure implemented in a package such as SPSS. The chosensubsetslargelyagreewiththe selection made by all subsets regression. It appears that one suitable model the crime rate data is that which includes for Age, E x l , Ed, U2, and X. The parameter estimates, and so on, for a multiple regression model including only these variablesare shown in Table 6.14. Notice that the estimated regression coefficients and their standard errors have changed from the values given in Table calculated for amodel including the original 6.10, 12 explanatory variables. The five chosen variables account 71% o f the variabilty for of the crime rates(theoriginal 12 explanatoryvariablesaccounted for 75%). Perhaps the most curious feature final model that crime rate increases with of the is increasing police expenditure (conditional on the other four explanatory variables). Perhaps this reflects a mechanism whereby as crime increases thepolice are given more resources?

6.5. REGRESSION DIAGNOSTICS


Having selected a more parsimonious model by usingone of the techniques described in the previous section, we still must consider one further important aspect of a regression analysis,and that is to check the assumptionson which the model

192
Forward Selection Resultsfor Crime Rate Data
Source

CHAPTER 6
TABLE 6.13

Model

ss
30586.257 38223.020 68809.277 38187.893 30621.383 68809.227 43886.746 24922.530 68809.277 46124.540 22684.737 68809.277 48500.406 20308.871 68809.277

DF
1 45 46 2 44 46 3 43 46 4 42 46 5 41 46

MS

1
2 3

Regression Residual Total Regression Residual Total Regression Residual Total Regression Residual Total Regression Residual Total

30586.257 849.400 19093.947 695.941 14628.915 579.594 11531.135 540.113 9700.081 495.338

36.009 27.436

25.240
21.349 19.58

Model Summary Model Ex1Constant, 1 2 Constant, Exl, X 3 Constant, Exl, X. Ed 4 Constant, Exl. X.Ed, Age 5 Constant, Exl, X Ed, Age, U2

R
0.667 0.745 0.799 0.819 0.840

R2
0.445 0.555 0.638 0.670 0.705

Note. No further variables are judged to contribute significantly to the prediction of crime rate. Model summary abbreviations are defined in Table 6.9.

is based. We have, in Section6.2, already described the use of residuals for this purpose, but this section we shall go into a little more detail in and introduce several are other useful regression diagnostics that now available. These diagnostics are methods for identifying and understanding differences between a model and the data to which it is fitted. Some differences between the data and the model may or be be the resultof isolated observations; one, a few, observations may outliers, or may differin some unexpected way from the of the data. Other differences rest may may be systematic;for example, a term be missingin a linearmodel. A number of the most useful regression diagnostics described in Display are 6.8.We shall now examine the use two of these diagnostics the final model of on

LINEAR AND MULTIPLE REGRESSION ANALYSIS


TABLE 6.14 Multiple Regression Results for the Final Model Selected Crime Rate Data for
Pammeter Estimate
I

I 93

SE

Intercept 0.37 Age Ex1 Ed

-528.86 1.02

99.62 0.16 0.50 0.16 0.45

-5.31
8.12 4.11 4.18 2.19

<.W1
4.001 .03

U2

1.30 2.04 0.65 0.99

. W

<.001
4.001

Note. Parameter abbreviationsare defined inTable 6.9. R* = 0.705.

selected for the crime rate data, and Figure 6.10 shows plots of the standardized and deletion residuals against fittedvalues. States 11, 19,and 20 might perhaps be seen as outliers because they fall outside the boundary, (-2,2). There is also perhaps some evidence that the variance of crime rate is not constant as required. (Exercise 6.6 invites readers to consider more diagnostic plots these for data.)

6.6. MULTICOLLINARITY
One of the problems that often occurs when multiple regression used in practice is is multicollinearity. The term is used to describe situations in which there are moderate to high correlations among some or all of the explanatory variables. Multicollinearity gives rise to a number of difficulties when multiple regression is applied.
1. It severely lmt the size of the multiple correlation coefficient R because iis the explanatory variables are largely attempting to explainof the same much variability in the response variable (see, e.g., Dizney and Gromen,1967). 2. It makes determining the importance a given explanatory variable difficult of because the effects of the explanatory variables are confoundedas a result of their intercorrelations. 3. It increases the variances of the regression coefficients, making the use of the fitted model for prediction less stable. The parameter estimates become unreliable.

1 94. Display 6.8 Regression Diagnostics

CHAPTER 6

T begin we need to introduce thehar mar% this is defined as o

H = X(XX)-X,
where X is the matrix introduced in Display 6.4. In a multiple regression the predicted values the response variable can be written of in matrix form as

9 =W
so H puts the hats on y. The diagonal elementsof H, i hi,,

= 1, . .n, are such that 0 5 h,, 5 1, with an ., average of value p/n. Observations with large values of h,, are said to be leverage of points, andit is often informative to produce a plot h,, against i , an indexplot, to identify those observations that have leverage, thatis, have most effect on high the estimation of parameters. The raw residuals introduced in the text not independent, nor do they have the are same variance becausevar(r,) = a2(1- Both propexties hi,). make. them lessuseful than they might be. W Oalternative residuals the standardized residual,r p , and thedeletion are residual, r?, defined as follows:
rp=
del r, =

ri

ri

where S& is the residual meansquare estimate of a*, after the deletion of observahoni . The deletion residualsare particularly good helping toidenw outliers. in A further useful regression diagnostic Cooks disfunce. defined as is

D = l

r?w
&zi=i5l

Cooks distance measures the influence observationhas on estimating the i all greater than one suggest that the regression parameters in the model. Values corresponding observation undue influence onthe estimated regression has coefficients. A full accountof regression diagnosticsis given inCook and Weisberg (1982).

Spotting multicollinearity among a set of explanatory variables may not be The obvious course of action simply to examine the correlations between these is variables, but although this is often helpful, it is by no means foolproof-more subtle forms of multicollinearity may be missed. An alternative, and generally far more useful, approach is to examine what a e know as the variance infition r

11

11

36
23 2

36
23

26

2 26
8
4

1 7 4
4

33
3
21 9

39

24 40

38
3 1
45
46 47 47

22

22

19

29

1 9

29

FIG. 6.10. Diagnostic plots for final model selected for crime rate data: la) stand. deletion residuals agains; predictardized residuals against fitted values and (b) ed values.

196

CHAPTER 6

factors (VIFs) of the explanatory variables. The variance inflation factor VIFj, for the jth variable, is given by

where R; is the square of the multiple correlation coefficient from the regression of the jth explanatory variable on the remaining explanatory variables. The variance inflation factor of an explanatory variable indicates the strength of the linear relationship between the variable and the remaining explanatory variables. A rough rule of thumb is that variance inflation factors greater than 10 give some cause for concern. Returning to the crime rate data, we see that the variance inflation factors of each of the original 13 explanatory variables are shown in Table 6.15. Here it is clear that attempting to use both Ex0 and EX1 in a regression model would have led to problems. How can multicollinearity be combatted? One way is to combine in some way explanatory variables that are highly correlated-in the crime rate example we could perhaps have taken the mean of Ex0 and Exl. An alternative is simply to select one of the set of correlated variables (this is the approach used in the analysis of the crime rate data reported previously-only Ex1 was used). Two more complex
TABLE 6.15 VIFs for the Original 13 Explanatory Variables in the Crime Rate Data

Variable

VIF

Age S Ed Ex0 Ex 1 LF M N

N w
u1 u2

w
X

2.70 4.76 5.00 100.00 100.00 3.57 3.57 2.32 4.17 5.88 5 .OO 10.00 8.33

Note. Variable abbreviations are defined in Table 6.9.

LINEAR AND MULTIPLE REGRESSION ANALYSIS

197

possibilities are regression on principal components and ridge regression, both of which are described in Chatterjee and Price (1991).

6.7. THE EQUIVALENCE OF MULTIPLE

REGRESSION AND ANALYSIS OF VARIANCE: THE GENERAL LINEAR MODEL


That (it is hoped) ubiquitous creature, the more observant reader, will have noticed that the models described in this chapter look vaguely similar to those used for analysis of variance, met in Chapters 3 and 4. It is now time to reveal that, in fact, the analysis of variance models and the multiple regression model are exactly equivalent. In Chapter 3, for example, the following model was introduced for a one-way design (see Display 3.2):
yij

+ ai +

Eij.

(6.9)

Without some constraints on the parameters, the model is overparameterized (see Chapter 3). The constraint usually adopted is to require that
k

(6.10)
i=l

where k is the number of groups. Consequently we can write


ax. = --a1
- - a 2 - * * * - ak-1-

(6.11)

(Other constraints could be used to deal with the overparameterization problemsee Exercise 6.5.) How can this be put into an equivalent form to the multiple regression model given in Display 6.4? The answer is provided in Display 6.9, using an example in which there are k = 3 groups. Display 6.10 uses the fruit fly data (see Table 3.1) to show how the multiple regression model in Display 6.4 can be used to find the relevant sums of squares in a one-way ANOVA of the data. Notice that the parameter estimates given in Display 6.10 are those to be expected when the model is as specified in Eqs. (6.9) and (6.10), where p is the overall mean of the number of eggs laid, and the estimates of the a,are simply the deviations of the corresponding group mean from the overall mean. Moving on now to the two-way analysis of variance, we find that the simplest way of illustrating the equivalence of the model given in Display 4.1 to the multiple regression model is to use an example in which each factor has only two

198

CHAPTER 6
Display 6.9 Multiple Regression Model for a One-way Design with Three Groups

Introduce two variables x 1 and x~ defined below, to label the group to which an observation belongs.

X]

x, .

1 1 0

Group 2 0 1

3 -1 -1

The usual one-way ANOVA model for this situation is

which, allowing for the constraint,

c:=,0,
a,=

we can now write as

This is exactly the same form as the multiple regression model in Display 6.4.

Display 6.10 One-way ANOVA of Fruit Fly Data, Using a Multiple Regression Approach
Define two variables as specified in Display 6.9. Regress the number of eggs laid on X I and x2. The analysis of variance table from the regression is as follows. Source Regression Residual SS 1362 5659 DF 2 72 MS 681.1 78.6

F 8.7

This is exactly the same as the one-way analysis of variance given in Table 3.2. The estimates of the regression coefficients from the analysis are

fi = 27.42,
&I

= -2.16, &., = -3.79.

The estimates of the a;are simply the differences between each group mean and the grand mean.

levels. Display 6.1 1 gives the details, and Display 6.12 describes a numerical example of performing a two-way ANOVA for a balanced design using multiple regression. Notice that in this case the estimates of the parameters in the multiple regression model do not change when other explanatory variables are added to the model. The explanatory variables for a balanced two-way design are orthogonal (uncorrelated). But now consider what happens when the multiple regression

LINEAR AND MULTIPLE REGRESSION ANALYSIS

199

Display 6.11 Multiple Regression Model for a 2 x 2 Factorial Design (Factor A at Levels A1 and A2, Factor B at Levels B1 and B2)

(See Chapter 4.) The usual constraints on the parameters introduced to deal with the overparameterized model above are as follows.

The constraints imply that the parameters in the model satisfy the following equations:
a = -a2, 1 81 = - 8 2 , Y I j = -y2J? XI = - X 2 .

The last two equations imply the following: Y E = -Y11 Y21 = -Y11 Y2z = Y11. In other words, there is only really a single interaction parameter for this design. The model for the observations in each of the four cells of the design can now be written explicitly as follows. A1 A2 B1 P + a l +PI +YIl P--l +81 -Y11 B2 P + ~- 8 1 - Y I I @ - - - I - 8 1 + Y I I I Now define two variables x1 and x2 as follows.
x1 = 1 if first level of A, X I = -1 if second level of A,

x2 = 1 if first level of B, x l = - 1 if second level of B. The original ANOVA model can now be written as YlJk = p where x3 = X I x
x2.

+ a I x l + 81X2+ y1lx3 +

'IJk,

200

CHAPTER 6
Display 6.12 Analysis of a Balanced Two-way Design by Using Multiple Regression
Consider the following data, which have four observations in each cell.

B1

B2

A1 23 25 27 29 26 32 30 31

A2 22 23 21 21 37 38 40 35

Introduce the three variables X I , x2, and xj as defined in Display 6.1 1 and perform a multiple regression, first entering the variables in the order XI,followed by x?, followed by x3. This leads to the following series of results. Step I : x1 entered. The analysis of variance table from the regression is as follows. Source Regression Residual

ss
12.25 580.75

DF 1 14

MS 12.25 41.48

The regression sum of squares gives the between levels of A sum of squares that would be obtained in a two-way ANOVA of these data. The estimates of the regression coefficients at this stage are

fi = 28.75,
&I

= -0.875.

Step 2: x1 and x2 entered. The analysis of variance for the regression is as follows. Source Regression Residual

ss
392.50 200.50

DF 2 13

MS 196.25 15.42

The difference in the regression sum of squares between steps 1 and 2, that is, 380.25, gives the sum of squares corresponding to factor B that would be obtained in a conventional analysis of variance of these data. The estimates of the regression coefficients at this stage are

fi = 28.75,
&I

= -0.875,

b1 = -4.875.
(Continued)

LINEAR AND MULTIPLE REGRESSION ANALYSIS


Display 6.12 (Continued)

20 1

Step 3: X I , x?, and x3 entered. The analysis of variance table for this final regression is as follows.

Source Regression Residual

ss
536.50 56.50

DF 3 12

MS 178.83 4.71

The difference in the regression sum of squares between steps 2 and 3, that is, 144.00, gives the sum of squares correspondingto the A x B interaction in an analysis of variance of these data. The residual sum of squares in the final table corresponds to the error sum of squares in the usual ANOVA table. The estimates of the regression coefficients at this final stage are

j = 28.75, 2
6 = -0.875, 1

/9* = -4.875,
9 1 = 3.000. 1
Note that the estimates of the regression coefficients do not change as extra variables are brought into the model.

approach is used to carry out an analysis of variance of an unbalanced two-way design, as shown in Display 6.13. In this case, the order in which the explanatory variables enter the multiple regression model is of importance; both the sums of squares corresponding to each variable and the parameter estimates depend on which variables have already been included in the model. This highlights the point made previously in Chapter 4 that the analysis of unbalanced designs is not straightforward.

6.8. SUMMARY
1. Multiple regression is used to assess the relationship between a set of explanatory variables and a continuous response variable. 2. The response variable is assumed to be normally distributed with a mean that is a linear function of the explanatory variables and a variance that is independent of the explanatory variables. 3. The explanatory variables are strictly assumed to be fixed. In practice where this is almost never the case, the results of the multiple regression are to be interpreted conditional on the observed values of these variables.

202

CHAPTER 6
Display 6.13 Analysis of an Unbalanced Two-way Design by Using Multiple Regression
In this case consider the following data.

B1

A1 23 25 27 29 30 27 23 25

A2 22 23 21 21 19 23 17 37 38 40 35 39 35 38 41 32 36 40 41 38

B2

26 32 30 31

Three variables X I , x2, and x3 are defined as specified in Display 6.11 and used in a multiple regression of these data, with variables entered first in the order xl , followed by x2, followed by x3. This leads to the following series of results. Step I: x1 entered. The regression ANOVA table is as follows. Source Regression Residual

ss
149.63 1505.87

DF

1 30

MS 149.63 50.19

The regression sum of squares gives the A sum of squares for an unbalanced design (see Chapter 4). The estimates of the regression coefficients at this stage are

fi

= 29.567,

2 1

= -2.233.

Step 2: x1 entered followed by x2. The regression ANOVA table is as follows.

Source Regression Resdual

ss
1180.86 476.65

DF 2 29

MS 590.42 16.37 (Continued)

LINEAR AND MULTIPLE REGRESSION ANALYSIS


Display 6.13 (Continued)
~~ ~

203

The increase in the regression sum of squares, that is, 1031.23,is the sum of squares that is due to B, conditional on A already being in the model, that is, BIA as encountered in Chapter 4. The estimates of the regression coefficients at this stage are

6 = 29.667,
$1

p1 = -5.977.

= -0.341,

Step 3: x I and x2 entered followed by x3. The regression ANOVA table is as follows.

Source Regression Residual

ss
1474.25 181.25

DF 3 28

MS 491.42 6.47

The increase in the regression sum of squares, that is, 293.39, is the sum of squares that is due to the interaction of A and Byconditional on A and B, that is, ABIA,B. The estimates of the regression coefficients at this stage are

6 = 28.606, 6 = -0.667. 1 fi, = -5.115 9 1 = 3.302. 1


Now enter the variables in the order &x2followed by x 1 (adding xj after x2 and x1will give the same results as step 3 above). Step I : x? entered. The regression ANOVA table is as follows. Source Regression Residual

ss
1177.70 477.80

DF 1 30

MS 1177.70 15.93

The regression sum of squares is that for B for an unbalanced design. The estimates of the regression coefficients at this stage are

f i , = -6.078.
ss
1180.85 474.65

6 = 29.745,

Step 2: xz entered followed by xl. The regression ANOVA table is as follows.

Source Regression Residual

DF 2 29

MS 590.42 16.37 (Continued)

204

CHAPTER 6
Display 6.13 (Continued)
The estimates of the regression coefficients at this stage are

fi = 29.667,
&I

= -0.341,

j1 -5.977. =
Note how the regression estimates for a variable alter, depending on which stage the variable is entered into the model.

4. It may be possible to find a more parsimonious model for the data, that is, one with fewer explanatory variables, by using all subsets regression or one of the stepping methods. Care is required when the latter is used as implemented in a statistical package. 5. An extremely important aspect of a regression analysis is the inspection of a number of regression diagnostics in a bid to identify any departures from assumptions, outliers, and so on. 6. The multiple linear regression model and the analysis of variance models described in earlier chapters are equivalent.

COMPUTER HINTS SPSS


To conduct a multiple regression analysis with one set of explanatory variables, use the following basic steps.
1. Click Statistics, click Regression, and click Linear. 2. Click the dependent variable from the variable list and move it to the Dependent box. 3. Hold down the ctrl key, and click on the explanatory variables and move them to the Independent box. 4. Click Statistics, and click Descriptives.

s-PLUS
In S-PLUS, regression can be used by means of the Statistics menu as follows. 1. Click Statistics, click Regression, and click Linear to get the Multiple Regression dialog box. 2. Select dependent variable and explanatory variables, or define the regression of interest in the Formula box. 3. Click on the Plot tag to select residual plots, and so on.

LINEAR AND MULTIPLE REGRESSION ANALYSIS

205

Multiple regression can also be applied by using the command language with the lm function with a formula specifying what is to be regressed on what. For example, if the crime in the USA data in the text are stored in a data frame, crime with variable names as in the text, multiple regression could be applied by using

Im (R- Age+ S+Ed+ Ex1 LF+ M+ N+ N W+ U1+ U2+ W+X) .


After fitting a model, S-PLUS has extensive facilities for finding many types of regression diagnostics and plotting them. All subsets regression is available in S-PLUS by using the leaps function.

EXERCISES
6.1. Explore other possible models for the vocabulary data. Possibilities are to include a quadratic age term or to model the log of vocabulary size.

6 2 Examine the four data sets shown in Table 6.16. Also show that the esti.. mates of the slope and intercept parameters in a simple linear regression are the
TABLE 6.16 Four Hypothetical Data Sets, Each Containing 11 Observations for Two Variables

Data Set

1-3

2
Variable

Observation

1 2 3 4 5 6 7 8 9 10 11

10.0 8.0 13.0 9.0 11.0 14.0 6.0 4.0 12.0 7.0 5.0

8.04 6.95 7.58 8.81 8.33 9.96 7.24 4.26 10.84 4.82 5.68

9.14 8.14 8.74 8.77 9.26 8.10 6.13 3.10 9.13 7.26 4.74

7.46 6.77 12.74 7.11 7.81 8.84 6.08 5.39 8.15 6.42 5.73

8.O 8.0 8.0 8.0 8.0 8.0 8.0 19.0 8.0 8.0 8.0

6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.50 5.56 7.91 6.89

206

CHAPTER 6
TABLE 6.17 Memory Retention

1 5 15 30 60 120 240 480 720 1440 2880 5760 10080

0.84 0.71 0.61 0.56 0.54 0.47 0.45 0.38 0.36 0.26 0.20 0.16 0.08

TABLE 6.18 Marriage and Divorce Rates per 1000 per Year for 14 Countries

Marriage Rate

Divorce Rate

5.6 6.O 5.1 5.0 6.7 6.3 5.4 6.1 4.9 6.8 5.2 6.8 6.1 9.7

2.0 3.O 2.9 1.9 2.0 2.4 0.4 1.9 2.2 1.3 2.2 2.0 2.9 4.8

LINEAR AND MULTIPLE REGRESSION ANALYSIS


TABLE 6.19 Quality of Childrens Testimonies

207

Age

Gender

Location

Coherence

Maturity

Delay

Prosecute

Quality

5-6 5-6 5-6 5-6 5-6 5-6 5-6 5-6 8-9 8-9 8-9 5-6 5-6 5-6 8-9 8-9 8-9 8-9 5-6 8-9 8-9 8-9 5-6

Male Female Male Female Male Female Female Female Male Female Male Male Female Male Female Male Female Female Female Male Male Male Male

3 2 1 2 3 3 4 2 3 2 3 1 3 2 2 4 2 3 4 2 4 4 4

3.81 1.63 3.54 4.21 3.30 2.32 4.51 3.18 3.02 2.77 3.35 2.66 4.70 4.3 1 2.16 1.89 1.94 2.86 3.11 2.90 2.41 2.32 2.78

3.62 1.61 3.63 4.1 1 3.12 2.13 4.3 1 3.08 3.00 2.7 1 3.07 2.72 4.98 4.21 2.91 1.87 1.99 2.93 3.01 2.87 2.38 2.33 2.79

45 27 102 39 41 70 72 41 71 56 88 13 29 39 10 15 46 57 26 14 45 19 9

No Yes No No No Yes No No No Yes Yes No No Yes No Yes Yes No Yes No No Yes Yes

34.11 36.59 37.23 39.65 42.07 44.91 45.23 47.53 54.64 57.87 57.07 45.81 49.38 49.53 67.08 83.15 80.67 78.47 77.59 76.28 59.64 68.44 65.07

same for each set. Find the value of the multiple correlation coefficient for each data set. (This example illustrates the dangers of blindly fitting a regression model without the use of some type of regression diagnostic.)

6.3. Plot some suitable diagnostic graphics for the four data sets in Table 6.16.
6.4. The data shown in Table 6.17 give the average percentage of memory retention, p , measured against passing time, t (minutes). The measurements were taken five times during the first hour after subjects memorized a list of disconnected items, and then at various times up to a week later. Plot the data (after a suitable transformation if necessary) and investigate the relationship between retention and time by using a suitable regression model.

6.5. As mentioned in Chapter 3, ANOVA models are usually presented in overparameterized form. For example, in a one-way analysis of variance, the constraint

208

CHAPTER 6

Cf=, 0 is often introduced to overcome the problem. However, the overpa= rameterization in the one-way ANOVA can also be dealt with by setting one of the atequal to zero. Carry out a multiple regression of the fruit fly data that is equivalent to the ANOVA model with a = 0. In terms of group means, what are 3 the parameter estimates in this case?
6.6. For the final model used for the crime rate data, examine plots of the diagonal elements of the hat matrix against observation number, and produce a similar index plot for the values of Cook's distance statistic. Do these plots suggest any amendments to the analysis?

67 Table 6.18 shows marriage and divorce rates (per lo00 populations per .. year) for 14 countries. Derive the linear regression equation of divorce rate on marriage rate and show the fitted line on a scatterplot of the data. On the basis of the regression line, predict the divorce rate for a country with a marriage rate of 8 per lo00 and also for a country with a marriage rate of 14 per 1o00. How much conviction do you have in each prediction?
6.8. The data shown in Table 6.19 are based on those collected in a study of the quality of statements elicited from young children. The variables are statement quality; child's gender, age, and maturity; how coherently the child gave evidence; the delay between witnessing the incident and recounting it; the location of the interview (the child's home, school, a formal interviewing room, or an interview room specially constructed for children); and whether or not the case proceeded to prosecution. Carry out a complete regression analysis on these data to see how statement quality depends on the other variables, including selecting the best subset of explanatory variables and examining residuals and other regression diagnostics. Pay careful attention to how the categorical explanatory variables with more than two categories are coded.

Analysis of Longitudinal Data

7.1.

INTRODUCTION

Longitudinal data were introduced as a special case of repeated measures in on Chapter 5. Such data arise when subjects are measured the same variable (or, in some cases, variables), on several different occasions. Because the repeated measures arise, in this case, solely from the passing of time, there is no possibility of randomizing theoccasions: and it is this that essentially differentiates longitudinal data f o other repeated measure situations arising in psychology, rm where the subjects are observed under different conditions, combinations of or conditions, which are usually given in different orders to different subjects. The special structureof longitudinal data makes the sphericity condition described in Chapter 5, and methods that depend it for their validity, very difficult justify. on to With longitudinal data it is very unlikely that the measurements taken close to one another in time will have the same correlation measurements made at more as widely spaced time intervals. The analysis of longitudinal data has become something a growth industry of in statistics, largely because of its increasing importance in clinical trials (see Everitt and Pickles,2000). Here we shall cover the area only briefly, concentrating
209

21 0

CHAPTER 7

on one very simple approach and one more complex, regression-based modeling procedure. of for However, first the question why the observations each occasion should not be separately analyzed has to be addressed. two groups of subjects are beiig When in Chapter 5 (seeTable 5.2), this occasion-bycompared as with the salsolinol data two occasion approach would involve a oft tests. When more than groups are series present, a series of one-way analyses variance would required. (Alternatively, of be some distribution-free equivalent might be used;see Chapter 8.) The procedure is straightforward but has a number of serious flaws and weaknesses. The first is that the series of tests performed are not independent one another, making of interpretation difficult. example, a succession marginally significant group For of mean differences might be collectively compelling the repeated measurements if are only weakly correlated, but much less convincing is a patternof strong if there correlations. The occasion-by-occasion procedure also assumes that each repeat measurement is of separate interestin its own right. This is unlikely in general, because the real concern likely to involve something more global, particular, is in the seriesof separate tests provide little information about the longitudinal development of the mean response profiles.In addition, the separate significance tests do not given overall answer to whether or not is a group difference and, an there of particular importance,do not provide a useful estimate of the overall treatment effect.

7.2. RESPONSEFEATUREANALYSIS: THE USE OF SUMMARY MEASURES


A relatively straightforward approach to the analysis of longitudinal data to first is transform the (say) repeated measurements each subject a single number T for into considered to capture some important aspect a subjects response profile; that of is, for each subject, calculate S given by

., for where X I , x2, . . XT are the repeated measures the subject andf represents S has the chosen summary function. The chosen summary measure to be decided on before the analysis the data begins and should, course, be relevant to the of of particular questions that are of interest in the study. Commonly used summary mnm m measures are (1) overall mean,(2) maximum ( i i u ) value, (3) time to maximum (minimum) response, slope of regression line of response on time, and (4) (5) time to reach a particular value (e.g., a fixed percentage of baseline). in the Having identified a suitable summary measure, the analysis of differences levelsof the between subject factor(s) is reduced to a simple (two groups) or an t test analysis of variance (more than two groups or more than a single between sub

ANALYSIS OF LONGITUDINAL DATA

21 1

factor. Alternatively, the distribution-free equivalents might if there is any beused in evidence of a departure from normality the chosen summary measure.) As a first example the useof the summary measure approach to longitudinal of data data, it will be applied to the salsolinol given in Chapter 5. (Because the data are clearly very skewed, we shall analyze the log-transformed observations.) We first need to ask, What is a suitable summary measure for these data? Herethe difference in average excretion level is likely to be one question of interest, so perhaps the most obvious summary measure is simply the average response to use 7.1 of each subject over the four measurement occasions. Table gives the values of this summary measure for each subject and also the result of constructing a confidence intervalfor the differencein the two groups. The confidence interval contains the value zero, and so, strictly, no claim canbe made that the data give any evidenceof a difference between the average excretion levels of those people who are moderately dependent and those who are severely dependent on alcohol. However, when a more pragmatic viewis taken of the lack of symmetry of the confidence interval around zero, it does appear that subjects who are severely dependent on alcohol may tend to have a somewhat higher salsolinol excretion level. in Now let us examine a slightly more complex example. The data given in which three groups of rats were put on differTable 7.2 result from an experiment g ent diets, and after a settling in period their bodyweights (inrams)were recorded weekly for nine weeks. Bodyweight was also recorded once before the diets 7.2) because began. Some observations are missing (indicted by NA in Table as of the failure to record a bodyweightparticular occasions intended. Before on the response feature approachapplied to these data,two possible complications is have tobe considered.
1. What should be done about the prediet bodyweight observation taken on

day l? 2. How should the missing observations dealt with? be We shall leave aside the first of these questions the moment and concentrate for on only the nine postdiet recordings. simplest answer to the missing values The problem is just to remove rats with such values from the analysis. This would it leave 11 of the original 16 rats for analysis. This approach may be simple but is not good! Usingit in this case would mean an almost 33% reduction in sample size-clearly very unsatisfactory. An alternative to simply removing rats with any missing values calculate is to the chosen summary measure from theavailable measurements on a rat. In this way, rats with different numbers of repeated measurements canall contribute to this as the sumthe analysis. Adopting approach, and again using the mean chosen mary measures, leads the valuesin Table 7.3 on which tobase the analysis. The to results of a one-way analysis of variance these measures, andthe Bonferonni of

212

CHAPTER 7
Summary Measure Results for Logged Saldiiol Data
(log to base IO used)

TABLE 7.1

Average Response Each Subject for


Gmup

Subject

Avemge Response

Group 1 (moderate)

Group 2 (severe)

l 2 3 4 5 6
1

0.06

0.19 0.19 -0.02


-0.08

2 3 4 5 6 7 8
Means andSt&d Gmup
Moderate Severe
SD

0.17 -0.05 0.28 0.52 0.20 00 .5 04 .1 0.37 00 .6


Deviations

Mean
0.0858

n
6 8

0.1186
0.1980

0.2301

Note. Log to base 10 is used. The 95% confidence interval for the diffennce between the two groups is

(0.2301 0.0858)f xs X (1/6+ l 8 /)


when S* is the assumedcommon variance in thetwo groups and is given by
S

= 5 x 0.1186 +7 x 0.19802 = o.2873.


8+6-2

These calculationsl e a d t the interval (-0.0552 0.3438). o

multiple comparisonstests, are given in Table 7.4. (The Scheffk procedure gives almost identical resultsfor these data.) Clearly the mean bodyweightof Group 1 differs from the mean bodyweight of Groups 2 and 3, but the means for the latter do notdiffer. (It should be noted here that when the number available repeated measures of differs from subject to subject in a longitudinal study, the calculated summary

ANALYSIS OF LONGITUDINAL DATA


TABLE 7.7, Bodyweights of Rats (grams)
DY O
Gmup

21 3

IS 29

22

36

43

50

57

64

1
1

240 225 255 245

1 1 1 1

2 2 2 2 3 3 3 3

250 230 250 255 260 265 275 255 415 420 445 560 465 525 525 510

255 230
250

260
NA

262
240

258
240

255 255 270 260 260 425 430 450 565 475 530 530 520

255 265 270 275 270 268 428 440


NA

262 265 270 275

NA
270 438 448 455 590 481 535 545 530

580 485 533 540


NA

265 268 273 271 274 265 443 460 455 591 493
540 546

266 243 261 270

NA
278 276 265 442 458 45 1 595 493 525 538 535

538

265 238 264 274 276 284 282 273 456 415 462 612 507 543 553 550

272 247 268 273 278 219


NA

274 468 484 466 618 518 544 555 553

218 245 269 215 280 28l 284 278 478 4% 472 628 525 559 548
NA

Note. U A , missing value: 1. Diet 1; 2, Diet bodyweight on day 1.

2 3. Diet 3. Diets begin ;

after the measurement of

measures areliely to have differing precisions, The analysis should then, ideally, of take this into account by using some form weighting. This was not attempted in the rat data example because it involves techniques outside the scopethis of text. However, those readers courageous enoughtackle a little mathematics, for to is a description of a statistically more satisfactory procedure given in Gombein, Lazaro,and Little, 1992.) Having addressed the missing value problem. now need to consider if and we as a how to incorporate the prediet bodyweight recording (known generallybaseline measurement) into an analysis. Such a pretreatment value can be used in association with the response feature approach number ways. For example, in a of if the average response over time is the chosen summary (as in the two examples above), there are three possible methods of analysis prison and Pocock, 1992).
1. POST-an

analysis that ignores the baseline values available and analyzesonlythemeanofthepostdietresponses(theanalysisreportedin Table 7.4).

214
Means of Available Posttreatment Observations
for Rat Data

CHAPTER 7
TABLE 7 3

Gmup

Rat

AVE. Bodyweight

Group 1

1 2 3 4 5 6 7 8
1

262.9 239.2 211 6. 269 6. 206 7. 276.0 272.1 267.8


444.1

Group 2

2 3 4

Group 3

2 3 4

457.4 456.9 593.9 495.4 577 3. 542.7 534.7

ANOVA of Means Given i Rble 73 n .

TABLE 74 .

Soume

DF

ss

MS

Group Error

89.6 2 119738 239475 1337 13 17375


Bonfenoni Multiple Comparisons

<.001

Comparison Estimate

SE
-285 -223.0 -263.0 -39.5 22.4 25.9

Bound Lower

UpperBound

gl-g2
g1-g3 22.4 @-g3

-324 -111

-120 -0' 21 31.4

ANALYSIS OF LONGITUDINAL DATA

215

2. CHANGE-an analysis that involves the differences of the mean postdiet weights and the prediet value. 3. ANCOVA-here between subject variation in the prediet measurement is as taken into account by using the prediet recording a covariatein a linear model for the comparisonof postdiet means.

Each method represents a different way of adjusting the observed difference POST, for example, relies on the fact at outcome by using the prediet difference. that when groups are formed by randomization, in the absence a difference then of between diets, the expected value mean difference outcome is zero. Hence, of the at in order to judge the the factor whichthe observed outcome requires correction by diet effectis also zero. The CHANGE approach corresponds to the assumption that the difference at outcome, in the absence a diet effect, expected to be equal the differences of is to in the means of the prediet values. That suchan assumption is generally false is well documented-see, for example, Senn (1994a, 1994b). The difficulty primarily involves the regression tu rhe mean phenomenon. This refers to the process that occursas transient componentsof an initial measurement are dissipated over time. Selection of high-scoring individuals for entry into a study, for example, necessarily also selects for individuals with high values any transient compoof nent that might contribute to that score. Remeasurement during the study will tend inito show a declining mean value for such groups. Consequently, groups that tially differ through the existence transient phenomena, such some forms of as of measurement error, will show a tendency to have converged on remeasurement. Randomization ensures only that treatment groups are in terms of expected similar values and may actually differ not in transient phenomena but alsomore so just in permanent componentsof the observed scores. Thus, although the description of transient components may bring about the regression to the mean phenomena suc as those previously described, the extent of regression and the mean value to which separate groups are regressing need not be expected to be the same. Analysis of covariance (ANCOVA) allows for such differences.The use of ANCOVA allows for some more general system of predicting what the outcome difference would have been in the absenceof any diet effect, as a function of the mean difference prediet. an that Frison and Pocock (1992) compare the three approaches and showanalysis of covariance is more powerful than both the analysisof change scores and analysis of posttreatment means only. (Using the mean of several baseline valof covariance even more efficient, there are if ues, if available, makes the analysis moderate correlations between the repeated mean.) The differences between the three approaches can be illustrated by comparing power curves calculated by using in the results given Frison andPocock (1992). Figures7.L7.2, and 7.3 show some examples for the situation two treatment groups, apretreatment measure, nine with posttreatment measures, and varying degrees correlation between repeated of the

216

CHAPTER 7

Power curvesfor three methods of analyzing repeated measuredesigns


9 r

I0

Ancova Post scores ._._._.Change scores


,.,, .,*,

20

40

60

60

Sample size

FIG. 7.1. Power curves comparing POST, CHANGE, and ANCOVA.

observations (the correlations between pairs of repeated measures are assumed equal in the calculation of these power curves). From these curves, it can be seen that the sample size needed to achieve a particular power for detecting a standardized treatment difference 0.5 is always lower with an ANCOVA. When of the correlation between pairs repeated measuresis small (0.3), CHANGE is worse of than simply ignoring the pretreatment value available and simplyPOST. As using the correlation increases, CHANGE approaches ANCOVA in power, with both being considerably better than POST. The results applying each CHANGE and ANCOVAthe rat data are given of of to 7.4). in Table 7.5. (The POST results were given previously in Table Here both POST and CHANGE suggest the presence of group differences, but ANCOVA 5% level, suggesting that the finds the group difference not significant at the ae It observed postdiet differences r accountable for by the prediet differences. is clear that the three groups of rats in this experiment have very different initial bodyweights, and the warnings about applyingan analysis of covariance in such 3 it cases spelled out in Chapter have to be kept in mind. Nevertheless, might be argued that the postdiet differences in bodyweight between the three groups are

ANALYSIS OF LONGITUDINAL DATA

217

Power curvesfor three methods of analyzing repeated measure designs


9 ,
r

_..
.* .'

2L

... ...... ....".

,,I,(....

........~

2-

x2I

*.. ..,

* I

Post scores

Alpha=O.OS
I

Rhoa.6

20

40

60

Sample size

m ~ 7.2. Power cutves comparing POST, CHANGE, and .


ANCOVA.

not the results of the different diets.A telling question that might be asked here is, why were the three diets were tried on animals, some of which were markedly lighter than the rest? The response feature method has a number of advantages the analysisof for longitudinal data. Three given by Matthews(1993) are as follows.

1. An appropriate choice of summary measure ensures that the analysis is focused on relevant and interpretable aspects of the data. 2. The method is statistically respectable. 3 To some extent, missing and irregularly spaced observationsbe accom. can modated.

A disadvantage of the response feature approach is that it says little about how the observations evolve over time, and whether different groups of subjects are behave in a similar way as time progresses. When such issues of interest, as they generally ae an alternative is needed, such as the more formal modeling r, procedures that are described in the next section.

218

CHAPTER 7
TABLE 7.5 CHANGE and ANCOVA for the Rat Data

Source

DF

ss

MS

9.1

Group 1098 1659 Error

CHANGE

2 13 <.ooO1 254389.0 1972.01 254389.0 1 2 12

Group 912 1550 Error

ANCOVA Remament

Power curves for threemethods of analyzing repeatedmeasure designs

IRhod.9 Alphad.05
I
I

.... ,.,,
,.,. ---

Anwva Post scores Change scores

20

40 Sample size

80

FIG. 7.3. ANCOVA.

Power curves comparing POST,

CHANGE.

and

ANALYSIS OF LONGITUDINAL DATA

2 19

7.3. RANDOM EFFECTMODELS FOR LONGITUDINAL DATA


A detailed analysis of longitudinal data requires consideration of models that represent both the level and the a group's profile of repeated measurements, of slope and also account adequately for the observed pattern of dependences in those measurements. Such models are now widely available, but confusingly they are often referred to different names, including, example, random growth curve by for and models, multilevel models, random effects models, hierarchical models.Such is now models have witnessed a huge increase in interest in recent years, and there aconsiderableliterature surrounding them. Here we shall try to give the flavor only of the possibilities,by examining one relatively straightforward example. Fuller accounts of this increasingly important area of statistics are availablein Brown (2000). and Prescott (1999) and Everitt and Pickles

7.3.1. The "Yeatmentof Disease

Alzheimer's

The datain Table 7.6arise from an investigation useof lecithin, a precursor of the of choline, in the treatment of Alzheimer's disease. Traditionally, it has been assumed that this condition involves an inevitable and progressive deterioration in all aspects of intellect, self-care, and personality. Recent work suggests that the disease involves pathological changes in the central cholinergic system, which it might be possible to remedy by long-term dietary enrichment with lecithin. In particular, the treatment might slow down or perhaps even halt the memory impairment associated with the condition. Patients suffering from Alzheimer's disease were randomly allocated to receive either lecithin or placebo for a 6-month of words recalled from apreviously period. A cognitive test score giving thenumber standard list was recorded monthly 5 months. for As with most data sets,it is important to graph the data Table 7.6 in some in informative way prior to more formal analysis. According Diggle, Liang, a to and Zeger (1994), there single prescription making effective graphical displays is no for of longitudinal data, although do offer the following simple guidelines. they show as much of the relevant raw data as possible rather than only data summaries, highlight aggregate patterns potential scientific interest, of identify both cross-sectional and longitudinal patterns, and make easy the identification of unusual individuals or unusual observations.

Three graphical displaysof the datain Table 7.6 are shown in Figures 7.4,7.5, and 7.6. In the first, the raw data each patientare plotted, and they are labeled for with the group to which they have been assigned. There is considerable variability

The Treatment of Alzheimers Disease

TABLE 1.6

Viii Group 1

l 1

U)

14
7

15 12
5

14

13
8

1 1
1

12 5
9 9
9

10
6

13

10

9 7 18

LO
7
3 17
9

1 1 1 1
1

11

5 7 4
8 5

10

15

9 9

7 16 9

6 14

12
11

12
7 8 9
4

9 12
3 10

1 1 l l

11
10

11

3 8

5
9 5 9 9

17

1 1 I

16
7

2 12 l5

14

10

12
7 1 7 14 6
14 8

1 1

5
16

2 7
2 7

1 1 1 1 2

9 19 7

0 7 l 11 16
3 1 3

7 7 0 6

0 4

10 7

2 5

2
8 6 6
10 18

5 12
8 17

5
6

9
1 3 11 7

5 12
7

2
2

2
2

I8
10

16 10 14

15

21

2 2 2 2 2 2 2 2 2 2
2

9 6

11 10
18 10 3

12
8

l2

8 3 4

11 19 3

14 9 14 12 19

16 21 l5 12

16 14

22
8 18

l1
1

10
3 7 3

11 10

7 17 I5

2
7

6
18 15 10

l8

15

3 19

4 9 4

16 5
6

10

2 2 2 2 2 2

14

l5
16 7 13
1 4 3
1 3

22 l8
17 9 16 7 16
17

22

19

19

10

6 9 4 4

9 3 13 11

10

20
9 19 21

Note. 1, placebo, 2. lecithin groups.

220

- 14
- 2

14 -

2I

3 ViSlt

FIG. 7.4.

Plot Of individual patient profiles for lecithin trial data the in Table 7.6.

8
14

14

f I
0
I

0 0

0 0

2-

t
Q

8 8 0
4

8
0
0

3
WSR

FIG. 7.5.

Plot of individual patient data from the lecithin trial for each treatment group. showing fitted linear regressions of cognitive scoreon visit.
221

222

CHAPTER 7

14

i
FIG. 7.6. Box plots of

lecithin trial data.

in these profiles, although patientslecithin appear have some increase their on to in 7.5, cognitive score over time. observation appears to be confirmed by Figure This which shows each groups data, and the fitted linear regression of cognitive score on visit. Finally the box plots given in Figure 7.6 for each group and each visit give little evidenceof outliers, orof worrying differencesin the variabilityof the observations on each visit. A further graphic that is often useful for longitudinal data is the scatterplot matrix introduced in ChapterFigure 7.7 shows such a plot the repeated mea2. for surements in the lecithin trial data, with each pointby group (0, placebo; l, labeled lecithin). Clearly, some pairs of measurements are more highly related than others. for this Now let us consider some more formal models example. We might begin by ignoring the longitudinal character of the data and use a multiple regression model as described in Chapter We could, for example, fit a model for cognitive 6. score that includes, as the explanatory variables, Visit (with values1, 2, 3,4, 5) and Group (0, placebo; 1, lecithin). That is,
Yijk

=h f k h x s i t j

f BzGroup, i C i j k , -

(7.2)

where Y i j k is the cognitive score subject k on visitj, in groupi , and the cijkare for cr2. error terms assumed normally distributed with mean zero and variance Note

N N

HG.7.7. Scatterplotmatrix o the cognitive scoreson the five visits o the lecithin f f trial. showing treatment group (0.placebo; 1, lecithin).

224

CHAPTER 7
Results of Fitting the Regression Model Described in E .(7.2) q to the Data in Table7.6

TABLE 7.7

Estimate Parameter

SE

0.802

6.93Bo(Intemept) 2.199 Pi i t ) 0.224 W 4.803 0.636 &(Group)

0.494 3.056

8.640

<.001 .029
4.001

Note. R* = 0.107; F for testing that all regression coefficients zero are is 14 on 2 and 232 degrees of freedom; p value is <.001.

that this model takes no account of the real structure of the data; in particular, the observations made each visiton a subject are assumed independenteach at of other. The resultsof fitting the model specified in (7.2) to the datain Table 7.6 Q. are given i Table 7.7. The results suggest both a significant Visit effect, and a n significant Group effect. in the Figure 7.7 clearly demonstrates that the repeated measurements lecithin trial have varying degrees of association so that the model in Eq. (7.2) is quite unrealistic. How can the model be improved? One clue is provided by the plot of the confidence intervals for each patients estimated intercept and slope in a regression of his or her cognitive score on Visit, shown in Figure 7.8. This suggests that certainly the intercepts and possibly also the slopes the patients of is to vary widely.A suitable way to model such variability introduce random effect terms, and in Display 7.1 a model where this is done for the intercepts only is described. The implicationsthis model the relationship between the repeated of for measurements is also describedin Display 7.1. The results given there show that 7.1 although inaoduced in a somewhat different manner, the model in Display is essentially nothing more than the repeated measure ANOVA model introduced in Chapter 5, but now written in regression terms the multiple response variables, for see in that is, the repeated measures. In particular, we that the model Display 7.1 only allows for a compound symmetry pattern of correlations for the repeated mesurements. The results of fitting the model in Display 7.1 to the lecithin data are given in Table 7.8. Note that the standard error for the Group regression coefficient is greater in the random effects model than in the multiple regression model (seeTable 7.7), and the reverseis the casefor the regression coefficientof visit. This arises because model the used to obtain results Table the in 7.2,

ANALYSIS OF LONGITUDINAL DATA

225

. I

-5

10

15

20

-2

FIG. 7.8. Confidence Intervals for Intercepts and slopes o the f

linear regression o cognltlve score on visit for each Individual f patlent In the leclthln trial.

Eq. 7.2, ignores the repeated measures aspect of the data and incorrectly combines the between group and the within group variations in the residual standard error. The model in Display 7.1, implying as it does the compound symmetry of the correlations between the cognitive scores on the five visits, is unlikely to capture the true correlationpattern suggested by Figure 7.7. One obvious extension of the model is to allow for a possible random effect of slope as well as of intercept, and such a model is described in Display 7.2. Note that this model allows for a more complex correlationalpattern between the repeated measurements. The results of fitting this new model are given in Table 7.9. Random effects models can be compared in a variety of ways, one of which is described in Display 7.3. For the two random effects models considered above, the AIC (Akaike information criterion) values are as follows. Random intercepts only, AIC = 1282; Random intercepts and random slopes, AIC = 1197.

226

C W E R7
Display 7.1 Simple Random EffectsM d l for the LecithinTrial Data oe The model is an extension of multiple regression model Eq. (7.2) to include a the in random intercept term each for subject:
Yijk

= (Bo+ a d +BlMsitj

+B2Gmupi +

Eijk,

where the are random effects that model shift in the intercept for eachsubject, the visit, are preserved for all values of visit. which, because there a fixed change for is The ak are assumed to be normally distributed with mean and variance 6.'. zero A l l other terms in the model are as in Eq. (7.2). Between subject variability in theintercepts is now modeled explicitly. The model can be written in matrix notation as
Ylk

= &p

+zak

+Elk+

where
p '
fik

= b i l k . YiU, YIU. yi4k9 y i d

= [ B o * 81, &l.

l 4 Group,
1 5

Group,

Z = [l, 1,1,1, l], '

4 k = [ilk$ CiU, El3kt

ElSkl.

The presence of the random effect that is common to each visit's observations for of subject k allows the repeated measurementsto have a covariance matrix the following form:

I = Zd.'Z' :
where I is a 5 x 5 identity matrix. This reduces to

021,

This is simplythe compound symmetry pattern describedin Chapter 5. The model can befitted bymaximum likelihood methods,as described in Brown and Prescott (1999).

ANALYSIS OF LONGITUDINAL DATA


TABLE7.8
Results of Fitting the M d l Described in Display 7.1 to the Data in Table 7.6 oe

227

Pammeter

Estimate

SE

Bo(InteEep0 pl(Visit) &(Gmup)

6.927 0.494 3.055

0.916 0.133 1.205

7.56 3.70 2.54

<.001 <.001 .015

Note. Random effects: 8. = 3.89, 8 = 2.87.

Display 7.2 Random Intercept, Random Slope M d l for the Lecithin Trial Data oe The model in this case is
YiJt = ( P O

+ uk) + (PI + bk)%itJ + & h u p i + CiJt.

A further random effect has been added compared with the model outlined in Display 7.1. The term bk shifts the slope for each subject. The b are assumed to be normally distributed with mean zero and variance uz. k The random effects are not necessarily assumed to be independent. A covariance ., parameter, ub is allowed in the model. All other terms in the model are as in Display 7.1. Between subject variability in both the intercepts and slopes is now modeled explicitly. Again, the model can be written in matrix notation as
Yik

= &p

+ zbk +

itr

where now

b = k , bkl. ;
The random effects are assumed to be from a bivariate normal distribution with both given by means zero and covariance matrix, I,

The model implies the following covariance matrix for the repeated measures:

x = ZIZ+ U*I.

228

CHAPTER 7 TABLE 1.9 Results of Fitting the ModelDescribed in Display 7.2 to the Data in "hble 7.6

Estimate

Parameter

SE

p&tercept)

B WW 1

BzGmup)

6.591 0.494 3.774

1.107 0.226 1.203

5.954 2.185 3.137

4 0 1

.3 00
.W3

Note. Random effect% 3,

= 6.22, a = 1.43, b

= -0.77,

d. = 1.76.

Display 7.3 Efcs Comparing Random fet Models for Longitudinal Data

Comparisons of random effects models for longitudinal data usually involve as their basis values of the log-likelihoods of each model. The log-likelihood associated with a model arises from the parameter estimation procedure; see Brown and Prescott(1999) for details. Various indices are computed thattry to balance goodness of with the number of fit achieve the fit: the searchfor parsimony discussed in Chapter parameters required to 1 is always remembered. One index is the Akaike Information Criterion, defined as MC = -2L

+2n,,

where L is the log-likelihood and , is the numberof parameters inthe model. n Out of the models being compared, the with the lowest AIC valueto be one is preferred.

It appears that the model in Display 7.2 to be preferred for these data. is This conclusion is reinforced by examination Figures 7.9 and 7.10, the first of of which shows the fitted regressions for each individual by using the random intercept model, and the second of which shows the corresponding predictions from the random intercepts and slopes model. Clearly, the first model does not capture the varying slopes of the regressions of an individual's cognitive score on the covariate visit. The number words recalled in the lecithin treated group about of is 4 more thanin the placebo group, the 95% confidence interval is (1.37,6.18). As with the multiple regression model considered in Chapter the fitting of 6, random effects models to longitudinal data to be followed an examination has by of residuals to check violations of assumptions. The situation more complex for is now, however, because of the nature of the data-several residuals are available for each subject. Thus here the plot of standardized residuals versus fitted values from the random intercepts and slopes model given Figure 7.11 includes in a panel for each subject in the study. The plot does not appear to contain any

229

230

23 1

232

ANALYSIS OF LONGITUDINAL DATA

233

particularly disturbing features that might give cause concern with the fitted for model. It may also be worth looking at the distributional propertiesof the estimated random effects from the model (such estimates are available from suitable random effects software such as that available in S-PLUS).Normal probability plots of the random intercept and random slope terms the second model fittedto the for lecithin data are shown in Figure 7.12. Again, there seem to be no particularly worrying departures from linearitythese plots. in

7.4. SUMMARY
1. Longitudinal data are common many areas, includingthe behavioral sciin ences, andthey require particular care in their analysis. 2. The response feature, summary measure approach provides a simple, but not simplistic, procedure analysis that can accommodate irregularly spaced for observations and missing values. 3. Pretreatment values, if available, can be incorporated into a summary measures analysisin a number ways, of which being used a covariate in an of as ANCOVA is preferable. 4. The summary measure approach to the analysis longitudinal data cannot of give answers to questions about the development response over time of the or to whetherthis development differs in the levels any between subject of factors. 5. A more detailed analysis of longitudinal data requires the use of suitable models suchas the random effects models briefly described in the chapter. Such models have been extensively developed over the past 5 years or so and are now routinely available in a number of statistical software packages. 6. In this chapter the models have been considered only continuous, norfor mally distributed responses, but they can be extended to deal with other types of response variable,for example, those that are categorical (see Brown and prescott, 1999,for details).

COMPUTER HINTS S-PLUS


In S-PLUS random effects models are available through the Statistics menu, by means of the following route.
1. Click Statistics, click Mixed effects, and click Linear, and the Mixed Effects dialogbox appears. 2. Specify data set and details the model thatis to be fitted. of

234

CHAPTER 7

Random effect models can also be fitted by using the lme function in the in is command language approach, and many respects this to be preferred because many useful graphic procedures can then also be used. For example, if the le trialdata (see Table 7.6)are stored as a data frame, lecit, each row of which gives Subject number, Group label, Visit number, and Cognitive score, then the two random effects models consideredthe text can be fitted and the results for in saved further analysis using the following commands. by lecit.fl < -lme(Cognitive-Visit+Group,method

=ML,random=-llSubject,data=lecit)
1ecit.B < -lme(Cognitive-Visit+Group,method

=ML,random=-VisitlSubject,data=lecit)
Many graphics are available for examining fits, andso on. Several have been used in the text. For example, to obtain the plots of predicted values from the two models fitted (see Figures 7.9 and 7.10), we can use the following commands.

plot(augPred(Jecit.fitl),aspect= ky,grid=T),
plot(augPred(lecit.fit2),aspect= ky,grid=T).

EXERCISES
71 Apply the summary measure approach to the lecithin data in Table 7.6, .. using as your summary measure the maximum number of words remembered on any occasion.
7.2. Fit a random effects model with a random intercept and a random slope, and suitable fixed effects to the rat data in Table 7.2. Take care over coding the a group to which rat belongs.

7.3. The data in Table 7.10 arise from a trial of estrogen patches the treatin ment of postnatal depression. Women who had suffered an episode of postnatal depression were randomly allocated to two groups; the members of one group received an estrogen patch, and the members of the other group received a du patch-the placebo. The dependent variable was a composite measure of depression, which was recorded on two occasions prior to randomizationand for A each of 4 months posttreatment. number of observations are missing (indicated by -9).

ANALYSIS OF LONGITUDINAL DATA


TABLE 7.10 Estrogen Patch Til Data ra
Wit

235

Baseline
Gmup

0
0

17 18 26 25
19
19

26

0 0 0 20 0 16 0 0 28 0 0 1 1 1 1 1 1 1 1 1 1

24

18 27 16 17
15

18 1723 17 17 14 12
14

15

17
-9
13 4

18
-9

23

22 28
24

28
25

19 13 2 6
1 3

27
24

10 11 13

8
9 9 -9

27 18 21 27
24

21 27

15 15

28

19

24

14 15

12 12 17 12
14 5

16 7
1

19 15 9 15 10 13

8 7 -9
14

12
9
10

12
14

l1

17 21 18
24

21

17 20 18 28 21 6

9 7
I1

3 3

7 8

6 12 2 2 6

Note. 0, placebo; 1, active groups. A -9 indicates a missing value.

1. Using the meanas a summary, compare the posttreatment measurements of the two groups by using the response feature procedure. 2. Calculate the change score defined as the difference in the mean of posttreatment values and the mean of pretreatment values, and test for a difference in average change score in the two groups. 3. Usingthemean of thepre-treatmentvalues as a covariate andthemean of posttreatment values as the response, carry out an ANCOVA to assess posttreatment difference in the two groups. 4. Which of 2, or 3 would you recommend, and why?
7.4. Find a suitablerandom effects model for the data Table 7.10 and interpret in the results from the chosen model.

236

CHAPTER 7

7.5. Fit a model to the lecithin data that includes randomeffects forboth the intercept and slope of the regression of cognitive score on Visit and also includes fixed effects forGroup and Group x Visit interaction.
7.6. For the random intercept and random slope model for the lecithin data described in Display 7.2, find the terms in the implied covariance matrix of the repeated measures,implicitly in t e r n o the parameters U, ua, b , and f a

Distribution-Free and Computationally Intensive Methods

8.1.

INTRODUCTION

The statistical procedures describedin the previous chapters have relied in the main for their validity on an underlying assumption of distributional normality. How well these procedures operate outside the confines of this normality conin straint varies from setting setting. Many people believe that most situations, to normal based methods are sufficiently powerful to make considerationof alternatives unnecessary. They may have a point; r tests and F tests, for example, have been shown be relatively robustto departures from the normality assumpto tion. Nevertheless, alternatives normal based methods have been proposed and to it is important for psychologists to be aware of these, because many are widely quoted in the psychological literature. One classof tests that do not rely on the normality assumptions are usually referred to as nonparametric or distribution free. (The two labels are almost synonymous, and we shall use the latterhere.) Distribution-free procedures are generally based on ranks of the raw data and the of do are usually valid over a large class distributions for the data, although they often assume distributional symmetry. Although slightly less efficient than their are normal theory competitors when the underlying populationsnormal, theyare
237

238

CHAPTER 8

often considerably more efficient when the underlying populations are not normal. A number of commonly used distribution-free procedures will be described in Sections 8.2-8.5. A further class of procedures that do not require the normality assumption are those often referred to computationally intensivefor reasons that will become as 8.6 apparent in Sections and 8.7. The methods to be described in these two sectio use repeated permutations of the data, or repeated sampling from the data to generate an appropriate distribution for a statistic under some null hypothesis test of interest. Although most of the work on distribution-free methods has concentrated on developing hypothesis testing facilities,is also possible to construct confidence it as is intervals for particular quantities of interest, demonstrated at particular points throughout this chapter.

8.2. THE WILCOXON-MANN-WHITNEY TEST AND WILCOXONS SIGNED RANKS TEST


Fmt, let us deal with the question of whats in a name. The statistical literature ways refers to two equivalent tests formulated in different asthe Wilcoxonranksum test and theMunn-Whimey tesr. The two names arise because of the independent development of the two equivalent tests by W1lcoxon (1945) and by Mann and Whitney (1947). In both cases, the authors was to come with adistributionaim up free alternative to the independent samples t test. The main points to remember about the Wdcoxon-Mann-Whitney test are as follows.

1 The null hypothesis to tested is that the populations being compared . be two have identical distributions. (For normally distributed populations with two to common variance,this would be equivalent the hypothesis that the means of the two populations are the same.) 2. The alternative hypothesis that the population distributions differ in locais tion (i.e.. meanor median). 3. Samples of observations available fromeach of the two populations bein are compared. 4. The test is based on the joint ranking of the observations from the two samples. 5. The test statistics is thesum of the ranks associated with one sample (the sums is generally used). lower of the two
8.1. Further detailsof the Wdcoxon-Mann-Whitney test are given in Display

COMPUTATIONALLY INTENSIVE METHODS


Display 81 . Wilcoxon-Mann-Whitney Test

239

Interest lies in testing the null hypothesis that two populations have the same probability distribution but the common distribution is not specified. The alternative hypothesis is that the population distributions differ in location. We assume that a sample of R I observations, XI, x2, . ,x,,, ,are available from the first population and a sample of nz observations, y ~ y2, ,ynl, from the second , population. The combined sample of nl n2 observations are ranked and the test statistic, S, is the sum of the ranks of the observations from one of the samples (generally the lower of the two rank sums is used). If both samples come from the same population, a mixture of low. medium, and high ranks is to be expected in each. If, however, the alternative hypothesis is true, one of the samples would be expected to have lower ranks than the other. For small samples, tables giving p values are available in Hollander and Wolfe (1999). although the distribution of the test statistic can also now be found directly by using the pennurational approach described in Section 8.6. A large sample approximation is also available for the Wilcoxon-Mann-Whitney test, which is suitable when nl and nz are both greater than 15. Under the null hypothesis the statistic Z given by

..

...

z=

nl(nl

ClnZ(n1

+ n2 + 1)/121'/2

+nz

+ 1)/2

has approximately a standard normal distribution, and a p value can be assigned accordingly. (We are assuming that nl is the sample size associated with the sample's giving rise to the test statistic S.) If there are ties among the observations, then the tied observations are given the average of the ranks that they would be assigned if not tied. In this case the large sample approximation requires amending; see Hollander and Wolfe (1999) for
details.

As our first illustration of the application of the test, we shall use the data shown in Table 8.1, adapted from Howell (1992). These data give the number of recent stressful life events reported by a group of cardiac patients in a local hospital and a control group of orthopedic patients in the same hospital. The result of applying the Wilcoxon-Mann-Whitney test to these data is a rank sum statistic of 21 with an associated p value of 0.13. There is no evidence that the number of stressful life events suffered by the two types of patient have different distributions. To further illustrate the use of the Wilcoxon-Mann-Whitney test, we shall apply it to the data shown in Table 8.2. These data arise from a study in which the relationship between child-rearing practices and customs related to illness in several nonliterate cultures was examined. On the basis of ethnographical reports, 39 societies were each given a rating for the degree of typical and socialization anxiety, a concept derived from psychoanalytic theory relating to the severity and rapidity of oral socialization in child rearing. For each of the societies, a judgment

2. 40

CHAPTER 8
Number of Stressful Life Events Among Cardiac
and Orthopedic Patients

TABLE 8.1

Cardiac Patients(C)
32872950

Orthopedic Patients (0)


12436

Observation

Rank

0
2 3 4

5 6
7 8 29 32

10

1 2 (0) 3 (0) 4 (0) 5 (0) 6 7 (0) 8 9

If
Note. Sum of 0ranks is 21.

Oral Socialization and Explanations Illness:Oa of rl Socialization Anxiety Scores

TABLE 8.2

Societies in which Oral Explanations Illness are Absent of


67777788910101010121213

Societies in which Oral Explanationsof lliness are Present


688101010111112121212131314141415151617

was also made (by an independent set of judges) of whether oral explanations of illness were present. Interest centers assessing whether oral explanations on of illness aremore likely to be present in societieswith high levels of oral socialization anxiety. This translates into whether oral socialization anxiety scores differ in two of location in the types of societies. The result the normal approximation test p described in Display 8.1 is a Z value of -3.35 and an associated value of .0008.

COMPUTATlONALLY INTENSlVE

METHODS
Display 8.2

24 1

Estimator and Confidence Interval Associated the Wilcoxon-Mann-Whitney Test with The estimator of the location difference, of the two populations the median of A, is the nl x nz differences, yi - x i , of each pair of observations, madeup of one observation from the sample and one from the second sample. first Constructing a confidence interval forA involves finding the two appropriate values
among the ordered n l x n2 differences, y xi. , For a symmetric two-sided confidence interval forA, with confidence level 1 a, determine the upper 4 2 percentile point, S+ of the null distribution of S, either from the appropriate tablein Hollander and Wolfe (1999),Table A.6. or from a permutational approach. Calculate

+1-

su/2.

The 1 - confidence interval (At,A) is then found from the values the C, a , in (A,) and the nlnz + 1 C. (A") positions in the ordered sample differences.

There is very strong evidence of a difference in oral socialization anxietyin the two types of societies. The W1lcoxon-Mann-Whitney test is very likely to have been covered in the introductory statistics course enjoyed (suffered?) by many readers, generally in the context testing a particular hypothesis, described and illustrated above. of as It is unlikely, however,if such a course gave any time to consideration of associated estimation and confidence interval construction possibilities, although both remain as relevanttothis distribution-free approach, they are in applicationstests and as oft the similar parametric procedures. Details estimator associated with rank sum of an ae r 8.2, test and the constructiona relevant confidence interval given in Display of and the results of applying both to the oral socialization anxiety data appear in .. for Table 8 3 The estimated difference in location the twotypes of societies is 3 units, with a95% confidence interval 2-6. of When samples from the two populations are matched in some way, the or paired wilcoxon's signed-ranks test. Details of appropriate distribution-free test becomes r .. of the testa e given in Display 8 3 As an example of the usethe test,it is applied to the data shown in Table 8.4. These data arise from an investigation of two types of electrodes usedin some psychological experiments. The researcher was interested in assessing whether the two types of electrode performed similarly. t for Generally, a pairedtest might be considered suitable this situation, but here a n o d probability plotof the differences the observations on the electrode of two types (see Figure 8.1) shows a distinct outlier, andthe signed-ranks test to be so is preferred. The results the large sample approximation is a Z value of-2.20 of test with a p value of .03. There is some evidenceof a difference between electrode

242

CHAPTER 8
TABLE 8.3 Estimate of Location Difference and Confidence Interval Construction for Oral Socialization Data
DifferencesBetween Pairs of Observorions. O e Observationfrom Each Sample n

0224455666677788 8991011-11133445555 66677788910-111334 4555566677788910-1 1133445555666777 88910-111334455556 6677788910-1 113344 555566677788910-20 0223344445556667 789-2002233444455 56667789-3-1-1 11223 3334445556678-4-2-2 0011222233344455 67-4-2-200112222333 4445567-4-2-2001122 223334445567-4-2-20 0112222333444556 7-6-4-4-2-2-1-100001112 223345-6-4-4-2-2-1-1000 01112223345-7-5-5-3-3 -2-2-1-1-1-10001112234

Note. The m d a of these differences is 3. The positions in the ordered ein differrnces of the lower andupperlimits of the 95% confidence interval are 117 and 252, leading to a confidence interval of (2.6).

types. Here, mainlyas a resultof the presenceof the outlier, a paired t test givesa p value of . 9 implying no difference between the two electrodes. (Incidentally, O,

outlier identified Figure 8.1 was put down by experimenter to the excessively in the hairy arms of one subject!) Again you may have met the signed-ranks test on your introductory statistics course, but perhaps not the associated estimator and confidence interval, both of which are described in Display 8.4. For the electrode, data application of these procedures gives an estimate of -65 and an approximate 95% confidence interval of -537.0 to -82.5. Details of the calculations are shown in Table 8.5.

COMPUTATIONALLY INTENSIVE METHODS


W1lcoxons Signed-Ranks Test
Display 8 3

243

Assume we have two observations,x and y, on each of n subjects inour sample, i i after for example, before and treatment. We first calculate the differences z = xi - between each pairof observations. t yi To compute the Wdcoxon signed-rank statistic, form the absolute values the T+, of differences,z, and then order them from least greatest. i to Now assign a positiveor negative sign to the anksof the differences according to r or whether the corresponding difference was positive negative. (Zero values are discarded, and the sample size altered accordingly.) n The statistic T+ is the sum of the positive signedranks.Tables are available for assigning p values; see Table A.4 in Hollander andWolfe (1999). A large sample approximation involves testing the statistic as a standard 2 normal:

z=

T+
[n(n

+ 1)(h+

- + 1)/4 n(n

1)/24]l2

If there are ties among the calculated differences, assign each the observations of r in a tiedgroup the averageof the integerranksthat a e associated with the tied group.

Skin Resistance and Electrode lLpe

TABLE8.4

Subject

Electtvde I

Elechvde 2

2 3

500 660
250 72 135 27 100 105 90 200

400

600
370 300 84

7 8 9 10

4 5 6

140
50

180 290 45

180

11 12 13 14

15 160 250
170

400

200

15 16

107

66

310 loo0 48

244

CHAPTER 8
l

-2

-1

Quantiles of Standard Normal


FIG. 8.1. Probability for plot differences between electrode readings for the data in "able 8.4.

Display 8.4 Estimator and Confidence Interval Associated with Wdcoxon's Signed-Rank Statistic

An estimator of the treatment effect, 6, is the median of the n(n 1)/2 averages of is, pairs of the n diference.s on which thestatistic T+ is based, that the averages ., (zr+z,)/2forisj=l, . .n. For a symmetric two-sided confidence interval for with confidence level1 -a, we 6, first find the uppera/2percentile pointof the null distributionof T+, from ,,,,r Table A.4 in Hollander and WoKe (1999) orfrom a permutational computation. Then set
L

The lower and upperlmt of the required confidence interval now found as the iis are C, and z,/z position valuesof the ordered averages of pairs of differences.

COMPUTATIONALLY INTENSIVE METHODS


TABLE 8.5 Estimation and Confidence Interval Construction for Electrode Data

245

Averages of Pairs of Sample Diffeerences

100.0 80.0 -10.0 16.0 -32.5 21.5 75.0 12.5 5.0 5.0 35.0 30.0-25.0 -20.0 -417.0 79.5 80.0 60.0 -30.0 -4.0 -52.5 1.5 55.0 -7.5 -15.0 -15.0 15.0 10.0-45.0 -40.0 -437.0 59.5 -10.0 -30.0 -120.0 -94.0 -142.5 -88.5 -35.0 -97.5 -105.0 -105.0 -75.0 -80.0 -135.0 -130.0 -527.0 -30.5 16.0 -4.0 -94.0 -68.0 -116.5 -62.5 -109.0 -104.0 -501.0 -9.0 -71.5 -79.0 -79.0 -49.0 -54.0 -116.5 -165.0 -111.0 -57.5 -120.0 -4.5 -32.5 -52.5 -142.5 -127.5 -127.5 -97.5 -102.5 -157.5 -152.5 -549.5 -53.021.5 1.5 -88.5 -62.5 -111.0 -57.0 -3.5 -66.0 -73.5 -73.5 1.075.0 55.0 -35.0 -43.5 -48.5 -103.5 -98.5 -495.5 -9.0 -57.5 -3.5 50.0 -12.5 -20.0 -20.0 10.05.0 -50.0 -45.0 -442.0 54.5 12.5 -7.5 -97.5 -71.5 -120.0 -66.0 -12.5 -75.0 -82.5 -82.5 -52.5 -57.5 -112.5 -107.5 -504.5 -8.0 5.0 -15.0 -105.0 -79.0 -127.5 -73.5 -20.0 -82.5 -90.0 -90.0 -60.0 -65.0 -120.0 -115.0 -512.0 -15.5 5.0 -15.0 -105.0 -79.0 -127.5 -73.5 -20.0 -82.5 -90.0 -90.0 -60.0 -65.0 -120.0 -115.0 -512.0 -15.5 35.0 15.0 -75.0 -49.0 -97.5 -43.5 10.0-52.5 -60.0 -60.0 -30.0 -35.0 -90.0 -85.0 -482.0 14.5 30.0 10.0-80.0 -54.0 -40.0 -95.0 -102.5 -48.5 5.0 -57.5 -65.0 -65.0 -35.0 -90.0 -487.0 9.5 -25.0 -45.0 -135.0 -109.0 -157.5 -103.5 -50.0 -112.5 -120.0 -120.0 -90.0 -95.0 -150.0 -145.0 -542.0 -104.0 -152.5 -98.5 -45.0 -107.5 -45.5 -20.0 -40.0 -130.0 -115.0 -115.0 -85.0 -90.0 -145.0 -140.0 -537.0 -40.5 -417.0 -504.5 -512.0 -512.0 -437.0 -527.0 -501.0 -549.5 -495.5 -442.0 -482.0 -487.0 -542.0 -537.0 -934.0 -437.5 79.5 59.5 -30.5 -4.5 -53.0 1.0 54.5 -8.0 -15.5 -15.5 14.5 9.5 -45.5 -40.5 -437.5 59.0

Note. The medianof these averages -65. The upperand lower is limits of the 95% confidence interval are in position 7 and 107 of the ordered values of the averages,leading to the interval (-537.0. -82.5).

8.3. DISTRIBUTION-FREE TEST FOR A ONE-WAY DESIGN WITH MORE THAN TWO GROUPS
Just as the Wllcoxon-Mm-Whirney test and theWilcoxon signed-ranks test can be considered as distribution-free analogsof the independent samples and paired samples t tests, theKruska-wallis procedureis the distribution-free equivalent of

246
Display 8.5

CHAF'TER 8

Kruskal-Wallis Distribution-Free Procedurefor One-way Designs


~~

distribution. For the Kruskal-Wallis test tobe performed, the observations first ranked without are regard to group membership and then the sums of the ranks of the observations in each group are calculated. These sums will be denoted by RI, . .Rk. R2. ., If the null hypothesis is m e , we would expect the R,s to be more or less equal, apart from differences caused the different samplesizes. by A measure of the degreeto which the Rjs differ from one another is given by

Assume there are k populations to be compared that a sample of nj observations and is available from population j , j = 1, . . k. ., The hypothesis to be tested is that allthe populations have the same probability

where N = nj Under the null hypothesis the statisticH has a chi-squared distribution with k 1 degrees of freedom. the one-way ANOVA procedure describedin Chapter 3. Details of the KruskalWallismethodforone-waydesignswiththreeormoregroupsaregivenin of Display 8.5. (When the number groups is two, the Kruskal-Wallis test is equivalent to the Wilcoxon-Mann-Whitney test.) To illustrate the ofthe Kruskal-Wallis method, we apply it to the data use shown in Table 8.6.These data arise from an investigation possible beneficial effects of the of the pretherapy training clients on the process and outcomecounseling and of of psychotherapy. Sauber (1971) investigated four different approaches to pretherapy training. Control (no treatment), Therapeutic reading (TR) (indirect learning), vicarious therapy pretraining ( " (videotaped, vicarious learning), and V ) Group, role induction interview (M) (direct learning). Nine clientswere assigned to each of these four conditions and a measure of psychotherapeutic attraction eventually given to each client. Applyingl -the hska Wallis procedure to these data gives a chi-squared test statistic which with 4.26, of 3 degrees of freedom has an associated value of .23. There is no evidence of a p difference between the four pretherapy regimes. (Hollander and Wolfe, 1999, describe some distribution-free analogs to the multiple comparison procedures discussed in Chapter 3, for identifying which

c:=,

COMPUl"ATI0NALLY INTENSIVE METHODS


TABLE8.6 Psychotherapeutic Attraction Scores for Four Experimental Conditions

247

Contml

Rroding (m)

Wotape (VTP)

Gmup (HI)

0 1 3 3 5 10 13 17 26

0 6 7 9 11 13 20 20 2 4

0 5 8 9 11 13 16 17 20

22

1 5 12 13 19

2 5 27 29

Friedman's Rank Test for Correlated Samples


Here we assume that a sample n subjects are observed under conditions. of k First thek observations for each subject ranked, togive the values are r i j , i = l , . .n , j = l , . , ., . .k. Then the sums of these ranksover conditions are found as Rj = rtj, j = 1, .. k. ., The Friedman statisticS is then given by

Display 8.6

cy=1

Under thenull hypothesis that in the population the conditions have same the distribution, S has a chi-squared distribution withk - degrees of freedom. 1

particular groupsin the one-way design differ, after obtaining a significant result from the Kruskal-Wallis test.)

8.4. DISTRIBUTION-FREE TEST FOR A ONE-WAY REPEATED MEASURES DESIGN


When the one-way design involves repeated measurements on the same subject, a (1937) can be used. The test is described in distribution-free test from Friedman ANOVA Display 8.6. The test is clearly related to a standard repeated measures applied to ranks rather than raw scores.

248

CHAPTER 8
TABLE8.7 Percentage of Consonants Correctly Identified under Each of Seven Conditions

Subject

AL

AC

L C

ALC

5 11.9 6 0 78.6 47.4 42.9 46.4 7 5.0 4.0 8 9 0 1.1 10 2.4 11 12 0 0 13 0 14 1.1 15 1.1 16 17 0 0 18

2 3 4

1.1 1.1 13.0


0

36.9 33.3 28.6 23.8


40.5

52.4
3. 45 40.5

42.9
3. 45

22.6 57.1 22.6 42.9 38.0 31.0 38.0 21.4 33.3 23.8 29.8 33.3 26.1 35.7

33.3 33.3 35.7 35.7 13.0 42.9 31.0 34.5 41.7

31.0 41.7 4. 40 33.3 46.4 37.0 33.3 45.2 32.1 46.4 33.3
3. 45

83.3 77.3 81.0 69.0 98.8 69.0 95.2 89.3 70.2 86.9 67.9 86.9 85.7 81.0 95.2 91.7 95.2

63.0 81.0 76.1 65.5


% A

27.4 20.2 29.8 27.4 26.2 29.8 21.4 32.1 28.6 28.6 36.9 27.4 41.7

44.0

32.1 41.7 25.0 40.5 42.9

39.3 35.7 31.0 4. 40 45.2

n.4 73.8 91.7 85.7 71.4 92.9 59.5 82.1 72.6 78.6 95.2 89.3 95.2

To illustrate the use of the Friedman method, we apply it to the data shown in Table 8.7, taken from a study reported by Nicholls and Ling (1982), into the hand effectiveness of a system using cues in the teachingof language to severely hearing-impaired children. In particular, they considered syllables presented to hearing-impaired children under the following seven conditions.

A: L: AL: C: AC: LC: ALC:

audition, lip reading, audition and lip reading, cued speech, audition and cued speech. lip reading and cued speech, and audition, lip reading, and cued speech.

The 18 subjects in the study wereall severely hearing-impaired children who for had been taught through the of cued speech at least 4 years. Syllables were use presented to the subjects under each of the seven conditions (presented in ranin dom orders), and the subjects were askedeach case to identify the consonants in each syllableby writing down what they perceived themto b . The subjects e

COMPUTATIONALLY INTENSIVE

METHODS

249

results were scoredby marking properly identified consonants in the appropriate order as correct. Finally. an overall percentage correct was assigned to each participant under each experimental condition; it is these percentages that are shown in Table 8.7. to Applying the Friedman procedure the datain Table 8.7 gives the chi-squared p statistic of 94.36 with 6 degrees of freedom. The associated value is very small, in and there is clearly a difference the percentage of consonants identifiedin the different conditions. (Possible distribution-free approaches to more complex repeated measures situations are described in Davis, 1991.)

8.5. DISTRIBUTION-FREECORRELATION AND REGRESSION


A frequent requirement in many psychological studies is to measure the correlation between two variables observed on a sample of subjects. Table 8.8, for Io example, examination scores (out of corresponding exam completion times 75) and (seconds) are given for 10 students. A scatterplot of the data is shown in Figure 8.2. r, takesthe TheusualPearsonsproductmomentcorrelationcoefficient, value 0.50, and a t test that the population correlation coefficient, p takes the , value zero gives t = 1.63. (Readers are reminded that the test statistic here is r d(n - 1 r*), which under thenull hypothesis thatp = 0 has at distribu2)/( 1 tion with n - degrees of freedom.) Dealing with Pearsons coefficient involves

TABLE 8.8 Examination Scores and lIme to Complete


Examination

score

lime (S)

49 70 55 52 61 65 57 71 69 44

2160 2063 2013 20 00 1420 1934 1519 2735 2329

I590

250

CHAPTER 8

e e . e

e
e

e
I

65

70

Exam w r e
FIG. 8.2.

times.

Scatterplot of examination scores and completion

the assumption that the data arises from a bivariate normal distribution (see Everitt and Wykes. 1999, for a definition). Buthow can we test for independence when a we are not willingto make this assumption? There. are number of possibilities.

8.5.1.

Kendalls mu

Kendalls tauis a measure correlation that canused to assess the independence of be of two variables, without assuming that the underlying bivariate distribution is of of bivariate normal. Details the calculation and testing the coefficient are given in Display 8.7. For the datain Table 8.8, Kendalls tau takes the value 0.29, and z the large sample test is 1.16 with an associated p value of 0.24. The estimated correlation is lower than the value given by using Pearsons coefficient, although the associated tests for independence both conclude that examination score and completion time are independent.

8.5.2.

Spearmans Rank Correlation

Another commonly used distribution-free correlation measureSpearmans cois efficient. Details of its calculation and testing are given in Display 8.8. For the examination scores and times data,it takes the value 0.42, and the large sample

COMPUTATIONALLY INTENSIVE METHODS


Display 8.7 Kendalls Tau

25 1

Assume n observations on two variables are available. W~thin each variable the observations are ranked. The value of Kendalls tau based on the number of is inversions in the two sets of help of asmall example involving ranks. This term can best be explained with the three subjects withscores on variables, whichafter ranking givethe following; two Subject 1 2 3 Variable l 1 2
3

Variable 2 1
3

When the subjectsare listed inthe order theirrankson variable 1, there isan of inversion ofthe ranks on variable 2 (rank 3 appears before rank 2). Kendalls tau statistic can now be definedas
7=1-

n(n

21 1)/2

in where I is the number of inversions the data. A significance testfor the null hypothesis that population correlationis zero is the provided by z given by

z=

+ 5)1/[9n(n -l)]

which can be testedas a standard normal.

Display 8.8 Spearmans Correlation Coefficient Ranked Data for

Assume that n observations ontwo variables are available. Within each variable the observations are ranked. Spearmans correlation coefficient defined as is

j in where D is the difference the ranks assigned to subject j on the two variables. (This formula arises simply from applying usual Pearson correlation coefficient the formula to theranked data.) A significance testfor the null hypothesis that population correlationis zero is the given by
z = (n
which is tested as a standard normal.

-l)zrs,

252

CHAPTER 8

z test is 1.24 with an associatedp value of 0.22. Again, the conclusion that the is two variablesare independent.

8.5.3. Distribution-Free Simple Linear Regression


Linear regression, as described in Chapter 6, is one of the most commonly used of regression coefficients depends of statistical procedures. Estimation and testing the model r largely on the assumption that the error terms in a enormally distributed. It is, however, possible to use a distribution-free approach to simple regression where there is a single explanatory variable, based on a methodby Theil suggested 8.9. (1950). Details are given in Display We shall use distribution-free regression procedure on the data introduced in the Chapter 6 (see Table 6.1). giving the average number of words known by children
Display 8.9 Distribution-Free Simple Linear Regression

The null hypothesisof interest is that the slope of the regressionline between two variables x and y is zero. We have a sample of n observation on the variables, x i , y j j = 1, . . n. two , ., The test statistic, C, is given by
i=l I=i+l

where the functionc is zero if yj = yi, takes the value 1 if yI yi. and the value 1 if^/ <yi. The p value of the teststatistic can be found from Table A.30 in Hollander and Wolfe (1999). A large sample approximation uses the statistic z given by
L=
[n(n

-l)(% +5)/18I1l2'

referred to a standard normal distribution. An estimator of the regression slope is obtained as where Si) = (y, yi)/(xj xi). For a symmetric two-sided confidence interval B with confidence level 1 - Iwe for (, first obtain the upper / 2percentile point,kap of thenull distribution of C from u Table A.30 of Hollander and Wolfe (1999). The lower and upper lits of the required confidence intervalare found at the Mth and Qth positionsof the orderedsample values S,, where M=
n

b = median(Si,, 1 5 i <j 5 n),

- ka/2 2
2

'

COMPUTATIONALLY INTENSIVE METHODS TABLE 8.9 Distribution-FreeEstimation for Regression Coefficientof Vocabulary Scores on Age
Si] Values (see Display 8.9)for the Vocabulary Data

253

38.00000 533.42857 600.00000 624.00000 900.00000 652.00000 648.00000 404.00000

269.00000 517.25000 607.2m 633.33333 776.00000 644.00000 566.66667 461.33333

295.33333 446.5oooO 487.60000 512.33333 511.8oooO 500.00000 582.66667 424.00000 616.00000 564.44444 585.71429 348.00000 634.00000 639.2oooO 572.5oooO 600.00000 729.33333 712.00000 650.4oooO 604.57143 649.33333 555.33333 588.00000 636.00000 536.00000 660.00000 532.ooMw) 511.00000 490.00000

Note. The estimatesof the intercept and thedope are obtained from these values as d = median(vocabu1ary score j x age):

j= median(Sij).

This l a s to values e d the d = -846.67, = 582.67. Theteststatistic, C. for testing the null hypothesis that the slope is zero takes the value 45. The associated p values for the appropriate tableinHollanderandWolfe (1999) is very small, and so we can (not surprisingly) reject the hypothesis that the regression coefficientfor age and vocabularyscore isnot zero.

at various ages. Details of the calculations and results are shown in Table 8.9.The fitted distribution-free regression line for comparison, that obtained using and, by least squares are shown on a scatterplot of the data in Figure 8.3. Here the two fitted linesare extremely similar. Other aspects distribution-free regression including a procedure of for multiple regression are discussed in Hollander and Wolfe(1999). Such methods based on is, what might be termed a classical distribution-free approach, that using ranks in some way, are probably not of great practical importance. However, the more recent developments such as locally weighted regression and spline smothers, which allow the data themselves to suggest the form regression relationship of the between variables, are of increasing importance. Interested readers can consult Cleveland (1985) for details.

8.6. PERMUTATION TESTS


TheWllcoxon-Mann-Whitneytestandotherdistribution-freeproceduresdeare scribed in the previous section all simple examplesof a general classof tests known as either permutation or randomization tests. Such procedures were first

254

CHAPTER 8

we
FIG. 8.3. Scatterplot vocabulary of scoresdifferent at ages, showing fitted least squares and distribution-free regressions.

introduced in the 1930s by Fisher and Pitman, but initially they were largely of theoretical rather than practical interest because of the lack of the computer technology required to undertake the extensive computation often needed in their application.However,witheachincreaseincomputerspeedandpower,the permutation approach is being applied to a wider and wider variety of problems, and with todays more powerful generation of personal computers, faster it is often to to calculate a p value for an exact permutation test thanlook up an asymptotic (or approximation in abook of tables. Additionally, the statistician the psychologist) is not limited the availability of tables but free to choose a test statistic by is a particular alternaexactly matched to testing a particular null hypothesis against tive. Significance levels then, so to speak, computedon the fly (Good, 1994). are The stages in a general permutation are as follows. test
1. Choose a test statistic S. 2 Compute S for the original set of observations. . 3. Obtain thepermutarion disrriburion Sof repeatedly rearranging the obserby vations. Whentwo or more samplesare involved (e.g., when the difference between groups is assessed), all the observations combined into a single are large sample before they rearmnged. are

4. For a chosen significance level a, obtain the upper a-percentage point of the permutational distribution and accept or reject the null hypothesis

COMPUTATIONALLY INTENSIVE METHODS

255

according to whether value of S calculated for the original observations the is smaller or larger than this value. The resultant p value is often referredto

as exact.

To illustrate the application a permutation test, consider a situation involving of two treatmentsin which three observations of some dependent variable interest of are made under each treatment. Suppose that the observed values as follows. are Treatment 1: Treatment 2

121,118,110, 34,33,12.

The null hypothesisis that thereis no difference between the treatments their in effect on the dependent variable.The alternative is that the first treatment results in in higher valuesof the dependent variable.(The author is aware that the result this case ispretty clear without any test!) The first stepin a permutation testis to choose atest statistic that discriminates between the hypothesis the alternative. obvious candidateis the sum of the and An observations for the first treatment group. the alternative hypothesis true, this If is sum ought to be larger than the sum of the observations in the second treatment group. If the null hypothesis true, then the sum the observations each group is of in should be approximately the same. Onesum might be smaller or larger than the other by chance, but the two should not be very different. The value of the chosen test statistic for the observed values is 121 118 110 = 349. To generate the necessary permutation distribution, remember that under the null hypothesis the labels treatment 1 and treatment 2 provide no information about the test statistic, as the observations re expected to have almost a the same values in each two treatment groups. Consequently, topermutate the of the observations, simply reassign six labels, the three treatment 1and three treatment 2, to the six observations; for example, treatmentl-121,118,34; treatment 2-1 10, 12,22; and so on. Repeat the process until the possible20 distinct arrangements all have been tabulated as shown in Table 8.10. From the results given Table 8.10, it is seen that the sum of the observations in in the original treatment 1 group, that is, 349, is equalled only once and never exceeded in the distinct random relabelings. If chance alone is operating, then such an extreme value has a 1 in 20 chance of occurring, that is, 5%. Therefore at the conventional .05 significance level, the test leadsto the rejection of the null hypothesis in favor of the alternative. The Wilcoxon-Mann-Whimey test described in the previous section is also an example of a permutation test, but one applied to the ranks of the observations rather than their actual values. Originally the advantage of this procedure was that, because ranks always take the same values(1,2, etc.), previously tabulated distributions couldbe used to derive p values, at least for small samples. Consequently, lengthy computations were avoided. However,is now relatively simple it

+ +

256

CHAPTER 8
TABLE 8.10 Permutation Distributionof Examplewith Three Observations in Each Group

FirstGmup

Second Gmup

sm of First u
Cmup Obs.

12

121 1 22 121 2 121 3 118 4


5

121

68

121 6 251 121 7 118 8 22 121 9 118 10 11 12 13 14 15 16 17 18 19 110 20

34 121 118 110 121 110 118 110 121 118 110 118 34

34 l18 12 118 110 121 110 118 110 12 110 22 118 12110 118 110 110 1234 121 34 34 34 2234 34 118 22 22 22 22 121

110 2234 34 262 34 22 3422 3412 34 22 12 24012 110 22 22 12 118 22 110 12 12 12 12 12 12

110 118 12 118 110 121 22 121 118 118 121 121 121 121 121 34

273
22

12 12

265

22 34

261 253
250

243 34 167 110 166 156 118 110 110 118 12 22 12 22 34 34 177 174 164 155 152 144

TABLE 8.11 Data from a Study of Organizationand Memory


~~

Tmining Gmup

No Tmining Gmup

0.35 0.40 0.41 0.46 0.49 0.34 0.37 0.33 0.35 0.39

to calculate the required permutation distribution, a point bedemonstrated that will a study of organization in by using the data shown in 8.11,which results from Table of A bidirectional the memory mildly retarded children attending a special school. (S02) of intertrial subjective organizational measure was used to assess the amount consistency ina memory task given to children intwo groups, oneof which had of training in sorting tasks. The question of interest received a considerable amount of here is whether there is any evidence an increase in organization in the group

COMPUTATIONALLY INTENSIVE METHODS


TABLE 8.12
Permutational Distribution the Wllcoxon-Mann-Whitney Rank Sum Statistic of S02 Applied to the Scores

257

There are a totalof 252 possible permutations pupils to groups, that ( ) of is . The distributionof the sum of the ranksin thetraining group for all possible252 permutations is as follows.
S19 18 17 16 15 f 1 3 1 2 5 S 29 28 30 32 31 33 f 2 0 1 9 1 8 1 6 1 4 1
20 21 22 23 24 26 25 27 7 9 1 1 4 1 6 1 8 1 9 2 0 39 3834 36 35 37 40 1 7 9 5 3 2 1 1

So. for example, thereis one arrangement where the sum to15, one where they sum 16, ranks to

two where they sum 17, and 50 on. to From this distribution we can determine the probability of finding a value the sum of the ranks of equal to or greater than the observed value37, i the null hypothesis is Vue: of f
Pr(S 2 37) = (3

+2 + 1 + 1)/(252) = 0.028.

It is type of calculation thatgives the values in the tables assigning p values to values this for of the Wkoxon-ManwWhitney rank sum t s et

that hadreceived training (more details the studya e given inRobertson, 1991) of r . The permutational distribution the Wilcoxon-Mann-Whitneyrank sum statistic of is shown inTable 8.12. As can be seen,this leads toan exact p value for the testof a group difference on the S02 scores of .028. There is evidence of a training effect.

8.7. THE BOOTSTRAP


The bootstrap is a data-based method for statistical inference. Its introduction into statistics is relatively recent, because themethod is computationally intensive. According to Efron and Tibshirani (1993), the term bootstrap derives from the phrase to pull oneself up by ones bootstraps,widely consideredto bebased on the eighteenth century adventures Baron Munchausen by Rudolph of Erich Raspe. (The relevant adventureis the one in which the Baron had fallen into the bottom of a deep lake. Just when looked as if all was lost, he thought he could himself it pick up by his own bootstraps.) respects, requires The bootstrap looks l i e the permutational approach in many a minimal number of assumptions for its applications, and derives critical values for testing and constructingconfidence intervals from the data at hand. stages The in the bootstrap approach as follows. are

1. Choose a test statistic S. 2. Calculate S for the original set of observations.

258

CHAPTER 8

4. Obtain the upper a-percentage point of the bootstrap distributionand acS cept or reject the null hypothesis according to whether for the original observations is smaller or larger thanthis value. 5. Alternatively, consmct a confidence intervalfor the test statistic by using the bootstrap distribution (see the example given later).

3. Obtain the bootstrap distribution of S by repeatedly resampling from the are are observations. In the multigroup situation, samples not combined but resampled separately. Sampling wirh replacement. is

Unlike a permutation test, the bootstrap does not provide exact p values. In addition, it generally less powerful than a permutation test. However, it be is may possible to use a bootstrap procedure when no other statistical method is applicable. Full details of the bootstrap, including many fascinating examples of its application, are given by Efron and Tibshirani (1993). Here, we merely illustrate its u e in s first two particular examples, in the constructionof confidence intervals for the difference in means and in medians of the groups involved in the WISC data 3 .) introducedin Chapter (see Table 3 6 ,and then in regression, using the vocabulary scores data used earlier in this chapter and givenin Chapter 6 (see Table 6.1). The results the WISC example summarized in Table for are 8.13. The confidence interval for the mean difference obtained using the bootstrap seen tobe narby is rower than the corresponding interval obtained conventionally witht the statistic.
TABLE 8.13 Construction of Confidence Intervalsfor the WISC Data by Using the Bootstrap
The procedure is based on drawing random samples 16 observations with replacement from of each of the m and comer p u p s . The random samples a m found by labeling the observations in each p u p with the integers 1.2,. . , l 6 and selecting random samples o f these integers (with replacement), using appro. . an priate computer algorithm (these generally available in most statistical are software packages). The mean difference and median difference is calculated for each bootstrap sample. In this study, lo00 bootstrap samples were used, resultingl in 0 mean and median o0 differences. The first five bootstrap samples in terms the integers 1.2.. . l 6 were as follows. of ,

1,1,3,3,3,4,5,6,8,8,9,14,14,15,16,16 1,2,4,4,4,5,5,5,6,10,11,11,12,12,12,14 1, 2. 3, 3. 5, 5. 6,6.9. 9, 12. 1 , 14, 15, 15, 16 3

1, 2.2,6, 7, 7, 8, 9. 11. 11, 13, 13, 14. 15, 16, 16 1. 2 3, 3, 5, 6, 7. I, 11, 1 1 , 12, 13. 14, 15, 16 O

A rough 95% confidence interval can be derived by taking the 25th and 975th largest of the
replicates, leading to the following intervals: mean (16.06,207.81), median (7.5.193.0). The 95%confidence interval for the mean difference derived inway, assuming usual the normality of the population distributions and homogeneity variance, is(11.65.21 1.59). of

COMPUTATIONALLY INTENSIVE METHODS

259

!lm 100
0

100

200

300

FIG. 8.4. Histogram of mean differences for samples of the WSC data.

l o o 0 bootstrap

300

I"

100

100

200

FIG. 8.5.

Histogram median of differences samples of the WlSC data.

for l o o 0 bootstrap

Histograms of the mean and median differences obtained from the bootstrap samn ples areshown i Figures 8.4 and 8.5. The bootstrap results the regression vocabulary scores age are summafor of on rized i Table 8.14.The bootstrap confidence interval(498.14,617.03) is wider n of (513.35, than the interval givenChapter 6 based on the assumption of normality in 610.51). The bootstrap results are represented graphicallyFigures 8.6 and 8.7. in We see that the regression coefficients calculated from the bootstrap samples sho a minor degree of skewness, and that the regression coefficient calculated from is of the the observed data perhaps a little biased compared to the mean bootstrap distribution of 568.35.

260

CHAPTER 8
TABLE 8.14 BootstrapResults for Regression of Vocabulary Scores on Age

Observed Value

Bias

Mean

SE

6.42

561.93

Number of bootsmp samples. 1o00, 95% interval (498.14.617.03).

Confidence

500

550

600

650

700

Estimated regression coefficient

FIG. 8.6. Histogram regression of coefficients vocabulary of scores on age for I OOO bootstrap samples.

8.8. SUMMARY
1. Distribution-freetests are usefulalternativestoparametricapproaches, when the sample size small and therefore evidence for any distributional is assumption is not available empirically. 2. Such tests generally operateon ranks and so are invariant under transforonly mations of the data that preserve order. They use the ordinal property of the raw data.

COMPUTATIONALLY INTENSIVE METHODS

26 1

-2

Quantiles of Standard Normal

FIG. 8.7. Normal probability plot of bootstrap regression coefficients for vocabulary scores data.

3. Permutation tests and the bootstrap offer alternative approaches to making 4. This chapter hasgiven only a brief account this increasingly important of area of modem applied statistics. Comprehensive accounts are given by Good (1994) and by Efron and Tibshirani (1993).

distribution-free inferences. Both methods computationally intensive. are

COMPUTER HINTS

SPSS
Many distribution-free tests are available in SPSS; for example, to apply the Wilcoxon-Mann-Whitney test, we would use the following steps.
1. Click on Statistics, click on Nonparametric Tests, and then click on 2

Independent Samples. 2. Move the names of the relevant dependent variables to the Ts Variable et List.

262

CHAPTER 8

3. Click on the relevant grouping variable and move it to Grouping Varithe

able box. 4. Ensure thatMann-Whitney U is checked. 5. Click on OK. For related samples, after clicking on Nonparametric we would click 2 Related Samples and ensurethat Wilcoxon is checked the Ts Type dialog box. in et

S-PLUS
Various distribution-free tests described the textare available from the Statistics in menu. The following steps access the relevant dialog boxes.

1. Click on Statistics; click onCompare Samples;then et (a) Click on One Sample, click on Signed Rank Ts for the Wilcoxons and signed rank test dialog box. ( )Click on h 0 Sample, and click onWilcoxon Rank Ts for the b et Wllcoxon-Mann-Whimey test dialog box. et (c) Click on k Samples, and click on Kruskal-WallisRank Ts or Friedman Rank Ts to get the dialog box for the Kruskal-Wallis oneet Friedman procedure for repeated way analysis procedure, or the measures.

Al these distribution free tests are also available by a particular function in l are wilm.test, krusM.test, and thecommandlanguage;relevantfunctions friedmantest. For example, to apply the Wilcoxon-Mann-Whitney test to the data in two vectors, x1 and x2, the command would be

rank and to apply theW1lcoxons signed test to paired data contained in two vecto of the same length, and y2, the command would be yl

wikox.test(y1 ,y2,paired=T).
The density, cumulative probability, quantiles, and random generation for the distribution of the Wllcoxon-Mann-Whimey rank sum statistic are also readily and be available by using dwilcox, pwikox, qwilcox, rwilcox, and these can very useful in particular applications. if distribution-free confidence intervals The outer function is extremely helpful are required. For example, in the electrode example the text, the data each in if for electrodel and electrode2, then the required sums electrode are stored in vectors,

COMPUTATIONALLY INTENSIVE METHODS of pairsof observations needed can be found as diff < -eJectrodel-electrode2,
wij < -outer(dX,di& "+")/2,

263

The cor function can be used calculate a variety of correlation coefficients, to and cor.test can be usedfor hypothesis testingof these coefficients. The bootstrap resampling procedureis available from the Statistics menu as follows. 1. Click on Statistics, click on Resample, and click on Bootstrap to access the Bootstrap dialog box. 2. Select therelevant data set, enter the expression the statistic to estimate, for and click on theOptions tag to alter the number bootstrap samples from of if the default value of 1O00, required.

EXERCISES
8. The data in Table 8.15 were obtained in a study reported by Hollander . 1 and Wolfe (1999).A measure of depression was recorded each patient on both for of the first and second visit after initiation therapy. Use the Witcoxon signed rank test to assess whether the depression scores have changed over the two visits and construct a confidence interval the difference. for
TABLE 8.15 Depression Scores
~~

Patient

Vi l ur

b i t2

1 2 3 4 5 6
7

8 9

1.83 0.50 1.62 2.48 1.68 1.88 1.55 3.06 1.30

0.88 0.65 0.60 2.05 1.06 1.29 1.06 3.14 1.29

264 Ts Scores of Dizygous Wins et


TABLE 8.16

CHAPTER 8

Pair i

Twin Xi

Twin fi

1 2 3 4 5 6 7 8 9 10 11 12 13

277 169 157 139 108 213 232 229 114 232 161 149 128

256 118 137 14 4 146 221 184 188 97 231 114 187 230

8.2. The data in Table 8.16 give the test scoresof 13 dizygous (nonidentical) 1999). male twins (the data are taken from Hollander and Wolfe, Test the hypothesis of independence versus the alternative that the twins scores are positively correlated.

8.3. Reanalyze the S02 data given in Table 8.11 by using a permutational approach, taking as the test statistic the sum the scores in the group that had of received training.

8.4. Investigate how the bootstrap confidence intervals both the mean and for median differencesin the WISC data change with the size of the bootstrap sample.
8.5. Use the bootstrap approach to produce an approximate 95% confidence interval for the ratio of the two population variancesof the WISC data used in this chapter. Investigatehow the confidence interval changes with the number of bootstrap samples used.

8.6. A therapist is interested in discovering whether family psychotherapyis of any value in alleviating the symptoms of asthma in children suffering from the disease. A total of eight families, each having a child with severe asthma, is selected for study. As the response variable, the therapist uses the number an asthma attack in a of trips to the emergency room of a hospital following 2-month period. The data shown in Table 8.17 give the values of this variable for the eight children both before psychotherapy and f e psychotherapy. Use a atr

COMPUTATIONALLY INTENSIVE METHODS


Number of Visits to the Emergency Rwm in a 2-Month Period
TABLE 8.17

265

Psychotherapy

Patient

Bdom

Ajier

PEF and S:C Measurements for 10 SchizophrenicPatients


PEF

TABLE 8.18

sc :
3.32 3.74 3.70 3.43 3.65 4.50 3.85 4.15 5.06 4.21

48 42 44 35 36 28 30 13 22 24

two periods, and calculatea distribution-free confidence interval the treatment for

suitable distribution-fre test to determine whether there has been any change

effect.
8.7. Thedata in Table 8.18 show the value of anindex known as the psychomotor expressiveness factor (PEF) andtheratio of striatum to cerebellum

266

CHAPTER 8

radioactivity concentration 2 hours after injection of a radioisotope (the ratiois known as S:C), for 10 schizophrenic patients. The data were collected in an investigation of the dopamine hypothesis of schizophrenia. Calculate the values of Pearsons product moment correlation, Kendalls tau, and Spearmans correrank lation for the variables, and each two case, test whether the population correlation is zero.

Analysis of Categorical Data I: Contingency mbles and the Chi-square Test

9.1. INTRODUCTION
Categorical data occur frequently in the social and behavioral sciences, where so and information about marital status, sex, occupation, ethnicity, on is frequently of a set of interest. In such cases the measurement scale consists of categories. %o specific examplesare as follows.
. -

1. Political philosophy:liberal, moderate, conservative. 2. Diagnostic testfor Alzheimers disease: symptoms present, symptoms absent.

m e categories of a categorical variable should mutually exclusive; one be i.e., and only one category should to each subject, unlike, say, the set apply of categories of liberal, Christian, Republican.) Many categorical scales have a natural ordering. Examples attitude toward are legalization of abortion (disapprove all cases, approve only certain cases, apin in good, prove inall cases), response to a medical treatment (excellent,fair, poor) and is diagnosis of whether a patient mentally ill (certain, probable, unlikely, definitely not). Such ordinal variables have largely been dealt with in the previous chap
267

268

CHAPTER 9

here we shall concentrate on categorical variables having unordered categories, so-called nominal variables.Examples are religious affiliation (Catholic, Jewish, Protestant, other), mode of transportation to work (automobile, bicycle, bus, w and favoritetype of music (classical, country, folk, rock). For nominal varijazz, ables, the order of listing the categories is irrelevant and the statistical analysis should not depend on that ordering. (It could, course, be argued that some of of type these examplesdo have an associated natural order; of music, for example, might be listed in terms its cultural contentas country, folk,jazz, classical, and of rock!) In many cases, the researcher collecting categorical data is most interested in assessing how pairs of categorical variables are related, in particular whether they are independent of one another. Cross-classifications of pairs of categorical variables, that is, two-dimensional Contingency tables, are commonly the starting point for such an investigation. Such tables and the associated chi-square test should be familiar to most readers from their introductory statistics course, but for those people whose memories these topics have become a little faded, the of next section will is hoped) act a refresher. Subsequent sections will consider (it as a number of topics dealing with two-dimensional contingency tables not usually encountered in an introductory statistics course.

9.2. THE TWO-DIMENSIONAL CONTINGENCY TABLE

A two-dimensional contingency tableis formed from cross-classifying two categorical variables and recording many members of the sample in each cell how fall of the cross-classification. An example of a 5 x 2 contingency table is given in Table 9.1, and Table shows a x 3 contingency table. The main question asked 9.2 3
TABLE 9.1 "%Dimensional Contingency Table: Psychiatric Patients by Diagnosisand Whether Their Wtment PrescribedDrugs

Diagnosis

Drugs

No Drugs

schimphnnic disorder Affective Neurosis disorder Personality Special symptom

105 12 18

41

19 52 1 3

8 2

ANALYSIS OF CATEGORICAL DATA

269

TABLE 9.2 Incidence of Cerebral Tumors

Site

Total

9 26

37T t l oa

n m

23 21 34 78

28

4 24

3 17

75 141

Note. Sites: I. frontal lobes; temporal lobes;m. other cerebral B, malignant tumors; C, other cerebral tumors.

areas. Ws:A. benignanttumors;

II.

TABLE 9.3 Estimated Expected Values Under the Hypothesis of Independence for the Diagnosis and Drugs Data in Table I 9.

Drugs

Diagnosis

No Drugs

symptoms

schizophrenic 4.77 Af'fective disorder Neurosis Personality disorder 33.72 Special 4.43

74.51 9.23
24.40

38.49 12.60

65.28 8.57

or not. The question answered by of the familiar is use chi-squaredtest; details are

about such tablesis whether the two variables forming the table are independent

given in Display 9.1. (Contingency tables formed from more than two variables will be discussed in the next chapter.) Applying the test described in Display 9.1 to the data in Table 9.1 gives the estimated expected values shown in Table 9.3 and a chi-square value of 84.19 with 4 degrees of freedom. The associated p value is very small. Clearly, diagnosis and treatment withdrugs are not independent.For Table 9.2 the chi-square 7.84, a statistic takes the value which with 4 degrees of freedom leads to p value of .098. Here there is no evidence against the independenceof site and type of tumor.

270

CHAPTER 9
Display 9.1 Testing for Independence in an r x c Contingency Table

Suppose a sampleof n individuals have been cross-classified with respect to two categorical variables,one with r categories andone with c categories, to form an r x c two-dimensional contingency table. The general form of a two-dimensional contingency tableasfollows. is

Variable 1

Here nij represents the number observations in the ijth cell of the table, ni. of represents the total number of observations the ith row of the table andnj. in represents the total number of observations thejth column of the in table-both ni, and n.j are usually termedmarginal totals. The null hypothesisto be tested is that thetwo variables are independent. This hypothesis can beformulated more formally as

HO Pi/ = Pi. X P , j *
where in the population from which theobservations havebeen sampled, pi) is the n in probability of an observation being the ijth cell, pi. is theprobability of being in the ith category of the row variable, and is theProbability of being thej t h p.j in category of the column variable. F e hypothesis is just a reflection of that for A elementary mle of probability that two independent events and B, the probability ofA nnd B is simply the product of the probabilitiesof A and ofB.) In a sample of individuals we would, if the variables independent, expect n are npi.p., individuals in theijth cell. Using the obvious estimatorsfor p;.and p.j, we can estimate this expected valueto give

The observed and expected values under independence then compared by using are
the familiar chi-squared statistic

If independence holds then the statistic has approximately a chi-squared Xz distribution with(r l)(c 1) degrees of freedom. This allows p values to be assigned. The possible problems caused by use of the chi-square distribution the to approximate thetrue null distributionof Xz is taken up in the text.

- -

ANALYSIS OF CATEGORICAL DATA


TABLE M

27 1

some Examples 2 x 2 Contingency Tables of

1. Classificoion of Psychiatric Patients Sex and Diagnosis by

Sa
Diagnosis

Male

Female
32 32 64

Total

Schizophrenia Other Total

43

15 58

75 47 122

2. Datafmm Pugh (1983)Involving How Juries Come to Decisions in Rape Cases Veniict Guilty Fault Not Guilty Total

alleged

Not Alleged

Ttl oa

153 105 zsa 358

100

24 76

177 181

3. Incidence of Suicidal Feelings Psychotic and Neumtic Patients in ripe of Patient

Psychotic Feelings Suicidol

YS e No

Total

2 18 20

14 20

8 32 40

Hem the verdict is classified against whether the defense alleged that the victim was somehow partially at fault the rape. for

The simplest form of two-dimensional contingency table is obtained when a sample of observations is cross-classified according to the values taken by two dichoromous variables, that is, categorical variables with only two categories. Several examples of such tables shown in Table9.4. The general form of such a are 2 x 2 contingency table, the special form of the chi-square test for such tables, 2x and the construction of a useful confidence interval associated with 2 tables are describedi Display 9.2. n The resultsof applying the chi-squared test the data sets in Table and the to 9.4 a e shown in Table r 9.5. derived confidence intervals for differences in proportions

272

CHAPTER 9
2 x 2 Contingency Tables
The general form of a 2 x 2 contingency tableis as follows.
Display 9.2

Variable 1

Category 1 a Category2

Variable 2 Category 1
c a+c

Category2 Total
d

a+b c+d

n=a+b+c+d Totalb + d

The chi-squared statistic used in testing independence can nowbewritten in for simplifiedform as n(ad bc)*
x 2

(a

+b)(c +d)(a +c)(b +d)

For a 2x 2 table the statistic a single degreeof freedom. has For this type of contingency table, independence implies that the probability of being in category1of variable 1 and category 1 of variable 2(PI) equal to the is and probability of being in category 1of variable 1 category 2of variable 2 ( ~ 2 ) . Estimates of these two probabilities are given by

B1 = a+c'

jj2

= b

b+d'

The standard errorof the difference of the two estimates is given by

This can be used to find a confidence interval the difference in the for two

probabilities in the usual way.

TABLE 9.5 Results of Analyzing theh x 2 ContingencyTables in Table9.4 2


1. For the schizophnia and genderdata, X2 = 7.49 with an associatedp value of . 0 2 The 95% confidence interval the Merence in the probability 06. for of beiig diagnosed schizophrenic for men and for women is (0.01.0.41). 2. For the rape data, X* = 3 . 3 with an associatedp value thatis 59 very small. The 95% confidence intervalfor the difference in the probability of beiig found guilty or not guilty when the defense docs suggest thatthe not rape is partially the fault the victimis (0.25.0.46). .i.e., defendentmore of likely to be found guilty. 3. For the suicidal feelingsdata, X2 = 2.5 with an associatedp value of .l 1. The 95% confidence interval for the difference in the probability of having suicidal feelings in the two diagnostic categories ( 0 4 . . 4 . is - . 4 0 0 )

..

ANALYSIS OF CATEGORICAL DATA


The results indicate the following.

273

1. That sex and psychiatric diagnosis are associated in the sense that a higher

proportion of men than women are diagnosed as being schizophrenic. 2. That the verdict in a rape case is associated with whetherthe defense or not allege that the rape was partially the fault of the victim. 3. Diagnosis and suicidal feelingsa e not associated. r

Three further topics that should mentioned in connection with x 2 continbe 2 gency tablesare
Yatess continuity correction-see Display 9.3; Fishers exacttest-see Display 9.4; McNemars test matched samples-see Display 9.5. for

We can illustrate the use of Fishers exact test on the data on suicidal feeling i n Table 9.4 because this has some small expected values (see Section for more 9.4 comments). The p-value form applying the test ,235, indicating that diagnosis is and suicidal feelingsare not associated. n To illustrate McNemars test, we use the data shown i Table 9.6. For these data the test statistic takes the value 1.29, which is clearly not significant, and we can conclude that depersonalization is not associated with prognosis where endogenous depressed patients are concerned.
Display 93 Yatess Continuity Correction

In the derivation of null distribution of the 2 statistic, a conrinuousprobability the X distribution, namelythe chi-square distribution,is being used as an approximationto the discrefe probability distribution observed frequencies, namely multinomial of the disrriburion (see glossary in Appendix A). To improve this approximation, Yates (1934) suggested a correction the test to statistic that involves subtracting from the positive discrepancies 0.5 0.5 before these (observed-expected), and adding to the negative discrepancies, values are squared in the calculationof X*. The correction maybe incorporated into the formula for X 2 for the 2 x 2 table given in Display9.2 to become

This is now known as the chi-square value corrected continuity. for Although Yatess correctionis still widely used,it isreally no longer necessary routine availability of the exact methods describedSection in 9.4, because of the

274 Display 9.4 Fishers Exact Test 2 x 2 Contingency Tables for

CHAPTER 9

Fishers exact test for a 2 x 2 contingency table does not the chi-square use approximation at al Instead the exact l. probability distribution of observed the frequencies is used. For fixed marginaltotals, the required distribution is what is known as a hypergeomerric distribution. Assuming that the two variables forming the table are independent, the probability of obtaining particular arrangement of any the frequencies a, b , c , and d when the marginal totals are as given is

Ma, b. c, d ) =

(a

+b)!(a+c)!@ +d ) ! ( b+d ) !
a!b!c!d!n!

where a!-read a factorial-is the shorthand method of writing the product and a of allthe integers less than it. (By definition,O! is one.) Fishers testuses this formula to find the probability of the observed arrangement of as frequencies, and of every other arrangement givingmuch or more evidenceof an association betweenthe two variables, keeping mind thatthe marginal totals are in regarded as fixed. The sum of the probabilities ofallsuch tablesis the relevant p value for Fishers test.

Display 95 McNemars Testfor P i e Samples ard

When the samples to be compared in a 2 x 2 contingency tableare matched in some way, for example, the same subjects observed ontwo occasions, then the appropriate test becomes one from McNemar (1962). For a matcheddata set, the 2 x 2 table takes the following form.
Sample 1 A present Sample 2 Apresent Aabsent
a c

A absent
b d

where A is the characteristic assessed in the pairs of observation makingup the of matched samples.Now a represents the number pairs of observations that both have A, and so on. To test whether probability of having differs in the matched populations,the the A relevant test statistic is
x2

(bC) =b+c

which, if there is no difference, has a chi-squared distribution with a degree of single freedom.

ANALYSIS OF CATEGORlCAL DATA I


Recovery of 23 Pairs of Depressed Patients

275

TABLE9.6

Depersonalized P0tien:s Recovered


Not Recovered
Ttl oa

Patients N t Depersonalized o

Recovered
Not recovered Total

14

19 4 23

16

2
7

9.3. BEYONDTHECHI-SQUARETEST:

FURTHER EXPLORATION OF CONTINGENCY TABLES BY USING RESIDUALS AND CORRESPONDENCE ANALYSIS


A statistical significance test is, implied in Chapter 1, often a crude and blunt as instrument. Thisis particularly truein the caseof the chi-square test indepenfor dence in the analysis contingency tables, and after a significant of the test of value statistic is found, it is usually advisable investigate in more detail the null to why at hypothesis of independence fails to fit. Here we shall look two approaches, the first involving suitably chosen residuals and the second that attempts to represent the associationin a contingency table graphically.

Analysis o Contingency nbles f

9.3.1. The U s e

of Residuals in the

After a significant chi-squared statistic is found and independence rejected for a is try two-dimensional contingency table, is usually informative to to identify the it cells of the table responsible, or most responsible, lack independence. It for the of might be thought that can be done relatively simply this by looking at the deviations of the observed counts in each of the table from the estimated expected values cell under independence, that is, by examination residuals: of the

rij=nij-Eij,

i = l , . . r, j = l , .,

. . c. .,

(9.1)

276

CHAPTER 9

This would, however, be very unsatisfactory because a difference of fixed size A is clearly more important smaller samples. more satisfactory way of defining for residuals for a contingency table might be to take
eij

= (nij - ~ ~ ) / f i ~ ~ . ~

These terms are usually knownas standardized residuals and are such that the chi-squared test statistic is given by

It is tempting to t i k that these residuals mightbe judged as standard normal hn variables, with values outside say ( 1.96, 1.96) indicating cells that depart significantly from independence. Unfortunately,can be shown the variance of it that eij is always less thanor equal to one, and in some cases considerably less than one. Consequently the use of standardized residuals for a detailed examination of a contingency table may often give a conservative reading as to which cells independence does not apply. At the costof some extra calculation, a more useful analysis can be achieved by using what are known as adjusted residuals, dij, as suggested by Haberman (1973). These are defined follows: as

When the variables forming the contingency table are independent, the adjusted residuals are approximately normally distributed with mean zero and standard deviation one. Consequently, values outside (-1.96, 1.96) indicate those cells departing most from independence. Table 9.7 shows the adjusted residualsthe data Table 9.1. Here the values for in of the adjusted variables demonstrate thata cells in the table contribute to the N
TABLE 9.7 Adjusted Residuals for the Dt in Table 9.1 aa

Drugs

Diagnosis

No Drugs

schizophrenic -78.28
Affective disorder Neurosis 11.21 Personality disorder 43.23 Special symptoms 13.64

151.56 8.56 -21.69 -83.70 -26.41

-4.42

ANALYSIS OF CATEGORICAL DATA

277

departure from independence of diagnosisand treatmentswith drugs. (See Exercise 9.6 for further work with residuals.)

9.3.2.

Correspondence Analysis

Correspondence analysis attempts display graphically the to relationship between the variables forming a contingency table by deriving a set of coordinate values representing the row and column categories of the table. The correspondence r principal components analysis coordinates a e analogous to those derived from a analysis of continuous, multivariate data (see EverittDand ,2 0 ) except that m 01. they are derived by partitioning the chi-squared statistic for the table, rather than a variance. A brief nontechnical account of correspondence analysis is given in Display 9.6; a full account the techniqueis available in Greenacre of (1984). As a first example of usingcorrespondence analysis, it will applied to the a be data shown in Table The resultsof the analysis are shownTable 9.9 and the 9.8. in resulting correspondence analysis diagram in Figure 9.1. The pattern isas would

Display 9.6 CorresuondenceAnalvsis


~~

Correspondenceanalysis attempts to display graphically the relationship between the two variables forming contingency tableby deriving a sets of coordinates a representing therow and columns categories of the table. The coordinates are derived from a procedure known as singulur vulue decomposition(see Everitt and Dunn, 2001, details) applied to the matrixE, for the elements of which are
eij

, nil =-

where the terms as defined in Display are 9.1. Application of the method leads two sets of coordinates, one set representing the to row categories and the other representing the column categories. In general, thefirst two coordinate valuesfor each category are the most important, because they can be used to plot the categories a scatterplot, as For a two-dimensional representation, the category coordinates can row be represented asuit, i = 1, . . r , and k = 1,2, and the column category coordinates ., a s v f ~ , j = I .. c , a n d k = 1 , 2 . , ., A cell for which therow and column coordinates both large and the samesign are of is one thathas a larger observed value than that expected under independence. cell A for which the row and column coordinates are both largebut of opposite signs is one in which the observed frequencylower than expected under independence. is Small of coordinate valuesfor the row and column categories a cell indicate that the observed and expected values are not too dissimilar. Correspondenceanalysis diagrams can be very helpful when contingency tables are dealt with, but some experience needed in interpreting the diagrams. is

278
TABLE 9.8 Cross-Classificationof Eye Color and Hair Color

CHAPTER 9

Hair Color
Eye Color

Fair

Red

Medium Black Dark

Light 116 4 Blue38 Medium Dark 48

688 326 343 98

188 110 1

84

68

584 241 26 909 403

412

81

TABLE 9.9 Derived Coordinatesfrom Correspondence Analysisfor Hair Color-Eye Color Data

Category

coonl 1

coonl2

Eye light (U) Eye blue @B) Eye medium @M) Eye @D) Hair fair ( 0 h Hair r d (tu) e Hair medium (hm) Hair dark (hd) Hair brown (hh)

0.44 0.40 -0.04 -0.70

0.09 0.17 -0.24 0.14

0.54 0.23 0.04 -0.59 -1.08

0.17

00 .5
-0.21 0.1 1 0.27

be expected with, example, fair hair being associated with blue and light eyes for and so on. of The second example a correspondence analysis will involve the data shown in Table 9.10, collected in a survey in which people the UK were asked about in which of a number of characteristics couldbe applied to people i the UK and n in other European countries. Here the two-dimensional correspondence diagram is shown in Figure 9.2. It appears that the respondents judged, for example, the French to be stylish and sexy, the Germans efficient and the British boring-well, they do say you can prove anything with statistics!

ANALYSIS OF CATEGORICAL DATA I


TABLE 9.10 What do People in the Think about Themselves and Their e r s in the UK h European Community?

279

Chamcteristic

French0 Spanish(S) Italian0

British@) Irish (r I) Dutch@) German(G)

37 7 30 9

29

1 5 4

14 7 4 48

12

14

21 8 19 4

2 1

10 6 16 2 12

19 9

27

10

20

27 30

15 3

10 7 7 12 3 2 9

8 3 12 2

10 0 2

13

6 9

8 7

11

13

6 3 5 26 5 2 4 41

23 13 16 11

12 10 29 25 22 1 28 1 8 38

2 6

1 1

27

4 8

-1.0

9.5

0.0
c 1

0.5

FIG. 9. I . Correspondence analysis diagram for the hair color-Eye color datain Table 9.8.

280

CHAPTER 9

-0.5

0.0

0.5

1.o

c1
FIG. 9.2. Correspondence analysis diagram Table 9. lo.

for the data

in

9.4.

SPARSE DATA

In the derivation of the distribution of the X 2 statistic, a continuous probability


distribution, namely the chi-squared distribution, is beiig used as an approximation to the true distribution of observed frequencies, that is, the multinomial distribution (see the glossary in Appendix A). The p value associated with the X* statistic is calculated from the chi-square distribution under the assumption that there is a sufficiently large sample size. Unfortunately,it is not easy to describe the sample size needed for the chi-square distribution to approximate the exact distribution of X 2 well. One rule of thumb suggested by Cochran (1954), which h s gained almost universal acceptance among psychologists (and others), a is that the minimum expected count all cells should beat least five. The probfor lem with this rule is that it can be extremely conservative, and,in fact, Cochran also gave a further rule of thumb that appears to have been largely ignored, namely that for tables larger than 2 x 2, a minimum expected count of one is permissible as long as no more than20% of the cellshave expected values below five. In the end no simple rule covers all cases, and it is difficult to identify, a priori, whether or not a given data setlikely to suffer from the usual asymptotic is

ANALYSIS OF CATEGORICAL DATA

28 1

TABLE 9.11 Firefighters' Entrance Exam Results

EfhnicCIWD ofEntmnt
Test Result White

Black Hispanic Asian

Total

Pass
Fail

No show
Total

0 0 5

2 2 5

0 3 5

0 1 4 5

9 2

20

TABLE 9.12 Reference Set for the Firefighter Example

inference. One solution that is now available is to compute exact p values by using a permutational approach the kind encountered in the previous chapter. of The main idea in evaluating exact p values is to evaluate the observed table, relative to a reference set other tablesof the same size that l i e it in every of are possible respect, except terms of reasonableness under the null hypothesis. The in if of approach will become clearerwe use the specific example Table 9.1 1, which summarizes the results a firefighters' entrance exam. The reference set of tables of for this exampleis shown in Table 9.12. The statistic for Table 9.1 1 is 11.56; X ' 5 is 12, that is, all but here the number of cells with expected value less than cells in the table! The exact p value is then obtained by identifying all tables in the reference set whoseX* values equal or exceed the observed statistic, and summing their probabilities, which under the null hypothesis of independence are found from the hypergeometric distribution formula (see Display 9.4). For example, Table 9.13(a) a member of the reference set and has a value 2 of is for X is its 14.67. Its exact probability .00108,and because X* value is more extreme than the observed value, it will contribute to the exact p value. Again, Table 9.13(b)

282
TABLE 9.13 WO Tables in the Reference Set for the Firefighter Data

CHAPTER 9

20

9 2 9

9 20 Note. Xz = 14.67, and the exact probability is .00108. for (a). X2 = 9.778 (not larger that the ohserved X2 value) and so it does not contribute to the exact p value, for (b).
TABLE 9.14 Reference Set fora Hypothetical6 x 6 Contingency 'Mle

XI I X13

XI2

x14 x16
x23
x33
X26 x24 X34 X44 X4 5

XI5 X25

x21
x3 I

x22

x32
X42

xu
14.5

W
X46 X%

7 7 12
4 34
4

x4 I

x43
x53

x5 I

x52

7 5

X55

is also a member of the reference set. Its X* value is 9.778, which is less than that of the observed table and so its probability does not contributeto the exact p value. The exact p value calculated in this way is .0398. The asymptotic p value associated with the observed X* value is .07265. Here the exact approach leads to a different conclusion, namely that the test result is not independent of race. The real problem calculating exactvalues for contingency tables compuin p is tational. For example, the number of tables reference set Table 9.14 is 1.6 in the for billion! Fortunately, methods and associated software areavailable that make now this approach a practical possibility (see StatXact, Cytel Software Corporation; pralay@cytel.com.).

ANALYSIS OF CATEGORICAL DATA

283

TABLE 9.15 Calculation of the Odds Ratio and Its Confidence Interval for the Schizophrenia and Gender Data
The estimateof the odds ratio is

$= (43 X 52)/(32

15) = 4.66

The odds in favor of being disgnosed schizophrenic among males is nearly five times the corresponding odds for females. The estimated variance the logarithmof the estimated odds ratio is of 1/4

+ 1/52 + 1/15 + 1/32 = 0.1404

An approximate confidencefor log @ is


log(4.66) f 1.96 X

m= (0.80.2.29).

Consequently, the required confidence interval for 1(1 is

[exp(0.80). exp(2.29)I = [2.24,9.86].

9.5. THE ODDS RATIO


Defining the independence of the two variables forming a contingency table is relatively straightforward, but measuring the degree of dependence is not so clear cut, and many measures have been proposed(Everitt, 1992). In a 2 x 2 table one possible measure might be thought tobe the difference between the estimates of the two probabilities of interest, as illustrated in Display 9.2. An alternative, and in many respects a more acceptable, measure odds ratio, which is explained is the in Display 9.7. This statistic is of considerable practical importance in the application of both log-linear modelsand logistic regression (see Chapter 10). The calculation of the odds ratio and standard error for the schizophrenia and genits der dataare outlined in Table 9.15. The odds of being diagnosed schizophrenic in males is between approximately2 and 10 times the odds in females.

9.6. MEASURING AGREEMENT FOR CATEGORICAL VARIABLES: THE KAPPA STATISTIC


It is often required to measure well two observers agree on the use of a cathow egorical scale. The most commonly used indexof such agreement is the kappa coeflcient, first suggested by Cohen (1960). The data shown in Table 9.16 will be used to illustrate the use of this index. These data are taken from a study 1953) in which two neurologists independently classified (Westland and Kurland,

284
Display 9.7 The Odds Ratio

CHAPTER 9

The general 2 x 2 contingency table met Display 9.2 can be summarized in terms in of the probabilities of observation beingin each of the four of the table. an cells
Variable 1 category 1 Category 2 Variable 2 Category 1
P11
P21

Category 2
P12

P22

The ratios p11/p12 p21/p= are know as odds. The first is the odds ofbeing in and category 1 of variable1 for thetwo categoriesof variable 2. The second is the corresponding odds for categoIy 2 of variable1. (Odds will be familiar those to on readers who like the occasional flutter the Derby-Epsom or Kentucky!) A possible measure for the degree of dependenceof thetwo variables forming a 2 x 2 contingency tableis the so-called odds ratio, given by

Note that(I-may take any value between zero and infinity, with a value of1 corresponding to independence (why?). The odds ratio, J, a number of rhas desirablepropecries for representing dependence among categorical variables that other competing measures have. These do not properties include the following. 1. remains unchanged if rows and columns interchanged. the are 2. If the levels of variable are changed (i.e.. listing category2 before category l), (Ia becomes l/*. 3. Multiplying either row by a constant or either column by a constantleaves J r unchanged. The odds ratio is estimated h m the four frequencies in an observed 2 x 2 contingency table(see Display 9.2) as

Confidence intervalsfor J can bedetermined relatively simply by using the r (I following estimatorof the variance of log : vwlog J ) l/a r=

+ l / b + l / c+ l/d.
d w .

An approximate 95%confidence intevalfor log (-is given by I

log 4 i 1.96 x

If the lmt of the confidence interval log(I-obtained in this way are $L, (-, iis for I"
then the corresponding confidence interval J is simply exp(JrL),exp(Jru). for r

ANALYSIS OF CATEGORICAL DATA 1


Diagnosis of Multiple Sclerosis by ' b o Neurologists

285

TABLE 9.16

D
1 0 6 10

Total

B
C

11

37 Total

38 33 10 3 84

S 11 14

0 3 S 3

44 47 3s 23 149

Soutre. Westland and Kurland (1953).

149 patients into four classes:(1) A, certainly suffering from multiple sclerosis; (2) B, probably suffering from multiple sclerosis; C, possibly suffering from (3) multiple sclerosis; (4) D,doubtful, unlikely, and definitely not suffering from and multiple sclerosis. One intuitively reasonable index of agreement for the two raters is the proportion PO of patients that they classify into the same category; for the data in Table 9.16,

P = (38 + 11 + 5 + 10)/149 = 0.429. o (9.5)

Such a measure has the virtuesimplicity and itis readily understood. Howof ever, despite such advantages (and intuition), PO is not an adequate index of the agreement between the two raters. The problem is that PO makes no allowance be for agreement between raters that mightattributed to chance. To explain, consider the two sets data i Table 9.17. In both, the two observers measured of n are as achieving 66% agreement if PO is calculated. Suppose, however, that each three observer is simply allocating subjects at random to the categories in accormarginal rates for the three categories. For example, observer A dance with their in the first data set would simply allocate 10% of subjects to category1 80% to , 3, totally category 2, and the remaining 10% to category disregarding the suitability B of a categoryfor a subject. Observer proceeds likewise. two observers would lead to Even such a cavalier rating procedure used by the some agreement and a corresponding nonzeroofvalue chance agreement, PO. This P,,can be calculated simply from the marginalofrates observer. For example, each for the first data set Table 9.17, P is calculated as follows. in ,

1. Category 1: the numberof chance agreements to expected is be


100 x 10/100 x 10/100 = 1.

286

CHAPTER 9
TABLE 9.11 WOHypotheticalData Secs, Eachof which Shows 6 % 6 Agreement Between the WOObservers
Observer A

Observer B

Total

Data Set 1
1 2

l 1
10

Total
Data Set 2
3 Total

8 1

2 8 64 8

80

8 1 10
3

3 1

10 80 10 100

1 2

24 5 1 30

13 20 7 40

5
22 30

40 30
30 100

(Remember how expected values are calculated in contingency tables? See Display 9 1. . ) 2. Category 2 the number of chance agreements be expected is to

100 X 80/100 X 80/100 = 64.


3 Category 3 the number of chance agreements be expected is . : to

(9.7)

100 x 10/100 x 10/100 = 1. Consequently, P is given by ,


pc

(9.8)

=[ l 100

+ 64+ l]= 0.66.

Therefore, in this particular table, d l the observed agreement might simply be due to chance. However, repeating the calculation on the second set of data in Table 9 1 gives P = 0.33, which is considerably lower than the observed .7 , agreement. A number of authors have expressed opinions on thetoneed incorporate chance agreement into the assessment of interobserver reliability. The clearest statement (1975),who suggested an in favor of such a correction has been made by Fleiss index that is the ratio difference between observed and chance agreement of the the maximum possible excess observed over chance agreement, that 1 -, of is, P .

ANALYSIS OF CATEGORICAL DATA

287

This leads to what has become known the kappa statistic: as


K
=: I

(Po

- -p,). Pc)/(l

(9.10)

If there is complete agreement between the two that all the off-diagonal so raters cells of the table are empty, K = 1. If observed agreementis greater than chance, K z 0. If the observed agreement equal to chance, K = 0. Finally, in the unlikely is K > event of the observed agreement being less than chance, 0, with its minimum value depending on the marginal distributions of two raters. the The chance agreement for the multiple sclerosis is given by data

P,

= -(44/149

+
K

1 X 84 471149 X 37 149 231149 X 17) = 0.2797.

+351149 X 1 1
(9.1 1) (9.12)

Consequently, for the multiple sclerosis data,

= (0.429 0.280)/( 1 0.280) = 0.208.

This calculated value K is an estimate the corresponding population value of of and, like all such estimates, be accompanied some measure its variance to has by of so that a confidence interval can be constructed. The variance of an observed value of K has been derived under a number of different assumptions by several authors, including Everitt (1968) and Fleiss, Cohen, and Everitt (1969). The formula for the large-sample variance of K is rather unpleasant, but for those with a strong stomach it is reproduced in Display 9.8. Its value for the multiple sclerosis data is 0.002485, which leads to an approximate 95% confidence interval of (0.108, 0.308). Thus there is some evidence that the agreement between the two raters in this example is greater than chance; otherwise the confidence interval would zero. have included the value However, whatconstitutes "good" agreement? Some arbitrary benchmarksfor the evaluationof observed K values have been given by

Large Sample Variance of Kappa

Display 9.8

where pi] is the proportionof observations in the ijth cell of the table of counts of agreements and disagreements for the two observers, pi. and p,, are the row and column mminal proportions, and r is the number of rows and columns in the table.

288 Landis and Koch(1977). They are as follows. kappa Agreement Strength of 0.00 Poor 0.01-0.20 slight 0.21-0.40 Fair 0.41-0.60 Moderate 0.61-0.80 Substantial 0.81-1.00 Perfect

CHAPTER 9

Of course, any series of standards such as these are necessarily subjective. Nevof ertheless, they may be helpful in the informal evaluation a series of K values, although replacing numerical values with rather poorly defined English phrases may not be to everybodys taste. In fact, there is no simple answer to the original question concerning what constitutes good agreement. Suppose, for example, that two examiners rating examination candidates as pass or fail had K = 0.54 (in the moderate range according to Landis and Koch). Would the people taking the examination be satisfied by this value? This is unlikely, particularly if future candidates are going be assessedby one of the examiners but not both. to If this were the case, sources of disagreement should be searched for and rectified; only then might one have sufficient confidencein the assessment of a lone examiner. The concept of a chance corrected measure of agreement can be extended to situations involving more than two observers; for details, see Fleiss and Cuzick (1979) and Schouten (1985). A weighted version ofis also possible, with weights K in reflecting differences the seriousness of disagreements. For example, in the m as tiple sclerosis data a disagreement involving one rater classifying a patientA and the other rater classifying the same patient as D would be very serious and An K would be given a high weight. example of the calculation of weighted and DUM some comments about choice of weights are given by (1989).

9 7 SUMMARY ..
1. Categorical data occur frequently psychological studies. in 2. The chi-square statistic can used to assess the independence or otherwise be of two categorical variables. 3. When a significant chi-square value for a contingency table has been obbe examined tained, the reasons for the departure from independence have to in more detailby the use of adjusted residuals, correspondence analysis,or both. 4. Sparse data can be a problem when the 2 x 2 statistic is used. This is a p values. problem that can be overcome by computing exact

ANALYSIS OF CATEGORICAL DATA

289

5. The odds ratio an extremely useful measure association. is of 6. The kappa statistic can be used to quantify the agreement between two observers applying a categorical scale.

COMPUTER HINTS SPSS


The most common test used on categorical data, the chi-square for indepentest dence in a two-dimensional contingency table, can be applied as follows.
1. Click on Statistics, click on Summarize, and then click on Crosstabs. 2. Specify the row variable in the Rows box and the column variable in the Columns box. 3. Now click on Statisticsand select Chi-square. 4. To get observed and expected values, click on Continue, click on Cells, and then click on Expected and Total and ensure that Observed is also checked. 5. Click on Continue and then OK.

S-PLUS
Various tests described in the text can be accessed from the Statistics menu as follows.
1. Click on Statistics, click on Compare Samples, and click on Counts and Proportions. Then, (a) Click onFishers exact test for the Fisher test dialog box, ( )Click on McNemarstest for the McNemar test dialog box, b (c) Click onChi-square testfor the chi-square test independence. of

These tests can alsobe used in the command line approach, with the relevant functions being chiqtest, fisber.test, and mcnemar.test. For example, for a contingency table in which the frequencies were stored in a matrix X, the chisquare testof independence could be applied with the command chisq.test(X).

If X is a 2 x 2 table, then Yatess correction would applied by default. To be stop the correction being applied would require

chisq.test(X,correct=F).

290

CHAPTER 9

EXERCISES
9. Table 9.18 shows a cross-classification of gender and in the afterlife . 1 belief for a sampleof Americans. Test whether the two classifications are independent, and construct a confidence intervalthe difference the proportion men and for in of proportion of women who believein the afterlife.

92 Table 9.19 shows data presented in the case of US versusLansdowne .. Swimming Club. Analyzed in the usual way by the Pearson chi-squared statistic,

the results are statistically significant. The government, however, lost the case, because of applying the statistic when the expected count in two cells was less than five-the defendant argued that the software used to analyze the data print exact out a warning that the chi-square test might not be valid. Use an to settletest the question.

93. Lundberg (1940) presented the results of an experiment to answer the question, What is the degreeof agreement in commonsense judgmentsof socioof economic statusby two persons who are themselves radically different status?
TABLE 9.18 Gender and Belief in the Afterlife

Belief in the Afierlifc

Gender

YCS

No
147 134

Female Male

435 375

TABLE 9.19 Application for Membershipt the LansdowneSwimming C u a lb


Parameter

Black Applicants

WhiteApplicants

membership Accepted for Rejected for membership Total applicants

1 5 6

379 0 379

ANALYSlS OF CATEGORlCAL DATA I


J n t r s a d Bankers Ratingsof the SocioeconomicStatus of 196 Familieson a aio n

29 1

TABLE 9.20

Six-Point Scale

Janitors Ratings Ratings Bankers


l

0 0 0 0 0 3

0 0 1

4 4

0 0 0 25 11

6 21 48 4 3

0 8 27 13 1 1

0 8 0

0 0

TABLE 9.21 Berkeley College Applications

Females Admitted Department

Males Refucd Admitted


Refused

A 207 B

c
E F

138

512 353 120 138 53 22

313

205 219
351

l7 202 131 94
2 4

89

19 8 391 244 299 317

One setof data collected shown in Table 9.20. Investigate the level agreement is of between the two raters using the kappa coefficient. by 9.4. Consider the following table of agreement for two raters rating a binary response. Yes Yes No 15 5

Total
20
40 60

No

Total

20

35 40

292

CHAPTER 9
Mothers Assessments of Their Childrens School Perfomawe, Years 1 and2

TABLE 9.22

Doing Well, Year2 Doing W l , e l Year 1

No

Yes
31

Total

No YS e
Total

49 17

52
83

66

80 69 149

Show that the kappa statisticfor these data is identical to both their intraclass correlation coefficient and their product moment correlation coefficient.
9.5. Find the standardized residuals and the adjusted residuals incidence for the of cerebral tumors in Table 9.2. Comment on your results in the light of the nonsignificant chi-square statistic these data. for
9.6. The datain Table 9.21 classify applicantsto the Universityof California at Berkeley by their gender, the department applied for, and whether or not they were successfulin their application.

(a) Ignoring which departmentapplied for, find the odds for the resulting is ratio 2 x 2 table, gender versus application result, and also calculate its 95% confidence interval. ( )Calculate the chi-square test for independence of application result and b department applied for, separately men and women. for ( consistent and, if not, why not? b (c) Are the results from (a) and)

9.7 Table 9.22 shows the results of mothers rating their children in two consecutive years, as to whether or not they were doing well at school. Carry out the appropriate test whether there has been a change in the mothers ratin of over the2 years.

l 0

Analysis of Categorical Data 1: Log-Linear Models 1 and Logistic Regression

1 0 . 1 . INTRODUCTION
The two-dimensional contingency tables, which were the subject of the previous chapter, will have been familiar to most readers. Cross-classifications of more than two categorical variables, however, will not usually have been encountered are the in an introductory statistics course. It is such tables and their analysis that tables, one subject of this chapter. W Oexamples ofmultidimensional contingency resulting from cross-classifying three categorical variables and one from crossclassifying five such variables, appear in Tables and 10.2. Both tables will be 10.1 examined in more detail later in this chapter. To begin our account of how to deal with this type of data, we shall look at three-dimensional tables.

10.2. THREE-DIMENSIONAL CONTINGENCY TABLES


The analysis of three-dimensional contingency tables poses entirely new conceptual problems compared with the analysis of two-dimensional tables. However, the extension from tables of dimensions to those of four or more, although three
293

294

CHAPTER 10 TABLE 10.1


Cross-Classificationof Method of Suicide by Age and Sex

Method A@ (Years)
StX

455 121 10-40 398 99 Male 41-70 6 93 0 r7 95 15 10-40 259 450 Female 41-70 r 7 0 154

Male Male Female Female 38 75 316 185 14 33

55 26
40

124 82 38 60
10

71

4, W

Note. Methodsare 1. solid or liquid m te 2. gas; 3, hanging, suffocating,&-g; at c or S , knives. or explosives;5, jumping; 6. other.

TABLE 10.2
Danish Do-It-Yourself

Accommodnlion Dpe Apartment


work

House

Tenurn

Answer

Age <30

31-45

46+

~ 3 0 3145

46+

Yes Rent Skilled


No

15 13 3

Yes Own
No

18 15 4 5
1

34 28 35

Unskilled Yes Rent


No

Yes Own
No

Yes RentOffice

21

23 61

17 34 2 3 30
2 5

1 10 29 17 0

6 9 56 1 211 15 19 3 0

10

2 6
8

56 12
44 16 23

9 51
2 2

own

No

Yes No

8 4 76

19 40 25 12 5 102 191 54 1 2 19 2

3 13 52 31 13 16

7 49
11

Note. These.dataarise from asking employed people whether, in the preceding year, they a craftsman had carriedout work on their home that they would previously have employed to do; answers, yes no. an cross-classified against the other variables, which age. or four an accommodationtype,tenurt, andtype of work of respondent.

ANALYSIS OF CATEGORICAL DATA I1


TABLE 103 Racial Equalityand the M Penalty t h

29s

Parameter

Death Not Penally Penally Death

White Wctim White defendant foundguilty of murder Black defendant foundguilty of murder Black Wctim White defendant found guilty of murder Black defendant foundguilty of murder

190 110

1320

520
90 970

0 60
190 170

2 x 2 TdlefromAggregating Data over Race of Wctim


White defendant found guilty of murder Black defendant foundguilty of murder

1410 1490

often increasing the complexityof both analysis and interpretation, presents no further new problems. Consequently, this section is concerned with only threedimensional tables and will form the necessary for the discussionof models basis for multiway tables to undertaken in the next section. be The first question that might be asked about a three-dimensional contingency table is, why not simply attempt its analysis by examining the two-dimensional tables resulting from summing the observed counts over one of the variables? The example shownin Table 10.3 illustrates why such a procedure not tobe recomis being data. For mended it can often lead to erroneous conclusions drawn about the example, analyzing the data aggregated over the racethe victim classification of gives a chi-squared statistic 2.21 with a single degree freedom andan associof of is of ated p value of. 14, implying that there racial equalityin the application the death penalty. However,the separate analysesthe data White victims and for of for Black victims lead to chi-squared values of 8.77, p = .W3 and 5.54, p = .019, respectively. Claims of racial equality the applicationof the death penalty now in look a little more difficult sustain. to A further example (see Table shows that the reverse can also happen; the 10.4) data aggregated over a variable can show a relationship between the remaining variables when in fact no such relationship really exists. Here the aggregated for datathechi-squarestatisticis5.26withasingledegreeoffreedomandanassociated p value of less than .05,suggesting that infant survival associated with amount is , is of care received. For Clinic A however, the chi-squared statisticapproximately which the conclusionis that infant survivalis not zero as it is for Clinic B, from related toamount of care received.

296
Survival of Infants and Amount of Renatal Care.

CHAPTER 10 TABLE 10.4

Died Place Where Care Received


Less Prenatal

Cam

More Prenatal Care

Less Prenatal

Care

More Prenatal Care

clinicA
Clinic B

3 17

4 2

176 197

293 23

2 x 2 Tablefrom Aggregating Dataover Clinics Infant's Survival Prenatal Amount of

Less More Total

20 6 26

313 316 689

393 322 715

The reason these different conclusions will become apparent later. Howev for these examples should make it clear consideration of two-dimensional tables why resulting from collapsing a three-dimensional table not a sufficient procedure is for analyzing the latter. Only a single hypothesis, namely that independence the two variables of the of involved, is of interest in a two-dimensional table. However, the situation is more complex for a three-dimensional table, and several hypotheses about the three variables may have to be assessed. For example, investigator may wishto test an that some variables are independent some others, or that a particular variable of is independent of the remainder. More specifically, the following hypotheses may be of interest in a three-dimensional table.
1. There is mutual independence of the three variables; that is, none of the variables are related. 2. There is partial independence; that is, an association exists between two of the variables, both which are independentof the third. of 3. There is conditional independence; that is, two of the variables are independent in each level of the third, but each may be associated with the third variable (this is the situation thatholds in the case of the clinic data discussed above).

ANALYSIS OF CATEGORICAL DATA

II

297

In addition, the variables a three-way contingency table may display a more in complex form association, namely whatknown asasecond-orderrelationship; of is this occurs when the degree direction of the dependence each pair variables or of of is different in some or all levels of the remaining variable. (This is analogous to with the three-way interaction in a factorial design three factors for a continuous 4 response variable; see Chapterfor anexample.) As will be demonstrated later, each hypothesis is tested in a fashion exactly analogous to that used when independence is tested for in a two-dimensional table, namely by comparing the estimated expected frequencies corresponding to the particular hypothesis, with the observed frequencies, by means usual of the or chi-squared statistic, X*, an alternative knownas the likelihoodratio statistic, given by

Xi = 2

observed x In (observed/expected),

(10.1)

where observed refersto the observed frequenciesin the tableand expected refers to the estimated expected values corresponding to a particular hypothesis (see later). In many cases X 2 and Xi will have similar values, but there are a 1976) that make it particularly suitnumber of advantages to the latter (Williams, able in the analysis of more complex contingency tables, as will be illustrated later. Under the hypothesis independence, the estimated expected frequencies of in a two-dimensional table are found from simple calculations involving the marginal totals of frequencies as described in the previous chapter.In some cases the required expected frequencies corresponding to a particular hypothesis about the variables in a three-dimensional table can also be found from straightforward calculations on certain marginal totals (this will be illustrated in Displays 10.1 and 10.2). Unfortunately, estimated expected values in multiway tables cannot always be foundso simply. For example, in a three-dimensional table, estimated expected values for the hypothesis of no second-order relationship between the three variables involves the application of a relatively complex iterative pmcedure. The details areoutside the scope this text, but they can be found of in Everitt (1992). In general, of course, investigators analyzing multiway contingency tables will obtain the required expected values and associated test statistics from a suitable piece statistical software, so will not need to too concerned with the of and be details of the arithmetic. The first hypothesis shall consider a three-dimensional contingency table we for is that the three variables forming the tableare mutually independent. Details of how this hypothesis is formulated and tested are given in Display 10.1, and in Table 10.5 the calculationof estimated expected values, the chi-squared statistic, and the likelihood ratio statistic for the suicide data in Table are shown. The 10.1 number of degrees of freedom corresponding each test statistic27 (see Everitt, to is 1992, for an explanation how to determine the number of the degrees of freedo of

298

C W E R 1 0

Display 10.1 Testing the Mutual Independence Hypothesisa Three-Dimensional Contingency Table in Using an obvious extensionof the nomenclature introduced in Chapter for a 9 two-dimensional table, we formulate the hypothesisof mutual independenceas can where pijt represents the probability an observation beingin the ijkth cell of the of table and pi.., p,]..and P . . k are the marginal probabilitiesof belonging to the i,j and kth categories of the three variables. The estimated expected values underthis hypothesis when there asample ofn is observations cross-classified are
Eijk

marginal probabilities are

The mtlutrve (Ad fortunately also themaximum likelihood) estimators of the

whe? ?,! , B . . k are estimates of the corresponding probabilities. !, . and F

= n?i..$.j.?,.k*

where ?ti,., n,j.,and n.,k are single-variable marginal totalsfor each variable obtained by summing the observed frequencies over the other two variables. Display 10.2 Testing the Partial Independence Hypothesis in a We-Dimensional Contingency Table
With the use of the same nomenclature as before, the hypothesisof partial independence canbe written in two equivalentforms as

HO: = Pi..p..k and P . j k Pik The estimators of the probabilities involvedare


?..k

= P.j.P..k.

where nij. represents the two-variable marginal totals obtained by summing the third observed frequencies over the variable. Using these probability estimates leads to the following estimated expected values under this hypothesis:

= n..r/n,

h].

= nij./n3

Corresponding to a particular hypothesis). Clearly, the three variables usedto form Table 10.1 are not mutually independent. parrial independence hypothesis about the three Now considera more complex variables in a three-dimensional contingency table, namely that variable 1 (say) is independent of variable 2, and that variables2 and 3 are also unrelated. However, an association between variable 1 and 3 is allowed. Display 10.2 gives details of

ANALYSIS OF CATEGORICAL DATA I1


TABLE 10.5 Testing Mutual Independence the Suicide Data for
For thesuicidedata,theestimatedexpectedvalue, independenceis obtained as

299

Elll, underthehypothesis

of mutual

E111= 5305 X 3375/5305 X 1769/5305 X 1735/5305= 371.89.


Other estimated expected values can found in asimilar fashion, and the full set values for be of the suicide data under mutual independence as follows. are Age 1-0 04 41-70 >70 1040 41-70 r70 Sex Male Male Male Female Female Female

1 371.9 556.9 186.5


212.7 318.5 106.6

2 51.3 730.0 76.9 244.4 25.7 29.4

Method 3 4 5 487.5 85.5 69.6 59.6 128.0 104.2 89.3 42.9 29.9 278.8 417.5 139.8 48.9 73.2 24.5 34.1
51.0

6 34.9 39.8
59.6

4. 40
14.7

17.2

20.0

The values of the two possible test statisticsare

X2 = 747.37, X! = 790.30.
The mutual independence hypothesis has27 degrees of freedom. Note that the single-variable marginal totals estimated expected values under the hypothesis of the of mutual independenceare equal to the corresponding marginal of the observed values, totals for example,

El,.= 371.9

nl..

= 398 + 399 +93 +259 +450 + 154 = 1753,

+556.9 + 186.5 +212.7 + 318.5 + 106.6 = 1753.

how this hypothesis is formulated and the calculationof the estimated expected values. In Table 10.6 the partial independence hypothesis tested for the suicide is this is for data. Clearly, even more complicated hypothesis not adequate these data. A comparisonof the observed values with the estimates values to be expected of the in all under this partial independence hypothesis shows that women age groups are or underrepresented in the useguns, knives, explosives (explosives!) to perform of the tragic task. (A more detailed account of how best to compare observed and expected valuesis given later.) Finally, consider the hypothesisof no second-order relationship between the This allows each pairof variables to be associated, variables in the suicide data. but it constrains the degree and direction of the association to be the same in each level of the third variable. Details of testing the hypothesis for the suicide data are given in Table 10.7. Note that, in this case, estimated expected values cannot be found directly from any set of marginal totals. They are found from an the iterative procedure referredto earlier. (The technical reason for requiring iterative process is that, in this case, the maximum likelihood equations from

300

CHAPTER 10
TABLE 10.6 Testing the Partial Independence Hypothesis the Suicide Data for

Here, we wish to test whether methodof suicide is independent of sex. and that age and a m sex also unrelated. However, we wishto allow an association between age and method. The estimated expected value E I I under this hypothesisis found as l
Ell!

= (3375 X 657)/(5305) = 417.98.


set such values of

* Other estimated expected values canfound in a similar fashion, and the be full
under the partial independence hypothesis as follows. is Age 10-40 41-70 s70 10-40 41-70 >70 Sex Male Male Male Female Female Female
l

418.0 540.1 157.1 239.0 308.9 89.9

2 3 349.9 86.5 793.3 60.4 318.7 7.0 49.5 34.6 4.0

Methods 4 107.5 123.4 25.5


61.5 70.6 14.6

5 60.44 77.6 40.7


34.6 44.4 23.3

6 103.1 90.3 15.3 59.0 51.7 8.7

200.1 453.7 182.3

Notethatinthiscasethehemarginaltotals.n,l.andE~~..areequal;forex~pmple.n~~. = 398+259 =
657 and ,511. = 418.0 239.0 = 657. The values of the two test statistics are

= 485.3,

X i = 520.4.

These statisticshave 17 degrees of M o m under the partial independence hypothesis.

which estimates arise haveno explicit solution. The equations have to be solved iteratively.) Both test statistics are nonsignificant, demonstrating this particular that hypothesis is acceptable for the suicide data. Further comments on this result will be given in the next section.

10.3. MODELS FOR CONTINGENCY TABLES


Statisticians are very fond of models! In the previous chapters the majority of analyses have been based on theassumption of a suitable model for the data of interest. The analysis of categorical data arranged in the form of a multiway frequency table may also be based on a particular type of model, not dissimilar to those used in the analysis of variance. As will be seen, each particular model corresponds to a specific hypothesis about the variables forming the table, but the advantages to be gained from a model-fitting procedure are that it provides a systematic approach to the analysis of complex multidimensional tables and, in addition, gives estimates of the magnitude of effects of interest. The models

ANALYSIS OF CATEGORICAL DATA


TABLE 10.7

I1

30 1

Testing the Hypothesis of No Second-Order Relationship Between the Variables in the Suicide Data

The hypothesisof interest is that the association between any two of the variables does differ not in either degree or direction in each levelof the remaining variable. More specifically,this means that the odds ratios corresponding to the 2 x 2 tables that arise fmm the cross-classificationof pairs of categories of two of the variables are the same in all levels of the mmaining variable. @etails are given in Everitt 1992.) Estmates of expected values under this hypothesis cannot be found from simple calculations on marginal totals of observed frequencies as in Displays 10.1 and 10.2. Instead, the required estimates have to be obtained iteratively byusing a p d u r e described in Everitt(1992). The estimated expectedvalues derived form this iterative procedure as follows. are Age 1040 41-70 r70
Sex Male Male Male

1 410.9 379.4 99.7 246.1 496.6 147.3

2 122.7 77.6 8.7 13.3 17.4 2.3

Method 3 4 5 439.2 156.4 56.8 819.9 166.384.751.1 308.9 33.4 24.1 110.8 427.1 192.1 12.6 27.7 6.6 38.2 70.9 39.9

6 122.0 13.3

1040 41-70 >70

Female Female Female

4. 00
57.3 10.7

The two test statistics take the values

X2 = 15.40, X i = 14.90.
The degrees of freedom for this hypothesis are 10.

used for contingency tables can be introduced most simply a little clumsily) (if in terms of a two-dimensional table; detailsare given in Display10.3. The model introduced thereis analogous to the model used a two-way analysis of variance in (see Chapter4) but it differsin a numberof aspects.
1. The data now consist of counts, rather than a score for each subject on some dependent variable. 2. The model does not distinguish between independent and dependent variables. A l l variables are treated alike as response variables whose mutual associations are to be explored. 3 Whereas a linear combination parameters is used in the analysis vari. of of ance and regression models of previous chapters, in multiway tables the natural model is multiplicative and hence logarithms are used to obtain a model in which parameters combined additively. are 4. In previous chapters the underlying distribution assumedfor the data was the normal, with frequency data the appropriate distributionis binomial or multinomial (see the glossary Appendix A). in

302

CHAPTER

1 0

Display 10.3 Log-Linear Model for a no-Dimensional Continnencv Table withRows andc Columns r Again the general model considered previous chapters, that is, in observed response= expected response error, is thestarting point. Here the observed responseis the observed count, nit in a cell of the table, and the expected responseis thefrequency to be expected under a particularhypothesis, Fi/. Hence

Unlike the corresponding terms models discussedin previous chapters,the error in are terms here will notbe normally distributed. Appropriate distributions the binomial and multinomial (see glossary in Appendix A). Under the independence hypothesis, the population frequencies, are given by Fit.
fil

= npi.p.j,

which Can be rewritten, using an obvious notation, as dot

When logarithm are taken, the following linear model for theexpected frequencies is arrived at:

hF;t=:hF;..+hF.,-hn.
By some simple algebra (it really is simple, butsee Everitt, 1992, for details), the model can be rewritten in theform

The form the model now very similarto those usedin the analysis of variance of is (see Chapters 3 and 4). Consequently, ANOVA terms are used for theparameters, and U is said to represent an overall mean effect, is themain effect of category i UI(I) of the row variable. uw) themain effect ofthe jth category ofthe columns and is variable. The main effect parametersare defined as deviations of row columns means of log or frequencies from the overall mean. Therefore, again using obvious dot notation, an

The values takenby the main effects parameters in this model simply reflect differences betweenthe row or columns marginaltotals and so are of little concern

ANALYSIS OF CATEGORICAL DATA I1

303

in the context the analysisof contingency tables. They could estimated by of be replacing the in the formulas above with the estimated expected values, Eij. The log-linear model be fitted by estimating the parameters, and hence the can expected frequencies, and comparing these withobserved values, using either the the chi-squared or likelihood ratio test statistics. This would be exactly equivalent to the usual procedure for testing independencein a two-dimensional contingency table as described in the previous chapter. If the independencemodel fails to give a satisfactory fit to a two-dimensional table, extra terms mustbe added to the model to represent the association between the two variables. This leads a M e r model, to

In Fij =

+ +
Ul(i)

U2(j)

UIz(ij)

where the parameters 1 2 ( ; j ) model the association between the two variables. u This is known as the saturated modelfor a two-dimensional contingency table because the number parameters in the model equal to the numberof of is independent cells in the table (see Everitt, 1992, for details). Estimated expected values under this model wouldsimply be the observed frequencies themselves, and the model provides perfect fit to theobserved data, but, of course, no simplification a in descriptionof the data. The interaction parameters l 2 ( , j ) are related toodds ratios (see Exercise10.7). u Now consider how the log-linear model Display 10.3 has to be extended to in be suitable for a three-way table. The saturated modelwill now have to contain

main effect parameters for each variable, parameters to represent the possible
associations between each pair of variables, and finally parameters to represent the possible second-order relationship between three variables. The model is the
Ejk

+ + + +
Ulg) U13(ik)

UZU)

UU(jk)

+ + +

u3~) u12(ij)
ul23(ijk)-

(10.2)

The parametersin this model are as follows.

1. U is the overall mean effect. 2. u l ( i ) is the main effect of variable 1. 3. u 2 ( j ) is the main effect of variable 2. 4. u3(k)is themain effect of variable 3.
6 . ~ 1 3 ( i t )is the

5. ulz(ij) is the interaction between variables and 2. 1

interaction between variables1 and 3. 7 , ~ 2 3 ( j is) the interaction between variables and 3. ~ 2 8. ulu(ijk)is the second-order relationship between the three variables. find the unsaturated The purpose of modeling a three-way table would be to model with fewest parameters that adequately predicts the observed frequencies. As a way to assess whether some simpler model would fit a given table, particular parameters in the saturated model are set to zero and the reduced model is

304.

CHAPTER 1 0

assessed for fit. However, it is important to note that, in general, attention must restricted to what are known as hierarchical models. These are such that whenever a higher-order effect is included in a model, the lower-order effects comif terms posed from variables in the higher effect are also included. For example, uls are included, so also must t e r n 1412, ~ 1 3~, 1 , 2 4 2 ,and u 3 . Therefore, models such as
Kjk

= U +% ( j )

U3(k)

U 123( ijk)

(10.3)

are not permissible.(This restriction to hierarchical models arises from the constraintsimposed by the maximum likelihood estimation procedures used in fitting log-linear models, details of whichare too technical to be included in this text. In practice, the restriction is of little consequence because most tables can be described by a series of hierarchical models.) Each model that can be derived from the saturated model for a three-dimensional table is equivalent a particular hypothesis about the variables forming the table; to the equivalence is illustrated in Display 10.4. Particular points to note about the material inthis display are as follows.

1. The first three models are of no consequence in the analysis of a three4 dimensional table. Model is known as the minimal model for such a table. 2. The fined marginuls or bracket notation is frequently used to specify the series of models fitted when a multidimensional contingency tableis examined. The notation reflects the fact noted earlier that, when testing particular hypoth about a multiway table (or fitting particular models), certain marginal totals of the estimated expected values are consmined to be equal to the corresponding (This marginals of the observed values. arises because of the form of the maximu likelihood equations.) The terms to specify the model with the bracket notatio used are the marginals fixed by the model.

To illustrate the use log-linear models in practice, a series of such models of be fitted to the suicide data given in 10.1.Details are given in Table The Table 10.8. aim of the procedure is to arrive at a model that gives an adequate fit to t data and, as shown in Table 10.8,differences in the likelihood ratio statistic for different models areused to assess whether models of increasing complexity (la number of parameters) are needed. The results given in Table 10.8 demonstrate that only model 7 provides an This adequate fit for the suicide data. model states that the associationbetween age and method of suicide is the same for males and females, and that the association between sex and method is the same for all age groups. The parameter estimates (and the ratio of the estimates standard errors) for the fitted are given to their model main effects parameters not of great interest; their are in Table10.9.The estimated values simply reflect differences between the marginal totals of the categories of = 1.55,arises only each variable. For example, the largest effect for method,

ANALYSIS OF CATEGORICAL DATA

I1

305

Display 10.4 Hierarchical Modelsfor aGeneral Three-DimensionalContingency Table

A series of possible log-linear models a three-way contingency table as follows. for is


Bracket Log-LinearModel 1. l n F j j & = U 2. In f i j k = U ul(i) 3. In E j t = U U l ( i ) 4. In E j i t = U Ul(i)
5.

ation

6. In f i j t 7. In f i j k 8. In kjlt

In E j t = 11 +

= U + ul(t)+u u )+ = U + ul(i) +U Z ( j ) + =U +

+ + + + + +
UZ(j)
U2(j)

U1
U3(&) ~ 3 ( t ) ulzij) u3(k) U 1 x i j ) U3(t) U l Z ( i j )

+u 2 ( / ) +
USG) U123(ijt)

+
+ +

+u23Uk)

U23(jl)

+ + +
u3(it)

+ +
+

~13(it) U W t ) UIW)

[11.[21 [11*[21,[31 [121,[31 [l219 [l31 [121, [131,[231 [l231

UIZ(t,)

The hypotheses corresponding to the seven models are follows. first as 1. A l frequencies are the same. l 2. Marginal totals for variable 2 and variable3 are equal. 3. Marginal totals for variable 3 are equal. (Because these first three models do not allow the observed frequencies to reflect observed differences in the marginal totals of each variable, they are of no real interest in the analysisof threedimensional contingency tables.) 4. The variables are mutually independent. 5. Variables 1 and 2 are associated and both are independent of variable 3. 6. Variables 2 and 3 are conditionally independent given variable 1. 7. There is no second-order relationship between the variables. three 8. Model 8 is the saturated modelfor a three-dimensional table.

because more people use hanging, suffocating, or drowning as a method of suicide


than the other five possibilities. The estimated interaction parameters more are of those for age and method and and method. example, sex for For interest, particularly the latter reflect that males use solid and jump women use them more less and than if sex was independent of method. The reversetrue for gas and gun. is As always when models are fitted to observed data, it essential to examine is the fit in more detail than is provided by a single goodness-of-fit statistic such as the likelihood ratio criterion. With log-linear models, differences between the and estimated expected frequencies form the basis this more (E) for observed (0) as detailed examination, generally using the standardized residual calculated

standardized residual = (0 E)/&. -

(10.4)

The residuals for the final model selected for the suicide data are given in Table 10.10. All of the residuals are small, suggesting that the chosen model does (A give an adequate representationof the observed frequencies. far fuller account of log-linear models is given in Agresti, 1996.)

306
TABLE 10.8
Log-Liear Models for Suicide Data

CHAPTER 10

The goodness of of a series loglinearmodels is given below (variableis metbcd, variable fit of 1 2 is age, and variable is sex). 3
Model

DF 17 22
25

[11.[21.[31 [W. [31 I 1131. P424.6 [=l. Dl W]. [l31 [W. [231 [2312 . [131. [ 1 1

Xi 790.3 520.4 658.9 154.7 389.0 14.9

P <.m1

12
15
10

<0 1 .0 <.m1 <.m1 C.001 <.m1

.4 l

The difference in the likelihood ratio statistic for two possible models can be to choose used between them. For example, the mutual independence model, [l], 121, [3], and the model that 1 allows an association between variables and 2,[121, [3], have X i = 790.3 with 27 degrees of of freedom andX i = 520.4 with 17 degrees freedom, respectively. The bypothesis that the are 0 u12(il) = 0 for all i and j extra parameters in the more complex model zero,that is. H : (U 12 = 0 for short), is tested by the difference in the two likelihood statistics, with degrees ratio of freedom equal to the difference in the degrees of freedom of the two models. Herethis leads to a value 269.9 with 10 degreesof freedom. The resulthighly significant and of is with the second model provides a significantly improved fit compandthe mutual independence model. A useful way of judging a seriesof log-linear models is by means an analog the square of of of the multiple correlation coefficient used in multiple regessione Chapter 6). The measure as ( e s L is defined as Xipaseline model) Xi(mode1 of interest) L= Xtpaseline model)

L lies in the range and indicates the percentage improvement in gwdnessthe model (0.1) of fit of beiig tested over the baseline model. The choice baseline model is not fixed. It will be the mutual independence model, but of often it could be the simpler of two competing models. For comparing the mutual independence no second-order relationship models the suicide and on data, L. is L = (790.3 14.9)/790.3 = 98.1%.

10.4. LOGISTIC REGRESSION FOR A BINARY RESPONSE VARIABLE


In many multidimensional contingency tables a m tempted to say almost all), (I in
there is one variable that can properly be considered a response, and it is the relationshipof this variable to the remainder, which is of most interest. Situations in which the response variable has two categories are most common, and it is these that are considered in this section. When these situations are introduced

ANALYSIS OF CATEGORICAL DATA I1


TABLE 10.9 Parameter Estimates in the Final Model Selected for the Suicide Data

307

The final model s e l e c t e d for the suicide data that of no secondader relationship between the is t r e variables, that is, model 7 in Display 10.4. he The estimated main effect parameters are as follows. category Solid Gas Hang Estimate 1.33 1.27 1.55

EstimaWSE
34.45 -11.75 41.90 -8.28 -7.00 -7.64 7.23 16.77 -16.23 15.33 -15.33

GUn

Jump Other
10-40 41-70 270

-.4 06
-0.42 -0.55

02 .5
0.56 -0.81 0.41 -0.41

Male Female

(Note that the parameter estimates for variable sum to zero.) each The estimated interaction parameters for method and age and their ratio to the comsponding standard error (inparenthms) are as follows. Solid Age 10-40 -O.O(-0.5) 41-70 -0.1 (-1.09) >70 0.1 (1.1) Method Hanging -0.6(-14.2) 0.1 (1.7) 0.5 (9.6)

Gas 0.5 (4.9) 0.1 (1.0) -0.6 (-3.6)

Gun -O.O(-0.4) 0.1 (1.3) -0.1 (-0.6)

Jump -0.2(-2.5) -0.3 (-3.4) 0.5 (4.8)

Other 0.3 (4.2) O.O(O.4) -0.4 (-3.0)

The estimated interaction parameters for method and sex and the these estimates to their ratios of
standard emrs ( parentheses)are as follows. i n

Male

Sex

Female

Solid Gas -0.4(-13.1) 0.4(5.3) 0.0(0.3) (8.4) 0.6 (13.1) 0.4 -0.4 (-5.3)

Method
-0.0 (0.3) -0.6 (-8.4)

Gun

Jump -0.5 (-8.6)

0.5 (8.6) 0.1 (2.2)

Other -0.1 (-2.2)

The estimated interaction parameters for age and and the ratios of these estimates to their sex standard errors (parentheses) are as follows. i n Sex Male Female
1-0 04 0.3 (11.5) -0.3 (-11.5)

(6.8)

Age 41-70 -0.1 (-4.5) 0.2 (4.5) 0.1

>70 -0.2 (-6.8)

308
TABLE 10.10

CHAPTER 1 0
StandardizedResidualsfmm the Fioal Model Fitted to Suicide Data

Age

Sa

41-70 r70
10-40

10-40

41-70 >70

Male Male Male Female Female Female

-0.6

1.0 -0.7

-0.2

0.8 -0.9 0.6

-0.8 -0.9 0.4 0.5 -1.5 -1.1 1.1 1.8 0.1-0.5

0.5

0.8

-0.1 0.2 -0.2 0.1 -0.0 -0.1 0.4 0.4 0.3 -0.3 0.0 -0.3

-0.3

0.2

-0.3 0.4 -0.2

in the context of a multidimensional contingency table, it might be thought that our interest in them is confined to only categorical explanatory variables. we As this shall seein the examples to come, is nor the case, and explanatory variables might be a mixture categorical and continuous.In general the data will consist of of either obervations from individual subjects having the value of a zero-one response variable and associated explanatory variable or these observations values, are grouped contingency table fashion, with counts of the number zero and one of values of the response in each cell. Modeling the relationship between a response variable and a number of explanatory variables has already been considered in some detail Chapter 6 and in in a numberof other chapters,so why not simply refer to the methods described previously? The reason with the nature of the response variable. Explanations lies if of will become more transparentconsidered in the context an example, and here of a we will use the data shown in Table 10.11. These data arise from a study psychiatric screening questionnaire called the General Health Questionnaire (GHQ) (see Goldberg, 1972). How might these data be modeled if interest centers on how "caseness" is related to gender GHQ score? One possibility that springs and to mind would be to consider modeling the probability, of being a case.In terms of the general p, model encounteredin earlier chapters and Display 10.1, that is, in

observed response = expected response error,

(10.5)

this probability would be the expected response and the corresponding observed response would be the proportion of individuals sample categorizedcases. in the as

ANALYSIS OF CATEGORICAL DATA TABLE 10.11


GHQ Data

I1

309

GHQ Scorn

Sa

No. of Cases

No. o Noncases f

3 4 5 6 7 8 9 10

3 4 5 6 7 8 9 10

1 2

F F F F F F F F F F F M M M M M M M M M M M

4 4 8 6 4 6 3

80

29 15 3

2 1 1

2 2

1 1

1 3 3

2 5 8

0 0 0 36

4 1

2 4

1 1

2 2

Nore. F. female; M, male.

Therefore, a possible model is

P = BO B w x B2GHQ
and

(10.6)

observed response

= p + error.

(10.7)

To simplify thingsfor the moment, lets ignore gender and fit the model

= :

B +BlGHQ o

(10.8)

by using a least-squares approach as described in Chapter 6. Estimates of &e W O parameters, estimated standard errors, and predicted values of &e response

310

CHAPTER 1 0
Linear Regression Modelfor GHQ Data with GHQ Score as the Single
Explanatory Variable

TABLE 10.12

Estimate

Parameter

SE

&,(inteFcept) PI(GHQ)

0.136
0.096

0.065 0.011

2.085 8.698

Predicted Values of pfrom This Model ObservationNo. Predicted Pmb. Observed Pmb.

10

1 2 3 4 5 6 7 8 9

0.136 0.233 0.328 0.425 0.521 0.617 0.713 0.809

11

0.905 1 .001 1.W7

0.041 0.100 0.303 0.500 0.700 0.818 0.714 0.750 0.857 1.Ooo 1 .ooo

are shown in Table 10.12. Immediately a problem becomes apparent-two of the predicted values greater than one, but the response probability constrained are is a to be in the interval (OJ). Thus using a linear regression approach here can lead is to fitted probability values outside the rangeAn additional problem that the 0.1. a e m r term in the e a r regression model assumed to have normal distribution; h is this is clearly not suitable for a binary response. It is clearly not sensible to contemplate using a which is known apriori model, to have serious disadvantages, and we needto consider an alternative approach so to linear regression for binary responses. That most frequently adopted is the linear logistic model, logistic modelfor short. Now a or transformed value of p is modeled rather than p directly, and the transformation chosen ensures that fitted are values of p lie in the interval (0,l). Details of the logistic regression model given in Display 10.5. Fitting the logistic regression model to the GHQ data, disregarding gender, gives the results shown Table 10.13. Note that now the predicted values in all are satisfactory and lie between 0 and 1. A graphical comparisonof the fitted linear and logistic regressions is shown in Figure 10.1, We see that in addition to the problems noted earlier with using the linear regression approachthis model here, provides a very poor description of the data.

ANALYSIS OF CATEGORICAL DATA


Display 10.5

I1

3 1 1

The Logistic Remssion Model The logistic rrunsfonnufion, A, of a probability, is defined as follows: p.

A = ln[P/(l P)].

In other words,A is the logarithm of the ratio for theresponse variable. odds As p varies from 0 to l,A varies between-m and m.
The logistic regression modelis a linear model for A, that is,

.x, where XI, XZ, . , are the q explanatory variables of interest. Modeling thelogistic transformation ofp rather than p itself avoids possible I 1 problems of finding fitted values outside permitted range.(n[p/( p)] is their often written as logit(p) for short.) The parameters in themodel are estimated by maximum likelihood (see Collett, 1991, for details). The parameters in the modelcan beinterpreted as the change the In(odds) of the in response variable produced by a change of one unit the corresponding explanatory in variable, conditionalon theother variables remaining constant. It is sometimes convenientto consider modelas it represents p itself, thatis, the

A=In~p/~l-P~l=Bo+Blxl+~~z+~~~+Bqxq,

P=

There are a number of summary statistics that measure the discrepancy between the observed proportions of success (i.e., the one category of the response variable,
true success say), andthe fitted proportionsfrom the logistic model for the probability p. The most commonis known as the deviance, D,and it is given by

exp(B0 + BA + + &x,) +exp(h + + +p,xg)


*

~ I X I

i=l

where yi is the number ofsuccess in the ith category ofthe observations, ni is the total number of responses the ith category, andYi = nisi where fii is the in predicted success probability this category. (We are assuming here that the raw for Danish data have been collected categories as in the GHQ data and the into do-it-yourself data in the text. Whenthis is not so and the data consist of the original zeros and ones the for response, thenthe deviance cumof be used as a measure of fit; see Collett, 1991,for anexplanation of why not.) The deviance is distributed as chi-squared and differences deviance valuescan be in used to assess competing models;see the examples in the text.

The estimated regression coefficient for GHQ score in the logistic regression is 0.74. Thus the log (odds) of being a case increases by 0.74 for a unit increase in GHQ score. An approximate 95% confidence interval for the regression coefficient is

0.74 f 1.96 X 0.09 = (0.56,0.92).

(10.9)

312
Logistic Regression

CHAPTER 10
TABLE 10.13 Modelfor GHQ Data with GHQ Score as the Single Explanatory Variable

Esrmnte

Parameter

SE

h(intercept)

PI (GHQ)
1 2 3 5 6 7

-2.71 1 0.736

-9.950 0.272 0.095

7.783

ObservationNo.

Predicted Pmb.
0.062 0.122 0.224 0.377 0.558 0.725

Observed Pmb.
0.041 0.100 0.303 0.500 0.700 0.818 0.714 0.750 0.857 1O OO . 1.Ooo

9 10 11

0.846 0.920 0.960 0.980 0.991

However, such results given they are in terms of log (odds) are not immeas if diately helpful. Things become betterwe translate everything back to odds, by exponentiating the various terms. exp(0.74) = 2.08 represents the increase So, in the odds beiig a case of when the GHQ score increases one. The corresponding by confidence intervalis [exp(O.56), exp(O.O92)] = [1.75,2.51]. for Now let us consider a further simple modelthe GHQ data, namely a logistic regression for caseness with only gender as an explanatory variable. The results of fitting such a model shown in Table 10.14. Again,the estimated regression are coefficientfor gender, -0.04, represents the change in log (odds) (here a decrease For as the explanatory variable increases by one.the dummy variable coding sex, such a change implies that the observation arises from a man rather than a Transfemng back to odds, havea value of 0.96 anda 95% confidence interval we of (0.55,1.70). Becausethis interval contains the value one, which would indicate the independenceof gender and caseness (see Chapter g),it appears that gender does not predict caseness these data. for If we now look at the 2 x 2 table of caseness and genderfor the GHQ data, that is, Sex Female Male 25 Case 43 Not case 131 79

ANALYSIS OF CATEGORICAL DATA

1 1

313

6
GHQ

10

FIG. 10.1. Fitted linear and logistic regression models for the probability of b e i n g a case as a function of the GHQ score for the data in labie 10.I I

TABLE 10.14 Logistic Regression Model GHQ Data with Gender as for the Single Explanatory Variable

EstimateParameter

SE

Bo (intercept) Bo (=x)

-1.114 -0.037

0.176 0.289

-6.338 -0.127

we see that the odds ratio is (131 x 25)/(43 x 79) = 0.96,the same resultas given by the logistic regression. For asingle binary explanatory variable,the estimated regression coefficient is simply the log of the odds ratio from the 2 x 2 table

relating the response variablethe explanatory variable. (Readers encouraged to are to confirm that the confidence interval found from the logistic regression is the same as would be given by using the formula the varianceof the log of the odds for As a ratio given in the previous chapter. a further exercise,readers might fit logistic regression model to the GHQ data, which includes both GHQ score and gender

314

CHAPTER 1 0

as explanatory variables.If they do, they will notice that the estimated regression coefficient for sex is no longer simply the log of the odds ratio from the 2 x 2 table above, because of the conditional nature of these regression coefficients w in other variables are included the fitted model.) As a further more complex illustrationof the use of logistic regression, the method is applied to the data shownin Table 10.2. The data come from asample of employed men aged between 18 and 67 years; who were asked whether, in the preceding year, they had canied out work in their home that they would have previously employed a craftsman to do. The response variable here is the answer (yedno) to that question. There are four categorical explanatory variables: (1) age: under 30, 31-45, and over 45; (2) accommodationtype: apartment or house; (3) tenure: rent or own; and (4) work of respondent: skilled, unskilled, or office. of To begin, let us consider only the single explanatory variable, workrespont h dent, whichis a categorical variable withree categories. In Chapter 6, when the it use of multiple regression with categorical explanatory variables was discussed, was mentioned that although such variables could be used, some care was neede in howto deal with them when they have more two categories, as here. Simply than a s coding the categories,say, 1,2, and 3for the work variable would really not do, equal intervals along the scale. The proper to handle way because it would imply a nominal variable with more that k > 2 categories would be to recode it as a number of dummy variables. Thus we will recode work terms of two dummy in as variables, work1 and work2, defined follows.
Work Work1 Skilled 0 Unskilled 1 Office 0 Work2
0 0

Theresults of thelogisticregressionmodel of theform logit(p) = Blworkl BZwork2 are given in Table 10.15. A cross-classification of work

TABLE 10.15
Logistic Regression for Danish Do-It-Yourself Data with Work as the Single Explanatory Variable

Parnmerer

Estimate

SE

(intercept)
B (work11 1

F2 (WOW

0.706 -0.835 -0.237

0.112
0.147 0.134

6.300 -5.695 1.768

ANALYSIS OF CATEGORICAL DATA I1


TABLE 10.16 Cross-Classification of Work Against Response for Danish Do-It-Yourself Data

315

work
Response

m Total Skilled e Unskilled

239 210

119 No Yes 241

Ttl oa

301 481 782

360

449

932 1591

against the response is shown Table 10.16. From this table we can extract the in following pair of 2x 2 tables: 119 Response Unskilled Skilled No Yes 1 24 210 odds ratio = (119 X 210)/(241 ResponseOffice Skilled No 241 Yes 48 1 odds ratio = (1 19X 481)/(241

239) = 0.434, log (oddsratio) = -0.835.

119

301) = 0.789, log (odds ratio)

-0.237.

We see that the coding used produces estimates for the regression coefficients workl and work2 that equal to the log ratio) from comparing skilled and are (odds unskilled and skilled and office. (Readers encouraged to repeatthis exercise; are as use the age variable represented alsotwo dummy variables.) The results from fitting the logistic regression model all four explanatory with variables to the Danish do-it-yourself data are shown in Table 10.17. Work has been recoded as workl and work2 as shown above, and age has been similarly coded in terms of two dummy variables, age1 and age2. So the model fitted is logit(p) = BO Blworkl+ hwork2 Bsagel+ B4age2 Bstenure B6type. Note that the coefficients for workl and work2 are similar, but not identical to as those in Table 10.15. They can, however, still be interpreted log (odds ratios), are taken but after the effects of the other three explanatory variables into account. The results in Table 10.17 appear to imply that work, age, and tenure are the three most important explanatory variables for predicting the probability of answering yesto the question posedin the survey. However, the same caveats apply in to logistic regression as were issued in the case of multiple linear regression

316

CHAPTER 1 0
TABLE 10.17 Results from Fitting the Logistic Model toD n s Do-It-Yourself aih Data by Using All Four O s r e Explanatory Variables bevd

SE

1.984 -5.019 -2.168 -0.825 -3.106 7.368 -0.017

0.154 0.152 0.141 0.137 0.141 0.138 0.147

Chapter &these regression coefficients and the associated standard errors are estimated, condirional on the other variables' being in the model. Consequently, the t values give only arough guide to which variables should be included in a final model. Important subsetsof explanatory variables in logistic regression often seare lected by using a similar approach the forward, backward, and stepwise proceto dures described in Chapter6, although the criterionfor deciding whether or not a candidate variable should be added or excluded from, an existing model is to, different, now usually involving the deviance index goodness of fit described of in Display 10.5. Many statistical packages have automatic variable selection prowe cedures tobe used with logistic regression, but here shall try to find a suitable model for the Danish do-it-youself data a relatively informal manner, by examin are ining deviance differencesas variables are added to a current model. Details given in Table 10.18. It is clear from the results Table 10.18 that Tenure, Work, and Age are all in so required in a final model. The parameter estimates, and on, forthis model a ~ e shown in Table 10.19, along with the observed and predicted probabilities of g a yes answer the question posed the survey. Explicitly, the final model to in is

" p - + l.0ltenure h 0.30 0.76workl 0.31work2 1-P

O.llage1 0.43age2. (10.10)

Before trying to interpretmodel, it would be tolook at some diagnostics this wise in that will indicate any problems in a has, way to that described Chapter 6 it similar for multiple linear regression. useful diagnostics for logistic regression "bo are detwo scribed in Display 10.6. Helpful plots of these diagnostics are shown in Figure 10.2. Figure 10.2(a) shows the deviance residuals plotted against the fitted values

ANALYSIS OF CATEGORICAL DATA II


TABLE 10.18
Comparing Logistic Models for the Danish Do-&Yourself Data

317

We will start with a model including only tenure as an explanatory variable and then add explanatory variables in an order suggested by the t statistics from Table 10.17. Differences in the deviance values for the various models can be used to assess the effect of adding the new variable to the existing model. These differences can be tested as chi-squares with degrees of freedom equal to the difference in the degrees of freedom of the two models that are being compared. We can arrange the calculationsin what is sometimes known as an analysis ofdeviance table Model Tenure Tenure +Work Tenure Work Tenure Work Deviance 12.59 40.61 29.61 29.67 DF Deviance diff. 31.98 10.94 0.00 DF diff. 2 2 1
p

+ +

+ Age

+ Age + ?srpe

34 32 30 29

<.OOOl .004 1.00

In fitting these models, work is entered as the two dummy variables work1 and work2, and similarly age is entered as age1 and age2.

and Figure 10.2(b) shows a normal probability plot of the Pearson residuals. These plots give no obvious cause for concern, so we can now try to interpret our fitted model. It appears that conditional on work and age, the probability of a positive response is far greater for respondents who own their home as opposed to those who rent. And conditionalon age and tenure, unskilled and office workers tend to have a lower probability of responding yes than skilled workers. Finally, it appears that the two younger age groups do not differ in their probability of giving a positive response, and that this probability is greater than that in the oldest age group.

10.5. THE GENERALIZED LINEAR

MODEL
In Chapter 6 we showed that the models used in the analysis of variance and those used in multiple linear regression are equivalent versions of alinear model in which an observed response is expressed as a linear function of explanatory variables plus some random disturbance term, often referred to as the ''error'' even though in many cases it may not have anything to do with measurement error. Therefore, the general form of such models, as outlined in several previous chapters, is
observed response = expected response error, (10.11) expected response = linear function of explanatory variables.

(10.12)

TABLE 10.19 StandardErrors for the Final Model Parameter Estimates and Selected for Danish Do-It-Yourself Data
SE
t

Estimate

Parameter

0.304 8.868 -5.019 -0.436

Intercept Tenure" 0.114 work1 0.152 Work2 Age1 Age2

1.014 -0.763
-0.305

-0.113

0.141 0.137

0.140

-2.169 -0.826 -3.116


n

Observed and Predicted Probabilitiesf a Yes Response o Observed Cell

cted

2 3 4
5

0.54 0.54

0.58 0.55

33 28
15

0.40
0.55

0.47 0.58
0.55

0.71
0.25

62 14
8

10 11

6 7 8 9

0.83 0.75
0.50

12 13 14 15 16 17 18 19 20 22 23 24 25 26 27 28 29 30

0.82 0.73 0.81 0.33 0.37


0.44 0.40

0.19 0.30
0.40 0.00
1. 0 0 0.72 0.63 0.49

0.47 0.79 0.77 0.71 0.79 0.77 0.71 0.39 0.36 0.29 0.39 0.36 0.29
0.64

51

6 4 2 68 77 43 27 34 73
16

23

21

0.61 0.53
0.64

5 2

0.55 0.55

0.34 0.47 0.45 0.48

0.61 0.53 0.50 0.47 0.39


0.50

100 55

3 32 83

0.47 0.39

42 61 47 29 23
(Continued)

318

ANALYSIS OF CATEGORICAL DATA II


TABLE 10.19 (Continued)

319

Cell

31 32 33 34 35 36

0.67 0.71 0.33 0.74 0.72 0.63

0.73 0.71 0.64 0.73 0.71 0.71

12 7 3 73 267 163

"Tenure is coded 0 for rent and 1 for own.

Display 10.6 Diagnostics for Logistic Regression The first diagnostics for logistic regression the Pearson miduals defined as a are

For ni 2 5 the distribution of the Pearson residuals can reasonably approximated be by a standard normal distribution, so a normal probability plot the XI should be of
linear.

Pearson residualswith absolute values greater two or three might be regarded than with suspicion. The second useful diagnostic for checking a logistic regression is the model
deviance residual,defined as

where sgn(yi - is the function that makes positive whenyi 2 9, and negative Y1) dl when y~<j l .Plotting the deviance residuals against the fte values canbe useful, itd indicating problems models. Like the Pearson residuals,the deviance residual with is approximately normally distributed. Specification of the model is completed by assuming some specific distribution for the error terms. In the case of ANOVA and multiple regression models, for example, the assumed distributionnormal with mean zero and a constant is variance d . Now consider the log-linear and the logistic regression models introduced in this chapter. How might thesemodels beput into a similar form to the models the in used is by a analysis of variance and multiple regression? The answer relatively simple

t
L

Ii
L-

320

ANALYSIS OF CATEGORICAL DATA II

32 1

adjustment of the equations given above, namely allowing some transformation of of the expected response be modeled as a linear function explanatory variables; to that is, by introducing a model of the form

observed response = expected response error, (10.13) f(expected response) = linear function of explanatory variables, (10.14)

where f represents some suitable transformation. context this generalized In the of linear model (GLM), f is known as a linkfunction. By also allowing the error terms to have distributions other than the normal, both log-linear models and be as logistic regression models can included in the same framewok ANOVA and multiple regression models. For example, logistic regression, the n function for lk i would be the logistic the error term and binomial. The GLM allowing a variety link functions and numerous error distributions of was first introduced into statistics by Nelder and Wedderburn (1972). Such mode are fitted to data by using a general maximum likelihood approach, details of which are well outside the technical requirements text (see.McCullagh and of this Nelder, 1989, fordetails). Apart fromthe unifying perspective the GLM, main of its advantage is that it provides the opportunitycarry out analyses that more to make realistic assumptions about data than the normality assumption made explicitly and, more worryingly, often implicitly the past. Nowadays, statistical in all software packages can fit GLMs routinely, and researchers in general, and those working in psychology in particular, need tobe aware of the possibilities such models often of their offer for a richer and more satisfactory analysis data.

10.6. SUMMARY

1. The analysis of cross-classifications of three or more categorical variables can now be undertaken routinely using log-linear models. by 2. The log-linear models fitted to multidimensional tables correspond to particular hypotheses about the variables forming the tables. 3. Expectedvaluesandparameters in log-linearmodels are estimatedby maximum likelihood methods. In some cases the former consist simple of functions of particular marginal totals observed frequencies. However, in of many examples, the estimated frequencies have to be obtained by an iterative process. 4. The fit of a log-linear model is assessed by comparing the observed and estimated expected values under the model by means of the chi-squared statistic or, more commonly, the likelihood ratio statistic. 5. In a data setin which one of the categorical variables be considered to can be the response variable, logistic regression can be applied to investigate the

322

CHAPTER 1 0

effects of explanatory variables. The regression parameters in such models can be interpreted in terms of odds ratios. 6. Categorical response variables with more than two categories and ordinal be handledbylogisticregression types of responsevariablescanalso models; see Agresti (1996) for details.

COMPUTER HINTS

SPSS
The type of analyes described inthis chapter can be accessed from the Statistics menu; for example, to undertake a logistic regression using forward selection by to choose a subset explanatory variables, we can use the following. of
1. Click onStatistics, click on Regression, and then click Logistic. on 2. Move the relevant binary dependent variable into the Dependent variable
3. Move the relevant explanatory variables into the Covariates box. 4. Choose Forward as the selectedMethods.

box.

5. Click onOK.

S-PLUS In S-PLUS, log-linear analysis and logistic regression can accessed by means be
as of the Statistics menu, follows.

1. Click on Statistics, click on Regression, and then click on Log-linear (Poisson) for the log-linear models dialog box, Logistic for the logistic or regression dialog box. 2. Click on thePlot tag to specify residual plots, and on. so
When the S-PLUScommand line language is used, log-linear models and logistic regression models can be applied by using the glm function, specifying fmiJy=poisson for the former, and famiJy=binomial for the later. for are Danish do-it- yourself When the data logistic regression grouped asin the example in the text, then the total numberof observations in each group has to glm weights argument. So if, for example, be passed to the function by using the GHQ the GHQ data were stored in a data frame with variablessex and score, the logistic regression command would be
glm(p-sex+score,famiJy= binomial,weights=n,data=GHQ),

ANALYSIS OF CATEGORICAL DATA

II

323

where p is the vector of observed proportions, andn is the vector of number of observations.

EXERCISES
1.. The data in Table 10.20were obtained from a studyof the relationship 01 between car size and car accident injuries. Accidents were classified according to their type, severity, and whether or not the driver was ejected. Using severity as the response variable, derive and interpret a suitable logistic model these for accounts.
10.2. The data shown in Table 10.21arise from a study which a sample of in 1008 people were asked to compare two detergents, brand M and brand X. In

addition to stating their preference, the sample members provided information on of used, and previous use of brand M, the degree of softness the water that they the temperature of the water. Use log-linear models to explore the associations between the four variables.
10.3. Show that the marginal totals of estimated expected values Eii., Ei.k, and Ejk, corresponding to the no second-order relationship hypothesis suifor the cidedata,areequal to thecorrespondingmarginaltotals of theobserved values. 10.4. The data inTable 10.22 (taken from Johnson and Albert, 1999) are for 30 students in a statistics class. The response variable y indicates whether
TABLE 1020 Car AccidentData

Number Hurt Car Weight


Driver Ejected

Accident ljpe

Severely Severely Not

S d Small

No
No Yes Yes No No YeS Yes

Small
Small Standard Standard Standard Standard

Collision Rollover Collision Rollover Collision Rollover Collision Rollover

150 112 23 80 1022

404
161 265

350 60 26 19 1878 148 111 22

324
TABLE 1031 Comparisonsof Detergents Data

CHAPTER

1 0

Previous User of M WaterSobess

Not Previous User of M High Temp.

B r d PreferredTemp. High

Low Temp.

Low Temp.

Soft

M
55 M

X X

Medium
23

33

Hard

X M

19 29 23 47 2 4 43

57 49 47 37 52

29 27

63 53 66 68 42

42 30

TABLE 10.22 Data for Class of Statistics Students

Student

y i

Test Score

Grade in Course

I 2 3 4 5 6 7

0 0 1 0 1

9 10 11 12 13 14 15 16 17 18 19 20

1 1 1 1 1 1 1 1 1 1

0 0 1

525 533 545 582 581 576 572 609 559 543 576 525 574 582 574 471 595 557 557 584

C D

C B

B
D B

A C A

D
C B B C

A A
A

ANALYSIS OF CATEGORICAL DATA

II

325

TABLE 10.23 Menstruation of Girls in Warsaw


Y
n

11.08 11.33 11.58 11.83 12.08 12.33 12.58 12.83 13.08 13.33 13.58 14.08 14.33 14.58 15.08 15.33 15.58 17.58

2 5 10 17 16 29 39 51 47 67 81 79 93 117 107 92 1049

90

120 88 105 11 1 100 93 100 108 99 106 117 98 97 100 122 11 1 94 1049

Nofe. Numbers y who havereached menamhe f o n in rm age group with center x .

or not the student passed (y = 1) or failed ( y = 0) the statistics examination at Also given are the students scores on a previous math test the end of the course. and their grades a prerequisite probability course. for
1. Group the students into those with maths test scores 400,501-550.5516 0 0 , and >W, thenfit a linear model to the probability of passing and by

using the midpoint of the gmuping intervalthe explanatory variable. as 2. Use your fitted model to predict the probability of passing students with for maths scoresof 350 and 800. 3. Now fit a linear logistic model the same data and again use the model to to of of predict the probability passing for students with math scores 350 and
800. 4. Finally fit a logistic regression model to the ungroupedby using both data explanatory variables.

10.5. Show that the interaction parameter in the saturated log-linear model for a 2 x 2 contingency table is related to the odds of the table. ratio

326

CHAPTER

10

10.6. The data in Table 10.23 relate to a sample of girls in Warsaw, the response variable indicating whether or not the girl has begun menstruation and the exploratory variable age years (measured to the month). the estimated in Plot probability of menstruation as a functionof age and show the linearand logistic on regression fits to the data the plot. 10.7. Fit a logistic regression model GHQ data that includes main to the effects for both gender and GHQ and an interaction between the two variables.
10.8. Examine both the death penalty data (Table 10.3) and the infants survival and data (Table 10.4) by fitting suitable log-linear models suggest what it is that leads to the spurious results the data are aggregated over a particular when variable.

Appendix A
Statistical Glossary

This glossary includes terms encountered in introductory statistics courses and terms mentioned inthis text with little or no explanation. In addition, some terms of general statistical interest thatare not specific toeither this text or psychology are defined. Terms that are explained in detailin the text are not included in this
glossary. Terms in italics in a definition are themselves defined in appropriate the place in the glossary. Terms are listed alphabetically, using the letter-by-letter convention. Four dictionaries of statistics that readers mayalso find useful are as follows.
1. Everitt, B. S. (1995). The cambridge dictionaryof statistics in the medical sciences. Cambridge: Cambridge UniversityPress.

2. Freund,J.E.,andWilliams,F.J.(1966).Dicrionary/outlineofbasicstatistics. New York Dover. 3. Everitt, B. S. (1998). The cambridge dictionary of statistics. Cambridge: Cambridge University Press. 4. Everitt, B. S., and Wykes, T. (1999).A dictionary o statistics forpsycholof gists. London: Arnold.

327

328

APPENDIX A

A
Acceptance region: The set of values of a test statistic for which the null hypothesis is accepted. Suppose, for example, a z test is being used to test that is the mean of a population is 10 against the alternative that it not 10. If the significance levelchosen is .05, then the acceptance region consists values of ofzbetween-1.96and1.96. Additive effect: A term used when the effect of administering two treatments See together is the sum of their separate effects. also additive model. Additive model: A model in which the explanatory variables have anadditive efecr on the response variable.So, for example, if variable A has an effect of size a on some response measure and variable B one of size b on the same A for response, then in an assumed additive model and B, their combined effect would be a b.

Alpha(a):

The probability of a ljpe I error. See also significance level. The hypothesis against which the
nullhypothesis is

Alternativehypothesis: tested.

Analysis of variance: The separationof variance attributable to one cause from the variance attributable to others. Provides a way of testing for differences between a setof more than two population means.

A posteriori comparisons: Synonym for post hoc comparisons.


A priori comparisons: Synonym for planned comparisons. Asymmetricaldistribution: Aprobability distribution orfrequency distribution that is not symmetrical about some central value. An example would be a distribution with positive skewness as shown in Figure 1, giving the histogram A. task. of 200 reaction times (seconds) to a particular Attenuation: A term applied the correlation between variables when both to two are subject to measurement error,to indicate that thevalue of the correlation between the true values likely to be underestimated. is

B
Balanceddesign: A termappliedtoanyexperimentaldesigninwhichthe same number observations is taken for each combination the experimental of of factors.

STATISTICAL GLOSSARY

329

1
FIG. A. I .

Reaction time

Example of an asymmetrical distribution.

Bartlettstest: A test for the equality the variances anumber of populations, of of sometimes used prior to applying analysis of variance techniques to assess the assumption of homogeneity of variance. It is of limited practical value because of its known sensitivity to nonnormality, so that a significant result might be caused departures from normality ratherbythan by different variances. See also Boxs test and Hartleys test. Bell-shaped distribution: Aprobabiliw distributionhaving the overall shapeof normal distributionis the most well-known avertical cross section a bell. The of example, buta Students t disrribution is also this shape. Beta coefficient: A regression coefficient that standardized so as to allow for is as to a direct comparison between explanatory variables their relative explanatory power for the response variable. Itis calculated from the raw regression of the coefficients by multiplying them by the standard deviation corresponding explanatory variable and then dividing standard deviation the response by the of variable. Bias: Deviation of results or inferences from the truth, or processes leading oe to such deviation.M r specifically, this is the extent to which the statistical be to method used in a study does not estimate the quantity thought estimated.

330

APPENDIX A

2
X

I *

FIG. A.2. Bimodalprobability and frequency distributions.

Bimodal distribution: A probability distribution, or a frequency distribution, with two modes. Figure shows examples. A.2

Binary variable: Observations that occur in one of two possible states, which are often labeled 0 and 1. Such data are frequently encountered in psychological investigations; commonly occurring examples include improvedlnot improved, and depressedhot depressed.

Binomial distribution: The probability disfributionof the number of successes, x, in a series ofn independent trials, each of which can result in either a succe or failure. The probability of a success, remains constant from trial toil p, t a. r Specifically, the distribution of is given by x

P(x) =

n! p(1 x!(n -x)!

-p>-, x = 0,1,2, ., n.
*.

The mean of the distribution np and its variance is np(1- p). is

Biserial correlation: A measure of the strength of the relationship between two y variables, one continuous( )and the other recorded as a binary variable (x), but having underlying continuity and normality. It is estimated from the sam values as

STATISTICAL GLOSSARY
x x

33 1

where j$ is the sample mean ofthe y variable for those individuals for whom = 1, j o is the sample mean of the y variable for those individuals having = 0, sy is the standard deviation of the y values, p is the proportion of individuals with x = 1, and q = 1 p is the proportion of individuals with x = 0. Finally, U is the ordinate (height)of a normal distributionwith mean zero and standard deviation one, at the point division between thep and q of proportions of the curve. See also point-biserial correlation.

Bivariate data: variables.

Data in which the subjects each have measurements on two

Boxs test: A test for assessing the equality of the variances in a number of populations that is less sensitive to departures from normality than Bartletts test. See also Hartleys test. C Ceiling effect: A term used to describe what happens when many subjects in a study have scores on a variable that are or near the possible upper limit at (ceiling). Such an effect may cause problems types analysis because for some of it reduces the possible amount of variation variable. The converse, in the orfloor effect, causes similar problems. Central tendency: A property of the distributionof a variable usually measured by statistics suchas the mean, median, and mode. Change scores: Scores obtained by subtracting a posttreatment score on some variable from the corresponding pretreatment, baseline value. Chi-squared distribution: Theprobabilitydistributionof the sumof squares of a number of independent normal variables with means zero and standard deviations one. This distribution arises area of statistics,for example, assessin many ing the goodness-of-fitmodels, particularly those fitted to contingency tables. of Coefficient of determination: The square of the correlation coeficient between two variables x and y. Gives the proportion the variation in one variable that of of 0.8 64% is accounted for the other. For example, a correlation implies that by of the varianceof y is accounted for by x. Coefficient of variation: A measure of spread for a setof data defined as

10 x standard deviation/mean. 0
This was originally proposed a way of comparing the variability different as in be distributions,but it was found to sensitive to errors in the mean.

332

APPENDIX A

Commensurate variables: Variables that are on the same scale expressed in or the sameunits, for example, systolic and diastolic blood pressure. Composite hypothesis: A hypothesis that specifies more than a single value for a parameter, for example, the hypothesis that the mean of a is greater population than some value. of Compound symmetry: The property possessed a covariance matrix a set by of multivariate data when its main diagonal elements equal to one another, are and additionally its off-diagonal elements are also equal. Consequently, the matrix has the generalform

where p is the assumed common correlation coeflcient of the measures.

Confidence interval: A range of values, calculated from the sample observaa tions, that rebelieved, with a particular probability, to contain the true par A ter value. 95% confidence interval, for example, implies that if the estimati process were repeated again and again, then95% of the calculated intervals would be expected to contain the true parameter value. Note that the stated probability level refers to properties of the interval and not to the parameter a itself, which is not consideredrandom variable.

Conservativeand nonconservativetests: Terms usually encountered discusin p sions of multiple comparisontests. Nonconservative tests provide r control over the per-experiment error rate. Conservative tests, in contrast, may limit the per-comparison ermr rate to unecessarily low values, and tend to have low power unless the sample is large. size

Contrast: A linear functionof parameters or statistics in which the coefficients sum tozero. It is most often encountered in the context of analysis of varia For example, in an application involving, say, three treatment groups (with means XT,, T ~and XT,) and a control group (with mean the followingis X , XC), the contrastfor comparing the mean of the control group to the average of the treatment groups:

See also orthogonal contrast.

STATISTICAL GLOSSARY

333

Correlation coefficient: An index that quantifies the linear relationship between a pairof variables. For sample observations a variety such coefficients have of used been suggested,of which the most commonly is Pearsonspmductmoment correlation coefficient,defined as

where (XI, yl), (x2, yz), .. (,, y,,) are the n sample values of the two vari. ,x, ables of interest. The coefficient takes values between and 1, with the sign -1 indicating the direction of the relationship and the numerical magnitude its strength. Values of -1 or 1 indicate that the sample values fall a straight on of line. A value of zero indicates the lack any linear relationship between the two variables.

Correlation matrix: A square, symmetric matrix with rows and columns corresponding to variables, in which the off-diagonal elements correlation are the coefficientsbetween pairs of variables, and elements the main diagonal are on unity. Covariance: For a sample of n pairs of observation, (,, y,,). the statisticgiven by x,
cxy
(XI, yl),

(Q,yz),

. ., .

= - c x i - y i - ), z m J
n

where 2and p are the sample means the xi and yi respectively. of

Covariance matnlx: A symmetric matrix which the off-diagonal elements in are the covariances of pairs of variables, and the elements on the main diagonal are variances. Critical region: The values of a test statistic that lead to rejection of a null hypothesis. The size of the critical region is the probability of obtaining an outcome belonging tothis region when the null hypothesisis true, thatis, the probability of a ljpe I errur. See also acceptance region. Critical value: The value with which a statistic calculated the sample data form is compared in order to decide whether anull hypothesis should be rejected. The value is related to the particular significance chosen. level Cronbachs alpha: An index of the internal consistency a psychological test. of If the test consists of n items and an individuals score is the total answered

334

APPENDIX A

correctly, then the coefficient given specificallyby is

: a where a2is the variance of the total scores and is the variance of the set of 0 lscores representing correct and incorrect answers on item , i.

Cross-validation: The division of data into approximately equalsized subtwo sets, one whichis used to estimate the parameters in some model of interest of and the other to assess whether the model with these parameter values fits adequately.
Cumulativefrequency distribution: Alisting of the sample values of avariable,

together with the proportion of the observations or equal to each value. less than
D Data dredging: A term used to describe comparisons made within a data set not start specifically prescribed prior to the of the study.

Data reduction: The process of summarizing large amounts of data by forming frequencydistributions, histograms, scatter diagram, and so on, and calculatas ing statistics such means, variances, and correlation coefficients. The term is also used when a low-dimensional representation of multivariate sought is data as principal components analysis and factor analysis. by use of procedures such Data set: A general term for observations and measurements collected during any type of scientific investigation.

Degrees of freedom: An elusive concept that occurs throughout statistics. Es-

sentially, the term means the number of independent units of information in of a sample relevant to the estimation parameter or calculation of a statistic. F 2 set example, in a x 2 contingency table with a given of marginal totals, only one of the four cell frequencies is free and the table has therefore a singl of freedom. In many cases the term corresponds to the number of parameters in a model.
Dependentvariable:

See response variable.

Descriptive statistics: A general term for methods summarizing and tabulatof ing data that make main features more transparent, for example, calculatin their exploratory data analysis means and variances and plotting histograms. See also and initial data analysis.

STATISTICAL GLOSSARY
DF(df):

335

Abbreviations for degrees offreedom.

Diagonal matrix: A square matrix whose off-diagonal elements are all zero. For example,

Dichotomous variable:

Synonym for binary variable.

Digit preference: The personal and often subconscious that frequentlyocbias curs in the recording of observations. It is usually most obvious in the final recorded digitof a measurement. Discrete variables: Variables having only integer values, for example, number of trials to leam a particular task. Doubly multivariate data: A term usedfor the data collectedin those longituis recorded for each dinal studies which morethan a single response variable in subject on each occasion.

Dummy variables: The variables resulting from recoding categorical variables with more than two categories into a series of binary variables. Marital status, 3 for example, if originally labeled1 for married; 2 for single; and for divorced, widowed, or separated could be redefined in terms variables as follows. of two

Variable 1: 1 if single, and 0 otherwise; Variable 2: 1 if divorced, widowed, or separated, and 0 otherwise. For a married person, both new variables would In general a categorical zero. be variable withk categorieswould be recoded in termsk - dummy variables. of 1 Such recoding used before polychotomous variables used as explanatory is are variables in a regression analysis to avoid the unreasonable assumption that 1,2, . , the original numerical codes for the categories, that is, the values . . k, correspond toan interval scale. E
EDA:

Abbreviation for exploratory data analysis.

Effect: Generally used for the change in a response variable produced by a change inone or more explanatoryor factor variables.

336

APPENDIX

Empirical: Based on observation experiment rather than deduction from basic or laws or theory.

Error rate: The proportion subjects misclassified by an allocationrule derived of


from a discriminant analysis.
Estimation: The process of providing a numerical value for a population paof If rameter on the basis information collected from a sample.a single figure is calculated for the unknownparameter, the process calledpoint estimation. is If an interval is calculated within which the parameter is likely to fall, then the procedure is called interval estimation.See also least-squares estimationand confidence interval. Estimator: A statistic used to provide an estimatefor a parameter. The sample mean, for example, is unbiased estimator of the population mean. an Experimental design: The arrangement and procedures usedan experimental in study. Some general principles good design simplicity, avoidance bias, of are of the use of random allocation for foning treatment groups, replication, and adequate sample size. Experimental study: A general termfor investigations in which the researcher candeliberatelyinfluenceeventsandinvestigatetheeffects of theintervention. Experimentwise error rate:

Synonym for per-experiment errorrate.

Explanatory variables: The variables appearing on the right-hand side of the or equations defining, example, multiple regression logistic regression, and for that seek to predict or explain the response variable.Also commonly known as the independent variables, although is not to be recommended because this they are rarely independentof one another. Exploratory data analysis: An approach to data analysis that emphasizes the use of informal graphical procedures not based on prior assumptions about the structureof the dataor on formal modelsfor the data. The essence this of approach is that, broadly speaking, data assumed to possess the following are structure:

Data = Smooth

+Rough,

where Smoothis the underlying regularity pattern in the data. The objective or of the exploratory approach is to separate the Smooth from theRough with

STATISTICAL

337

minimal useof formal mathematics statistical methods. See also initial data or analysis.

Eyeball test: Informal assessmentof data simplyby inspection and mental calculation allied with experiencethe particular area from of which the data arise.

F
Factor: A term used in a variety of ways in statistics, but most commonly to refer to acategoricalvariable, with a small number of levels, under investigation in an experimentas a possible source variation. This is essentially simply a of categorical explanatory variable. Familywise error rate: The probabilityof making any error in a given family of per-experiment error rate. inferences. See also per-comparison error rate and

F distribution: The probability distribution of the ratio of

two independent random variables, each having a chi-squared distribution. Divided by their respective degrees freedom. of

Fisher's exact test: An alternative procedure to the usethe chi-squared of statis-

tic for assessing the independence two variables forming x 2 contingency of 2a table, particularly when the expected frequencies are small.

Fisher'sz transformation: A transformation of Pearson's product moment correlation coefficient,r , given by

Thestatisticzhasmean~In(l+p)/(l-p),wherepisthepopulationcorrelation value and variance l/(n - where n is the sample size. The transformation 3) may be used to test hypotheses and to construct confidence intervals for p.
Fishing expedition: Synonym for data dredging.

Fittedvalue: Usuallyusedtorefertothevalueoftheresponsevariableas predicted by some estimated model. Follow-up: The process of locating research subjects or patients to determine whether or not some outcome of interest has occurred.

Floor effect: S e e ceiling egect.

338

APPENDIX A

Frequency distribution: The division of a sampleobservations into a number of of classes, together with the number of observations in each class. Acts a as useful summary of the main features of the data suchas location, shape, and spread. An example ofsuch a tableis given below.

IQ Scom

Class Limits

Observed Frequency

15-19 80-84 85-89 95-99 100-104 105-109 110-114 2115


90-94

10

5 9

l
4

Frequency polygon: A diagram used to display graphically the values in a frequency distribution. The frequencies are graphed as ordinate against the class midpoints as abcissae. The points are then joined by a series straight of lines. Particularly useful in displaying a number of frequency distributions on the same diagram.

F test: A test for the equalityof the variances of two populations having normal
in the in which taken from each. most often encountered analysis of variance, It is testing whether particular variances are the same also for the equalityof a tests set of means.

distributions, based on theratio of the variancesof a sample of observations

G
Gamblers fallacy: The belief that an event has not happened for a long time, if it is bound to occur soon.
Goodness-of-fit statistics: Measures of agreement between aset of sample values and the corresponding values predicted from some model of interest.

Grand mean: Mean ofall the values in a grouped data set irrespective of group

STATISTICAL

339

Graphical methods: A generic term for those techniques in which the results or some are given in the form of a graph, diagram, other form of visual display.

Ho: Symbol for null hypothesis.

HI: Symbol for alternative hypothesis.


Halo effect: The tendency of a subjects performance on some task to be overrated because of the observers perception of the subject doing well gained in an earlier exercise when assessedin a different area. or

Harmonic mean: The reciprocalof the arithmetic meanof the reciprocals of a . ,x,. set of observations,XI, x2, . . Specifically obtained from

1 1 l -=-c-. H n
i=l Xi

Hartleys test: A simple test of the equality of variances of a number of populations. The test srarisric is the ratio of the largest to the smallest sample variances.

Hawthorne effect: A term used for the effect that might be produced in an experiment simply from the awareness by the subjects that they are particip in some form of scientific investigation. The name comes from a study of industrial efficiency at the Hawthorne Plant in Chicago 1920s. in the Hello-goodbye effect: A phenomenonoriginallydescribedinpsychotherapy research but one that may arise whenever a subject is assessed on two occasions with some intervention between the visits. Before an intervention, a person may present himself herself in bad as light as possible, thereby hoping to qualify or as for treatment, and impressing with the seriousness of or her problems. staff his At the end of the study the person may want to please the staff with his or her improvement, and so may minimize any problems. The result is to make it appear that there has been some impovement when none has occurred, or to magnify the effects that did occur. Histogram: A graphical representation of a set of observations in which class frequencies are represented by the areas of rectangles centered on the class also proporinterval. If the latterare all equal, the heights of the rectangles are tional to the observed frequencies.

340

APPENDIX A

Homogeneous: A term that is used in statistics indicate the equalityof some to quantity of interest (most often a variance), in a number of different groups, populations, andso on.

Hypothesis testing: A general term for the procedure of assessing whether sample data are consistentotherwise with statements made about the population. or See also null hypothesis, alternativehypothesis, composite hypothesis, sign@1 cance test, significance level, ljpe I e m r , and ljpe I error.

I
I A Abbreviation for initial data analysis. D

Identification: The degree to which there is sufficient information in the sample An observations to estimate the parameters in a proposed model. unidentified to the model is one in which are too many parameters in relation number of there observations to make estimation possible.A just identified model corresponds to a saturatedmodel. Finally, an overidentified model is one in which par of of can be estimated, and there remain degrees freedom to allow the fit the model to be assessed. Identity matrix: A diagonal matrix in which all the elements on the leading diagonal areunity and all the other elements zero. are Independence: Essentially, two events said to be independentif knowing the are outcome of one tellsus nothing about the other. More formally the concept is two defined int e r n of the probabilities of the events. In particular, two events A and B are said to be independentif

P(A and B) = P(A) X P@),


where P(A) and P(B)represent the probabilities of and B. A
Independent samples t test: See Students t test. Inference: The process of drawing conclusions about a population on the basis of measurements or observations made on a sample of individuals from the population. Initial data analysis: The first phase in the examination of a data set, which consists of a number of informal steps, including

checking the quality of the data,

STATISTICAL GLOSSARY calculating simplesummary statistics, and constructing appropriate graphs.

341

The general aim to clarify the structurethe data, obtain a simple descripis of tive summary,and perhaps get ideas for a more sophisticated analysis.

Interaction: A term applied when two (or more) explanatory variables do not act independentlyon a response variable. also addirive efect. See Interval variable: Intervalestimation: Interval variable:
Synonym for continuous variable.
See estimation.

Synonym for continuous variable.

Interviewer bias: The bias that may occur in surveys of human populations because of the direct result the action the interviewer. The bias canfor of of arise avariety of reasons, including failure to contact the right persons and systematic errors in recording the answers received from the respondent.

J
J-shaped distribution: An extremely assymetrical distribution with its maximum frequency in the initial class and a declining frequency elsewhere. An example is shown in FigureA.3.

K
Kurtosis: The extent to which the peak of a unimodal frequency distribution departs from the shape a normal distribution, by either beingmore pointed of (leptokurtic) or flatter (platykurtic). It is usually measured for a probability distribution as
P4IP: 3, where p is the fourth central momentof the distribution, and p2 is its vari4 ance. (Corresponding functions the sample moments are used frequency of for distributions.) For a normal distribution, index takes the value zero (other this distributions with zero kurtosis arecalledmesokurtic);for one that leptokurtic is it is positive, andfor a platykurtic curve is negative. it

L
Large sample method: Any statistical method based an approximation to a on normal distributionor otherprobability distributionthat becomes more accurate as sample size increases.

342

APPENDIX A

Reaction time
FIG. A.3.

Example of a Jshaped distribution.

Least-squares estimation: A method used for estimating parameters, particularly in regression analysis, minimizing the difference between the observed by response and the value predicted by the model. For example, if the expected value ofa response variable is of the form y

where x is an explanatory variable, then least-squares estimators of the parame-

terscrand~maybeobtainedfromnpairsofsamplevalues(x~, (XZ, yz), . . yl), .,


(xn, yn)by minimizing S given by

i=l

to give

Often referred to as ordinary least squaresto differentiate this simple version of the technique from more involved versions, such as weighted least squares.

STATISTICAL

343

Leverage points: A termused in regression analysis for those observations that have an extreme value one or more of the explanatory variables. The effect on of such points is to force the fittedmodel close to the observedvalue of the response, leadingto a small residual. Likert scales: Scales oftenused in studies of attitudes, in which the raw scores are based on graded alternative responses each of a seriesof questions. For to example. the subject may be asked to indicate or her degree of agreement his with each of series statements relevant to the attitude. A number a of is attached to each possible response, for example, 1, strongly approve; 2, approve; 3, undecided; 4, dissaprove; 5, strongly disapprove. The sum these is used as of the composite score. Logarithmictransformation: Thetransformation of avariable, x, obtained by taking y = ln(x). Often usedwhen thefrequency distribution of the varito norable, x, shows a moderate large degreeof skewness in order to achieve mality. Lower triangular matrix: A matrix in which all the elements above the main diagonal are zero. An example is the following:

1 0 0 0

L=(;2 3 0 0
M

; ;

3.

Main effect: An estimate of the independent effectof usually a factor variable in on a response variable an ANOVA. Manifest variable: A variable that can be measured directly, in contrast to a latent variable.
Marginal totals: A term often used the total number observations in each for of row and each columnof a contingency table.

MANOVA: Acronym for multivariate analysis of variance.


Matched pairs: A term used observations arising from either individuals for two of who are individually matched on a number variables, for example, age and are two sex, or where two observations taken on the same individual onseparate occasions. Essentially synonymous with paired samples.

344

APPENDIX A

Matched pairs t test: A Students t test for the equality of the means of two populations, when the observations ariseas paired samples. The test is based on the differences between the observations of the matched pairs. The test statistic is givenby

where n is the sample size, d is the mean of the differences, and sd is their standard deviation. If the null hypothesis of the equality of the population l is true, thent has aStudents t distribution with n - degrees of freedom.

Matching: The process of makinga study group and a comparison group compaIt rable with respect extraneous factors. is often used in retrospective studies to of in the selection cases and controlsto control variation in a response variable to that is due sources other than those immediately under investigation. Severa kinds of matching can be identified, the most common of which is when each case is individually matched with a control subject on the matching variables, so paired samples. such as age, sex, occupation, and on. See also

Matrix: A rectangular arrangement of numbers, algebraic functions, and on. so h 0 examples are

Mean: A measure of location or central value for a continuous variable. For a X 2 ., sample of observations I , x , . .x,, the measureis calculated as

n
Mean vector: A vector containing the mean values of each variable in a set of multivariate data. Measurement error: E r r in reading, calculating, or recording a numerical ros value. This is the difference between observed valuesof a variable recorded under similar conditions and some underlying true value.

STATISTICAL
Measures of association: Numerical indices quantifyingthestrength statistical dependence of two or more qualitative variables.

345

of the

Median: The valuein a set of rankedobservationsthatdividesthedatainto two parts of equal size. When there is an oddnumber of observations, the median is the middle value. When there an even number of observations,the is measure is calculated as the average of the two central values. It provides a measure of location of a sample that is suitable for asymmetrical distributions and is also relatively insensitive to the presence of outliers. See also mean and mode. Misinterpretation of p values: A p value is commonly interpreted in a variety of ways that are incorrect. Most common is that it is the probability of the null hypothesis, and that it is the probability of the data having arisen by chance. For the correct interpretation, see the entry for p value. Mixed data: Data containing a mixture o f continuous variables, ordinal variables, and categorical variables. Mode: The most frequently occurring value in a set of observations. Occasionsd ally u e as a measure of location. See also mean and median. Model: A description of theassumedstructure of a set of observationsthat can range from fairly imprecise verbal account to, more usually,formalized a a mathematical expression the process assumed to have generated the observed of data. The purpose of such a description is to aid in understanding the data. Model building: A procedurethatattemptstofindthesimplestmodel sample of observations that provides a adequate fit to the data. n for a

Most powerful test: A test of a null hypothesis which has greater power than any other test for agiven alternative hypothesis. Multilevel models: Models for data that are organized hierachically, for example, children within families, that allow for the possibility that measurements made on children from the same familyare likely to be correlated.

Multinomial distribution: A generalization of the binomial distributionto situationsinwhich r outcomescan occur on each of n trials, where r > 2.
Specifically the distributionis given by

346

APPENDIX A

where ni is the number of trials with outcome i, and pi is the probability of outcome i occurring on a particular trial.

Multiple comparison tests: Procedures for detailed examination of the differences between a set of means, usually after a general hypothesis that they are all equal has been rejected. No single technique is best in all situations and a major distinctionbetween techniques is how they control the possible inflation of the type I error. Multivariate analysis: A generic term for the many methodsof analysis important in investigating multivariate data. Multivariate analysis of variance: A procedure for testing the equality of the mean vectors of more than two populations. The technique is directly analogous to the analysis o variance of univariate data,except that the groups are f compared onq response variables simultaneously. the univariate In case, Ftests a e used to assess the hypothesesof interest. In the multivariatecase, no single r test statistic be constructed that optimal in all situations. The most widely can is used of the available test statistics is wilks lambda, which is based on three matrices W (the within groups matrix of sums of squares andproducts), (the T total matrixo sums of squares and f cross products) and B (the between groups matrix of sums of squares and cross products), defined asfollows:

t: B =E
=
8

i=1 j=1 ni(Zi

&)(Xij

&),

- -), g)(%,g

i=l

where x i j , i = 1 , . . g, j = 1,. . , represent the jth multivariate obser., . ni vation in the ith group, g is the number of groups, and ni is the number of observations in the ithgroup. The mean vector of the ith group is represented by %i and the mean vector of all the observations by % These matrices satisfy . the equation

T=W+B.
WWs lambda is given by theratio of the determinants W and T,that is, of

STATISTICAL GLOSSARY

34.7

The statistic, can be transformed give an test to assess the null hypothesis A, to F In of the equality of the population mean vectors. addition to A, a numberof are other test statistics available. Roy's largest root criterion: the largest eigenvalue BW"; of The Hotelling-Lawley trace: the sum of the eigenvalues BW"; of The Pillai-Bartlett trace: the sum of the eigenvalues BT". of It has been found that the differences in power between the various test so not statistics are generally quite small, and in mostsituations the choice will greatly affect conclusions.

Multivariate normal distribution: The probability distributionof a setof variables x = [XI, ~. .xq]given by ' x , , .

matrix. This distribution is

E where p is the mean vector of the variables and is their variance-covariance assumed by multivariate analysis procedures such of as multivariate analysis variance.

N Newman-Keuls test: A multiple Comparison test used to investigate in more detail the differences between a set of means, as indicated by a significant F tesr in an analysis of variance. Nominal significance level:The significance level a test when assumptions of its are valid. Nonorthogonal designs: Analysis of variance designs with two or more factors equal. in which the numberof observations in each cell are not Normal distribution: A probability distributionof a random variable,x that is , assumed by many statistical methods. It specifically given by is

are, respectively, the mean and variancex This distribution of . where p and uz is bell shaped.

Null distribution: The probability distribution of a test statistic when the null hypothesis is true.

348

APPENDIX A

Null hypothesis: The no differenceor no association hypothesis be tested to that (usually by means ofa significance test) against alternative hypothesis an or postulates a nonzero difference association. Null matrix: A matrix in which all elements are zero. Null vector: A vector, the elements of which are allzero.
0

One-sided test: A significance test for which the alternative hypothesis is directional; for example, that one population mean is greater than another. The choice between a one-sided andtwo-sided test must be made before any test statistic is calculated. Orthogonal: A term that occurs in several of statisticswith different meanareas ings in each case. is most commonly encountered in relationtwo variables It to or two linear functions a set variables to indicate statistical independence. of of It literally means right angles. at Orthogonal contrasts: Sets of linear functionsof either parametersor statistics in which the defining coefficients satisfy a particular relationship. Specifically, if c and c2 are twocontrusts of a set of m parameters such that 1

= ~ I I +I I Z+ Z +a l m S m , S~ S cz = +a2282 + +a2,,,Sml
c1
* *
*

zy!, .1, then the contrastsare said to be orthonormal. 4=


they are orthogonal if
alin~

= 0. If, inaddition, CLla2 = 1 and

orthogonal matrix: A square matrix that is such that multiplying the matrix by its transpose results in identify matrix. an Outlier: An observation that appears to deviate markedly from the other members of the sample in which it occurs. In the set of systolic blood pressures, (125,128,130,131,198), for example, 198 might be considered an outlier. Such extreme observations may be reflecting some abnormality in the measured an characteristicof a patient,or they mayresult from error in the measurement or recording. P Paired samples: nvo samples of observations with the characteristic feature that each observation in one sample has one and matching observation one only in the other sample. There several waysin which such samples can arise are in

STATlSTlCAL GLOSSARY

349

psychological investigations. The first, self-pairing, occurs when each subject in serves as his or her own control, as in, for example, therapeutic trials which each subject receives both treatments, one each of two separate occasions. on for Next, natural pairing can arise particularly, example, in laboratory experiments involving litter-mate controls. Lastly, artificial pairing may be used by an investigator to match the two subjects in a pair on important characteristics likely to be related the response variable. to
Paired samples 1test: Synonym for matchedpairs t test. Parameter: A numerical characteristicof a population or a model. The probafor bility of a successin a binomial distribution, example. Partial correlation: The correlation between a pair of variables after adjusting for the effect of a third. It can be calculated from the sample correlation coefficients of each pairof variables involvedas
Q213

r2 1

rl3r23

Pearson's product moment correlation coefllcient: An index that quantifies of the linear relationship between of variables. For a sample n observations a pair of two variables(XI, y l ) , (XZ, yz) . (xn,yn)calculated as .

The coefficient takes values between1 and 1.


Per comparison error rate: The significance level at which each test or comparison is canied out in an experiment. Per-experimenterror rate: The probabilityof incorrectly rejecting at least one null hypothesis in an experiment involving or more tests or comparisons, one when the corresponding null hypothesis is true in each case. See also percomparison error rate. Placebo: A treatment designed to appear exactly like a comparison treatment, but which is devoid of the active component. Planned comparisons: Comparisons between a setof means suggested before data are collected. Usually more powerful than a general test mean differfor ences.

350

APPENDIX A

Point-biserial correlation: A special caseof Pearsonspmduct moment correlation coeficient used when one variable is continuous 0)and the other is a binary variable(x)representing a natural dichotomy. Given by

where ji, is the sample mean they variable for those individuals with 1, of X = 90is the sample mean of the variable for those individuals with = 0, sYis y x
X

the standard deviation of the values, p is the proportion of individuals with y = 1, and 4 = 1 - is the proportion of individuals with x = 0. See also p biserial cowelation.

Poisson distribution: Thepmbabilirydistributionof the number of occurrences of some random event, in an interval of time space, and given by x, or

P(x)=-

e-AAx
X!

x=o,1,2

).. ..

The mean and variance a variable with such a distribution equal A. of are both to

Population: In statistics this term is used for any finite or infinite collection of units, which are often people but maybe, for example, institutionsor events. See alsosample,

Power: The probability of rejecting the null hypothesis when it is false. Power gives a method discriminating between competing tests of the same hypothof esis; the test with the higher power is preferred. the basisprocedures It is also of for estimating the sample size needed to detect an effect of a particular ma tude. Probability: The quantitative expression of the chance that an event will occur. This can be defined in a varietyof ways, of which the most common that is still involving long-term relative frequency:

P(A) =

number of times A occurs number of times A could occur

For example, if out of OOO children born in a region, 100, 51,ooO are boys, then the probability of a boy is1. 0.5
Probability distribution: For a discreterandom variable, this is a mathematical of the formula that gives the probability of each value variable.See, for example, binomial distribution and Poisson distribution. For a continous random

GLOSSARY STATISTICAL

35 1

variable, this is a curve described by a mathematical formula that specifies, falls by way of areas under the curve, the probability that the variable within a particular interval.An example is the normal distribution. In both cases the term probability densityalso used. (A distinction is sometimes made between is density and distribution, when the latter is reserved for the probability that the random variable will fall below some value.)

p value: The probability of the observed data(or data showing a more extreme departure from thenull hypothesis) when the null hypothesisis true. See also misinterpretationofp values, significancetest, and significancelevel.

Q
Quasi-experiment: A term used for studies that resemble experiments are but weak on some the characteristics, particularly that manipulation of of subjects to For groups is not under the investigators control. example, if interest centered on the health effects of a natural disaster, those who experience the disaster can be compared with those who do not, but subjects cannot be deliberately See assigned (randomlyor not) to the two groups. also experimental design.

R
Randomization tests: Procedures for determiningstatisticalsignificance directly from data without recourse to some particular sampling distribution. The data are divided (permuted) repeatedly between treatments, and each for test at division (permutation) the relevant statistic (for example, or F)is calculated to determine the proportion data permutations that provide of the as large a test statistic as that associated with the observed data.If that proportion is a the , are a smaller than some significance level results significant at the level. Random sample: Either a setof n independent and identically distributed random variables, or a sampleof n individuals selected from a population in such a way that each sample of the same size is equally likely. Random variable: A variable, the values of which occur according to some specifiedprobabiliv distribution. Random variation: Thevariationinadatasetunexplainedbyidentifiable sources. Range: The difference between the largest and smallest observationsin a data set. This is often usedas an easy-to-calculate measure the dispersion a set of in of observations, but is not recommended this task because of its sensitivity it for to outliers.

352

APPENDlX A

Ranking: The process of sorting a set of variable values into either ascending or descending order.

Rank of a matrix. The number of linearly independent rows or columns of a matrix of numbers.
Ranks: The relative positions the members ofa sample with respectto Some of characteristic.

Rank correlation coefficients: Correlation coefficients that depend on only the ranks of the variables, not on their observed values.
Reciprocaltransformation: A transformationoftheform y = l/x, which is particularly useful for certain types of variables. Resistances, for example, become conductances, and times become speeds. Regression to the mean: The process first notedSirFrancis Galton that each by to a less peculiarity in man is shared by his kinsmen, but on the average degree. al Hence the tendency, for example, for tall parents produce t l offspring but to who, on the average, are shorter than their parents. Research hypothesis: Synonym for alternutive hypothesis. Response variable: The variable of primary importance in psychological investigations, because the major objective is usually the effects of treatment to study and/or other explanatory variables this variable and to provide suitable modon els for the relationship betweenand the explanatory variables. it Robust statistics: Statistical procedures and tests that still work reasonably well even when the assumptions on which they are based are mildly (or perhaps for moderately) violated.Students t rest, example, is robust against departures from normality. Rounding: The procedure used for reporting numerical information to fewer decimal places than used during analysis. The rule generally adopted is that excess digits are simply discarded if the first of them is smaller than five; 127.249341 otherwise the last retained digit is increased by one. Thus rounding to three decimal places gives 127.249.
S

Sample: A selected subset of a population chosen by some process, usually with the objectiveof investigating particular properties of the parent population.

STATISTICAL

353

Sample size: The numberof individuals to be included in an investigation, usually chosenso that the study has a particular power of detecting an effect a of size particular size. Software available for calculating sample for many types is of study. Sampling distribution: The probability distribution of a statistic.For example, the sampling distribution of the arithmetic mean of samples of size n taken from anormal distribution with mean p and standard deviationQ is a normal p u distribution also with mean but with standard deviation / f i .

Sampling error: The difference between the sample result and the population characteristic being estimated. In practice, the sampling error can rarely be determined because the population characteristicis not usually known. With appropriate sampling procedures, however, it can be kept small and the invest gator can determine probable h i t s of magnitude. See also standard e m r . its Sampling variation: The variation shown by different samples of the same size from the same population. Saturated model: A model that contains main effectsand all possible interall actions between factors. Because such a model contains the same number of fit parameters as observations, it results in a perfect for a data set. Scatter diagram: A two-dimensional plot a sample bivariate observations. of of type ik The diagram an important aid in assessing whatof relationshipl n sthe is two variables.

S E Abbreviation for standard error.


Semi-interquartile range: Half the difference between the upper and lower quartiles. Sequential sums of squares: A term encountered primarily regression analyin sis for the contributions variables as they are added to the model aparticular of in sequence. Essentially the difference the residual sum squares before and in of after adding a variable. Significance level: The level of probability at which it is agreed that the null hypothesis will be rejected. It conventionally setat .05. is Significance test: A statistical procedure that when applied to a set of obserp vations results in a value relative to some hypothesis. Examples include the Student's ttest, the z test, and Wilcoxon's signedrank test.

354

APPENDIX A

Singular matrix: A square matrix whose determinantis equal tozero; a matrix whose inverseis not defined.

Skewness: Thelackof symmetryin aprobabilitydistribution.Usually quantified by the index,S, given by

takes the value zero for a symmetrical distribution.distribution is said to have A it thin al positive skewness when has a long ti at the right, and to have negative

where p2 and p 3 are the second third moments about the mean. The index and skewness whenit has a long thin tailthe left. to

Split-half method: A procedure used primarily in psychology to estimate the reliability of a test. l b o scores a e obtained from the same test, either from r alternative items, the so-calledodd-even technique, or from parallel sections of items. The correlation these scoresor some transformation of them gives of the required reliability. See also Cronbachsalpha. Square contingency table: A contingency table with the same number rows of as columns. Square matrix: A matrix with the same number rows as columns. Varianceof covariance matricesand correlation matrices r statistical examples. ae Square root transformation: A transformation of theform y = , often E more used to makerandom variables suspected to have a Poisson distribution suitable for techniques suchas analysis of variance by making their variances independent of their means. Standard deviation: The most commonly used measure of the spread of a set of observations. Itis equal to the square root of the variance. Standard error: The standard deviationof the sampling distribution a statiof of stic. For example, the standard error the sample mean ofn observations is U / ! , where uz is the varianceof the original observations. Standardization: A term used in a variety of ways in psychological research. The most common usage the context transforming a variable dividing is in of by to new of by its standard deviation give a variable with a standard deviation 1. Standard normal variable: A variable having a normu1 distributionwith mean zero and variance one.

STATISTICAL GLOSSARY
Standard scow:

355

Variable values transformed to zero mean and unit variance.

Statistic: A numerical characteristic of a sample, for example, the sample mean See and sample variance. also parameter. Students t distribution: The probability distributionof the ratio of a n o d variable with mean zero and standard deviation one, to the square root of a of chi-squared variable. In particular, the distribution the variable

where 2 is the arithmetic mean ofn observations from an o m l dismbution with mean , and S is the sample standard deviation. The shape of the disU tribution varies with n, and as n gets larger it approaches a standard normal distribution.
Students t tests: Significance tests for assessing hypotheses about population means. One versionis used in situations in which it is required to test whether the mean of a populationtakes a particular value. This is generally known as a single sample t test. Another version is designed to test the e u l t of the qaiy means oftwo populations. When independent samples available from each are as the t population, the procedure. is often known independent samples test and the test statisticis

where 2 1 and fz are the means of samples of sizenl and nz taken from each population, ands2 is an estimate of the assumed common variance given by

where S: and si are the two sample variances. If the null hypothesis of thee u l t of the two population means ism e , then qaiy t has aStudents t distribution with nl nz 2 degrees of freedom, allowing p values to be calculated. In addition to homogeneity, the test assumes that each population has an o m l distribution but is known to be relatively insensitive to departures from assumption. See also matchedpairs t test. this

+ -

Symmetric ma* A square matrix that is symmetrical about its leading diagonal; that is, a matrix with elements ai] such that ail = aji. In statistics, correlation matrices covariunce matrices of this form. and are

356

APPENDIX A

in relation to some Test statistic: A statisticusedto assess a particular hypothesis population. The essential requirementsuch a statistic a known distribution of is when the null hypothesis is true.
Tolerance: A term used in stepwise regression the proportionof the sum of for squares about the mean of an explanatory variable not accounted by other for variables already included in the regression equation. Small values indicate possible multicolliiearityproblems. Trace of a matrix: The sum of the elements on the main diagonal a square of matrir, usually denotedas MA).So, for example, ifA ( = : then &(A) = 4.

i)

Transformation: A change in the scale of measurement for some variable@). Examples are the square root transformation and logarithmic transformation. Two-sided test: A test in which thealternative hypothesis not directional, is for example, that one population mean is either above or below the other. See also one-sided test.
' p I error: The error that results Qe when the null hypothesis is falsely rejected.

Qpe IIerror: The error that results the null hypothesis is falsely accepted. when

U
Univariate data: Data involving a single measurement on each subject or patient. U-shapeddistribution: A probability distribution of frequencydistribution shaped more or less like a letter U, though it is not necessarily symmetrical. Such a distribution has greatest frequenciesat the two extremes of the its range of the variable. V Variance: In a population, the second moment about the mean. estimator of the population value provided by sz given by is

An unbiased

., , 2 where XI, xz, . . x are then sample observations and is the sample mean.

STATISTICAL GLOSSARY
Variancecovariance matrix: Synonymous with covariance mat&. Vector:

357

A matrix having only one row or column.

W
Wilcoxons signed rank test: A distribution free method for testing the difference between two populations using matched samples. The test is based on two the absolute differences of the pairs of characters in the samples, ranked according to size, with eachrank being given the sign of the difference. The test statistic is the of the positive sum

ranks.

z scores: Synonym for standard scores.

z test: A test for assessing hypotheses about population means when their variances are known. For example, for testing that the means of two populations are equal,that is H :1 = p2, when the variance of each population is known 0 p to be u2, the test statisticis

where R1 and f z are the means of samples of size nl and n2 from the two populations. If H is true, then z has a standard normal distribution. See also 0 Students t tests.

Appendix B
Answers to Selected Exercises

CHAPTER 1
12 One alternative explanationis the systematic bias that may be produced .. by always using the letterQ for Coke and the letterM for Pepsi. In fact, when the Coca Cola company conducted another study in which Coke was put into both glasses, one labeled and the other labeled the results M Q, showed that a majority of people chose the glass labeled in preference to the labeled Q. M glass
1.4. Thequotationsarefromthefollowingpeople: 1, FlorenceNightingale; 2, Lloyd George; 3, Joseph Stalin; 4, W. H. Auden; 5, Mr. Justice Streatfield; 6, Logan Pearsall Smith.

CHAPTER 2
2 . The graph in Figure 2.34 commits the cardinal sin of quoting data out of 3

context; remember that graphics often by omission, leaving out data sufficient lie for comparisons. Herea few more data points for other years in the area would be helpful, as would similar data for other areas where stricter enforcement of been introduced. speeding had not

358

ANSWERS TO SELECTED EXERCISES

359

10

1
* 8 t

* *

50

Knee-jointangle (deg)

70

90

FIG. B. I . Simple plot of data from ergocycle study.

CHAPTER 3
3.2. Here the simple plot of the data shown in Figure B1 indicates that the90" p u p appears to contain two outliers, and that the observations in 50" group the

in seem to split into two relatively distinct classes. Suchoffeatures would, the data general, have to be investigated further before any formal analysis was undertaken The ANOVA table for the data as follows. is
Source ss DF MS 11.52 .002 2 Between angles 90.56 45.28 106.16 27 3.93 Withinangles

Because the p u p s have a clear ordering,the between group variation might of be split into components, one degree freedom's representing variation that with is due to linear and quadratic trends.
3.3. The termp in the one-way ANOVA model is estimated by the grand mean of a, ., all the observations. The termsi i = 1,2, . . k are estimated by the deviation

of the appropriate group mean from the grand mean.


3.5. The ANOVA table for the discharge anxiety scores as follows. is
SS 8.53 54.66 2.02 27

Source Between methods Within methods

DF 2

MS

4.27 2.11

p .l4

360

APPENDIX B

2
1

30

IniUel anvlely m

92

34

FIG. 8.2. Plot of

final versus initial anxiety scores for wisdom tooth extraction data.

Figure B2 shows a plot of final anxiety score against initial anxiety score, with the addition of the regression line of final on initial for each method. The line for method 2 appears to have a rather different slope from the other two lines. Consequently, an analysis of covariance may be justified. not I ,however, you ignore then the analysis of covarianceresults follows. f B2, a e as r SS DF MS F p Source Initial anxiety 6.82 1 6.82 4.87 .036 Betweenmethods 19.98 9.99 7.13 .W3 2 Within methods 36.40 26 1.40 The analysis of variance for the difference, final score initial score, is as follows. Source SS DF MS F P Betweenmethods 48.29 24.14 2 13.20 .oooO Within methods 49.40 27 1.83 A suitable model that allows both age and initial anxiety covariates is be to where x and z represent initial anxiety and from this model are follows. as Source SS DF MS 1 6.82 Initial anxiety 6.82 0.78 1 0.78 Age Between methods 19.51 9.75 6.76 2 Withinmethods 36.08 25 1.44
yij

= CL + a i

Bl(Xij

- +B z ( z-~+ X) ~Z)
4.72 0.54

Eijk,

age, respectively. The results ANCOVA

p .039 .468 .W

ANSWERS TO SELECTED EXERCISES

36 1

CHAPTER 4
4.1. Try a logarithmic transformation. 4.2. The 95% confidence interval is given by
Q 1

- X , ) ft I Z S J ( 1 I R I )

+ (I/nz>,

where X1 and 22 are the mean weight losses for novice and experienced slimmers both usinga manual, S is the square root the error mean square of from the ANOVA of the data, tt2 is the appropriate valueof Students t with 12 degrees of freedom, and nl and n2 are the samplesizes in thetwo groups. Applyingthis formula to the numerical valuesgives

[-6.36

-(-1.55)]

f2.18&Xd-,

leading to the interval [-7.52, -2.101. 43. The separate analyses of variance for the three drugs give thefollowing results. Drug x Source Diet (D) Biofeed (B) D x B Error Drug Y Source Diet (D) Biofeed(B) D x B Error Drug Z Source Diet (D) Biofeedp) D x B

ss

DF

MS
294.00 864.00 384.00 126.90

294.00 864.00 384.00 2538.00 3037.50 181.50 541.50 2892.00

1 1 1 20
DF

F 2.32 6.81 3.03

.l4 .02 .l0

ss

MS
3037.50 181.50 541.50 144.60

F
21.01 1.26 3.74

P
C.001 .28 .07
P

1 1 1 20
DF 1

ss
2773.50 1261.50 181.50 3970.00

MS
2773.50 1261.50 181.50 198.50

F
13.97 6.36 0.91

Fmx

1 1 20

.001 .02 .35

The separate analyses give nonsignificant results for the diet x biofeed interaction, although for drugs X and Y , the interaction terms approachsignificance a t the 5% level. The three-way analysis given in the text demonstrates that the nature of this interaction is different for each drug.

362

APPENDIX B

4.11. The main effect parameters are estimated by the difference between row a

(column) mean and the grand mean. The interaction parameter corresponding to a particular cellis estimated as (cell mean - mean row column mean grand mean).
4.12. An ANCOVA of the data gives the following results.

Source Pretreatment 1 Pretreatment 2 35.69 Between treatments

ss

8.65

Err ro

27.57 222.66

DF 1 1 1 17

MS

8.65 0.66 2.72 35.69 27.57 2.10 13.10

p .43 .l2 .l7

CHAPTER 5
5.3. The ANOVA table for the data is

DF MS Source ss Between subjects 1399155.00 15 93277.00 Between electrodes 281575.43 70393.86 4 Error 1342723.37 22378.72 60 Thus usingthe nomenclature introduced the text,we have in
A2 U,

22378.72 = 14179.66, = 93277.00 -

8;

70393.86

8 = 22378.72. : So the estimateof the intraclass correlation coefficient is

16

22378.72 = 3000.95,

R=

14179.66

+ 3000.95 +22378.72 = 0.358.

14179.66

A complication with these data the two extreme readings on subject The is 15. reason given for such extreme values was that the subject had very hairyarms! Repeating theANOVA after removing this subject gives the following results.

DF MS Source ss Between subjects 852509.55 14 60893.54 Between electrodes 120224.61 30056.15 4 Error 639109.39 56 11412.67 The intraclass correlation coefficient 0.439. is now

A S E S TO SELECTED EXERCISES NW R

363

CHAPTER 6
6.1. Figure B3 shows plots of the data with the following models fitted (a) log vocabulary size = /?lage a x 2age2; (vocabulary size) = BO plage; (b) x 2agez p3age3. and (c) vocabulary size = BO plage terms fit. It appears that the cubic model provides the best Examine the residuals

from each model to check. is p log t . One might imagine that a possible model here = exp(-pt), suggesting geometric loss memory, Such model should result in linearity of log p of a in a plot versus t , that is; plot( )in Figure B4, but plainly does not happen. A model b this that does result in linearity plotp against log , although this is more difficult is to t to explain.
6.5. Introduce two dummy variables and x2 now defined as follows. x1 6.4. Figure B4 shows plots (a) p versus t ; ( ) p versus t ; and (c) p versus of b log

1 x x 1 2

Group 2 3 1 0 0 0 1 0

A multiple regression using these two variables gives the same ANOVA table as that in Display 6.9, but the regression coefficientsnow the differences between are a group mean and the mean of Group 3.

CHAPTER 7
7.1. Using the maximum score the summary measure the lecithin trial data as for

and applying at test gives these results: t = -2.68:45 degrees of freedom; p = .0102. A 95% confidence interval is (-6.85, -0.98).

7.5. Fitting the specified model with now the Placebo u p coded as 1 and the p Lecithin groupas 1 gives the following results.

Fixed effects t Estimate SE Parameters P .Ol 8.21 0.72 11.35 <O O Intercept 0.72 -3.28 .002 -2.37 Group . O O 0.58 0.12 4.73 < O l Visit 10.68 c.OOO1 1.301 0.12 Group x Visit Random effects &a = 4.54, &b = 0.61, &ab = -0.47, &c = 1.76

Io
v)

n
N

364.

*Q

8'0

SO

V0

20

uoguwPl

0
: : ' N

.o

365

366

APPENDIX B

CHAPTER 8
8. The signed rank statistic takes the value 40 with n = 9. The associated p . 1 value is 0.0391.

83 The results obtained the author for various numbersbootstrap samples .. by of are as follows.

N 95% Confidence Interval 1.039,7.887 200 1.012,9.415 400 800 1.012,8.790 1000 1.091,9.980
8.7.Pearsonscorrelationcoefficientis-0.717,witht = -2.91,DF=8,p = .020
for testing that the population correlation is zero. Kendallss is -0.600, withz = -2.415,~ = .016fortestingforindependence. Speannans correlation coefficient is -0.806, with z = -2.44, p = ,015 for testing for independence.

CHAPTER 9
91 The chi-squared statistic is 0.111, which with a single degree of freedom .. is has an associatedp value of 0.739. The required confidence interval (-0.042,
0.064).

93 An exact testfor the datagives a p value of less than .OooOOl. .. 98 Here because the paired nature the data, McNemars test should be used. .. of of p value of less than.05.The mothers The test statistic is 4.08 with an associated ratings appear to have changed, with a greater proportion of children rated as improved in year 2 than rated not doingas well. as

CHAPTER 10
1.. An Investigation of a numberof logistic regression models, including ones 01 that containing interactions between three explanatory variables, shows amodel the including each of the explanatory variables providesan adequate fit to the data. The fitted model is

ln(odds of being severely hurt) = -0.9401

+0.3367(weight) + 1.030(ejection) + 1.639(type),

where weight is 0 for small and 1 for standard, ejectionis 0 for not ejected and 1 for ejected, and type is 0 for collision and 1 forrollover. The 95% confidence

ANSWERS TO SELECTED EXERCISES

367

intervals for the conditional odds ratios of each explanatory variable are weight, 1.18, 1.66;ejection, 2.31, 3.40,type, 4.38, 6.06.Thus, for example, the odds of beiig severely hurt in a rollover accident are between 4.38 and 6.06 those for a collision.

1.. With four variables there are a host of models to dealwith. A good way to 02 begin is to compare the following t r e models: (a) m i effects model, that is, he an Brand Prev Soft Temp; ( )all firstader interactions, that is, model (A) b above Brand. Prev Brand. Soft Brand. Temp Prev. Soft Prev. Temp Soft. Temp; (c) all second-order interactions, that is, model (B)above Brand. Prev. Soft Brand. Prev. Temp Brand. Soft. Temp Prev. Soft. Temp. The likelihood ratio goodness-of-fit statistics forthese models are as follows.

+ +

+ +

+ +

Model 18 A 9 B 2 C

LRStatistic 42.93 9.85 0.74

DF

Model A does notdescribe the data adequately but models C do. Clearly, B and a model more complex than A, but possibly simpler than B, is needed. Consequently, progressin searching for the best model mightbe made either by forward selection of interaction terms to add to A or by backward elimination of terms
tJ
r

20r

2-

20-

211

12

13

14

15

16

l?

A w
FIG. 8.5. Linearandlogisticmodels

ation.

for probability of menstru-

368

APPENDIX B

10.8. The plot of probability of menstruation with the fitted linear and logistic regressions is shown i Figure B5. The estimated parameters (etc.) for the two n models are as follows.
Linear Parameter Estimate SE -2.07 0.257 Intercept 0.20 0.019 Age Logistic Intercept -19.78 0.848 -23.32 Age 1.52 0.065
t

-8.07 10.32 23.38

References

Agresti, A. (1996). An intmiuction to categoricaldata analysis. New York: Wlley. AiIkin, M. (1978). The analysisof unbalanced cross-classification.Jouml of the Royal Statistical Sociery. Series A, 141,195-223. Andersen, B. (1990). Methodological errors in medical research. Oxford: Blackwell Scientific. Anderson, N.H. (1963). Comparisonsof different populations: resistanceto extinction and transfer. PsychologicalRevim, 70,162-179. . Wisconsin: Universityof Wmonsin &ss. Bertin. J (1981). Semiology of graphics. Bickel, F! J Hammel,E A., and OConnell. J W. (1975). Sex bias in graduate admissions. Data from . . . . Berkeley. Science, 187.398-404. ondstatisticalmethodrforbehavioumlondsocialresearch. Boniface, D. R.(1995).Q~crimentdesign London: Chapman andHall. Box, G. E P (1954). Sometheorems on quadraticforms applied in the study of analysis of variance .. problems. I.Effects of inequality of variance andof comlations between errors in the two-way I classification.Annals of Mathemcal Statistics, 25.484498. Brown, H., and ReScott,R (1999).Applied mixed mo&k in medicine. Chichester,UK:Wdey. . Encyclopedia o f Cm. D.B.(1998). Multivariate Graphics. In F! Armirage and T Colton W.). biostoristics. Chichester,U K Wtley. Chambers, J M.. Cleveland,W. S., Kleiner. B ,andTukey, P A. (1983). Gmphicalmethodsfordata . . . analysis.London: Chapman and HalVCRC. S., and Price, B. (1991). Regression analysisby example. New York Wdey. Cleveland,W. S. (1985). The elements ofgmphing data. Summit. N J Hobart. Cleveland, W. S. (1994). Wsualizing data. Summit, NJ Hobart.

Chatteqee.

369

370

REFERENCES

Cochran, W. G. (1954). Somemethods for strengthening the common chi-squaret.Biometrics, IO, tss e 417451. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 2 0 . 3 7 4 . Collett, D. (1991).Modelling binarydatu. London: Chapman and W C R C . E .( 9 ) 1 Colman A. M. ( d ) 9 4.The companion encyclopediaofpsychology. London: Routledge. Cwk. R. D., and Weisberg, S. (1982). Residuals and influence in regression. London: Chapman and WCRC. Crutchfield,R. S. (1938). Efficient factorial design analysisof varianceillustrated in psychological and experimentation.Journal of Psychology, 5,339-346. Crutchfield, R. S., and T l a , E. C. (1940). Multiple variable design for experimentsinvolving om n interaction of behaviour. PsychologicalReviov,47.3842. Daly, F., Hand, D. J.. Jones, M. C., Lunn. A. D., and McConway, K.J. (1995). Elements OfStatistics. Wokingham:Addison-Wesley. . tests in repeated measures experiments. PsyDavidson, M.L (1972). Univariate versus multivariate chologicul Bulletin,77.446452. Davis, C. (1991). Semi-parametric and non-parametric methods the analysisof repeated measurefor ments with applications to clinical trials. S:atistics in Medicine,IO. 1959-1980. Diggle. P J., Liang. K, andZeger, S. L. (1994). AM/ysis of longitudinal &:U. Oxford Oxford . . University Press. Dmey, H .and Gromen. L (1967). Predictive validity and differential achievementon three MLA . . comparative Foreign Language Tests. Edueorional and Psychological Measurement, 27. 11271130. Dormer, A., and Eliasziw. M. (1987). Sample size requirements for reliability studies. Stutistics in Medicine, 6 , 4 4 1 4 8 . D m . G. (1989). Design and analysis ofreliabiliry studies: Statistical evaluation of measurement e m r s . London: Arnold. Efron. B , and libshirani. R. J. (1993). An introduction to the bootstrap. London:Chapmanand . WCRC. Eliasziw, M., and Dormer, A. (1987). A cost-function approach to the design of reliability studies. SIatistics in Medicine,6.647456. Everitt, B.S. (1968). Moments of the statistics kappa and weighted kappa. British Journal ofhiathematical and S:atisticul Psychology, 21.97-103, 19) Everitt, B. S. ( 9 2.The analysis of contingency tables. (2ndedition)London:Chapmanand Hall/CRC. Everitt, B.S. (1998). Cambridgedictionary of statistics, Cambridge University Press, Cambridge. Everitt, B. S., and G. (2001). Appliedmultivuriute data analysis. London: Arnold. Everitt, B.S., and pickles. A. (2000). Statistical aspects of the design and analysis o clinical trials. f London: Imperial College Press. Everitt, B. S., and Rabe-Hesketh, S. (2001). The analysis of medical data using S-PLUS. New York: SpMger. Everitt, B. S.. and Wykes, T. (1999). A dictionaryof stutistics forpsychologists. London: Arnold. Fleiss, J L. (1975). Measuring agreement between judges on the prcsence or absence of a trait. . Biometrics. 31,651459. of Fleiss, J. L. (1986). The design andanalysis clinical experiments.New York Waley. Fleiss. J. L., Cohen,J., and Everitt, B. S. (1969). Large samplestandard errnrs of kappa and weighted kappa. Psychological Bulletin,72.323-327. . Fleiss, J. L., and Cuzick, J (1979). The reliabitity of dichotomous judgements: unequal numbers of judgements per subject.Applied Psychological Measurement, 3,537-542. Fleiss, J. L ,and Tanur, J. M. (1972). The analysis of covariance in psychopathology.n M. Hammer, . I K.Salzinger, and S. Sutton ( d . .Psychopathology. New York Wdey. Es)

Dunn.

REFERENCES

37 1

Friedman, M. (1937). The use of ranks to avoid the assumption normality implicit in the analysisof of variance. Journal of the American Statistical Association,32,675-701.

Frison,L..andPocock,S.J.(1992).Repeatedmeasuresinclinicaltrials:analysisusingmeanssununary

statistics and its implication for design. Statistics in Medicine. 11,1685-1704. Gardner. M. J., and Altman. D G. (1986). Confidence intervals rather P-values: estimation rather . than b o hypothesis testing.Brirish MedicalJournal, 292.746-750. 38. . illness by questionnaire. Oxford: OxfordUniversity Goldberg. B. P (1972). Thedetection ofpsychiatric Press. Goldstein, H. (1995). Multilcvelstatisticalmodels. London: Arnold. Good, P (1994). Permutation tests: A practical guide to resampling methodsfor testing hypotheses. . New York Springer-Verlag. Gorbein, J. A., Lazaro,G. G., Little. R.J. A. (1992). Incomplete data in repeatedmeasures analysis. and Statistical Methodsin Medical Research,I, 275-295. Greenacre, M. J. (1984). Theory and application of correspondence analysis. London: Academic

Gaskill,H.V.,andCox,G.M.(1937).Pa~emsinemotionalnactions: 1.Respiration:theuseofanalysis o f varianceandcovarianceinpsychologicaldata. Journal of General Psychology, 16, 21-

press. Greenhouse.S.W..andGeisser,S.(1959).Onthemethodsintheanalysisofprofiledata.Psychometri~
24.95-112. Habeman. S. J (1974). The analysis offrequency data. . Chicago: University of Chicago Press. Hollander, M,, and Wolfe,D.A. (1999). Nonparametric statistical methods. New York. Wlley. Belmont, CA: Duxbury Press. Howell, D.C. (1992). Statistical methods forpsychology. Huynh, H., and Feldt. S. (1976). Estimates of the correction for degrees freedom for sample data L. of in randomised block and split-plot designs. Journal of Educational Statistics, I, 69-82. Johnson, V. E ,and Albert, J. H. (1999). Onfinal data modeling.New York Springer. . Kapor, M. (1981). Eflciency on ergocycle in relation to knee-joiht angle and drag. Unpublished master's thesis, Universityof Delhi, Delhi. K e s e l m a n , H. J., Keselman. J. C.,and Lix, L. M. (1995). The analysis of repeated measurements: univariatetests,multivariate tests or both? British Journal of Mathematical and Statistical Psychology, 48.319-338. Krause, A., and Olson, M. ( 0 0 . basics of S and S-PLUS. New York: Springer. 2 0 ) The Krzanowski. W. J. (1991). Principle of multivariate analysis. Oxford: Oxford University Press. Landis, J. R,and Koch. G.C. (1977). The measurement of observer agreementfor categorical d t . aa Biomemcs. 33,1089-1091. Lindman. H. R. (1974). Analysis of variance in complex e.rgerimental designs. San Francisco, California: Freeman. Lovie, A. D (1979). The analysis of variance in experimental psychology:1934-1945. British Journal . OfMarhematicalandStatistica1Psychology,32,151-178. Lundberg, G.A. (1940). The measurement of socio-economic status. American Sociological Review, 5,29-39. M m , H. B., and Whimey,D R.(1947). On a test of whether oneof two random variablesis stochas. tically larger than the other.Annals of Mathematical Staristics, 18.50-60. Matthews. J. N. S. (1993). A refinement to the analysis o f serial mesures using summary measures. Statistics in Medicine,12.27-37. Maxwell, S. E,and Delaney, H. D (1990). Designing experiments analysing data. Belmont, C A . . and Wadswonh. .. London: Chapman and McCullagh, and Nelder,J. A. (1989). Generalized linear models (2nd d ) Hall. McKay. R. J.. and Campbell, N. A. (1982a). Variable selection techniques in discriminant analysis. I. Description. British Journal of Mathematical and Statistical Psychology, 35.1-29.

. ' F

372

REFERENCES

McKay. R. l and Cambell, N.A. (1982b). Variable selection techniques in discriminant analysis. . . U. Allocation. British Journal OfMathematical andStatistica1 Psychology, 35.30-41. Statistical methodsfor research workers(4th d.). Edinburgh: Oliver and Boyd. McNemar, Q. (1962). Morgan, G. A., andGriego, 0.V. (1998). Eary use and interpretation of SPSS Windows. Hillsdale, for NI: Erlbaum. . of Series Nelder. J A. (1977). A reformulation linear models. Joumul ofthe Royal Stafistical Sociefy, A, J40,48-63. Nelder, J A., and Wedderbum, R. W.M. (1972). Generalized linear models. Journal ojthe Royal . Statistical Sociefy, Series J55,370-3&1. A, Nicholls, G. H , Ling, D. (1982). Cued speech and the reception of spoken language. Joumul of . and Speech and Hearing Research, 25,262-269. Novince, L. (1977). The contribution of cognifive resrructuring to the effectiveness of behavior reheard in modifiing social inhibition in females. Unpublished doctoral dissertation, University of Cincinnati. Oakes. M. (1986). Statistical inference: A commentary for the social and behuviouml sciences. Chichester:Wdey. Quine, S. (1975). Achievement orientation ofaboriginal and wbite AusIralian adolescents. Unpublished doctoral dissertation, Ausualian National University., Canberry. Rawlings, l 0. (1988). Applied regressiononnlysis. Belmont, C A Wadswonh and Bmks. . Robertson, C. (1991). Computationally intensive statistics. In. Lovie and A. D. Lovie (d..New P Es) developments in statistics forpsychology andf h e social sciences 49-80). London: BPS Books @p. and Routledge. Rosenthal, R.,and Rosnow,R. T. (1985). Contrust undysis. Cambridge: Cambridge University Press. Sauber, S. R. (1971). Approaches to precounseling and therapytraining: an investigation of its potential inluenceon process outcome. Unpublished doctoral disserlation, Florida State University, Tallahassce. and Schapim. S. S., WW M. B. (1965). An analysisof variance testfor normality (complete samples). Biometriku, 52,591-61 1. Schmid, C.F (1954). Handbook of graphic presentation. . New York: Ronald. Schouten, H. l. A. (1985). Statistical measurement of interobserver agreement. Unpublished doctoral dissertation, Erasmus University, Rotterdam. In E. Es) Scbuman, H., and Kalton, G. (1985). Survey methods. G. Lindzey and Aronson ( d . ,Hundbmk of socialpsychology (Vol.1. p. 635). Reading. MA: Addison-Wesley. Senn. S. J (1994a). Repeated measures in clinical trials: analysis using mean . summary statistics for J3.197-198. design. Statistics in Medicine, Senn. S. J (1994b). Testing for baseline balance clinical trials.Statistics inMedicine, 13,1715-1726. . in Senn, S. l. (1997). Statistical issues in drug development. Chichester: Wrley. Singer, B. (1979). Distribution-free metbods for non-parametric problems: a classified and selected and bibliography. British Joumul of Mathematical Statistical Psychology, 32.1-60. Stevens. J (1992). Applied multivariate statistics the social sciences.Hillsdale, NI: Erlbaum. . for Theil, H. (1950). A rank-invariant methodof linear and polynomial regession analysis. J. P m . Koa Ned. Akud, v. Weremch A.53,386-392. Tufte, E. R. (1983). The visual display of quantitative infomution. Cheshire,CP Graphics. Vetter, B. M.(1980). Working women scientistsand engineers. Science, 207.28-34. Wainer, H.(1997). visual revclationS. New York: Springer-Verlag. Westlund, K. B.,and Kurland, L.T. (1953). Studies in multiple sclerosis inW h p e g , Manitoba and New Orleans. Lousiana. American Joumul of Hygiene,57.380-396. Wkoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulktiit, l , 80-88. Wilkinson, L. (1992). Graphical displays. Statistical Methods in Medical Research,3 2 . I, - 5 Williams. K. (1976). The failure of Pearson's goodness of fit statistic. The Statisticiun, 25.49.

Index

Adjusted group mean, 81 Adjusted value, 130 Advantagesof graphical presentation. 22 Akaike informationcriterion (MC), 225,228 AU subsets regression.see variable selection methods in regression Analysis of covariance. 78-84 in factorial design,126-127 in longitudinal studies, 215-218 Analysis of variance, factorial design,7 1 12 9multivariate (MANOVA),84-90, 143-150 one-way. 65-70 for repeated measuredesigns, 136139 table, 68 ANCOVA, see analysisof covariance ANOVA. see analysis of vsriance

Assumptionsof F test, 66 Avoiding graphical distortion,5 - 4 35

B
Backward selection, see variable selection methods in regnssion Balanced designs. 116 Barchart, 2-4 22 Baseline measurement, 213 Benchmarks for evaluation of kappa values, 288 Between groups sum of squares. 68 Between subject factor, 134 Bimodal distribution,2 5 Bonferonni correction, 72 Bonferonni test, 72-73 Bootshap distribution.258 Bootstrap, 257-260 Box plots, 3 W 5 Bracket notationfor log linear models, 304 Bubble plot,4 0 4 l

3173

374
C
Calculating correction factors for repeated measures, 147 Case-contml study, 9 Categorical data, 267 Challenger space shuttle, 55.57 Chance agreement, 285 Change score analysis repeated measures, for 215-218 C hi -~q~ar cd 269,270 test, Compound symmetry, 139-141 Computationally intensive methods,253-260 Computers and statistical software. 16-17 Concomitant variables,80 Conditional independence,2% Conditioning plots,see Coplots Confidence intervalfor Wllcoxon signed-rank test, 244 Confidence intervalfor Wdcoxon-Mm-Whitneytest, 2 4 1 Confidence intervals,5 Contingency tables, chi-squared testfor independence in,270 residuals for, 275-277 r x c table. 270 thrrc-dimensional, 293-300 two-by-two. 271-275 two-dimensional, 268-275 Cook's distance, 194 Coplots, 44-46 Correction factorsfor repeated measures, 142-143 Greenhouse and Geisser correction factor, 142 Huynh and Feldt correction factor, 143 Correlation coefficients Kendall's tau, 250 Pearson's product moment,249 Spearman's rank,250-252 Correspondence analysis.277-280 Covariance matrix, 141 p l e d within-group, 87,88

INDEX
applications for Berkeley college, 291 110 blood pressure and biofeedback, blood pressure, family history and smoking, 130 bodyweight of rats, 213 brain mass and body mass for 62 species of animal, 27 caffeine and finger tapping, 76 car accidentdata, 323 cerebral tumours, 269 children's school performance, 292 248 consonants correctly identified, crime in theUSA, 182-183 crime ratesfor drinkers and abstainers, 23 Danish do-it-yourself. 294 days away from school, 117 depression scores, 263 detergents data, 324 diagnosis anddrugs, 268 estrogen patches inmtment of postnatal depression, 13 1 fecundity of fruit ilies. 64 field dependence anda reverse Sttooptask, 134 firefighters entrance exam, 281 gender andbelief in theafterlife, 290 GHQ data, 309 hair colourand eye colour,278 height, weight,sex and pulse rate,44,45 heights and ages f married couples, 2&30 o human fatness, 177 ice-cream sales, 42 improvement scores, 98 information about 10 states in the USA,60 janitor's and banker's ratings of socio-economic status, 291 knee joint angle and efficiency cycling, 93 of length guesses, 60 marriage and divorce a e ,206 rts maternal behaviour in rats, 94 measurements for schizophrenic patients, 265 membership of W o w n e Swimming Club, 290 memory retention, 206 menstruation, 325 mortality ratesfmm male suicides,61 oral vocabulary sizeof children at various ages, 162 organization and memory, 256 pairs of depressed patients, 271

D
Data sets, alcohol dependence and salsolinol excretion, 135 anxiety scoresduring wisdom teeth extraction, 95

INDEX
postnatal depression and childs cognitive development, 118 proportion of d e p s in science and engineering, 55.56 scorn, 247 psychotherapeutic attraction quality of childrens testimonies, 207 quitting smoking experiment,159 racial equality and the death penalty, 295 rat data. 101 scores in a synchronized swimming competition, 154 sex and diagnosis,271 skin resistanceand electrndet p , 243 ye slimming data. 107 smoking and performance. 126 social skills data,85 statistics students, 324 stressful life events, 240 suicidal feelings,271 suicide by age andsex, 294 survival o f infants and amount care, 2% of test scoresof dizygous twins,264 time to completion of examination scores.249 treatment of Alzheimers disease, 220 university admission rates, 46 verdict in rape cases,271 visits to emergency m m ,265 visual acuity and lens strength, 40 what peoplethink about the European community,279 WISC blncks data, 80 Deletion residual. 194 Dependence panel, 45 Deviance residual,319 Deviance, 311 Dichotomous variable,271 Dot notation,68 Dot plot, 23-25 Draughtsmans plot. synonymous with scatterplot matrix Dummy variable, 178-180

375
F
Factor variable, 14 Factorial design. 9 7 Fishers exact test, 273,274 Fitted marginals, 304 Fitted values,102.163.168 Five-number summary, 32 Forward selection, see variable selection methods in regression Fricdman test 247-249
G

Graphical deceptions andgraphical disasters, 48-57 Graphical display, 21-22 Graphical misperception,58 Greenhouse andGeisser comtion factor, see comtion factors

H
Hat matrix, 194 Histograms. 25-31 Homogeneityof variance, 137 Hotellings Tztest, 147-150 Huynh and Feldt comtion factor, see correction factors Hypergeometric distribution, 274

84-90.

E
Enhancing scatterplots,40-46 Exact p values,281 Expected mean squares, 138 Experiments. 10 Explanatory variable, 14 Exponential distribution,47

Idealizedresidual plots, 168 Independent variable,14 Index plot, 194 Inflating thelLpe1 error, 72 Initial examination data. 3-4 of Interaction plot, 108.113,124 Interactions,97-101 first order, 110 second order, 110 Interqudle range, 34 Intraclass correlation coefficient, 150-157

K
Kappa statistic,283-288 Kappa statistic, large sample variance, 287

376
Kelvin scale, 13 Kruskat-Wallis test,245-247

INDEX
Multicollinarity, 193-197 Multidimensional contingency tables, 293 Multilevel modelling, 66,219 Multiple comparisontechniques, 70-75 Multiple correlation coefficient, 172 Multiple regression and analysis of variance, eqivalence of,197-201 Multivariate analysis variance, see analysis of of variance Muhlal independence, 2%
N

L
Latent variable, 128 Latin square. 112 Least squares estimation, 163 Lie factor,52 Likelihood ratio statistic,297 Locally weighted regression, 253 Logistic regression. 306-317 Logistic transformation. 311 Longitudinal data, 209 Longitudinal designs.136

Nonoverlapping sums of squares. 120-121 N o d distributions, 46.49

M MdOWSC statistic. 185-187 k MANOVA, see analysis of variance Marginal totals.270 McNemars test,273.274 Measwment scales interval scales, 13 nominal scales, 12 ordinal scales, 12 Method of difference, 15 Misjudgement of of correlation, size Missing observations. 1 21 Models, 6-8 analysis of covariance, 81 in data analysis, 6-8 fixedeffects, 112-116 for contingency tables, 300-306 himhical, 219,304 linear, 67 logistic, 306-317 log-linear, 302-303 main effect, 102 minimal, 304 mixedeffects. 113,138 multiple linear regression, 171-172 one way analysis variance, 67 of multiplicative, 301 random effects,112-1 16,219-233
saturated 303

0
Observational studies, 9
Occams razor, 8

Odds ratio, 283.284 One-way design,63.65-70 Orthogonal polynomials, 76-78 Outside values, 34 Overparametenzed, 67

ratio scales, 13
54

P
Parsimony, 8 Partial F test, 187 Partial independence, 296 Partitioning of variance, 66 Pearson residuals, 19 3 Permutation distribution, 241 Permutationtests, 253-257 Pie chart,22-24 Pillai statistic, 128 Planned comparisons, 75-76 Pooling sums of squares, 102 Post hoc comparisons. 76 Power curves,216-218 Power, 18 Prediction using simple linear regression, 165 Predictor variables,14 Probability of falsely rejecting the null hypothesis. 65 Probability plotting, 46-48 normal probability plotting, 49

for two-way design,99

INDEX
Prospective study. 9 Psychologicallyrelevant difference, 18 pvalue. 2.4-6

377
Simple effects. 1 6 0 Sparse data in contingencytables, 280-283 Sphericity, 137,139-141 Spline smoothers.253 S-PLUS ANOVA, 92 aov. 93 Bootstrap, 263 chisq.test, 289 Chi-square test, 289 Compare samples,262 Coplots, 59 Counts and proportions.289 fisher.test, 289 Fishers exacttest, 289 Friedman Rank Test, 262 friedman.test, 262 glm, 322 graph menu,59 graphics palettes,59 help, 59 kruskal.test, 262 Kruskal-WallisRank Test, 262 Im, 205 h e , 234 W O V A , 92 mcnemar.test, 289 Mixed effects,233 Multiple Comparisons,92 outer, 263 pie, 59 Regression,204 Regression, logistic,322 Regression. log-linear, 322 Resample, 263 Signed Rank Test, 262 trellis graphics,59 wilcox.test. 262 Wkoxon Rank Test, 262 SPSS bar charts, 58 Chi-square, 289 Crosstabs, 289 General linear model, 91 GLM-Multivariate. 92 GLM-Repeated measures,158 graph menu, 58 Mann-Whitney U. 262 Nonparametric tests, 261 Regression,204 Regression, logistic,322

Q
Quasicxperiments, 10

R
raison d&e of statistics and statisticians,2 Random allocation. 11 Random intercepts andslopes model, 227 Random intercepts model,226 Regression to the mean,215 Regression, 161-169 automatic model selection in,187-191 all subsets, 183-187 diagnostics. 191-193 distribution-fee, 252-253 generalized linear,317-321 logistic, 306-317 multiple linear, 169-183 residuals, 167-169 simple linear, 162-169 with zero intercept, 1 6 1 6 7 Relationship between intraclass correlation coefficient and product moment comlation coefficient, 157 Residuals adjusted, 276 in contingency tables, 275-277 deletion, 194 standardized. 194 Response feature analysis, 210-218 Response variable, 14 Risk factor, 9
S

Sample size, determination 17-19 of, Scatterplot matrix,35-40 Scatterplots. 32-35 Scheff6 test, 73-75 Second-order relationship,297 Sequential sums of squares, 121 Shrinking family doctor,53 Significance testing.4-6

378
Standard normal distribution, 47 Standardized regressioncoefficients, 173-175 Standardized residual. 194 Stem-and-leaf plots,32-33 Students t-tests, 63-65 Summary measures, use of, 210-219 Surveys, 9
T

INDEX
U
Unbalanced designs, 116-126 Unique sums of squares, 121-122
V

Testing for independence in a contingency table, 270 Testing mutual independence in tbree a 298 dimensional contingency table, Testing partial independence in a three dimensional contingency table, 298 Three-dimensional contingency tables, 293-300 h s f o r m i n g data, 69 Trellis graphics. 4448 Trend analysis,76-78 hoxtwo contingency table,272-273 Two-dimensional contingency tables, 268-275 Qpe Isums of squares, see sequential sums of

Variable selection methods in regression, all subsets regression, 183-187 backward selection, 187-191 forward selection, 187-191 stepwise selection,187-191 Variance inEation factor,194-197

W
Weighted kappa, 288 Wkoxons signed ranks test,243-245 WlkOXOn-MaM-Whitneytest, 238-242 Witbin groups sum squares, 68 of W~thii subject factor. 133
Y

Qpe II sums of squares,see unique sums of I


squares

squares

Yatess continuity correction. 273