You are on page 1of 243

Ulf Olsson

Generalized Linear Models
An Applied Approach

Copying prohibited
transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher.

The papers and inks used in this product are environment-friendly.

Art. No 31023
eISBN10 91-44-03141-6
eISBN13 978-91-44-03141-5
© Ulf Olsson and Studentlitteratur 2002
Cover design: Henrik Hast

Printed in Sweden
Studentlitteratur, Lund
Printing/year 1 2 3 4 5 6 7 8 9 10 2006 05 04 03 02

Contents

Preface ix

1 General Linear Models 1
1.1 The role of models . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 General Linear Models . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Assessing the fit of the model . . . . . . . . . . . . . . . . . . 4
1.4.1 Predicted values and residuals . . . . . . . . . . . . . 4
1.4.2 Sums of squares decomposition . . . . . . . . . . . . . 4
1.5 Inference on single parameters . . . . . . . . . . . . . . . . . . 6
1.6 Tests on subsets of the parameters . . . . . . . . . . . . . . . 7
1.7 Diﬀerent types of tests . . . . . . . . . . . . . . . . . . . . . . 7
1.8 Some applications . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8.1 Simple linear regression . . . . . . . . . . . . . . . . . 8
1.8.2 Multiple regression . . . . . . . . . . . . . . . . . . . . 10
1.8.3 t tests and dummy variables . . . . . . . . . . . . . . . 12
1.8.4 One-way ANOVA . . . . . . . . . . . . . . . . . . . . . 13
1.8.5 ANOVA: Factorial experiments . . . . . . . . . . . . . 18
1.8.6 Analysis of covariance . . . . . . . . . . . . . . . . . . 21
1.8.7 Non-linear models . . . . . . . . . . . . . . . . . . . . 23
1.9 Estimability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.10 Assumptions in General linear models . . . . . . . . . . . . . 24
1.11 Model building . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.11.1 Computer software for GLM:s . . . . . . . . . . . . . . 24
1.11.2 Model building strategy . . . . . . . . . . . . . . . . . 25
1.11.3 A few SAS examples . . . . . . . . . . . . . . . . . . . 26
1.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

iii

. .2 Continuous response . .1. . 37 2. . . 32 2.iv Contents 2 Generalized Linear Models 31 2. . . 46 2. . . .3. . . . . . . . . . . . . . . . . .6 Response as a rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8. . . . . . . . . . . . . . . . . . 48 2. . .7 Ordinal response . 38 2. . . . .7 Numerical procedures . . .1. . . . . . .2 The binomial distribution . . . . . . . . . . . . . . . . . . . . . . . .5 Response as a count . . . . . . . . . .1. . . . . . . . . . 34 2. . . . . . . . . . . .10 Descriptive measures of fit . . . . 45 2. . . . . 33 2. 46 2. . 31 2. . . . . . . . .12 Exercises . . . . . . . . . . . .9.11 An application .1. . . . . .8.3. . . . . 42 2. . 35 2. . . . . . . . . . .4 Response as a proportion . . . . . . . . . . . . . . 42 2. . . . . .2 Generalized linear models . . . . . . . . . . . . . . . . . . . .6 Maximum likelihood estimation . . .1 Introduction . . .1 Wald tests . . . . . . . . . . . . . . . . . . . . . .4 The function b (·) . . .9. . . . . . . . .8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Akaike’s information criterion . . . . . . . . . . . . . . . . . .1. . . . . . .2 The generalized Pearson χ2 statistic . . 42 2. . . . . 36 2.9.1 Types of response variables . . . . . . . . . . 37 2. . 49 2. . .4. .8. . .4 The link function . . . . . . . . . . . . . . 40 2. . 37 2. . 44 2. . . 48 2. . . . . . . . . . . . . .8 Assessing the fit of the model . . .1 The deviance . . . . . . . 32 2. . .3 Score tests . . 47 2. . . . . . . . . . . . . . . .1. 38 2.1 The Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . .5 The linear predictor . 49 2. . 31 2. . 45 2. .4 The choice of measure of fit . . . . . . 35 2. . . . . . . . . . . . . . . . . . . .3 The Normal distribution . . . . .4 Tests of Type 1 or 3 . . . . . . . . . . . . . . . 53 c Studentlitteratur ° . . . . . . . .9. . 47 2.9 Diﬀerent types of tests . . . . 47 2. . . . . . . . .2 Likelihood ratio tests . . . . .3 The exponential family of distributions . . . . . . . .3 Response as a binary variable . . . .1 Canonical links .1. . . . . . . . . . .3. . . . . . . . . . . . . 50 2. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . 78 4. . . . .1 Pearson residuals .6 Exercises . . . . . .1 Models for overdispersion . . . . . . . . . . . 58 3.2 The Hat matrix . . . . . . . 59 3. . .1. . . . . . . .8. . . . . . . .5. . . . . . . . . . 83 c Studentlitteratur ° . . . . . . . .2 Normal probability plot . . . .2 The choice of distribution . . . 60 3.7 Non-convergence . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4 Models for continuous data 69 4. . . . . . . . 79 4.1 Plot of residuals against predicted values . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Simple linear regression . . . . . . . . . . . . . . . . .4. 55 3. . .2 Variance function diagnostics . . . . . . . . . . . . . . . . . .1. .3. . . . .4 Eﬀect on data analysis . . . . . . . . . 79 4. . 73 4. . . . . . . . . . . . .5 Anscombe residuals . . . . . . . . .2 Deviance residuals . . . .4. . .4 Likelihood residuals . . .Contents v 3 Model diagnostics 55 3. .1 GLM:s as GLIM:s . . . . . . . . . .1 Leverage . 69 4. . . . .3 The Gamma distribution . .5. . . . . . . . . . . . . 69 4. . . . . .5. . . . . .3. . . . . 75 4. . . . 64 3.4. . . . . . . . 57 3. . . . . . . . . 67 3. . . . . . . 59 3. 81 4. . . . . . .2 Cook’s distance and Dfbeta . .3. . . .4 Transformation of covariates . . . . . . . . . .5. . . .8. . . . . . . .3 Residuals in generalized linear models . . . . . . . . . . . . . . . . . . . . .4 Influence diagnostics . . . . .3. . . . . . . . . . . . . . 60 3. . . . . . . . . 67 3. . . . . . . . . 77 4. . . . .3. . . . . . . . . . . . . . . . . . . . . . . .6 The choice of residuals . . . .5 Model diagnostics . .8 Applications . 58 3. . . . . . . . . . . . . . . . . . . . . . . 55 3. . . . . 78 4. . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3. . . . . . . . . . . . . . . . 63 3. . . . 56 3. . . .3. . . . .3 Goodness of fit measures . . . . . .3 Plots of residuals against covariates . . . . . .2 Simple ANOVA . . .1 Introduction . . . . . . . . .3. 75 4. . . .8. . . .4 Influential observations and outliers . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . .1 The Chi-square distribution . . . . . . . . 73 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3. . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3. . . . . . . . . . . .4 The inverse Gaussian distribution . . . . . . 61 3. . 72 4. . . 62 3.2 The Exponential distribution . . . . . . . . . .9 Exercises . .6. . . . .8. . . . . . . . . . . 58 3. . . . 71 4. . . . . . .6 Overdispersion . . .3 Link function diagnostics . . . . .5 Partial leverage . . . . .3 An application with a gamma distribution . . . . . . 60 3. . 66 3. . . . 56 3. . . . . .3 Score residuals . . . . . . . . . . . . . . .1 Residual plots . . . .

. . . . . . 92 5. . . 97 5. .7. . . . . . . . . .7 Overdispersion in binary/binomial models . . . . . . . . 96 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 The complementary log-log link . . . . . . . . .1. . . . . . . . . . . 88 5. . . .3 An example of over-dispersed data . . . . . . .5 Multiple logistic regression . .2 Distributions for binary and binomial data . . . . . . 85 5. . 101 5. . . . . . . . . . . . . . . .1 The Bernoulli distribution . . .2 Modeling as a beta-binomial distribution . .1 Model building . . . . 104 c Studentlitteratur ° . . . . . . .8 Exercises . . . .5. . . . . .3 Model diagnostics .3 Probit analysis . . .2 The Binomial distribution . . . 102 5. . . . .7. . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5. . . . . . . .4 Logit (logistic) regression . 86 5. . . . . . . . . . .5. . . . . . . . . . . . . . . 89 5. . . . . . . . 87 5. . . . .5. . . . . . . 101 5. .1 The probit link .2. . . . . .1. .6 Odds ratios . 86 5. . . . . . . . . .7. . . . . . . . . . . . .1 Estimation of the dispersion parameter . . .1. . . . . 92 5. .1 Link functions . . . . . . . . . . .2 The logit link . . . . . . . . . . .vi Contents 5 Binary and binomial response variables 85 5.2 Model building tools . . . . . . . . . 85 5. . . . . . . 87 5. . . . . . . . . . . 91 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . 98 5. . . . .

. . . . . .5 Repeated measures: the GEE approach . 122 6. . .2 Types of independence . . . . . . . . . . . . . . . . . . .5. . . . . .5. . . . . . . . . . .2. . . .1 The multinomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8 Additional topics 157 8. . . . .11. . . . . . . . .1. . . . . . . . 118 6. . . . . .13 Exercises . . . 114 6.12 Diagnostics . . . 118 6. . . . . . . . . . . . . . . 162 8. . . .2. . . . . 159 8. . .11. . . . . . . . . . . . . . . . 148 7. . . .11 Overdispersion in Poisson models . .6. . . .2. . . . . . . .2 When independence does not hold . . . . . . .6 Relation to logistic regression . . . . . . . . . . . . . . . . . . . 163 8. . . . . .2. . . . . . . . . . . . . . . . . . . . . . . .3 Quasi-likelihood . . . . . . . . . . 122 6. . . 121 6. . . . . . . . . . . . . 113 6. . . 114 6. . . . . . . . . . . . . . . 145 7. . . . . . . . . .8 Poisson regression models . . . . . . . . . . . . . . .1 Arbitrary scoring . 119 6. . . . . . . . . . . . . . . . . . . . . . . . 137 7 Ordinal response 145 7.5. . . . . . . . . . . . . . . 158 8. . . . . . . . . . . . 112 6. . . .4 Quasi-likelihood for modeling overdispersion . . . .5 A Genmod example . 157 8. . . . . . . . . .3 Proportional odds .1. . . 150 7. . . . . . . . . . . . . . .4 Relation to contingency tables . . . . . . .2 RC models . 119 6. . . . . . .2 Distributions for count data . . . . .1 A three-way table . . . . .1 Binary response . . . . . . . . . . . .2 The product multinomial distribution . . . . . . . . . . . . . 113 6. . 148 7. . 165 c Studentlitteratur ° . . . . . . . . . . . . . .9 A designed experiment with a Poisson distribution . . .2 Modeling as a Negative binomial distribution . . .6.3 Genmod analysis of the drug use data . . .6 Exercises . . . . . . . . . . 126 6. . . . . . . . .5 Higher-order tables . . . . . . . . . . 153 7. . . . . . . . . . . . . . . . . . . . . . . . . . 115 6. . . .1 Variance heterogeneity . .5. . . . . . . . 131 6. . . . . . .3 Analysis of the example data . . . . . . . . . 129 6. . . . . . . . . . . . . . . . . . . . . . . . . 112 6.10 Rate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Nominal logistic regression . . . . . . 135 6. . . . . . . . . . . . . . . . . . . . . 120 6. . . . . .2 Survival models . . . . .1 A log-linear model for independence . . . . . 111 6.7 Capture-recapture data . . . . . . . . . . 133 6. . . . . . .Contents vii 6 Response variables as counts 111 6. . . . . . . . . . . . . . . .1 An example . . . . .3 The Poisson distribution . . . . 114 6. . . . . . . 133 6. . . . 134 6. . . . . . . . . . .4 Interpretation through Odds ratios . .4 Latent variables . . . . . . . . . . . . . 121 6. . . .1 Modeling the scale parameter . 117 6. . . .1 Log-linear models: introductory example .4 Testing independence in an r×c crosstable . . . . . .2.

. . . . . . . 189 Fisher’s scoring . . . . . . 188 Distributions with many parameters . . . . 186 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Some special types of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Appendix B: Inference using likelihood methods 187 The likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Exercises . . . . . . . . . . . . 184 Determinants . . . . . . . . . . . . . . . 185 Some statistical formulas on matrix form . . . . .6 Mixed Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 The rank of a matrix . . 187 The Cramér-Rao inequality . . . . . . . . 183 The inverse of a matrix . . . . . . . . . . . . . . . 183 Idempotent matrices . 180 Calculations on matrices . . 182 Multiplication by a matrix . . . . . . . . . . . . . . . . . 179 The dimension of a matrix . . . . . . . . . . . . . . 172 Appendix A: Introduction to matrix algebra 179 Some basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Generalized inverses . . . . . . . . . . . .viii Contents 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Multiplication by a scalar . . . . . . . . 189 Numerical procedures . . . . . . . . . . . . . . . . . . . . . 181 Matrix multiplication . . . . . . . . . . . . 180 The transpose of a matrix . 190 Bibliography 191 Solutions to the exercises 197 c Studentlitteratur ° . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Calculation rules of multiplication . . . . . . . 188 Properties of Maximum Likelihood estimators . . . . . . . . . . . . . . . . . . . . 168 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 The Newton-Raphson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

regression analysis and analysis of variance. is a special case of GLIM:s. The models are formulated in ix .. Particularly important is a knowledge of statistical model building. and much more. * Log-linear models * Generalized estimating equations In this book we will make an overview of generalized linear models and present examples of their use. and knowledge of basic calculus.. GLIM:s also include log-linear models for analysis of contingency tables. are mathematical prerequisites. Since many of the examples are based on analyses using SAS. We assume that the reader has a basic understanding of statistical principles. Generalized linear models General linear models Models for counts. Poisson regression. In Chapter 1 we summarize some results on general linear models. analysis of variance and analysis of covariance.Preface Generalized Linear Models (GLIM:s) is a very general class of statistical models that includes many commonly used models as special cases. * Regression analysis proportions etc * Analysis of Variance * Probit/logit regression * Covariance analysis * Poisson regression . Some knowledge of matrix algebra (which is summarized in Appendix A). probit/logit regression. some knowledge of the SAS system is rec- ommended. For example the class of General Linear Models (GLM:s) that includes linear regression. assuming equal variances and normal distributions.

In chapters 4—7 we consider applications for diﬀerent types of response variables. and the pioneering GLIM software. Anna Gunsjö. Chapter 3 covers model checking. Esbjörn Ohlsson. mixed models and analysis of survival data. and we discuss Maximum Likelihood estimation and ways of assessing the fit of the model. I would like to thank Gunnar Ekbohm. Tomas Pettersson and Birgitta Vegerfors for giving many useful comments. as counts and as ordinal response variables are discussed. the GLM procedure of the SAS package. like quasi-likelihood procedures. In this book we will let the acronym GLM denote ”General Linear Models”.x matrix terms. repeated measures models.se. which includes systematic ways of assessing whether the data deviates from the model in some systematic way. Jan-Eric Englund. c Studentlitteratur ° . Generalized linear models are introduced in Chapter 2. as binary/binomial variables. Several students and colleagues have read and commented on earlier versions of the book. Finally. Terminology in this area of statistics is a bit confused. The exponential family of distributions is discussed. Carolyn Glynn. They can be downloaded from the publishers home page which has address http://www. In particular. while we will let GLIM denote ”Generalized Linear Models”.studentlitteratur. Most of the data sets for the examples and exercises are available on the Internet. in Chapter 8 we discuss theory and applications of a more complex nature. and practical examples using the Genmod software of the SAS package are given. Response variables as continuous variables. This is also a way of paying homage to two useful computer procedures. This chapter provides the basic theory of generalized linear models.

Statistical packages like SAS. Some models can be seen as a theory about how the data were generated. GLM:s include regression analysis. Models play an important role in statistical inference. General Linear Models 1.1 The role of models Many of the methods taught during elementary statistics courses can be collected under the heading general linear models.. Minitab and others have standard procedures for general linear models. in fact. I'm only doing a few t tests. A model is a mathe- matical way of describing the relationships between a response variable and a set of independent variables. I'm not using any model. in the form of residuals. Some applied researchers are not aware that even their simplest analyses are.. analysis of variance.1. GLM. account for the possibility that the relationship is not perfect. Statistical models. But . This is done by allowing for unexplained variation. Other models are only intended to provide a convenient summary of the data. as opposed to determin- istic models. model based. 1 . and analysis of covariance.

2 1.   1 x11 · · · x1(p−1)  1 x21    X = .1) Models of type (1.   β0  β1    β = .   .3) In (1.  1 xn1 xn(p−1) is a known matrix of dimension n × p. + β p−1 xi(p−1) + ei (1.3). . called a design matrix that contains the values of the independent variables and one column of 1:s corresponding to the intercept. . ..1) are. . the observed value of the dependent variable y for observation number i (i = 1. xp−1 as yi = β 0 + β 1 xi1 + . approximations of the actual conditions. . .. at best.2) or in matrix terms y = Xβ + e. A model is seldom “true” in any real sense. 2. The best we can look for may be a model that can provide a reasonable approximation to reality.   y1  y2    y = .. x2 . 1..2. . However.. .2 General Linear Models In a general linear model (GLM).   .. some models are certainly better than others.  β p−1 c Studentlitteratur ° . . n) is modeled as a linear function of (p − 1) so called independent variables x1 . The role of the statistician is to find a model that is reasonable.   ..  yn is a vector of observations on the dependent variable. while at the same time it is simple enough to be interpretable. General Linear Models A way of describing a frequently used class of statistical models is Response = Systematic component + Residual component (1. (1.

e2i . In such models.3 Estimation Estimation of parameters in general linear models is often done using the method of least squares. β (1. we can still find a solution. estimation. It is common to assume that the residuals in e are independent. as estimators of the parameters of the model. prediction. such as Montgomery (1984) or Christensen (1996). b = (X0 X)−1 X0 y. to symbolize an estimator. We will briefly summarize some results on estimation and hypothesis testing in general linear models. (1. such as Draper and Smith (1998) or Sen and Srivastava (1990). is minimal. Some models do not contain any intercept term β 0 .  en is a vector of residuals.   . and textbooks in analysis of variance. General Linear Models 3 is a vector containing p parameters to be estimated (including the intercept). (1. or a combination of these. 1.. this sum of squares is 0 e0 e = (y − Xβ) (y − Xβ) . In matrix i terms.4) Minimizing (1. b. this yields. For normal theory models this is equivalent to Maximum Likelihood estimation. For a more complete description reference is made to standard textbooks in regression analysis. although the c Studentlitteratur ° .6) Throughout this text we will use a “hat”. normally distributed and that the variances are the same for all ei . The purpose of the analysis may be model building. The parameters Pare estimated with those values for which the sum of the squared residuals.5) If the matrix X0 X is nonsingular.4) with respect to the parameters in β gives the normal equa- tions X0 Xβ = X0 y. the leftmost column of the design matrix X is omitted. If the inverse of X0 X does not exist. and   e1  e2    e = . hypothesis testing.1.

9) The predicted values are the values that we would get on the dependent variable if the model had been perfect.1 Predicted values and residuals When the parameters of a general linear model have been estimated you may want to assess how well the model fits the data. X SST = (yi − y)2 . Assessing the fit of the model solution may not be unique.7) Alternatively we can restrict the number of parameters in the model by in- troducing constraints that lead to a nonsingular X0 X. i.4 Assessing the fit of the model 1. (1. (1.4 1.8) j=0 or in matrix terms y b b = Xβ.4. Formally. The diﬀerence between the observed value and the predicted value is the observed residual: ebi = yi − ybi . β (1. if all residuals had been zero.e.4. this is done as follows. i This can be subdivided as X X (yi − y)2 = (yi − ybi + ybi − y)2 (1. 1.2 Sums of squares decomposition The total variation in the data can be measured as the total sum of squares. We define the predicted value (or fitted value) of the response variable as p−1 X ybi = bj xij β (1. i i i c Studentlitteratur ° . We can use generalized inverses (see Appendix A) and find a solution as b = (X0 X)− X0 y. This is done by subdividing the variation in the data into two parts: systematic variation and unexplained variation.10) 1.11) i i X X X = (yi − ybi )2 + yi − y)2 + 2 (b (yi − ybi ) (b yi − y) .4.

R2 would be 1. (1. for example econometric model building. i SSe . the total sum of squares SST can be subdivided into two parts: X SSModel = yi − y)2 (b i and X SSe = (yi − ybi )2 . called the residual (or error) sum of squares. since R2 increases (or is unchanged) when new terms are c Studentlitteratur ° . However. models often have values of R2 very close to 1. When several models have been fitted to the same data.1. i X SSModel = (b 2 b 0 X0 y−ny2 with p − 1 df . For data where the predicted values ybi all are equal to the corresponding observed values yi . will be small if the model fits the data well. which is the variance of the residuals. It holds that X SST = (yi − y)2 = y0 y−ny 2 with n − 1 degrees of freedom (df). A descriptive measure of the fit of the model to data can be calculated as SSModel SSe R2 = =1− . It holds that 0 ≤ R2 ≤ 1. yi − y) = β i X SSe = b 0 X0 y with n − p df . In some applications. The sum of squares can also be written in matrix terms. R2 can be used to judge which model to prefer. M Se provides an estimator of σ2 . General Linear Models 5 The last term can be shown to be zero.12) SST SST R2 is called the coeﬃcient of determination. (yi − ybi )2 = y0 y−β i The subdivision of the total variation (the total sum of squares) into parts is often summarized as an analysis of variance table: Source Sum of squares (SS) df M S = SS/df Model SSModel = βb 0 X0 y−ny 2 p−1 M SModel Residual SSe = y0 y−βb 0 X0 y n−p b2 M Se = σ Total SST = y0 y−ny 2 n−1 These results can be used in several ways. In other applications models can be valuable and interpretable although R2 is rather small. Thus. It is not possible to judge a model based on R2 alone.

14) M Se This is compared to appropriate percentage points of the F distribution with (p − 1. Inference on single parameters added to the model. a test of the hypothesis that β 1 ..5 Inference on single parameters Parameter estimators in general linear models are linear functions of the observed data. .. The adjusted R2 decreases when irrelevant terms are added to the model.e. β p−1 are all zero) can be obtained as MSModel F = . this makes it possible to obtain the variance of any parameter estimator as ³ ´ X V ar βb = w2 σ 2 . Variance estimated without any model A formal test of the full model (i. (1.17) n−p b can now be estimated as The variance of a parameter estimator β j ³ ´ X Vdar β bj = 2 2 wij b .15) j i where wij are known weights.6 1. (1. n − p) degrees of freedom.. the estimator of any parameter β j can be written as X b = β wij yi (1. (1.16) j ij i The variance σ2 can be estimated from data as P 2 ebi b2 = i σ = M Se .5. Thus. model comparisons are often based on the adjusted R2 . If we assume that all yi :s have the same variance σ2 .13) n−p SST / (n − 1) This can be interpreted as 2 Variance estimated from the model Radj =1− . σ (1.18) i c Studentlitteratur ° . 1. (1. It is defined as n−1 ¡ ¢ M Se R2adj = 1 − 1 − R2 = 1 − .

e. with (n − p) degrees of freedom. β q = 0 by the F test (SSe2 − SSe1 ) /q F = (1.. in such models the diﬀerent parameter estimates are c Studentlitteratur ° .6 Tests on subsets of the parameters In some cases it is of interest to make simultaneous inference about several parameters. The diﬀerence SSe2 − SSe1 will be related to a χ2 distribution with q degrees of freedom. This will give an error sum of squares. A test of the hypothesis that the parameter β j is zero can be made by comparing b β j t= r ³ ´ (1. Similarly. in a model with p parameters one may wish to simultaneously test if q of the parameters are zero. the model with fewer parameters. where q is the number of parameters that are included in model 1.1.20) j j would provide a (1 − α) · 100% confidence interval for the parameter β j .21) SSe1 / (n − p) with (q.. We can now test hypotheses of type H0 : β 1 = β 2 =. Now estimate the parameters of the smaller model. 1. n − p) degrees of freedom. 1. For example.. r ³ ´ b ± t(1−α/2.7 Diﬀerent types of tests Tests of single parameters in general linear models depend on the order in which the hypotheses are tested. . SSe2 . Tests in balanced analysis of variance de- signs are exceptions.19) Vd ar β b j with the appropriate percentage point of the t distribution with n − p degrees of freedom. General Linear Models 7 This makes it possible to calculate confidence intervals and to test hypotheses about single parameters.n−p) Vd β ar β b (1. This can be done in the following way: Estimate the parameters of the full model. This will give an error sum of squares. but not in model 2. i. SSe1 . with (n − p − q) degrees of freedom.

while the remaining co- c Studentlitteratur ° . SS(B|A) and SS(AB|A. Type 2 means that the SS for each parameter is calculated as if the fac- tor had been added last to the model except that. loosely speaking. B). Type I SS are sometimes called sequential sums of squares. These SS cannot in general be computed by comparing model SS from several models. and finally the interaction SSAB|A. Type 1 means that the test for each parameter is calculated as the change in SSe when the parameter is added to the model.8.8 1. 1. Some applications independent. Unfortunately. SSA is calculated first as if the experiment had been a one-factor experiment. SAS handles this problem by allowing the user to select among four diﬀerent types of tests. In other cases there are several ways to test hypotheses. all these SS will be equal. B).e. this is not an infallible method. i. Type 4 diﬀers from Type 3 in the method of handling empty cells.1 Simple linear regression In regression analysis. Minitab gives the Type 3 SS as “Adjusted Sum of Squares”. If the experiment is balanced. all main eﬀects that are part of the interaction should also be included. an attempt to calculate what the SS would have been if the experiment had been balanced.B is obtained as the reduction in SSe when we also add the interaction to the model.8 Some applications 1. Type 3 is.8. incomplete experiments. In practice. tests in unbalanced situations are often done using Type 3 SS (or “Adjusted Sum of Squares” in Minitab). for interactions. in the order given in the MODEL statement. One problem with them is that the sum of the SS for all factors and interactions is generally not the same as the Total SS. If we have the model Y = A B A*B. SS(B|A) and SS(AB|A. This can be written as SS(A). (model: Y=A). Then SSB|A is calculated as the reduction in SSe when we run the model Y=A B. For the model Y = A B A*B this gives the SS as SS(A|B). These are often called partial sums of squares. the design matrix X often contains one column that only contains 1:s (corresponding to the intercept). The Type 3 SS are generally preferred when experiments are unbalanced.

The linear function fitted to these data is yb = −36. c Studentlitteratur ° .1 An experiment has been made to study the emission of CO2 from the root zone of Barley (Zagal et al.22) y4 1 x4 e4 Example 1.200 30 Emission 30 34.5 % 11. The emission of CO2 was measured on a number of plants at diﬀerent times after planting.1 units per time unit.830 35 20 41. including ANOVA table.1.069 24 45 15. The graph suggests that a linear trend may provide a reasonable approximation to the data. (1. is given below. General Linear Models 9 lumns contain the values of the independent variables. It can be concluded that the emission of CO2 increases significantly with time.09776X Emission Time R-Sq = 97. the rate of increase being about 2.351 38 10 24 29 34 39 Time One purpose of the experiment was to describe how y=CO2 -emission develops over time.765 30 35 28.730 35 25 35.7+2. the small re- gression model yi = β 0 + β 1 xi + ei with n = 4 observations can be written in matrix form as       y1 1 x1 µ ¶ e1  y2   1 x2  β 0  e2         y3  =  1 x3  β 1 +  e3  .255 24 40 26.7443 + 2.1x. 1993). A SAS regression output. A small part of the data is given in the following table and graph: Emission of CO2 as a function of time Y = -36. over the time span covered by the experiment. Thus.677 38 15 45.

85963 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT -36.63 0.V.32 0. For example.10 1.8.8. Root MSE EMISSION Mean 0.40858691 TIME 2. The regression model is then yi = β 0 + β 1 xi1 + β 2 xi2 + ei .7126999 R-Square C.056555 29. He bases his advice on multiple regression of y = Price of the wine at wine auctions with meteorological data as predictors. Base material was taken from “Departures” magazine.0001 Error 6 25. i = 1.3361798 992. The New York Times used the head- line “Wine Equation Puts Some Noses Out of Joint” on an article about Prof. Some applications Dependent Variable: EMISSION Sum of Mean Source DF Squares Square F Value Pr > F Model 1 992. Ashenberger. 6. .09776164 15. and that we have made n = 6 observations. In matrix terms this model is       y1 1 x11 x12 e1  y2   1 x21 x22     e2      β0    y3   1 x31 x32     =   β 1  +  e3  . . The variables in the data set below are: • Rain_W=Amount of rain during the winter. but the data are invented.2 Professor Orley Ashenfelter issues a wine magazine. c Studentlitteratur ° . • Av_temp=Average temperature.3361798 234. giving advice about good years.975065 6. (1.23)  y4   1 x41 x42   e4      β    y5   1 x51 x52  2  e5  y6 1 x61 x62 e6 Example 1.3765201 4. suppose that y may depend on two variables.2294200 Corrected Total 7 1017.0002 4.13695161 ¤ 1.0001 0. .2 Multiple regression Generalization of simple linear regression models of type (1. September/October 1990.1) to include more than one independent variable is rather straightforward. “Liquid assets”. .887412 2.74430710 -8.33 0.

91 10.25 0.1: Data for prediction of the quality of wine. A set of data of this type is reproduced in Table 1.17 9.008 Rain_H -0.05937 0. General Linear Models 11 Table 1.02767 2.11773 0.055 Av_temp 1.014 S = 3.94 0.001 Rain_W 0.18 0.14 40.3603 0. A multiple regression output from Minitab based on these data is as follows: Regression Analysis The regression equation is Quality = 48.4187 3.092 R-Sq = 91.9 + 0.60 c Studentlitteratur ° .36 Av_temp .6% R-Sq(adj) = 89.04010 -2. • y=Quality.118 Rain_H Predictor Coef StDev T P Constant 48.43 384.41 4.000 Residual Error 11 105.1.70 0.0.1.56 Total 14 1257. Year Rain_W Av_temp Rain_H Quality 1975 123 23 23 89 1976 66 21 100 70 1977 58 20 27 77 1978 109 26 33 87 1979 46 22 102 73 1980 40 19 77 70 1981 42 18 85 60 1982 167 25 14 92 1983 99 28 17 87 1984 48 24 47 79 1985 85 24 28 84 1986 177 27 11 93 1987 80 22 45 75 1988 64 25 40 82 1989 75 25 16 88 • Rain_H=Rain in the harvest season. which is an index based on auction prices.4% Analysis of Variance Source DF SS MS F P Regression 3 1152.0594 Rain_W + 1.15 0.

3 t tests and dummy variables Classification variables (non-numeric variables). would produce good wine. 2. i. a simple t test on data with two groups and three observations per group can be formulated as yij = µ + βdi + eij i = 1. di is a dummy variable that has value di = 1 if observation i belongs to group 1 and di = 0 if it belongs to group 2. while rats in the control group were coded as 0. To illustrate that the t test is actually a special case of a general linear model. ¤ 1. a high average temperature. Rats in the toluene group were given the value 1 on the dummy variable. and only a small amount of rain at harvest time. we analyzed these data with Minitab using regression analysis with Group as a dummy variable. Here. and eij is a residual. 3. The concentrations in the striatum region of the brain are given in Table 1. µ is a general mean value. It appears that years with much winter rain. researchers mea- sured the concentration of Dopamine in the brains of six control rats and of six rats that had been exposed to toluene. The interest lies in comparing the two groups with respect to average Dopa- mine level. as measured by the price. The variable Rain_W is not quite significant but would be included in a predictive model. whether β is diﬀerent from 0. In the t test situation we want to examine whether µ1 is diﬀerent from µ2 . This model can be written in matrix terms as       y11 1 1 e11  y12   1 1       µ ¶  e12   y13   1 1  µ  e13         y21  =  1 0  β +  e21  .2. The Minitab output of the regression analysis is: c Studentlitteratur ° . groups or blocks can be included in the model as so called dummy variables. i. j = 1.3 In a pharmacological study (Rea et al.12 1. as variables that only take on the values 0 or 1. such as treatments. For example. According to this model. (1.24)        y22   1 0   e22  y23 1 0 e23 Example 1. 2. The size and direction of this relationship is given by the estimated coeﬃcients of the regression equation. Some applications The output indicates that the three predictor variables do indeed have a relationship to the wine quality.e. This is often done as a two sample t test. the population mean value for group 1 is µ1 = µ + β and the population mean value for group 2 is simply µ2 = µ.e.8. 1984).8.

7168 0.2587 2. c Studentlitteratur ° .8.7168. The size of this group eﬀect is estimated as the coeﬃcient β b = 0. a one-way ANOVA model with three treatments.397 2.717 Group Predictor Coef StDev T P Constant 1.820 2.843 1.781 2.4 One-way ANOVA The generalization of models of type (1.4% R-Sq(adj) = 37.020 Residual Error 10 2.68 0.90 + 0.539 2.5500 The output indicates a significant Group eﬀect (t = 2.38 0. and that the t test given by the regression routine does actually give the same results as a t test performed according to textbook formulas.77. General Linear Models 13 Table 1.4482 R-Sq = 43.000 Group 0.803 1.911 1.8% Analysis of Variance Source DF SS MS F P Regression 1 1.772 = 7.8987 0. This leads to a simple oneway analysis of variance (ANOVA) model.990 Regression Analysis The regression equation is Dopamine level = 1.420 1.1. This 1 means that the toluene group has an estimated mean value that is 0. Also note that the F test in the output is related to the t test through t2 = F : 2. The reader might wish to check that this calculation is correct. p = 0.020 S = 0.5416 1.77 0.0084 0.803 2. ng/kg Toluene group Control group 3.7168 units higher than the mean value in the control group.020). Thus.1830 10.5416 7. Dopamine.2: Dopamine levels in the brains of rats under two treatments. These two tests are identical.68.2008 Total 11 3.314 1. we would need one more column in X (one new dummy vari- able) for each new group.24) to more than two groups is rather straightforward.464 1. ¤ 1.

can be written as yij = µ + β i + eij . can be obtained as y i − y i0 t= r ³ ´ MSe n1i + 1 ni0 c Studentlitteratur ° .26) = µ + β i di + eij . If this test is significant. a test of the hypothesis that two groups have equal mean values. i. If we know the values of d1 and d2 the group membership is known so d3 is redundant and can be removed from the model. 3. d2 and d3 such that di = ½ 1 for group i . i = 1. 2. 2 (1. j = 1.14 1. The model can now be written as 0 otherwise yij = µ + β 1 d1 + β 2 d2 + β 3 d3 + eij (1. In fact. this estimator should be used in such group comparisons if the assumption of equal variance seems tenable. 2. j = 1. A pairwise comparison between two group means.e. Since the ANOVA provides an estimator σ b2e = MSe of the residual variance 2 σe .8. 3. the model can be written in matrix terms as       y11 1 1 0 e11  y12   1 1 0     e12      µ    y21   1 0 1     =   β 1  +  e21  (1.27)  y22   1 0 1   e22      β    y31   1 0 0  2  e31  y32 1 0 0 e32 Although there are three treatments we have only included two dummy vari- ables for the treatments.e. we have chosen the restriction β 3 = 0. any combination of two of the dummy variables is suﬃcient for identifying group membership so the choice to delete one of them is to some extent arbitrary. i.25) We can introduce three dummy variables d1 . Follow-up analyses One of the results from a one-way ANOVA is an over-all F test of the hy- pothesis that all group (treatment) means are equal. 2 Note that the third dummy variable d3 is not needed. Some applications each with two observations per treatment. it can be followed up by various types of comparisons between the groups. After removing d3 . i = 1.

Problems when the number of comparisons is large After you have obtained a significant F test. Thus. For example.28) i P where we define the weights hi such that hi = 0.29) i b is and the estimated variance of L ³ ´ X h2 Vdar L b = M Se i . In some cases it may be of interest to do comparisons which are not simple pairwise comparisons. 3 and 4. (1. we may want to compare treatment 1 with the average of treatements 2. General Linear Models 15 with degrees of freedom taken from M Se . A confidence interval for the dif- ference between the mean values can be obtained analogously.5 or 3. 5 significant results. For example. the 5% level you may end up with a number of significant results even if all the null hypotheses are true. the over-all significance level of all tests (the experimentwise error rate). The general solution is to apply a stricter limit on what we should declare “sig- nificant”. The SAS procedure GLM includes 16 diﬀerent c Studentlitteratur ° . Nelder (1971) states “In my view.1.30) i ni This can be used for tests and confidence intervals on contrasts.0 instead. multiple comparison methods have no place at all in the interpretation of data”. say. i. If you make many tests at. (1. there may be many pairwise com- parisons or other contrasts to examine.0. the probability to get at least one significant result given that all null hypotheses are true. on the average. other authors have suggested various methods to protect against mass significance. If a single t test would be significant for |t| > 2. If you make 100 such tests you would expect. The contrast can be i estimated as X b= L hi y i . even if the significance level of each individual test is 5% (the so called comparisonwise error rate). For example. This is the problem of mass significance. (1. we could use the limit 2.e. However. is larger. We can then define a contrast in the treatment means as L = µ1 − µ2 +µ33 +µ4 . There is some controversy whether mass significance is a real problem. in a one-way ANOVA with seven treatments you can make 21 pairwise comparisons. A general way to write a contrast is X L= hi µi .

Example 1.19 Diatrizoate 20.10 Ringer 0.87 Ringer 0.40 Diatrizoate 20.22 Mannitol 9. One variable that was studied was the urine production.47 Omnipaque 16. This means that each individual test is made at the significance level α/c.03 Ultravist 15.30 Hexabrix 4.22 Hexabrix 6.3: Change in urine production following treatment with diﬀerent contrast media (n = 57).94 Hexabrix 1.78 Hexabrix 5.90 Ultravist 15.64 Hexabrix 3.03 Mannitol 6.75 Isovist −0.98 Hexabrix 0.79 Diatrizoate 7.77 Mannitol 9.11 Mannitol 7.37 Omnipaque 6.51 Mannitol 10.96 Omnipaque 10.51 Hexabrix 2.80 Ringer 0.3 shows the change in urine production of each rat before and after treatment with each medium.4 Liss et al (1996) studied the eﬀects of seven contrast media (used in X-ray investigations) on diﬀerent physiological functions of 57 rats.51 Ultravist 12.44 Ringer 0. A simple but reasonably powerful method is to use Bonferroni adjustment. The procedure GLM in SAS produced the following result: c Studentlitteratur ° .16 1.38 Isovist 1.99 Ultravist 7.92 Isovist 2.8.08 Omnipaque 9.75 Isovist 2. This analysis is a oneway ANOVA situation. Medium Diﬀ Medium Diﬀ Medium Diﬀ Diatrizoate 32.07 Ultravist 6.55 Hexabrix 7.34 Ultravist 3.85 Isovist 0.11 Ultravist 12.00 Omnipaque 1. where α is the desired over-all level and c is the number of comparisons you want to make. Some applications Table 1.48 Isovist 4.08 Ultravist 5.77 Ringer −0.04 Ringer 0.22 Mannitol 14.14 Omnipaque 4.53 Hexabrix 4.88 Omnipaque 16.68 Isovist −0.18 methods for deciding which limit to use. Table 1.74 Ringer 0.63 Omnipaque 7.35 Isovist 2.52 Mannitol 0.16 Mannitol 5. It is of interest to compare the contrast media with respect to the change in urine production.11 Mannitol 4.06 Omnipaque 8.11 Omnipaque 3.10 Diatrizoate 25.58 Isovist 0.

0877969 R-Square C.0001). For example.9722541 297.24365278 B -3.0001 There are clearly significant diﬀerences between the media (p < 0. and are not unique estimators of the parameters.0001 2. Root MSE DIFF Mean 0. General Linear Models 17 General Linear Models Procedure Dependent Variable: DIFF DIFF Sum of Mean Source DF Squares Square F Value Pr > F Model 6 1787.01817243 Ringer -9. This is as expected for an ANOVA model: not all dummy variables can be included in the model.73 0. To find out more about the nature of these diﬀerences we requested Proc GLM to print estimates of the parameters.69787500 B -4.46 0. with the mean value for Ultravist.06740338 Omnipaque -1.0001 Error 50 905.06740338 Mannitol -2. some method for protection against mass significance might be considered. i.9953757 16.1155428 18.95963 4.75 0.8668596 Source DF Type III SS Mean Square F Value Pr > F MEDIUM 6 1787.90787500 B 6.2882 2.1023109 Corrected Total 56 2693. The procedure excludes the last dummy variable. we should take a look at how the data behave.06740338 Isovist -8.46 0.48412500 B 4.20200665 Ultravist 0.0001 2.50425691 MEDIUM Diatrizoate 11.e.1. Estimates followed by the letter 'B' are biased.2546811 6. . All other estimates are comparisons of the estimated mean value for that medium. we can prepare a boxplot of the distributions for the diﬀerent c Studentlitteratur ° .42554139 Hexabrix -5.V.0059 2. Since this can result in a large number of pairwise comparisons (in this case. The following results were obtained: T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 9.663912 61.99 0.00000000 B . estimates of the coeﬃcients β i for each of the dummy variables.94731944 B -2. Note that Proc GLM reports the X0 X matrix to be singular. Before we close this example.51817500 B -0.59 0.40 0. NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations.21920833 B -1. Least squares estimates of the mean values for the media can be calculated and compared.88 0.4 along with indications of significant pairwise diﬀerences using Bonferroni adjustment.0002 2. .0001 1.9953757 16. The least squares means are given in Table 1. setting the parameter for Ultravist to 0. 7 · 6/2 = 21 comparisons).9722541 297.4554 2.07 0.

In matrix terms. Ultra. n. − Ringer * * * * n.s.8. the assumption of equal variance. − Mannitol * n. The model is yijk = µ + αi + β j + (αβ)ij + eijk . This suggests that one assumption underlying the analysis. (1.1.66 0.31) i = 1.69 3.s. This feature can be illustrated by considering a factorial experiment with factor A (two levels) and factor B (three levels).e.8. Isovist Ringer zoate vist paque tol brix Mean 21. j = 1. The plot indicates that the variation is quite diﬀerent for the diﬀerent media. n. Some applications Table 1. 2. 3. for the contrast media experiment.39 9. may be violated.s. c Studentlitteratur ° . This boxplot is given in Figure 1.s. k = 1.96 1.4: Least squares means.91 8.18 1. n. Omni. and pairwise comparisons between treatments. The number of non-redundant dummy variables equals the number of degrees of freedom for the eﬀect. with a large variation for Diatrizoate and a small variation for Ringer (which is actually a placebo). 2.21 Diatrizoate − Ultravist * − Omnipaque * n. The dummy variables that correspond to the interaction terms would then be constructed by multiplying the corresponding main eﬀect dummy variables with each other. i. 2 The number of dummy variables that we have included for each factor is equal to the number of factor levels minus one. Hexa.s.5 ANOVA: Factorial experiments The ideas used above can be extended to factorial experiments that include more than one factor and possible interactions. n. − Hexabrix * n.s.s.39 7. We will return to these data later to see if we can make a better analysis. − Isovist * * * n.s. Diatri.s. − media. ¤ 1. n. Manni. the last dummy variable for each factor has been excluded.s. and where we have two observations for each factor combination.

1. General Linear Models 19

30

20
Diff

10

0
Diatrizoate Hexabrix Isovist MannitolOmnipaque Ringer Ultravist

Medium

Figure 1.1: Boxplot of change in urine production for diﬀerent contrast media.

     
y111 1 1 1 0 1 0 e111
 y112   1 1 1 0 1 0   e112 
     
 y121   1 1 0 1 0 1   e121 
      
 y122   1 1 0 1 0 1  µ  e122 
     
 y131   1 1 0 0 0 0   α1   e131 
      
 y132   1 1 0 0 0 0   β1   e132 
 =  + . (1.32)
 y211   1 0 1 0 0 0   β2   e211 
      
 y212   1 0 1 0 0 0   (αβ)   e212 
    11  
 y221   1 0 0 1 0 0  (αβ)  e221 
    12  
 y222   1 0 0 1 0 0   e222 
     
 y231   1 0 0 0 0 0   e231 
y232 1 0 0 0 0 0 e232

Example 1.5 Lindahl et al (1999) studied certain reactions of fungus myce-
liae on pieces of wood by using radioactively labeled 32 P . In one of the
experiments, two species of fungus (Paxillus involutus and Suillus variegatus)
were used, along with two sizes of wood pieces (Large and Small); the response
was a certain chemical measurement denoted by C. The data are reproduced
in Table 1.5.

These data were analyzed as a factorial experiment with two factors. Part of
the Minitab output was:

c Studentlitteratur
°

20 1.8. Some applications

Table 1.5: Data for a two-factor experiment.

Species Size C Species Size C
H Large 0.0010 S Large 0.0021
H Large 0.0011 S Large 0.0001
H Large 0.0017 S Large 0.0016
H Large 0.0008 S Large 0.0046
H Large 0.0010 S Large 0.0035
H Large 0.0028 S Large 0.0065
H Large 0.0003 S Large 0.0073
H Large 0.0013 S Large 0.0039
H Small 0.0061 S Small 0.0007
H Small 0.0010 S Small 0.0011
H Small 0.0020 S Small 0.0019
H Small 0.0018 S Small 0.0022
H Small 0.0033 S Small 0.0011
H Small 0.0015 S Small 0.0012
H Small 0.0040 S Small 0.0009
H Small 0.0041 S Small 0.0040

General Linear Model: C versus Species; Size
Analysis of Variance for C, using Adjusted SS for Tests

Species 1 0.0000025 0.0000025 0.0000025 0.93 0.342
Size 1 0.0000002 0.0000002 0.0000002 0.09 0.772
Species*Size 1 0.0000287 0.0000287 0.0000287 10.82 0.003
Error 28 0.0000742 0.0000742 0.0000027
Total 31 0.0001056

The main conclusion from this analysis is that the interaction Species ×
Size is highly significant. This means that the eﬀect of Size is diﬀerent for
diﬀerent species. In such cases, interpretation of the main eﬀects is not
very meaningful. As a tool for interpreting the interaction eﬀect, a so called
interaction plot can be prepared. Such a plot for these data is as given in
Figure 1.2. The mean value of the response for species S is higher for large
wood pieces than for small wood pieces. For species H the opposite is true:
the mean value is larger for small wood pieces. This is an example of an
interaction. ¤

c Studentlitteratur
°

1. General Linear Models 21

Interaction Plot - Data Means for C

Species
H
0.0035 S
Mean

0.0025

0.0015

Large Small
Size

Figure 1.2: Interaction plot for the 2-factor experiment.

1.8.6 Analysis of covariance
In regression analysis models the design matrix X contains quantitative vari-
ables. In ANOVA models, the design matrix only contains dummy variables
corresponding to treatments, design structure and possible interactions. It
is quite possible to include a mixture of quantitative variables and dummy
variables in the design matrix. Such models are called covariance analysis,
or ANCOVA, models.
Let us look at a simple case where there are two groups and one covariate.
Several diﬀerent models can be considered for the analysis of such data even
in the simple case where we assume that all relationships are linear:

1. There is no relationship between x and y in any of the groups and the
groups have the same mean value.
2. There is a relationship between x and y; the relationship is the same in
the groups.
3. There is no relationship between x and y but the groups have diﬀerent
levels.
4. There is a relationship between x and y; the lines are parallel but at
diﬀerent levels.

c Studentlitteratur
°

22 1.8. Some applications

5. There is a relationship between x and y; the lines are diﬀerent in the
groups.

These five cases correspond to diﬀerent models that can be represented in
graphs or in formulas:
14 14

12 12

10 10

8 8

6 6

4 4

2 2

0 0

Model 1: yij = µ + eij Model 2: yij = µ + βx + eij
14 14

12 12

10 10

8 8

6 6

4 4

2 2

0 0

Model 3: yij = µ + αi + eij Model 4: yij = µ + αi + βx + eij
14

12

10

8

6

4

2

0

Model 5: yij = µ + αi + βx + γ · di · x + eij
Model 5 is the most general of the models, allowing for diﬀerent intercepts
(µ + αi ) and diﬀerent slopes β + γdi , where d is a dummy variable indicating
group membership. If it can be assumed that the term γdi is zero for all i,
then we are back at model 4. If, in addition, all αi are zero, then model 2 is
correct. If, on the other hand. β is zero, we would use model 3. If finally β
is zero in model 2, then model 1 describes the situation. This is an example
of a set of models where some of the models are nested within other models.
The model choice can be made by comparing any model to a simpler model
which only diﬀers in terms of one factor.

c Studentlitteratur
°

The same kind of ambiguity holds also for other parameters of the model. Such functions. (1. Thus. α1 . β 3 . (αβ)22 .9 Estimability In some types of general linear models it is impossible to estimate all model parameters.33) can be written as yi = β 0 + β 1 xi + β 2 ui + β 3 vi + ei (1. This model contains a total of 12 parameters: µ. k = 1. 2. β 1 . We will not consider such models. α2 . As an example. 2.33) Such models are simple to analyze using general linear models. β 2 . are called estimable functions. A linear combination of model parameters is estimable if it can be written as a linear combination of expected values of the observations. which are called intrinsically nonlinear. c Studentlitteratur ° . three levels of factor B and two replications can be written as yijk = µ + αi + β j + (αβ)ij + eijk . Formally.36) In this model it would be possible to replace µ with µ + c and to replace each αi with αi − c. but only 6 of the parameters can be estimated. 3. Some models can be transformed into a linear form by a suitable choice of transformation. if they exist. the model y = eβ 0 +β 1 x can be made linear by using a log transformation: log (y) = β 0 + β 1 x. like y = β 0 + β 1 eβ 2 x + e. (αβ)13 . Other models can be linear in the parameters. or nonlinear in the parameters. (αβ)21 . As noted above.34) which is a standard multiple regression model.35) i = 1. General Linear Models 23 1. j = 1. Models of this type can be handled using standard GLM software. For example. (1. computer programs often solve this problem by restricting some parameters to be zero.1. A model can contain non-linear functions of the parameters. (1.7 Non-linear models Models can be non-linear in diﬀerent ways. 1. (αβ)12 . like yi = β 0 + β 1 xi + β 2 x2i + β 3 exi + ei . if we denote ui = x2i and vi = exi then the model (1. (αβ)11 . each transformation of x is treated as a new variable.8. but nonlinear in the variables. or to use some other restriction on the parameters. However. where c is some constant. It is then necessary to restrict some parameters to be zero. 2. it may be possible to estimate certain functions of the parameters in a unique way. a two-factor ANOVA model with two levels of factor A. and (αβ)23 .

1. µi·· is also estimable. It holds that µij· = E (yijk ) = µ + αi + β j + (αβ)ij (1.e. For example. the residuals are homoscedastic. 3 This is a linear function of cell means. The residuals are assumed to follow a Normal distribution. c Studentlitteratur ° . Assumptions in General linear models Let us denote with µij· the mean value for the treatment combination that has factor A at level i and factor B at level j.1 Computer software for GLM:s There are many options for fitting general linear models to data.10.24 1. The residuals are assumed to be independent. Since the cell means are estimable. independent of X. i.11. Since similar tools are used for generalized linear models. the expected value of all observations with factor A at level i can be written as µ11· + µ12· + µ13· µi·· = . any linear function of the µij :s is also estimable. most statistical packages have routines for general linear models that automatically construct the appropri- ate set of dummy variables. In addition. One option is to use a regression package and leave it to the user to construct appropriate dummy variables for class variables.11 Model building 1. reference is made to Chapter 3 for details. This function is estimable. Diﬀerent diagnostic tools have been developed to detect departures from these assumptions.37) which is a linear function of the parameters.10 Assumptions in General linear models The classical application of general linear models rests on the following set of assumptions: The model used for the analysis is assumed to be correct. The residuals are assumed to have the same variance σ 2e . However. 1.

treatments. Operator Explanation.11. and at the same time they should capture most of the information in the data. blocks. This means that it is impossible to state simple rules for model building: there will certainly be cases where the rules are not c Studentlitteratur ° . Y . In the following table we list the operators used by SAS. they should be simple. Also used for polynomials: X*X (none) Both eﬀects present: A B | All main eﬀects and interactions: A|B=A B A*B () Nested factor: A(B). oneway ANOVA Y=A Two-way ANOVA with interaction Y = A B A*B or Y=A|B Covariance analysis model 1 Y= Covariance analysis model 2 Y=X Covariance analysis model 3 Y=A Covariance analysis model 4 Y=AX Covariance analysis model 5 Y = A X A*X 1.) Computer software requires the user to state the model in symbolic terms. The kinds of models that we have discussed in this chapter can symbolically be written in SAS language as indicated in the following table. A good model is a compromise between parsimony and completeness. General Linear Models 25 Let us use letters at the end of the alphabet (X.1. Letters in the beginning of the alphabet (A.2 Model building strategy Statistical model building is an art as much as it is a science. “A nested within B” @ Order operator: A|B|C @ 2 means that all main eﬀects and all interaction up to and including second order in- teractions are included. The model statement contains operators that specify diﬀerent aspects of the model. Y will be used for the dependent variable. Examples of the use of the operators are given below. SAS example * Interaction: A*B. Model Computer model (SAS) Simple linear regression Y=X Multiple regression Y=XZ t tests. B) will symbolize class variables (groups. There are many requirements on models: they should make sense from a subject-matter point of view. Z) to denote numeric variables. etc.

xa−1 . Model building relevant. CLASS group. The t test (or the oneway ANOVA) can be modelled as PROC GLM DATA=Anova.11. RUN. . A significance level in the range 15-25% may be used in- stead. MODEL y = x. . if the interaction A*B*C is included. • Alternatively..26 1. p.3 A few SAS examples In SAS terms.11. the model should also include A. For example. B. However. The diﬀerence between the two programs is that in the t test. • The conventional 5% significance level is often too strict for model build- ing purposes. This is discussed on page 46 in connection with generalized linear models.1) using Proc GLM. even those that are not significant. The analy- sis could be done with a program that does not include any CLASS variables: PROC GLM DATA=Regression. • Covariates that do not have any detectable eﬀect should be excluded. This asks SAS to build appropriate dummy variables. C.. criteria like the Akaike information criterion can be used. the independent variable ( “group”) is given as a CLASS variable. • If an interaction is included. As examples of SAS programs for a few of the models discussed above we can consider the regression model (1. A*C and B*C. • A model that contains polynomial terms of type xa should also contain the lower-degree terms x. c Studentlitteratur ° . A*B. MODEL y = group. the model should also include all main eﬀects and interactions it comprises. grouping variables (classification variables) are called CLASS variables. 1. are useful in many cases: • Include all relevant main eﬀects in the model. x2 . RUN. the following suggestions. 89). partially based on McCullagh and Nelder (1989.

test statistic.02 10 0.25 20 0. A complete solution should contain hypotheses.1 8.8 7. The following data.1.8 4.6 3. At various times after injection. radioactivity measurements were made.8 1.69 40 1.1 Cicirelli et al (1983) studied protein synthesis in developing egg cells of the frog Xenopus laevis.7 13.2 1. All egg cells were taken from the same female.12 Exercises Exercise 1. Radioactively labeled leucine was injected into egg cells. B.2 3.7 6. C.0 3. From these measurements it was possible to calculate how much of the leucine had been incorporated into protein.8 9. quoted from Samuels and Witmer (1999). and a conclusion.1 9. Time Leucine (ng) 0 0. General Linear Models 27 1. Plot the data and the regression line.50 60 1. are mean values of two egg cells. A c Studentlitteratur ° .07 50 1.74 A.54 30 0.8 53.9 15.6 A.9 3.5 5. The results are summarized in the following table: a b c 3. Exercise 1. Use linear regression to estimate the rate of incorporation of the labeled leucine. Prepare an ANOVA table.3 10.9 9.4 7. Make an analysis of these data that can answer the question whether there are any diﬀerences in cortisol level between the groups.1 15. calculations.2 The level of cortisol has been measured for three groups of patients with diﬀerent syndromes: a) adenoma b) bilateral hyperplasia c) cardinoma.

481 28. D. Graph the data. ii) Fit a model that does not assume that the regression lines are parallel. According to your best analysis above.200 32.862 41. Make the calculation by hand. 1993). Two levels of nitrogen were used. Exercise 1. are not fulfilled.351 A.28 1. 35 and 38 days after germination.414 35. Analyze the data in a way that treats Days from germination as a quanti- tative factor. and B. above. B. and assume that all regressions are linear. 30. i) Fit a model that assumes that the two regression lines are parallel.479 31.688 21. 35 days after germination? The same question for a plant with a low level of nitrogen? Use the model you consider the best of the models you have fitted under A. No new ANOVA is needed. is there any significant eﬀect of: i) Interaction ii) Level of nitrogen ii) Days from germination c Studentlitteratur ° .069 26.448 10. Exercises graphical display (for example a boxplot) may help in the interpretation of the results.186 12. Treat level of nitrogen as a dummy variable.255 28. Examine this. B. The data were as follows: Days from germination Level of Nitrogen 24 30 35 38 High 8.403 Low 15.12. and suggest what can be done to improve the analysis. C.677 11.237 11. What is the expected rate of CO2 emission for a plant with a high level of nitrogen.830 45.296 25.594 31.115 34. iii) Test the hypothesis that the regressions are parallel.3 Below are some data on the emission of carbon dioxide from the root system of plants (Zagal et al.891 20.220 19. Include both the observed data and the fitted Y values in your graph. using the computer printouts of model equations.301 18. indicate what the problems are.730 43.765 34. and samples of plants were analyzed 24.951 39. There are some indications that the assumptions underlying the analysis in A.

A. A convenient way to estimate the parameters of such a function is to make a linear regression of log(y) on x. B. General Linear Models 29 Exercise 1. quoted from Snedecor and Cochran (1980). The results were: Exposure Count 0 271 15 108 30 59 45 29 60 12 It was assumed that the Count (y) depends on the exposure time (x) through an exponential relation of type y = Ae−Bx .1. Perform a linear regression of log(y) on x. c Studentlitteratur ° .4 Gowen and Price. What assumptions are made regarding the residuals in your analysis in A. Plot the data and the fitted function in the same graph.? C. counted the number of lesions of Aucuba mosaic virus after exposure to X- rays for various times.

.

In general linear models. the classical applications of GLM:s rest on the assumptions of normality. the response variable Y is often assumed to be quantitative and normally dis- tributed. In these models.1 Introduction In Chapter 1 we briefly summarized the theory of general linear models (GLM:s). 2. GLM:s are limited in many ways.2.1 Types of response variables This book is concerned with statistical models for data. GLM:s are very useful for data analysis. In later chapters we will apply the theory in diﬀerent situations. The generalization of GLM:s that we will present in this chapter will allow us to model our data using other distributions than the Normal. Generalized Linear Models 2. For example. and on the class of distributions called the exponential family. But this is by no means the only type of response variables that we might meet in practice. Some examples of diﬀerent types of response variables are: 31 . However. The choice of distribution aﬀects the assumptions we make regarding variances.1. Formally. since the relation between the variance and the mean is known for many distributions. the Poisson distribution has the property that µ = σ2 . the concept of a response variable is crucial. This chapter is the most theoretical chapter in the book. It builds on the theory of Maximum Likelihood estimation (see Appendix B). linearity and homoscedasticity.

Since a probability p is limited by 0 ≤ p ≤ 1. Introduction • Continuous response variables. In fact. since measurements cannot be made to infinite precision. • Response variables in the form of proportions. c Studentlitteratur ° .1.1 In the pharmacological study discussed in Example 1. 2.1. We will here give a few examples of these types of response variables. A common approach to modeling this type of data is to model the probability that the event will occur. Example 2. models for the data should use this restriction. • Ordinal response. but continu- ous models are still often used as approximations.3 the concentration of Dopamine was measured in the brains of six control rats and of six rats that had been exposed to toluene.3 Response as a binary variable Binary response.32 2. Binary data are often modeled using the Bernoulli distribution. often assuming normality and homoscedasticity. • Response variables in the form of counts. Many response variables of this type are modeled as general linear models.1. few response variables are truly continuous. is the result of measurements where it has only been recorded whether an event has occurred (Y = 1) or not (Y = 0). other examples will be discussed in later chapters. In this example the response variable may be regarded as essentially continuous. We may illustrate the concept of continuous response using data of a type often used in general linear models. • Response in the form of rates. and one has to revert to approximations. ¤ 2. It is common for response variables to be restricted to pos- itive values. the normality assumption cannot hold exactly for such data. The results were given on page 13. Since the Normal distribution is defined on [−∞. ∞].2 Continuous response Models where the response variable is considered to be continuous are com- mon in many application areas. • Binary response variables. often called quantal response in earlier literature. Physical measurements in cm or kg are examples of this.

The data are of the type given in the following table. Generalized Linear Models 33 which is a special case of the Binomial distribution where n = 1. taking on only the values 0 or 1. . Thus.46 1 0 0 0 61 0. The response of the individuals might be to improve from a certain medical treatment. The issue of concern was to find indicators whether the cancer had spread to the surrounding lymph nodes.50 0 1 0 0 58 0. reports some data on the treatment of prostatic cancer.84 1 1 1 1 . or for a piece of equipment to fail.48 0 0 0 0 65 0. when sprayed on the insect Macrosi- c Studentlitteratur ° . the response Y has value 1 if nodal involvement has occurred and 0 otherwise. Age Acid X-ray Tumour Tumour Nodal level result size grade involvement 66 0. Even some of the independent variables (X-ray results.2. one purpose of the modeling is to formulate a model that can predict whether or not the lymph nodes have been aﬀected.3 Finney (1947) reported on an experiment on the eﬀect of Rotenone. In this type of data.. A proportion corresponds to a probability. Surgery is needed to ascertain the extent of nodal involvement.2 Collett (1991). This is called a binary response. f out of the n individuals respond in one way (Y = 1) while the remaining n−f individuals respond in some other way (Y = 0). in diﬀerent concentrations.1. and modeling of the response probability is an important part of the data analysis... Tumour size and Tumour grade) are binary variables. . . . . ¤ 2. The response is the proportion pb = nf . .. . quoting data from Brown (1980). . the actual data set contained 53 patients. . Example 2.4 Response as a proportion Response in the form of proportions (binomial response) is obtained when a group of n individuals is exposed to the same conditions. The binomial distribution is further discussed on page 88. Example 2.. These data will be analyzed in Chapter 5. to die from a specified dose of an insecticide. Some vari- ables that can be measured without surgery may be indicators of nodal in- volvement. Binary response is a special case of binomial response with n = 1.. . Only a portion of the data is listed.48 1 1 0 1 65 0. In such models the fact that 0 ≤ p ≤ 1 should be allowed to influence the choice of model. .

01 50 44 88 7.6 0. The functions g and f should be chosen such that the model cannot produce a predicted probability that is larger than 1. Example 2. The results are given in the following table.1.34 2. in general terms.41 50 6 12 One aim with this experiment was to find a model for the relation between the probability p that an insect is aﬀecteded and the dose.71 46 24 52 3. ¤ 2. Counts are often recorded in the form of fre- quency tables or crosstabulations. We will return to the analysis of these data later (page 117).8 0. Introduction phoniella sanborni. These data will be discussed later on page 89. The re- sults are: Season Red Other Total Early spring 29 11 40 Late spring 273 191 464 Early summer 8 31 39 Late summer 64 64 128 Total 374 297 671 The data may be used to study how the color of the beetle depends on season.58 48 16 33 2.89 49 42 86 5. as g(p) = f(Concentration). Such a model can be written. aﬀected % aﬀected 10.e. i. ¤ c Studentlitteratur ° .1.5 Response as a count Counts are measurements where the response indicates how many times a specific event has occurred.4 Sokal and Rohlf (1973) reported some data on the color of Tiger beetles (Cicindela fulgida) collected during diﬀerent seasons. Conc Log(Conc) No. Models for counts should take this limitation into account.1 0. of insects No. Count data are restricted to integers ≥ 0. A common approach is to test whether there is independence between season and color through a χ2 test.7 0. in batches of about fifty.2 1. the concentration.

Examples of such variables are ratings of patients.1. of person years (’000) 17. The data refer to 16262 Medicaid enrollees. the number of birds of a certain species that have been sighted may depend on the area of the habitat that has been surveyed. we have to account for the fact that males and females have diﬀerent observation periods. Generalized Linear Models 35 2. subdivided by sex. ¤ 2. which is an ordinal variable. of accidents 175 320 No. The amount of snoring was assessed on a scale ranging from “Never” to “Always”. The data are: Snoring Heart problems Never Some.5 The data below. Example 2. one has to account for diﬀerences in size between objects. answers to attitude items. of accidents)/(no. i.7 Ordinal response Response variables are sometimes measured on an ordinal scale.2.6 Response as a rate In some cases. In this case.1. An interesting question is whether there is any relation between snoring and heart problems.4 Accident data can often be modeled using the Poisson distribution. Accident rate can be measured as (no. in terms of number of person years. the response can be assumed to be proportional to the size of the object being measured. quoted from Agresti (1996). Females Males No. on a scale where the categories are ordered but where the distance between scale steps is not constant. we will discuss how this type of data can be modelled. For each sex the number of person years (in thousands) is also given. In a later chapter (page 131).3 21. Often Always Total times Yes 24 35 21 30 110 No 1355 603 192 224 2374 Total 1379 638 213 254 2484 c Studentlitteratur ° . In the analysis of data of this type. In this case the response may be measured as “number of sightings per km2 ”. For example.e. of person years). are accident rates for elderly drivers. The data were obtained through interviews with the patients. Example 2. and school marks.6 Norton and Dunn (1985) studied the relation between snoring and heart problems for a sample of 2484 patients. which we will call a rate.

starting with the distribution.3): y = Xβ + e (2. Generalized linear models are a generalization of general linear models in the following ways: 1. ¤ 2. is called a link function. Thus.1) Let us denote η = Xβ (2. specification of the linear predictor Xβ. we model some function g (µ) of µ. Instead of modeling µ =E (y) directly as a function of the linear predictor Xβ. We can relax this assumption to permit the distribution to be any distribution that belongs to the exponential family of distributions. This includes distri- butions such as Normal. 2.2 Generalized linear models Generalized linear models provide a unified approach to modelling of all the types of response variables we have met in the examples above. Let us return to the general linear model (1. specification of the distribution 2. The specification of a generalized linear model thus involves: 1. gamma and binomial distributions. In this section we will summarize the theory of generalized linear models. In later sections we will return to the examples and see how the theory can be applied in specific cases. Analysis of ordinal data is discussed in Chapter 7.36 2. specification of the link function g (·) 3. We will discuss these issues.3).2. An assumptions often made in a GLM is that the components of y are independently normally distributed with constant variance.3) The function g (·) in (2. Poisson.2) as the linear predictor part of the model (1. the model becomes g (µ) = η = Xβ. (2. c Studentlitteratur ° .3). Generalized linear models The main interest lies in studying possible dependence between snoring and heart problems.

It has probability function µy e−µ f (y. φ) (2. see Jørgensen (1987). (2. p) = p (1 − p) (2.4). we will demonstrate that some well-known distributions are. µ) = exp [yθ − exp (θ) − log (y!)] Thus. 1−p y c Studentlitteratur ° .4) with θ = log (µ). We insert this into (2. 2.5) is a special case of (2.5) y! = exp [y log (µ) − µ − log (y!)] .3 The exponential family of distributions The exponential family is a general class of distributions that includes many well known distributions as special cases. φ) = − log(y!) and a(φ) = 1. p.4) assuming that a (φ) is unity. 10f). b(θ) = exp(θ).3.6) y · µ ¶ µ ¶¸ p n = exp y log + n log (1 − p) + log . which include the function a (φ) while assuming that the so called dispersion parameter φ is a constant. µ) = (2. φ) = exp + c (y. and exponential dispersion family. θ.2 The binomial distribution The binomial distribution can be written as µ ¶ n y n−y f (y.4) a (φ) where a (·) . It can be written in the form · ¸ (yθ − b (θ)) f (y. which is (2. As examples of the usefulness of the exponential family.2. b (·) and c (·) are some functions. We note that θ = log (µ) which means that µ = exp (θ). Generalized Linear Models 37 2. The so called canonical param- eter θ is some function of the location parameter of the distribution.3. 2. special cases of the exponential family. Lindsey (1997.1 The Poisson distribution The Poisson distribution can be written as a special case of an exponential family distribution. c(y. in fact. Some authors diﬀer between exponential family. We can compare this expression with (2.5) and get f (y.

3. and c (y. it is an exponential dispersion family distribution.3.e. (In fact.3 The Normal distribution The Normal distribution can be written as ¡ ¢ 1 −(y−µ)2 f y. b (θ) = n log [1 + exp (θ)]. φ. σ2 2σ 2 2 This is an exponential family distribution £ 2 with θ =¤ µ. see above. φ) = − y /φ + log (2πφ) /2. φ). This can be inserted into 2.4 The function b (·) The function b (·) is of special importance in generalized linear models be- cause b (·) describes the relationship between the mean value and the variance in the distribution. µ. (2.9) ∂θ 2 ∂θ c Studentlitteratur ° . b (θ) = θ /2. 2.) 2. σ2 = √ e 2σ2 (2. The first derivative: b0 We denote the log likelihood function with l (θ. Accord- ing to likelihood theory it holds that µ ¶ ∂l E =0 (2. p = 1+exp(θ) .7) 2πσ2 ³ 2 ´  yµ − µ2 y 2 1 ¡ ¢ = exp  − 2 − log 2πσ 2  . y) = log f (y. 1 + exp (θ) y It follows that ³ the´binomial distribution is an exponential family distribution p ¡n¢ with θ = log 1−p . φ) = log y and a(φ) = 1. θ. a (φ) = 2 φ. The exponential family of distributions ³ ´ p exp(θ) We use θ = log 1−p i.38 2. p) = exp yθ + n log + log . c (y.8) ∂θ and that µ ¶ "µ ¶2 # ∂ 2l ∂l E +E = 0. For a brief introduction to Maximum Likelihood estimation reference is made to Appendix B. φ = σ . To show how this works we consider Maximum Likeli- hood estimation of the parameters of the model.6 to give · µ ¶ µ ¶¸ 1 n f (y.3.

10) ∂θ and ∂ 2l = −b00 (θ) /a (φ) (2. these derivatives are: Poisson : b (θ) = exp (θ) gives b0 (θ) = exp (θ) = µ exp (θ) Binomial : b (θ) = n log (1 + exp (θ)) gives b0 (θ) = n = np 1 + exp (θ) θ2 Normal : b (θ) = gives b0 (θ) = θ = µ 2 For each of the distributions the mean value is equal to b0 (θ). φ). The second derivative: b00 From (2. ∂l = [y − b0 (θ)] /a (φ) (2.12) ∂θ so that E (y) = µ = b0 (θ) .9) and (2.15) We see that the variance of y is a product of two terms: the second derivative of b (·). For the distributions we have discussed so far. and the function a (φ) which is independent of θ.2.4) we obtain that l (θ.8) and (2.14) a (φ) a (φ) so that V ar (y) = a (φ) · b00 (θ) . (2.10) we get µ ¶ ∂l E = E {[y − b0 (θ)] /a (φ)} = 0 (2.11) we get b00 (θ) V ar (y) − + 2 =0 (2. of b with respect to θ. (2. φ.13) Thus the mean value of the distribution is equal to the first derivative of b with respect to θ. y) = (yθ − b (θ)) /a (φ)+c (y. From (2.11) ∂θ2 where b0 and b00 denote the first and second derivative. respectively. The parameter φ is called the dispersion parameter and b00 (θ) is called the variance function. Therefore. c Studentlitteratur ° . Generalized Linear Models 39 From (2.

1: . Generalized Linear Models 41 c Studentlitteratur ° Figure 2.2.

From the expression for the linear predictor η = P ∂η xj β j we obtain ∂β = xj . Generalized Linear Models 43 that maximize the log likelihood.18. the variance function. (2.17) ∂β j ∂θ dµ dη ∂β j We have shown earlier that b0 (θ) = µ. c Studentlitteratur ° . Putting things together. Thus. using the chain rule. By summing over the observations. Thus. functions of θ. (2.2. yields ∂l ∂l dθ dµ ∂η = . ∂µ∂θ = V . and that b00 (θ) = V . in turn. Asymptotic variances and covariances of the parameter esti- mates are obtained through the inverse of the Fisher information matrix (see Appendix B). a (φ) dµ In 2. the likelihood equation for one parameter β j is given by X Wi (yi − µ ) dη i i xij = 0. φ. (2. we have written the likelihood for one single observation. W is defined from µ ¶2 −1 dη W = V.16) a (φ) The parameters of the model is a p × 1 vector of regression coeﬃcients β which are. (2.20) with respect to β j since the µi :s are functions of the parameters β j .19) dµ So far.18) ∂β j a (φ) V dη W dη = (y − µ) xj . φ) . j j ∂l (y − µ) 1 dµ = xj (2. Diﬀerentiation of l with respect to the elements of β. which for a single observation can be written yθ − b (θ) l = log [L (θ. y)] = + c (y.20) i a (φ) dµ i We can solve (2.

this algorithm works as follows: 1.16). . 4.. . . 5.44 2. dη Form the adjusted dependent variate z0 = b b0 ) dµ where the η0 + (y − µ 0 b0 . Briefly. from which a new estimate b η 1 of the linear predictor is calculated. and let µb0 be the corresponding fitted value derived from the link function ³ ´ η = g (µ).  .β b   Cov β 1 0 V ar β 1   =  . is done using numerical methods. Repeat steps 1−4 until the changes are suﬃciently small.β b Cov β b ...7 Numerical procedures Maximization of the log likelihood (2.20). . . Linearize the link function g (·) by using the first order Taylor series ap- proximation g(y) ≈ g (µ) + (y − µ) g 0 (µ) = z. see Mc- Cullagh and Nelder (1989).βb 0 0 1 0 p−1  ³ ´ ³ ´   b b .21) ∂ l ∂l ∂l ∂l ∂l ∂β 20 ∂β 0 ∂β 1 ∂β 0 ∂β p−1  ∂2 l   ∂l ∂l   ∂β 1 ∂β 0 ∂β 21  = −E  .   ³ ´ ³ ´  b b Cov β p−1 .  . .7. where V is the variance function. . This gives new estimates β b of the param- 1 eters. Define the weight matrix W from W0−1 = dµ V0 .   . β 0 ··· b V ar β p−1  2 −1 (2. . Numerical procedures  ³ ´ ³ ´ ³ ´  V ar βb Cov β b . which is equivalent to solving the likelihood equations (2. x2 . A commonly used procedure is the iteratively reweighted least squares approach. Let b η 0 be the current estimate of the linear predictor.    ∂l ∂l ∂2l ∂β p−1 ∂β 0 ··· ∂β 2p−1 2.. derivative of the link is evaluated at µ ³ ´2 dη 3. xp using weights W0 . c Studentlitteratur ° . 2. Perform a weighted regression of dependent variable z on predictors x1 .

of course. It is actually the scaled deviance that is used for inference. The Genmod procedure in SAS presents two deviance statistics: the deviance and the scaled deviance. The null model has only one parameter that represents a common mean value µ for all obser- vations. The deviance D is defined as D = 2 (l(y. The full model is used as a benchmark for assessing the fit of any model to the data. The simpler model would then have a larger deviance and more df. the deviance is just the resid- ual sum of squares. Suppose that a certain model gives a deviance D1 on df1 degrees of freedom (df). φ. For the saturated model. φ. y) be the log likelihood of the current model at the Maximum Likelihood estimate. φ. y) be the log likelihood of the full model. This is done by calculating the deviance. and relate this to the χ2 distribution with (df2 − df1 ) degrees of freedom. A second. the full (or saturated) model has n parameters. (D2 − D1 ).2. φ. i. This would give a large-sample test of the significance of the parameters that are included in model 1 but not in model 2. For distributions such as Binomial and Poisson. requires that the parameters included in model 2 is a subset of the parameters of model 1. This. i. and let l(y.8 Assessing the fit of the model 2. For distributions that have a scale parameter φ. and perhaps more important use of the deviance is in comparing competing models.22) It can be noted that for a Normal distribution. that the models are nested.1 The deviance The fit of a generalized linear model to data may be assessed through the deviance.e. and that a simpler model produces deviance D2 on df2 degrees of freedom. y)) . Diﬀerent models can have diﬀerent degrees of complexity. the deviance and the scaled deviance are identical. the deviance will asymptotically tend towards a χ2 dis- tribution as n increases. The deviance is also used to compare nested models. one for each observation. (2.e. Generalized Linear Models 45 2. each observation fits the model perfectly. This can be used as an over-all test of the adequacy of the model. y) − l(b µ. To compare the two models we can calculate the diﬀerence in deviance. The deviance is defined as follows: Let l(b µ. c Studentlitteratur ° . the scaled deviance is D∗ = D/φ.8. y = yb. If the model is true. The degree of approximation to a χ2 distribution is diﬀerent for diﬀerent types of data. In contrast.

The AIC is not very useful in itself since the scale is somewhat arbitrary. and φ is the dispersion parameter.2 The generalized Pearson χ2 statistic An alternative to the deviance for testing and comparing models is the Pear- son χ2 . 2. so in this case. Vb (b µ) is the estimated variance function. q is the number of parameters in the model. (2. For the Normal distribution. If φ is constant. large values of AIC would then indicate a good model. Another reason is that the Pearson χ2 does not have the same additive properties as the deviance for comparing nested models. Assessing the fit of the model 2.3 Akaike’s information criterion An idea that has been put forward by several authors is to “penalize” the likelihood functions such that simpler models are being preferred. which can be defined as X χ2 = b)2 /Vb (b (yi − µ µ) .8. In other cases. which may be one reason to prefer the deviance over the Pearson χ2 . (2. This is the information criterion (AIC) suggested by Akaike (1973).8. Computer packages for generalized linear models often produce both the deviance and the Pearson χ2 .46 2. The main use is to compare the AIC of competing models in order to decide which model to prefer. It can be shown that α ≈ 2 would lead to prediction errors near the minimum. Maximum likelihood estimation of the parameters in generalized linear mod- els seeks to minimize the deviance. Akaike’s information criterion is often used for model selection: the model with the smallest value of DC would then be the preferred model. the de- viance and Pearson’s χ2 coincide. it can be shown that α ≈ 4 is roughly equivalent to testing one parameter at the 5% level. the deviance and Pearson’s χ2 have diﬀerent asymptotic properties and may produce diﬀerent results. c Studentlitteratur ° . A general expression of this idea is to measure the fit of a model to data by a measure such as DC = D − αqφ. Note that some computer program report the AIC with the opposite sign. Large diﬀerences between these may be an indication that the χ2 approximation is bad.8.23) i Here. D is the deviance.24) Here. this is again the residual sum of squares of the model.

is that it is a likelihood-based test that is useful for comparing nested models. but with z replaced by t. 2. we can test hypotheses about single parameters by using βb z= (2.25). Collett (1991) concludes that for binary data with all ni = 1. Note that the Wald tests for single parameters are related to the Pearson χ2 . This is based on the fact that if z is standard Normal. For Normal models the deviance is equal to the residual sum of squares which is not a model test by itself. the deviance cannot be used to assess the over-all fit of the model (p. In some cases the Wald tests are presented as χ2 tests.9 Diﬀerent types of tests Hypotheses on single parameters or groups of parameters can be tested in diﬀerent ways in generalized linear models. as compared to the Pearson χ2 .1 Wald tests Maximum likelihood estimation of the parameters of some model results in estimates of the parameters and estimates of the standard errors of the esti- mators. 65).9. then z 2 follows a χ2 distribution on 1 degree of freedom. The usefulness of these tests depends on the kind of data being analyzed. and the Pearson χ2 . c Studentlitteratur ° . are exact.8.25) bβb σ where z is a standard Normal variate. without necessarily making any formal inference. Akaike’s information criterion is often used as a way of comparing several competing models. can provide large-sample tests of the fit of the model. 2.4 The choice of measure of fit The deviance. Generalized Linear Models 47 2. In other cases the Wald tests are asymptotic tests that are valid only in large samples. For example. tests based on (2. This is called a Wald test. The estimates of standard errors are often asymptotic results that are valid for large samples. If the large-sample conditions are valid.2. Let us denote the asymptotic standard error of the estimator βb with σ bβb . In normal theory models. The advantage of the deviance.

26) Under rather general conditions. The likelihood ratio test uses information on the log likelihood both at β b and at 0. the LR statistic uses more information than the Wald and score statistics. while a derivative close to 0 would be a sign that we are close to the maximum. c Studentlitteratur ° . It compares the likelihoods L1 and L0 using the asymptotic χ2 distribution of −2 (log L0 − log L1 ). (2. Denote with L1 the likelihood function maximized over the full parameter space.48 2. The score test is based on the behavior of the likelihood function close to 0. L1 and L0 denote the likelihood under H1 and H0 . In Figure 2.9. If the derivative at H0 is “large”. The likelihood ratio statistic is −2 log (L0 /L1 ) = −2 [log (L0 ) − log (L1 )] = −2 (l0 − l1 ) . The Wald test uses the behavior of the likelihood function at the ML estimate b The asymptotic standard error of β β. We are testing a hypothesis H0 : β = 0.2 Likelihood ratio tests Likelihood ratio tests are based on the following principle. 2. Generally. we illustrate a hypothetical likelihood function. It can be treated as an asymptotic χ2 variate on 1 df . Thus.9. β b is the Maximum Likelihood estimator of some pa- rameter β. the value stated in H0 . Exceptions occur if there are linear constraints on the parameters. b depends on the curvature of the b likelihood function close to β. the likelihood ratio tests are related to the deviance.3 Score tests We will illustrate score tests (also called eﬃcient score tests) based on argu- ments taken from Agresti (1996). it can be shown that the distribution of the likelihood ratio statistic approaches a χ2 distribution as the sample size grows. Diﬀerent types of tests 2. this would be an indication that H0 is wrong. In the same way as the Wald tests are related to the Pearson χ2 . in a sense. The score test is calculated as the square of the ratio of this derivative to its asymptotic standard error. the number of degrees of freedom of this statistic is equal to the number of parameters in model 1 minus the number of parameters in model 0. and denote with L0 the likelihood function maximized over parameter values that correspond to the null hypothesis being tested. Agresti (1996) suggests that the likelihood ratio statistic may be the most reliable of the three.9. For this reason. respectively.2.

The interpretation of these tests is the same as in general linear models. Proc Genmod in SAS oﬀers Type 1 or Type 3 tests.10 Descriptive measures of fit In general linear models. LR and score tests. In a Type 1 analysis the result of a test depends on the order in which terms are included in the model. 2. In general linear models the Type 1 and Type 3 tests are obtained through sums of squares.2.9. A value of R2 close to 1 would indicate a good fit. Generalized Linear Models 49 L(b ) L1 L0 0 bˆ b Figure 2. the fit of the model to data can be summarized as R2 = SSSS Model T . c Studentlitteratur ° . see Chapter 1.4 Tests of Type 1 or 3 Tests in generalized linear models have the same sequential property as tests in general linear models. A type 3 analysis does not depend on the order in which the Model statement is written: it can be seen as an attempt to mimic the analysis that would be obtained if the data had been balanced. 2. In Generalized linear models the tests are Likelihood ratio tests. It holds that 0 ≤ R2 ≤ 1. Similar measures of fit have been proposed also for generalized linear models. An adjusted version of R2 has been proposed to account for the fact that R2 increases even when irrelevant factors are added to the model.2: A likelihood function indicating information used in Wald. but there is an option in Genmod to use Wald tests instead. See Chapter 1 for a discussion on Type 1 and Type 3 tests.

2 Rmax = 1 − (L0 )2/n . (2. but has the disadvantage that it is always smaller than 1.28) For example. in a binomial model with half of the observations in each cat- egory. An application Cox and Snell (1989) suggested to use µ ¶2/n L0 R2 = 1 − (2. The coeﬃcients by Cox and Snell. This measure equals the usual R2 for Normal models. and by Nagel- kerke.7 Samuels and Witmer (1999) report on a study of methods for producing sheep’s milk for use in the manufacture of cheese. Ewes were randomly assigned to either mechanical or manual milking. Nagelkerke (1991) therefore suggested the modification 2 R = R2 /Rmax 2 .75 even if there is perfect agreement between the variables. It was suspected that the mechanical method might irritate the udder and thus producing a higher concentration of somatic cells in the milk. Mechanical Manual 2966 186 269 107 59 65 1887 126 3452 123 189 164 93 408 618 324 130 548 2493 139 y 1216 219 s 1343 156 c Studentlitteratur ° . this maximum equals 0.27) LMax where L is the likelihood.50 2.11. 2. are available in the Logistic procedure in SAS.29) Similar coeﬃcients have been suggested by Ben-Akiva and Lerman (1985). In fact.11 An application Example 2. (2. and by Horowitz (1982). The data in the following table show the counts of somatic cells for each animal.

The SAS programs for analysis of these data were of the following type: PROC GENMOD DATA=sheep. The output for this model is as follows: Model Information Description Value Data Set WORK.0507 .SHEEP Distribution POISSON Link Function LOG Dependent Variable COUNT Observations Used 20 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 18 14862. The diﬀerence between these p-values is explained by the fact that the Wald test essentially approximates the t distribution with a Normal distribution. p < 0. Cell counts often conform to Poisson distributions. A model assuming a Normal distribution and with milking method as a factor was fitted.7182 825.12 on 1 df . the program then chooses the canonical link which. for the Normal distribution. some kind of relationship between the mean and the variance may be at hand.5077 797.5282 Log Likelihood . The Poisson model gives a deviance of 14863 on 18 df.2.044.5282 Scaled Pearson X2 18 14355. is the identity link. By default. a Normal distribution and an identity link. 83800.014).7066 Scaled Deviance 18 14862. MODEL count = milking / dist = normal. In this model. CLASS milking.7182 825. An analysis similar to a standard two-sample t test can be ob- tained by using a generalized linear model with a dummy variable for group membership. this gives p = 0. ¤ We will illustrate the analysis of these data by attempting several diﬀerent analyses. RUN.7066 Pearson Chi-Square 18 14355. closer scrutiny of the data reveals that the variation is quite diﬀerent in the two groups. c Studentlitteratur ° . This means that a Poisson distribution with the canonical (log) link is another option. If we use a standard t test assuming equal variances.5077 797. The group diﬀerence is highly significant: χ2 = 5451. Generalized Linear Models 51 This is close to a textbook example that may be used for illustrating two- sample t tests.0001. the Wald test of group diﬀerences is significant (p = 0. However. In fact.

The non-parametric Wilcoxon test is not significant.0001 Poisson with φ 0.11. which is related to a phenomenon called overdispersion. Two other distributions were also tested: the gamma distribution and the inverse gaussian distribution. the p values are rather diﬀerent. To decide which of the results to use we need to assess the diﬀerent models. Glim 0.1620 Although most models seem to indicate significant group diﬀerences. These were used with their respective canonical links. . c Studentlitteratur ° .0405 Gamma 0. However. An application Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 7.52 2. SCALE 0 1.0086 Inverse Gaussian 0.010. In the analysis where the scale parameter was used.0 and the p value for milking was 0.1030 0.0000 0. the scaled deviance was 18.0232 5451.0610 Poisson <0. Methods for statistical model diagnostics in generalized linear models is the topic of the next chapter. In addition. .0001 MILKING Man 1 -1.0000 .0440 Log normal 0. is discussed in the next chapter.7139 0.0091 613300.717 0.1201 0.0001 MILKING Mech 0 0.05 limit.0000 0. t test 0. NOTE: The scale parameter was held fixed. since the ratio deviance/df is so large.0000 .0140 Normal. This approach. a second analysis was made where the program estimated the dispersion parameter φ.0102 Wilcoxon 0. a Wilcoxon-Mann-Whitney test was made. The results for all these models can be summarized as follows: Model p value Normal. based both on statistical consideration and on subject-matter knowledge. This illustrates the fact that significance testing is not a mechanical procedure. The first Poisson model gives a strongly signif- icant result while the standard t test is only just below the magical 0.

but where it is impossible to observe the value y = 0. The survival time and the weight were recorded for 13 sheep that had been poisoned.. Derive the variance function. The density of the exponential distribution is f(x) = λe−λx for x > 0.2. Exercise 2. c Studentlitteratur ° . 3. For example.. Does the exponential distribution belong to the exponential family? If so. what are the functions b (·) and c (·)? What is the variance function? Exercise 2. A. It has probability function e−λ λyi p (yi |λ) = (1 − e−λ ) yi ! for yi = 1. Generalized Linear Models 53 2.3 Aanes (1961) studied the eﬀect of a certain type of poisoning in sheep.1 The distribution of waiting times (for example the time you wait in line at a bank) can sometimes be approximated by an exponential distribution.2 Sometimes data are collected which are “essentially Poisson”. The truncated Poisson distribution can sometimes be used to model such data. Investigate whether the truncated Poisson distribution is a member of the Exponential family. Do not use only the canonical link for each distribution. Results: Weight Survival Weight Survival 46 44 59 44 55 27 64 120 61 24 67 29 75 24 60 36 64 36 63 36 75 36 66 36 71 44 Find a generalized linear model for the relationship between survival time and weight. . 2. Plot the data and the fitted models. Try several diﬀerent distributions and link functions.12 Exercises Exercise 2. B. if data are collected by interviewing occupants in houses on “how many occupants are there in this house”. it would be impossible to get an answer from houses that are not occupied.

.

• Overdispersion means that the variance is larger than would be expected for the chosen distribution. We will discuss ways to detect over-dispersion and to modify the model to account for over-dispersion. the model should be modified.1 Introduction In general linear models. For example. In this chapter we will discuss diﬀerent model diagnostic tools for generalized linear models. The diagnostic tools that we consider are the following: • Residual plots (similar to the residual plots used in GLM:s) can be used to detect various deviations between data and model: outliers. a residual is the diﬀerence between the observed value of y and the fitted value yb that would be obtained if the model were 55 . dependence. Model diagnostics in Glim:s can be performed in similar ways. • Some observations may have an unusually large impact on the results. problems with distributions or variances. Our discussion will be fairly general. the fit of the model to data can be explored by using residual plots and other diagnostic tools. If there are indications of systematic deviations between data and model. Model diagnostics 3. and so on. We will discuss tools to identify such influential observations. the normality assumption can be examined using normal probability plots. We will return to these issues in later chapters when we consider analysis of diﬀerent types of response variables.2 The Hat matrix In general linear models. 3.3. The purpose of model diagnostics is to examine whether the model provides a reasonable approximation to the data. the assumption of homoscedasticity can be checked by plotting residuals against yb. and so on.

56 3. i (yi −b Example: In a Poisson model.3. ei. most computer software for Glim:s have options to print the hat matrix. Residuals in generalized linear models perfectly true: e = y − yb. H is idempotent (i.e. b = Xβ Example: In simple linear regression.P earson . the variance of the residuals is often related to the size of yb. The Pearson residual is the raw residual standardized with the standard deviation of the fitted value: yi − ybi ei. We will not give general formulae here.1 Pearson residuals The raw residual for an observation yi can be defined as ebi = yi − ybi . In generalized linear models. it holds that b = Hy y (3. it is possible to compute the hat matrix. Therefore. For linear predictors.P earson = √(yi −ni pbi ) . The hat matrix may have a more complex form in other models than GLM:s. (3.3. ei. Except in degenerate cases. HH = H) and symmetric. the observed residuals are simply the diﬀerence between the observed values of y and the values yb that are predicted from the model: eb = y − yb.P earson = √ yi ) . bi (1−b ni p pi ) c Studentlitteratur ° . The fitted values are linear functions of the (Yi ) = µ observed values. the hat matrix is H = X (X0 X) X0 . bi y Example: In a binomial model. y b = X (X0 X)−1 X0 y. Several suggestions have been made on how to achieve this.3 Residuals in generalized linear models In general linear models. The concept of a “residual” is not quite as clear-cut in generalized linear models.1) where H is known as the “hat matrix”. 3. The estimated expected value (“fitted value”) of the response in a general \ linear model is E bi = ybi . however. 3. some kind of scaling mechanism is needed if we want to use the residuals for plots or other model diagnostics. In this −1 case.P earson = q .2) Vd ar (b yi ) P The Pearson residuals are related to the Pearson χ2 through χ2 = e2i.

P earson ei. residuals outside ±2 will occur in about 5% of the cases. such that their variance is close to unity. Still. 3. even when they are standardized with the standard error of yb. the standard errors of Pearson residuals can be estimated. Likelihood ratio tests and Score tests. This is obtained as ei. The score residuals are related to the score tests. properly standardized.3. In Maximum Likelihood estimation. be interpreted c Studentlitteratur ° .3.e.Deviance = sign (yi − ybi ) di . which means that e.3 Score residuals The Wald tests. one for each observation.adj. the parameter estimates are obtained by solving the score equations. as a measure of fit of the model: D = di . This can be used to detect possible outliers in the data. provide diﬀerent ways of testing hypotheses about parameters of the model. Pearson residuals can often be considered to be approxi- mately normally distributed with a constant variance.4) The deviance residuals can also be written in standardized form. This is since we have standardized the residuals using estimated standard errors.D = √ (3.3. Model diagnostics 57 If the model holds.g.3) 1 − hii where hii are diagonal elements from the hat matrix. How- ever. presented in the pre- vious chapter. It can be shown that adjusted Pearson residuals can be obtained as ei. the variance of Pearson residuals cannot be assumed to be 1. which are of type ∂l U= =0 (3. in large samples.P = √ (3. i.5) 1 − hii where hii are again diagonal elements from the hat matrix.adj. These terms can.2 Deviance residuals Observation number i contributes P an amount di to the deviance. The adjusted Pearson residuals can often be considered to be standard Normal.Deviance ei.6) ∂θ where θ is some parameter. The score equations involve sums of terms Ui . (3. We define the deviance residuals as i p ei. 3.

3. the Anscombe residuals are rather diﬃcult to calculate. Anscombe (1953) suggested that the residuals may be defined based on some transformation of observed data and fitted values. Residuals in generalized linear models as residuals. An approxima- tion to the residuals that would be obtained using this procedure is q ei.Anscombe = q (3.3.9) Vd ar (A (yi ) − A (byi )) The function A (·) is chosen depending on the type of data. in samples of the sizes we often meet in practice.Deviance )2 (3. which may explain why they have not reached widespread use.4 Likelihood residuals Theoretically it would be possible compare the deviance of a model that com- prises all the data with the deviance of a model with observation i excluded.Likelihood = sign (yi − ybi ) hii (ei. However. i.8) where hii are diagonal elements of the hat matrix. and vi are elements of a certain weight matrix. In general. 3.6 The choice of residuals The types of residuals discussed above are related to the types of tests and other model building tools that are used: c Studentlitteratur ° . 3.3.adj. The transformation would be chosen such that the calculated residuals are approximately standard Nor- mal.Anscombe = 3 2/3 1/6 2 y − b y /b y .Score )2 + (1 − hii ) (ei.58 3.7) (1 − hii ) vi where hii are diagonal elements of the hat matrix. this procedure would require heavy computations. Anscombe defined the residuals as A (yi ) − A (b yi ) ei.e.S = p (3. The standardized score residuals are obtained from Ui ei.3.3. For exam- ple. for Poisson ¡ 2/3 ¢ data the Anscombe residuals take the form ei.5 Anscombe residuals The types of residuals discussed so far have distributions that may not al- ways be close to Normal. This is a kind of weighted average of the deviance and score residuals. as the contribution from each observation to the score.

3. the estimates may change drastically.e. The Pearson residuals are related to the Pearson χ2 and to the Wald tests. 3. The likelihood residuals are a compromise between score and deviance resid- uals The Anscombe residuals. c Studentlitteratur ° . The score residuals are related to score tests. Since H is idempotent it holds that tr (H) = p. Note that influential observations are not necessarily outliers. although theoretically appealing. Observations with a leverage of. i. An outlier is an observation for which the model does not give a good approx- imation. 2000a) have options to store the hat matrix in a file for further processing and plotting. the deviance residuals may be the preferred type of residuals to use for model diagnostics. If such so called influential observations are changed by a small amount. say. This derivative is the corresponding diagonal element hii of the hat matrix H. 3. Computer software like the Insight procedure in SAS (2000b).1 Leverage The leverage of observation i on the fitted value µ bi is the derivative of µ bi with respect to yi . or if they are deleted. Outliers can often be detected using diﬀerent types of plots.4 Influential observations and outliers Some of the observations in the data may have an unduly large impact on the parameter estimates. are not often used in practice in programs for fitting generalized linear models. The average leverage of all observations is therefore p/n. Diagnostic tools are needed to detect influential observations and outliers. An observation can be influential and still be close to the main bulk of the data. Model diagnostics 59 The deviance residuals are related to the deviance as a measure of fit of the model and to Likelihood ratio tests. the number of parameters. and the related JMP program (SAS. In the previous chapter we suggested that the likelihood ratio tests may be preferred over Wald tests and score tests for hypothesis testing in Glim:s. By extending this argument.4. Collett (1991) suggested that either the deviance residuals or the likelihood residuals should be used. twice this amount may need to be examined.

4.4. of course. Let X[j] be the design matrix with the column corresponding to variable xj removed. the calculation of Ci (or Di ) requires extensive re-fitting of the model which may take time even on fast computers. Thus. In principle.5 Partial leverage In models with several explanatory variables it may be of interest to study the impact of a variable.4 Eﬀect on data analysis Computer programs for generalized linear models often include options to calculate the measures of influence discussed above.4. data analysis in generalized linear models should contain both an analysis of the residuals. and an analysis of influential observations.2 Cook’s distance and Dfbeta Dfbeta is the change in the estimate of a parameter when observation i is deleted. or the change in Pearson’s χ2 . The Dfbetas can be combined over all parameters as 1 ³b b ´0 0 ³b b ´ Di = β − β(i) X X β − β (i) . on the results. an approximation to Ci can be obtained as hii (ei. 3. be doubted. A statistical result that may be attributed to very few observations should.10) p (1 − hii ) where p is the number of parameters and hii are elements of the hat matrix. and others. p It can be shown that this yields the so called Cook’s distance Ci . Partial leverage 3. when the observation is deleted. Fit the generalized c Studentlitteratur ° . A large change in the measure of fit may indicate an influential observation.5. 3. say variable xj . discussed above. 3.3 Goodness of fit measures Another type of measure of the influence of an observation is to compute the change in deviance. It belongs to good data analytic practice to use such program options to investigate influ- ential observations and outliers. However.P earson )2 Ci ≈ (3.60 3. The partial leverage of variable j can be obtained in the following way.

however. It shows how much the residuals change between models with and without variable xj . that a poor model fit can also be caused by the wrong choice of linear predictor or wrong choice of distribution or link. Model diagnostics 61 linear model to this design matrix and calculate the residuals. is theoretically possible but rather unusual in practice.6 Overdispersion A generalized linear model can sometimes give a good summary the data. Similarly. consider a dose-response experiment where the same dose of an insecticide is given to two batches of insects. Thus. the variance is some function of the mean: σ2 = V (µ). Under-dispersion. Interesting examples of under-dispersion can be found in the analysis of Mendel’s classical genetic data. for data that are modelled using a binomial distribution. Over-dispersion occurs when the variance of the response is larger than would be expected for the chosen distribution.3. a poor fit does not necessarily mean that we have over-dispersion. and within individuals. Formally. Calculate the residuals from this model as well. between individuals. Note. A partial leverage plot is a plot of these two sets of residuals. fit a model with variable xj as the response and the remaining variables X[j] as regressors. for many distributions it is possible to infer what the variance “should be”. in the sense that both the linear predictor and the distribution are correctly chosen. One possible reason for this may be a phenomenon called over-dispersion. In Chapter 2 we noted that for distributions in the exponential family. Over-dispersion may have many diﬀerent reasons. given the mean value. In one of the batches.6. i. Also. if we use a Poisson distribution to model the data we would expect the variance to be equal to the mean value: µ = σ2 . This lack of homogeneity may occur between groups of individuals. as measured by deviance/df . Partial leverage plots can be produced in procedure Insight (SAS. For example. while in the other batch 65 out of 100 insects die. and still the fit of the full model may be poor. over-dispersion can be de- tected as a poor model fit.e. This may indicate that the batches of insects are c Studentlitteratur ° . In models that do not contain any scale parameter. the main reason is often some type of lack of homogeneity. a “too small” variance. 50 out of 100 insects die. Thus. However. As an example. the variance is a function of the response probability: σ2 = np (1 − p). p = 0. these data are better than would be expected by chance. 3. this means that the response probabilities in the two batches are significantly diﬀerent (the reader may wish to confirm that a “textbook” Chi-square test gives χ2 = 4.032). 2000b).

62 3.1 Models for overdispersion Before any attempts are made to model the over-dispersion. such as new covariates. If these data are part of some larger dose-response experiment. Overdispersion not homogenous with respect to tolerance to the insecticide. thus causing a poor model fit. For example. given the mean value. The parameter φ is often called the over-dispersion parameter. Williams (1982) suggested a more sophisticated iterative procedure for estimating φ. or Pearson χ2 /df . One possible model is to assume that the mean pa- rameter has a separate value for each individual. using more batches of animals and more doses. this type of inhomogeneity would result in a poor model fit because of overdispersion. A simple way to model over-dispersion is to introduce a scale parameter φ into the variance function. the assumptions underlying the large-sample theory may not be fulfilled. and for Poisson data we would use φµ as variance. Deviance/df ).6. Thus. from that model as an estimator of φ. c Studentlitteratur ° . the mean parameter would be assumed to follow some random distribution over individuals while the response follows a second distribution. the ”maximal model” is a somewhat subjectively chosen ”large” model which includes all eﬀects that can reasonably be included. and to use the mean deviance (i. which has φ = 0. you may have to add terms to the predictor. you have to examine all other possible reasons for poor model fit. For binomial data this means that we would use the variance np (1 − p) φ. This leads to test statistics which are too large: it becomes too easy to get a significant result. interaction terms or nonlinear terms. • Wrong choice of link function. 3. we would assume that V ar (Y ) = φσ 2 . but somewhat rough. A simple. We can then re-fit the model. Thus. • There may be outliers in the data. A common eﬀect of over-dispersion is that estimates of standard errors are under-estimates. using the obtained value of the over-dispersion parameter. Instead. A more satisfactory approach would be to model the over-dispersion based on some specific model. This would 1 Note that this ”maximal model” is not the same as the saturated model.e. • When the data are sparse.6. These include: • Wrong choice of linear predictor. way to estimate φ is to fit a “maximal model”1 to the data. see Collett (1991) for details.

the average number of observations per cell will be less than 1. It is then easy to imagine that many of the cells will be empty.7 Non-convergence When using packages like Genmod for fitting generalized linear models. Non-convergence occurs because of the structure of the data in relation to the model that is being fitted. The model is then under-identified. interaction terms. We will return to the topic of over-dispersion as we discuss fitting of gener- alized linear models to diﬀerent types of data. As a general advice: when the procedure does not converge. the error messages are given in the program log. in particular.3. If the sample size is moderate. 3. Make tables and other summaries of the data to find out the reasons for the failure to converge. is discussed in Chapter 8. it may happen that the program reports that the procedure has not converged. a crosstable of dimension 4·3·3·3 contains 108 cells. The convergence is questionable. the procedure may fail to converge when it tries to fit estimated proportions close to 0 or 1. try to simplify the model as much as possible by removing. based on so-called Quasi-likelihood estimation. WARNING: The procedure is continuing but the validity of the model fit is questionable. Model diagnostics 63 lead to compound distributions. You can get some output even if these warnings have been given. Convergence problems are likely in such cases. Another approach to over- dispersion. Note that in SAS. WARNING: The specified model did not converge. Sometimes the convergence is slow and the procedure reports estimates of standard errors that are very large. This may happen when many observed proportions are 0 or 1. c Studentlitteratur ° . A few examples of compound distributions are discussed in Chapters 5 and 6. This can easily happen in the analysis of multidimensional crosstables. See also Lindsey (1997) for details. say n = 100. When the data are binomial. Typical error messages might be WARNING: The negative of the Hessian is not positive definite. A common problem is that the number of observed data values is small relative to the number of parameters in the model. For example.

3. The plots of residuals against predicted values indicate that the variation is larger for larger predicted values. This plot should yield a straight line. using the statistics we have discussed in this chapter. 4.8.8. Deviations from this pattern may indicate the wrong link function. using the same model as for y.1 and Figure 3. A plot of residuals against covariates should show the same pattern as the previous plot. Any systematic pattern in this plot may indicate that u should be used as a covariate. As illustrations of the various plots we use the example on somatic cells in the milk of sheep.2 for Normal and Poisson distributions. A normal probability plot of the residuals plots the sorted residuals against their expected values. Plotting the residuals in the order the observations are given in the data may help to detect possible dependence between observations. Plots of residuals against predicted values for the data in the example on Page 50 are given in Figure 3. c Studentlitteratur ° . A plot of residuals against the fitted values b η should show a pattern where the residuals have a constant mean value of 0 and a constant range. The residuals can also be plotted to detect an omitted covariate u. as long as we can assume that the residuals are approximately Normal. Applications 3. and n is the sample size. Deviations from this “random” pattern may arise because of incorrect link function. This is done as follows: fit a model with u as response. This tendency is strongest for the Normal model. respectively.1 Residual plots In this section we will discuss a number of useful ways to check models.8 Applications 3. These are given by Φ−1 [(i − 3/8) / (n + 1/4)] where Φ−1 is the inverse of the standard Normal distribution function. discussed in the previous chapter (page 50). i is the order of the observation. The following types of residual plots are often useful: 1. 2. and plot these against each other. 5. or omission of non-linear terms in the linear predictor. Obtain unstandardized residuals from both these models. For the illustrations we use a model with a Normal distribution and a unit link. incorrect choice of scale or omission of non-linear terms. and a model with a Poisson distribution and a log link. wrong choice of scale of the covariates.64 3.

3. Normal distribution and identity link.2: Plot of residuals against predicted values for example data. Figure 3. Model diagnostics 65 Figure 3.1: Plot of residuals against predicted values for example data. c Studentlitteratur ° . Poisson distribution and log link.

2 Variance function diagnostics McCullagh and Nelder (1989) suggest the following procedure for checking the variance function. where ζ is some constant.3: Normal probability plot for the example data. Normal probability plots for these data are given in Figures 3. PROBPLOT resdev / NORMAL (MU=est SIGMA=est color=black w=2 ) height=4. for the Normal and Poisson models. by Proc Univariate in SAS. PROC UNIVARIATE normal data=ut. SAS code for the normal probability plots presented here was as follows.8. Fit the model for diﬀerent values of ζ. Normal probability plots can be produced i. LABEL resdev="Deviance residual". respectively. c Studentlitteratur ° . Assume that the variance is proportional to µζ . VAR resdev.4. A Normal probability plot is a plot of the residuals against their normal quantiles. Applications Figure 3. The value of ζ for which the deviance is as small as possible is suggested by the data. 3. The deviance residuals were stored in the file ut under the name resdev. but the fit is not perfect. The distribution of the residuals is closer to Normal for the Poisson model.a.3 and 3. Normal distribution with an identity link. RUN.66 3. and plot the deviance against ζ.8.

4 Transformation of covariates So called partial residual plots can be used to detect whether any of the covari- ates need to be transformed.8. This can be plotted against b z. 3.4: Normal probability plot for the example data. b η is the fitted linear predictor. Poisson distribution with a log link. 3.8. where z is the adjusted dependent variable. x is a covariate and γb is the parameter estimate for the covariate. The plot should be approximately linear if no transformation is needed. This is defined as zi = g (b η. The partial residual is defined as u = z −b η +b γ x. Curvature in the plot is an indication that x may need to be transformed. The partial residuals can be plotted against x.3 Link function diagnostics To check the link function we need the so called adjusted dependent variable µi ).3. c Studentlitteratur ° . If the link is correct this should result in an essentially linear plot. Model diagnostics 67 Figure 3.

68 3. Plot the residuals against the predicted values C. Plot the residuals against the predicted values C. Exercises 3. Calculate predicted values and residuals B. Calculate the leverage of all observations Comment on the results Exercise 3.3: A.3 to: A. Prepare a Normal probability plot D.9 Exercises Exercise 3.2 For the data in Exercise 1. Plot the residuals against the predicted values C. Calculate predicted values and residuals B.9. Calculate the leverage of all observations Comment on the results Exercise 3. Calculate predicted values and residuals B.1: A. Calculate the leverage of all observations Comment on the results c Studentlitteratur ° .1 For the data in Exercise 1. Prepare a Normal probability plot D.3 Use one or two of your “best” models for the data in Exercise 2. Prepare a Normal probability plot D.

We used this program on the regression data given on page 9. The results are: 69 . t tests etc. RUN.1 GLM:s as GLIM:s General linear models. such as regression models. We will illustrate this on some of the GLM examples we discussed in Chapter 1. MODEL y = x / DIST=Normal LINK=Identity . ANOVA. 4.1 Simple linear regression A simple linear regression model can be written in Genmod as PROC GENMOD. The identity link is the default link for the Normal distribution. can be stated as generalized linear models by using a Normal distribution and an identity link.4. Models for continuous data 4. we will use the SAS (2000b) procedure Genmod for data analysis. Throughout this chapter.1.

1.0978 · T ime.0001 TIME 1 2.2294 Scaled Pearson X2 6 8.7443 + 2. The t test in the regression output was obtained as bj − 0 β t= r ³ ´ Vd b ar β j c Studentlitteratur ° .3765 4. The regression model is estimated as yb = −36.0000 1.2294 Scaled Deviance 6 8.6233 0. The Wald test of the hypothesis that the parameter β j is zero is given by b −0 β j z=r ³ ´. the scaled deviance is 8. . Also.0000 1.3333 Log Likelihood . Note that the deviance reported by the Genmod procedure is equal to the error sum of squares in the output on page 10.7443 3. These tests are equivalent only in large samples.70 4.1186 312. GLM:s as GLIM:s The GENMOD Procedure Model Information Description Value Data Set WORK.8179 92. -15.0978 0. Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 -36.3333 Pearson Chi-Square 6 25.9690 . NOTE: The scale parameter was estimated by maximum likelihood.4453 . which is equal to the sample size. The tests. are not the same as in a standard regression analysis: the Genmod tests are Wald tests while the tests in the regression output are t tests.7810 0. d V ar βb j where z is a standard Normal variate.3765 4.0001 SCALE 1 1.EMISSION Distribution NORMAL Link Function IDENTITY Dependent Variable EMISSION Observations Used 8 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 6 25. This is the same estimate as given by a standard regression routine. however.8360 0.

c Studentlitteratur ° .0000 1. CLASS medium.1023 Scaled Pearson X2 50 57.2 Simple ANOVA A Genmod program for an ANOVA model (of which the simple t test is a special case) can be written as PROC GENMOD DATA=lissdata.1155 18.1721 = 1.7810. SCALE gives the estimated scale parameter as 1. RUN.6823 .0566.2294 = 2. For these data we get 2 6 √ bML = 8 · 4.2294 = 3. An√ unbiased estimate b2 = Deviance/df = 4. MODEL diff = medium / DIST=normal LINK=identity . 4.1023 Scaled Deviance 50 57.1.781. -159. Note that the ML estimator of σ 2 is biased. This is the ML esti- mate of σ. The relation between these two estimates is that the ML estimate does not b2ML = n−p account for the degrees of freedom: σ n σ b2 .4.1155 18.2294 giving σ of σ 2 is given by σ b = 4.0000 1. using the data on page 16.1400 Log Likelihood . The output from this program. Models for continuous data 71 with n − p degrees of freedom. contains the fol- lowing information: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 50 905.1721 so σ σ bML = 3.1400 Pearson Chi-Square 50 905.

9363 18. The choice of distribution Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 9.0001 MEDIUM Mannitol 1 -2.5554 0. In fact.4219 MEDIUM Ringer 1 -9. LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi MEDIUM 6 62.4089 49. Note that the parameter estimates are the same as for the ANOVA output given on page 17. As noted above.1116 0. Experience with the type of data at hand can often suggest a suitable distribution.g. can alternatively be run using Proc Genmod.1023. However. pairwise comparisons between the media.4340 0.0001 MEDIUM Ultravist 0 0.0001 The scale parameter. . The examples given in this section show that many analyses that can be run as general linear models. The parameter estimates for the diﬀerent media are given.0624 22. NOTE: The scale parameter was estimated by maximum likelihood.4563 0.4841 2. while the tests of single parameters are Wald tests.6979 2.2 The choice of distribution One advantage of the generalized linear model approach is that it is not necessary to limit the models to Normal distributions.3732 . In many cases there are theoretical justifications for assuming other distributions than the Nor- mal. SCALE 1 3. the JMP program (SAS Institute.72 4.0001).1257 0. while the Genmod analysis gives the type 3 test as a χ2 approximations to the likelihood ratio test. while an unbiased estimate of σ2 is given by Deviance/df =18.0001 MEDIUM Diatrizoate 1 11. using e.9079 1.2. The eﬀect of Medium is significant (p<0. 4.0021 MEDIUM Isovist 1 -8.2437 1. The scaled deviance is 57.9849 0.1517 0.9363 9.9473 1. . Proc GLM in SAS. which is the ML estimator of σ.9363 1.2717 25.8902 0.g.2192 1.1 (based on Leemis. take a generalized linear model approach to all model fitting. is estimated as 3.0000 .3136 0. 1986) summarizes the relation- c Studentlitteratur ° .98.0001 MEDIUM Hexabrix 1 -5.2518 MEDIUM Omnipaque 1 -1.0000 0. the scaled deviance is equal to n and the deviance is equal to the residual sum of squares in Normal theory models. the ANOVA gives the tests as an over-all F test and as t tests for single parameters. 1993). 2000a). Figure 4. which makes it possible to make e. as well as the related procedure Insight in SAS. This also holds for the pioneering Glim software (Francis et al.6451 0.5182 1.

4.2) and that Γ (n) = (n − 1)! (4. in fact. α. that not all these distributions are members of the exponential family.2. using the gamma function.4.. If α is positive. The parameter β mostly influences the spread of the distribution and is called the scale parameter. The parameter α describes the shape of the distribution. it holds that Γ (α + 1) = αΓ (α) for α > 0 (4. Note that the gamma distribution is a member of the exponential family. A few examples of gamma distributions are illustrated in Figure 4. It has a reciprocal canonical link. Note. The gamma distribution is defined. a particularly useful class of distributions is the gamma distribution. These relations also hold for the special cases of the gamma distribution that are described below. as 1 f (y.3..4) Γ (α) β α The gamma distribution has two parameters. A χ2 distri- bution with p degrees of freedom can be obtained as a gamma distribution c Studentlitteratur ° . The variance function of the gamma distribution is V (µ) = µ2 .1) 0 is called a gamma function. α and β. mainly the peakedness and is often called the shape parameter. Models for continuous data 73 ships between some common distributions. 4.3) where (n − 1)! = (n − 1)·(n − 2)·. however.3 The Gamma distribution Among all distributions in the exponential family. β) = y α−1 e−y/β . g (y) = − y1 . For the gamma distribution it holds that E (y) = αβ and V ar (y) = αβ 2 .·2·1. For the gamma function. (4.1 The Chi-square distribution The χ2 distribution is a special case of the gamma distribution. then the integral Z∞ Γ (α) = tα−1 e−t dt (4.

1: Relationships among common distributions.74 4. c Studentlitteratur ° . Adapted from Leemis (1986).3. The Gamma distribution Figure 4.

2 0.9 0. An example of this is given on page 157. 4. The exponential distribution is sometimes used as a simple model for lifetime data. The χ2 distribution has mean value E χ = p and variance V ar χ2 = 2p. Models for continuous data 75 1 0. respectively.4.8 0.7 0.1 Hurn et al (1945). For this reason.5 0.2: Gamma distributions with parameters α = 1 and β =1.6 0.2 The Exponential distribution The exponential distribution has density 1 −y/β f (y.3. studied the clotting time of blood. β) = e (4. 3 and 5.3 An application with a gamma distribution Example 4. (n−1)s σ2 ∼ χ2 with (n − 1) degrees of freedom.3. Note that.1 0 0 2 4 6 8 10 12 14 Figure 4. Two diﬀerent clotting agents were com- c Studentlitteratur ° . for data from a Normal 2 distribution. 4. with ¡ 2parameters ¢ α = p/2 and¡ β ¢= 2. quoted from McCullagh and Nelder (1989). the gamma distribution is sometimes used for modelling of variances.5) β It can be obtained as a gamma distribution with α = 1.3 0. 2.4 0.

0013 359.9825 0. A Genmod analysis of the full model gave the following output: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 14 0.0006 164.0298 0. Thus.9927 0. .0021 Scaled Pearson X2 14 18.3.0074 0.9674 1.0083 0.0001 AGENT 2 0 0.0452 0.0001 LC*AGENT 1 1 -0.0001 AGENT 1 1 0.0015 24.1058 203.0001 LC*AGENT 2 0 0. c Studentlitteratur ° . The Gamma distribution pared for diﬀerent concentrations of plasma.0294 0. -26. SCALE 1 611.3015 Log Likelihood . −1/µ.6464 .0704 0.2205 1. LC 1 0.0000 . The data are: Clotting time Conc Agent 1 Agent 2 5 118 69 10 58 35 15 42 26 20 35 21 30 27 18 40 25 16 60 21 13 80 19 12 100 18 12 Duration data can often be modeled using the gamma distribution. . Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 -0.0239 0. The canonical link of the gamma distribution is minus the inverse link.2834 Pearson Chi-Square 14 0. Preliminary analysis of the data suggested that the relation between clotting time and concentration was better approximated by a linear function if the concentrations were log-transformed.0000 0. NOTE: The scale parameter was estimated by maximum likelihood.0000 0.0021 Scaled Deviance 14 17.0005 1855. the models that were fitted to the data were of type 1 = β 0 + β 1 d + β 2 x + β 3 dx µ where x = log (conc) and d is a dummy variable with d = 1 for lot 1 and d = 0 for lot 2. . This is a kind of covariance analysis model (see Chap- ter 1).76 4.0236 0.0000 .5976 .

the time T it takes for the particle to reach a barrier for the first time has an inverse Gaussian distribution. ¤ 4.97 on 14 df. called Brownian motion after the British botanist Robert Brown. A graph of the shape of inverse Gaussian distributions is given in Figure 4.4. In a so called Wiener process for a particle. but McCullagh and Nelder note that the lowest concentration value might have been misrecorded. The scaled deviance is 17. is given in Figure 4. µ and λ. Models for continuous data 77 Figure 4. has its roots in models for random movement of particles. The c Studentlitteratur ° . λ) = exp (4. It has mean value µ and vari- ance µ3 /λ. A plot of the fitted model.6) 2πx3 2µ2 x The distribution has two parameters. along with the data. and resembles the log- normal and gamma distributions.3. The fit is good. µ.4. also called the Wald distribution. The density function is µ ¶1/2 " # λ −λ (x − µ)2 f (x. It belongs to the exponential family and is available in procedure Genmod.3: Relation between clotting time and log(concentration) We can see that all parameters are significantly diﬀerent from zero.4 The inverse Gaussian distribution The inverse Gaussian distribution. which means that we cannot simplify the model any further. The distribution is skewed to the right.

See Armitage and Colton (1998) for references.78 4. 4. Model diagnostics 0 0. The plot does not show the even.1 Plot of residuals against predicted values The residuals can be plotted against the predicted values.5 Model diagnostics For the example data on page 76. µ = 3 and µ = 4. distribution has also been used to model the length of time a particle remains in the blood. Two observations in the lower right corner are possible outliers. Such a plot for our example data is given in Figure 4. and length of stay in hospitals.5 1 1. maternity data.5 2 2. respectively.4: Inverse Gaussian distributions with λ = 1 and mean values µ = 1 (lowest curve). random pattern that would be expected. c Studentlitteratur ° . crop field size. In Normal theory models this kind of plot can be used to detect heteroscedasticity. µ = 2. 4. In this section we will present some examples of model diagnostics based on these data.5. the deviance residuals and the predicted values were stored in a file for further analysis.5.5 Figure 4.5.

However. or that some observations may be outliers. systematic patterns in this kind of plot may indicate that non-linear terms are needed. A plot of deviance residuals against log(concentration) for the gamma regression data is given in Figure 4.5. In simple linear models.5.5. c Studentlitteratur ° .7. A Normal probability plot for the gamma regression data is given in Figure 4. Still.3 Plots of residuals against covariates A plot of residuals against quantitative covariates can be used to detect whether the assumed model is too simple. These are the same two observations that stand out in figure 4.6.4. the normal probability plot is a useful tool for detecting anomalies in the data.5: Plot of residuals against predicted values for the Gamma regression data 4. The figure may indicate that the two observations in the lower left corner are outliers. the distributional properties of the residuals in finite samples depend upon the type of model. Models for continuous data 79 Figure 4. For most generalized linear models the residuals can be regarded as asymptotically Normal. 4.2 Normal probability plot The Normal probability plot can be used to assess the distributional proper- ties of the residuals.

6: Normal probability plot for the gamma regression data Figure 4.80 4.5. Model diagnostics Figure 4.7: Plot of deviance residuals against Log(conc) for the gamma regression data c Studentlitteratur ° .

c Studentlitteratur ° .22 = 0. these are the observations that have been noted in the other diagnostic plots.8. Models for continuous data 81 4.9. The diagonal elements of the Hat matrix were computed using Proc Insight (SAS.44. 2000b). observation with a leverage above twice that amount. should be examined. the average leverage is 4/18 = 0. here above 2 · 0.5. Since there are four parameters and n = 18. As noted in Chapter 3.e.222.4.4 Influence diagnostics The value of Dfbeta with respect to log(conc) was calculated for all observa- tions and plotted against log(conc). i. These values are plotted against the sequence numbers of the observations in Figure 4. The resulting plot is given in Figure 4. For these data the first two observations have a high leverage. The figure shows that observations with the lowest value of log(conc) have the largest influence on the results. the two possible outliers noted earlier are actually placed on top of each other in this plot. These are the same observations that were noted in other diagnostic plots above.

Figure 4. c Studentlitteratur ° .8: Dfbeta plotted against log(conc) for the gamma regression data.82 4.9: Leverage plot. Model diagnostics Figure 4. Gamma regression data.5.

92 0. Fit an exponential distribution to these data by applying a generalized linear model with an appropriate distribution and link.36 0.24 0.71 0.6 Exercises Exercise 4. we use the first 200 observations only.72 0.4.66 0.45 1.62 II 0. The analysis suggested by Box and Cox was a standard twoway ANOVA on the data transformed as z = 1/y.31 0.61 0.1 The following data.18 0.63 0.71 0.76 0. and where the linear pre- dictor only contains an intercept.21 0.33 Analyze these data to find possible eﬀects of poison. and also make a generalized linear model analysis assuming that the data can be approximated with a gamma distribution. Data were extracted from Cox and Lewis (1966). In both cases.43 0.37 0. who gave credit to Drs. P.22 0.40 0. and inter- actions.56 0. If pulses arrive in a completely random fashion one would expect the distri- bution of waiting times between pulses to follow an exponential distribution.38 III 0.31 0.23 1.29 0.46 0.23 0.29 0.82 0.30 0.30 0. Make this analysis. Compare the observed data with the fitted distribution using diﬀerent kinds of plots.24 0.44 0. Exercise 4.35 1.38 0.43 0.22 0.49 0. Treatment Poison A B C D I 0. The original data set consists of 799 observations.10 0. Fatt and B.36 0.45 0.40 0. The ex- periment is a two-way factorial experiment with factors Poison (three levels) and Treatment (four levels).02 0.25 0.45 0. make residual diagnostics and influence diagnostics. treatment.31 0. The data are as follows: c Studentlitteratur ° . show the survival times (in 10 hour units) of a certain variety of animals.2 The data given below are the time intervals (in seconds) be- tween successive pulses along a nerve fibre.88 0. Models for continuous data 83 4. taken from Box and Cox (1964).23 0. Katz.

21 0.14 0.17 0.14 1.15 0.35 0.35 1.08 0.44 0.06 0.27 0.04 0.40 0.05 0.26 0.15 0.04 0.04 0.01 0.27 0.05 0.03 0.09 0.14 0.26 0.01 0.16 0.23 0.19 0.06 0.15 0.96 0.11 0.10 0.51 0.14 0.07 0.25 0.64 0.03 0.07 0.16 0.11 0.04 0.25 0.51 0.07 0.13 0.02 0.43 0.37 0.01 0.01 0.35 0.05 0.15 0.06 0.70 0.05 0.55 0.24 0.28 0.19 0.18 0.07 0.12 0.84 4.38 0.09 0.15 0.06 0.78 0.19 0.49 0.08 0.04 0.05 0. Exercises 0.28 0.04 0.38 0.49 0.68 0.23 0.10 0.06 0.10 0.29 0.23 0.74 0.04 0.12 0.15 0.06 0.33 0.15 0.09 0.38 0.11 0.30 0.40 0.32 0.11 0.01 0.47 0.32 0.15 0.17 0.93 0.01 0.26 0.61 0.08 0.27 0.08 0.09 0.01 0.55 0.16 0.08 0.29 0.07 0.10 0.29 c Studentlitteratur ° .08 0.05 0.09 0.56 0.12 0.09 0.02 0.20 0.03 0.26 0.02 0.05 0.10 0.15 0.04 0.10 0.06 0.05 0.06 0.45 0.01 0.16 0.11 0.12 0.24 0.38 0.01 0.09 0.05 0.15 1.59 0.07 0.07 0.02 0.16 0.16 0.06 0.24 0.20 0.31 0.15 0.02 0.71 0.16 0.07 0.50 0.39 0.07 0.26 0.34 0.36 0.21 0.30 0.01 0.21 0.13 0.73 0.14 0.15 0.21 0.58 0.05 0.6.94 0.21 0.05 0.25 0.09 0.06 0.05 0.11 0.29 0.51 0.03 0.74 0.38 0.01 0.14 0.03 0.

The proba- bility that a plate will contain no organisms. then pi = 1 − e−µi .5. for each plate. c Studentlitteratur ° . this function is asymmetric around 0. if pi is the probability that growth occurs under dilution i. by taking logarithms. is said to follow a Bernoulli distribution.2 Distributions for binary and binomial data 5. This means that dilution by a factor of two gives a solution with N2 individuals per unit volume.2. The probability function of a Bernoulli random variable y is ½ 1 − p if y = 0 f (y) = py (1 − p)1−y = . and has density h i f (y) = β exp (α + βy) − e(α+βy) .2. The probit. Therefore. log µi = log N − i log 2. or a Gumbel distribution. Binary and binomial response variables 87 active organisms in a solution. µi = − log (1 − pi ) which gives log µi = log [− log (1 − pi )] . µi = N 2i or. After some time it is possible to record. 5. Thus. (5.1 The Bernoulli distribution A binary random variable that takes on the values 1 and 0 with probabilities p and 1 − p. Samples from each dilution are applied to plates that contain some growth medium. assuming a Poisson distribution.4) p if y = 1 The Bernoulli distribution has mean value E (y) = p and variance V ar (y) = p (1 − p). logit and complementary log-log links are compared in Figure 5. If the organisms are randomly distributed one would expect the number of individuals per unit volume to follow a Poisson distribution with mean µi .3) This is the complementary log-log link: log [− log (1 − pi )]. The method works as follows. The solution containing the organisms is progressively diluted. Thus. After i dilutions the concentration is N 2i . The tolerance distribution that corresponds to the complementary log-log link is called the extreme value distribution. (5. respectively. whether it has been infected by the or- ganism or not. is e−µi . As opposed to the probit and logit links. Suppose that the original solution contained N individuals per unit volume.

6 0. The proportion of successes.4 0.88 5. c Studentlitteratur ° .2 0. Log-log -3 -4 Figure 5. follows the same distribution. Since the Bernoulli distribution is a special case of the binomial distribution with n = 1.6) on page 37. As was demonstrated in formula (2. The probability function of the binomial distribution is µ ¶ n y f (y) = p (1 − p)n−y .2.2 The Binomial distribution If a Bernoulli trial is repeated n times such that the trials are independent. It holds that E (b p) = p(1−p) p) = p and V ar (b n . (5. then y = the number of successes (1:s) among the n trials follows a bino- mial distribution with parameters n and p. pb = ny .2: The probit. except for a scale factor: f (y) = f (b p).5) y The binomial distribution has mean E (y) = np and variance V ar (y) = np (1 − p) .8 1 Probability -1 -2 Probit Logit Compl. logit and complementary log-log links 5.2. the binomial distribution is a member of the exponential family. Distributions for binary and binomial data 4 Transformed value 3 2 1 0 0 0. even the Bernoulli distribution is an exponential family distribution.

8 0.58 48 16 33 2.41 50 6 12 A plot of the relation between the proportion of aﬀected insects and Log(Conc) is given below.3 0.6 0. 5.6 0.8 0.2 1. A fitted distribution is also included.9 1. and p is the proportion aﬀected in the population.1 Finney (1947) reported on an experiment on the eﬀect of Rotenone.4 0. Binary and binomial response variables 89 When the binomial distribution is applied for modeling real data.6 0. the crucial assumption is the assumption of independence.01 50 44 88 7. in diﬀerent concentrations. in batches of about fifty. this can often be diagnosed as over-dispersion.5 1. This can be achieved as a generalized linear model by specifying a binomial distribution for the response and using a probit link.1 0.89 49 42 86 5. aﬀected % aﬀected 10. The results were: Conc Log(Conc) No. If independence does not hold.7 0. The probit analysis approach is to assume a linear relation between Φ−1 (p) and log(dose). where Φ−1 is the inverse of the cu- mulative Normal distribution (the so called probit). The dependent variable is a proportion. 1 0.71 46 24 52 3.5.2 1. when sprayed on the insect Macrosi- phoniella sanborni. The following SAS program was used to analyze these data using Proc Gen- mod: c Studentlitteratur ° .3 Probit analysis Example 5.8 Relation between log(Conc) and proportion aﬀected This situation is an example of a “probit analysis” setting. of insects No.2 0 0 0.

10. indicating a good fit.6 50 6 . CARDS. In this case. Collett (1991) states that “a useful rule of thumb is c Studentlitteratur ° .7390 0.7390 0.PROBIT Distribution BINOMIAL Link Function PROBIT Dependent Variable X Dependent Variable N Observations Used 5 Number Of Events 132 Number Of Trials 243 This part simply gives us confirmation that we are using a Binomial distri- bution and a Probit link.5763 Log Likelihood . if the sample is large.0516 .3.7289 0. PROC GENMOD DATA=probit. -120. the value is 1.2 50 44 7.7 49 42 5. logconc=log10(conc).74 which is clearly non-significant.5797 Scaled Deviance 3 1. Probit analysis DATA probit. INPUT conc n x.7289 0. The deviance can be interpreted as a χ2 variate on 3 degrees of freedom.90 5. MODEL x/n = logconc / LINK = probit DIST = bin . Part of the output is as follows: The GENMOD Procedure Model Information Description Value Data Set WORK.5797 Pearson Chi-Square 3 1.1 46 24 3.5763 Scaled Pearson X2 3 1. This section gives information about the fit of the model to the data. Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 3 1.8 48 16 2.

4747 Scaled Deviance 3 1. 66).0001 LOGCONC 1 4.2132 · log (conc) . -119. The output finally contains an Analysis of Parameter estimates.8875 + 4.4241 0.4218 0.1 using a binomial distribution with a logit link function. The fit of the model is excellent.68535 = 4.0001 SCALE 0 1.2132 conc = 100.68535 giving b β1 4. This gives estimates of model parameters. .3501 68.5.2132 0. In this case.4739 Scaled Pearson X2 3 1. their standard errors.5919 0.0085 0.8875 0. The program and part of the output are similar to the probit analysis.0000 0.8875 0 log (conc) = − = = 0.4 Logit (logistic) regression Example 5.8942 .4783 77.8456.4241 0. The dose that aﬀects 50% of the animals (ED50 ) can be calculated: if p = 0.4218 0. the estimated model is Φ−1 (p) = −2.5 then Φ−1 (p) = 0 from which b β 2.4747 Pearson Chi-Square 3 1. we can al- ternatively analyze the data in Table 5.2 Since the logit and probit links are very similar.0000 . c Studentlitteratur ° . ¤ 5. as for the probit analysis case: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 3 1.4739 Log Likelihood . the model is satisfactory” (p. Binary and binomial response variables 91 that when the deviance on fitting a linear logistic model is approximately equal to its degrees of freedom. and a Wald test of each parameter in the form of a χ2 test. Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 -2.

6429 57.0000 0. 5.1. .3 The data in Table 5. This is similar to the estimate provided by the probit analysis.1462 · log (conc)) pb = . Multiple logistic regression The parameter estimates are given by the last part of the output: Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 -4. Note that the estimated proportion aﬀected at a given concentration can be obtained from exp (−4. 1 + exp (−4. 1 − pb The estimated model parameters permits us to estimate e.5. Snedecor and Cochran.1462 which. this value is b β −4.8869 0 ED50 = − =− = 0. the dose that gives a 50% eﬀect (ED50 ) as the value of log (conc) for which p = 0.5.0000 .0001 LOGCONC 1 7.5 log 1−0. is 100.1462 · log (conc)) ¤ It can be mentioned that data in the form of proportions have previously often been analyzed as general ³p ´linear models by using the so called Arc sine transformation y = arcsin pb (see e.0744 0.5. Since 0.83.8869 + 7.8869 0. on the dose scale.g. were col- lected to explore whether it was possible to diagnose nodal involvement in c Studentlitteratur ° . The resulting estimated model is pb logit (b p) = log = −4.5 Multiple logistic regression 5.1 Model building Model building in multiple logistic regression models can be done in essen- tially the same way as in standard multiple regression. taken from Collett (1991).1462 log (conc) .6839 = 4.8869 + 7.8869 + 7.6839 b β1 7. 1980).1462 0.92 5.7757 0.0001 SCALE 0 1.8928 64. Example 5.g.5 = 0.

6 2 0 0 0 0 59 .4 8 0 0 0 0 64 .7 0 0 1 0 1 63 .94 5.6 2 1 0 0 0 57 .7 8 0 0 0 0 60 .9 9 0 0 1 1 63 .2 6 1 1 1 1 Table 5.5 1 1 1 1 1 50 .6 6 0 1 1 0 51 .5 6 0 0 1 1 65 .4 6 1 0 0 0 58 .4 0 0 1 1 0 68 .5 0 0 1 0 0 66 .4 9 0 1 0 1 49 .5 5 1 0 0 0 65 .6 7 1 0 1 1 67 .4 7 0 0 1 0 53 .5 9 0 1 1 0 65 .8 4 1 1 1 1 56 .9 5 0 1 0 0 67 .6 7 0 1 1 1 61 1 .7 8 1 1 1 1 67 .4 0 0 1 0 0 58 .6 3 1 1 1 0 58 .7 5 0 0 0 0 67 .8 7 0 0 0 0 57 .8 9 1 1 0 1 68 1 .6 7 0 1 0 1 59 .1: Predictors of nodal involvement on prostate cancer patients c Studentlitteratur ° .5 0 0 0 1 0 50 . Multiple logistic regression A ge A cid X ray S ize G rade I n v o lv A ge A cid X ray S ize G rade I n v o lv 66 .7 2 1 1 0 1 56 .7 0 0 1 1 1 56 .5 5 0 1 1 0 60 .6 5 0 0 0 0 53 .5.0 2 0 1 0 0 51 .5 0 0 0 0 0 64 .8 3 0 0 0 0 45 .8 2 0 1 0 1 64 1 .5 2 0 0 0 0 46 .9 8 0 0 0 0 56 .5 0 0 0 0 0 52 .5 0 0 1 1 0 56 .7 6 0 1 0 0 67 .8 1 1 1 1 1 60 .7 1 0 0 0 0 61 1 .4 9 0 0 0 0 66 .5 2 0 0 0 0 63 .5 6 0 0 0 0 61 .3 6 1 0 0 1 51 .4 9 0 0 0 0 65 .8 2 0 0 0 1 64 .7 6 1 1 1 1 52 .4 8 0 1 1 0 61 .4 8 1 1 0 1 60 .

5. Binary and binomial response variables 97

Residual Model Diagnostics
Normal Plot of Residuals I Chart of Residuals
2.5 3
1
2.0
2 11
1.5
5552 3.0SL=1.417

Residual
1.0
Residual

1
0.5 22
2
0.0 0 X =-0.05895
-0.5
-1.0 -1
5 5 -3.0SL=-1.535
-1.5 55
-2.0 -2 1

-2 -1 0 1 2 0 10 20 30 40 50
Normal Score Observation Number

Histogram of Residuals Residuals vs. Fits
20 2.5
2.0
1.5
Frequency

Residual
1.0
10 0.5
0.0
-0.5
-1.0
-1.5
0 -2.0
-2.0-1.5-1.0-0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0
Residual Fit

Figure 5.3: Residual plots for nodal involvement data

Stepwise selection: This is a modification of the forward selection model.
Variables are added to the model step by step. In each step, the procedure
also examines whether variables already in the model can be deleted.
Best subset selection: For k = 1, 2, ... up to a user-specified limit, the method
identifies a specified number of best models containing k variables. Tests
for this method are based on score statistics (see Chapter 2).

Although automatic variable selection methods may sometimes be useful for
“quick and dirty” model building, they should be handled with caution.
There is no guarantee that an automatic procedure will always come up
with the correct answer; see Agresti (1990) for a further discussion.

5.5.3 Model diagnostics
As an illustration of model diagnostics for logistic regression models, the pre-
dicted values and the residuals were stored as new variables for the multiple
logistic regression data (Table 5.1). Based on these, a set of standard di-
agnostic plots was prepared using Minitab. These plots are reproduced in
Figure 5.3.

c Studentlitteratur
°

98 5.6. Odds ratios

It appears that the distribution of the (deviance) residuals is reasonably
Normal. The “runs” that are shown in the I chart appear because of the
way the data set was sorted. Note that the Residuals vs. Fits plot is not
very informative for binary data. This is because the points scatter in two
groups: one for observations with y = 1 and another group for observations
with y = 0.

5.6 Odds ratios
If an event occurs with probability p, then the odds in favor of the event is
p
Odds = . (5.6)
1−p
For example, if an event occurs with probability p = 0.75, then the odds in
favor of that event is 0.75/ (1 − 0.75) = 3. This means that a “success” is
three times as likely as a “failure”. If the odds are known, the probability p
Odds
can be calculated as p = Odds+1 .
A comparison between two events, or a comparison between e.g. two groups
of individuals with respect to some event, can be made by computing the
odds ratio
p1 / (1 − p1 )
OR = . (5.7)
p2 / (1 − p2 )
If p1 = p2 then OR = 1. An odds ratio larger than 1 is an indication that
the event is more likely in the first group than in the second group. The odds
ratio can be estimated from sample data as long as the relevant probabilities
can be estimated. Estimated odds ratios are, of course, subject to random
variation.
If the probabilities are small, the odds ratio can be used as an approximation
to the relative risk, which is defined as
p1
RR = . (5.8)
p2
In some sampling schemes, it is not possible to estimate the relative risk but
it may be possible to estimate the odds ratio. One example of this is in
so called case-control studies. In such studies a number of patients with a
certain disease are studied. One (or more) healthy patient is selected as a
control for each patient in the study. The presence or absence of certain risk
factors is assessed both for the patients and for the controls. Because of the
way the sample was selected, the question whether the risk factor is related
to disease occurrence cannot be answered by computing a risk ratio, but it
may be possible to estimate the odds ratio.

c Studentlitteratur
°

5. Binary and binomial response variables 99

Example 5.4 Freeman (1989) reports on a study designed to assess the
relation between smoking and survival of newborn babies. 4915 babies to
young mothers were followed during their first year. For each baby it was
recorded whether the mother smoked and whether the baby survived the first
year. Data are as follows:

Survived
Smoker Yes No
Yes 499 15
No 4327 74

The probability of dying for babies to smoking mothers is estimated as
15/(499 + 15) = 0.02918 and for non-smoking mothers it is 74/(4327 + 74) =
0.01681. The odds ratio is 0.02918/(1−0.02918)
0.01681/(1−0.01681) = 1.758. The odds of death for
the baby is higher for smoking mothers. ¤
Odds ratios can be estimated using logistic regression. Note that in logistic
p
regression we use the model log 1−p = α + βx where, in this case, x is a
dummy variable with value 1 for the smokers and 0 for the nonsmokers. This
is the log of the odds, so the odds is exp (α + βx). Using x = 1 in the
numerator and x = 0 in the denominator gives the odds ratio as

exp (α + β)
OR = = eβ
exp (α)

Thus, the odds ratio can be obtained by exponentiating the regression coeﬃ-
cient β in a logistic regression. We will use the same data as in the previous
example to illustrate this. A SAS program for this analysis can be written
as follows:

DATA babies;
INPUT smoking \$ survival \$ n;
CARDS;
Yes Yes 499
Yes No 15
No Yes 4327
No No 74
;
PROC GENMOD DATA=babies order=data;
CLASS smoking;
FREQ n;
MODEL survival = smoking/
RUN;

Part of the output is as follow:

c Studentlitteratur
°

100 5.7. Overdispersion in binary/binomial models

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 4913 886.9891 0.1805
Scaled Deviance 4913 886.9891 0.1805
Pearson Chi-Square 4913 4914.9964 1.0004
Scaled Pearson X2 4913 4914.9964 1.0004
Log Likelihood -443.4945

The model fit, as judged by Deviance/df, is excellent.

Analysis Of Parameter Estimates

Standard Wald 95% Chi-
Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq

Intercept 1 -4.0686 0.1172 -4.2983 -3.8388 1204.34 <.0001
smoking Yes 1 0.5640 0.2871 0.0013 1.1267 3.86 0.0495
smoking No 0 0.0000 0.0000 0.0000 0.0000 . .
Scale 0 1.0000 0.0000 1.0000 1.0000

There is a significant relationship between smoking and the risk of dying
for the babies (p = 0.0495). The odds ratio can be calculated as e0.5640 =
1.7577 which is the same result as we got above by hand calculation. But
the Genmod procedure also gives a test and a confidence interval for the
parameter.

5.7 Overdispersion in binary/binomial
models
Overdispersion means that the variance of the response is larger than would
be expected for the chosen model. For binomial models, the variance of
y =“number of successes” is np (1 − p), and the variance of pb = ny is p(1−p)
n . A
simple way to illustrate over-dispersion is to consider a simple dose-response
experiment where the same dose has been used on two batches of animals.
Suppose that the chosen dose has eﬀect on 10 out of 50 animals in one
of the replications, and on 20 out of 50 animals in the other replication.
This means that there is actually a significant diﬀerence between the two
replications (p = 0.029). In other less extreme cases, there may be a tendency
for the responses to diﬀer, even if the results are not significantly diﬀerent
at any given dose. Still, when all replications are considered together, a
value of the Deviance/df statistic appreciably above unity may indicate that
overdispersion is present in the data.

c Studentlitteratur
°

where in the clustered case it can be assumed that σ2 = 1 + (k − 1) τ 2 .7.7. an alternative way of estimating the dispersion parameter is to use the observed value of Deviance/df (or Pearson χ2 /df ) as an estimate. This is discussed in Chapter 8. For exam- ple. However.5. that can be estimated from r 1 X (yj − nj pb) b2 = σ (5.2 Modeling as a beta-binomial distribution If one can suspect some form of clustering. If the structure of the clustering is unknown. Then. One way to model such over-dispersion is to assume that the mean value is still E (y) = np but that the variance takes the form V ar (y) = np (1 − p) σ2 . and thus be genetically diﬀerent. This value of the dispersion parameter is then kept constant in all later analyses of the data. this distribution is not at present available in the Genmod procedure.9) r − 1 j=1 nj pb (1 − pb) where r is the number of clusters. even if they are not significant. the distribution of y will follow a distribution called the Beta-binomial distribution. the distribution of y will be a so called compound distribution which can be derived. this can be done via the between-cluster variance. k is the cluster size. diﬀerent batches of animals may come from diﬀerent parents. This means that the observations are not independent. Estimation using Quasi-likelihood methods is an alternative approach to modeling overdispersion. that contains all relevant factors. another approach to the model- ing is to assume that y follows a binomial distribution within clusters but that the parameter p follows some random distribution over clusters. A rather simple case is obtained when the distribution of p is a Beta distribution. for example models in designed experiments.1 Estimation of the dispersion parameter One way to account for over-dispersion is to estimate the over-dispersion parameter from the data. Binary and binomial response variables 101 A common source of over-dispersion is that the data display some form of clustering. If the data have a known cluster structure. The dispersion parameter is estimated from this model. Here. 5. 5. and to re-run the analysis using this value. This option is present in Proc Genmod. c Studentlitteratur ° . For simple models. If the distribution of p is known. an alter- native is to ask the software to use a Maximum Likelihood estimate of the dispersion parameter. A useful recommendation is to run a “maximal model”.

Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 17 33. is given below. and also to compare the two types of host plants.9575 Pearson Chi-Square 17 31. are: O.0000 0.0000 0. The data taken from Collett (1991).0000 0.0000 0. aegyptiaca 73 Bean Cucumber Bean Cucumber y n y n y n y n 10 39 5 6 8 16 3 12 23 62 53 74 10 30 22 41 23 81 55 72 8 28 15 30 26 51 32 51 23 45 32 51 17 39 46 79 0 4 3 7 10 13 It was of interest to compare the two varieties.0000 c Studentlitteratur ° .0000 0.0000 Scale 0 1.3 An example of over-dispersed data Example 5.8618 Log Likelihood -543.102 5. Part of the output for a model containing Variety.0000 0.0000 host Bean 1 -1. The ratio Deviance/df is nearly 2.1775 host Cucumber 0 0. p = 0. The model does not fit well (Deviance=33.7600 0.1106 Algorithm converged. A number of batches of Orobanche seeds of two varieties were grown on extract from Bean or Cucumber roots.6511 1.0000 variety*host 75 Cucumber 0 0. indicating that overdispersion may be present. Analysis Of Parameter Estimates Standard Parameter DF Estimate Error Intercept 1 0.01). An analysis of these data using a binomial distribution with a logit link revealed that an interaction term was needed.6511 1.7781 0. aegyptiaca 75 O. Host and Variety*Host.7.3182 0.0000 variety*host 75 Bean 0 0. Overdispersion in binary/binomial models 5.2778 1. and the number of seeds germinating was recorded.8618 Scaled Pearson X2 17 31.7.9575 Scaled Deviance 17 33.2778 1.3064 variety*host 73 Cucumber 0 0.5 Orobanche is a parasital plant that grows on the roots of other plants.0000 variety*host 73 Bean 1 0.1250 variety 73 1 -0.6322 0.2778 on 17 df .2100 variety 75 0 0.

0000 Pearson Chi-Square 17 31.48 <.0001 variety*host 1 17 3.15 0.27 0.27 0.29 0. Analysis Of Parameter Estimates Standard Parameter DF Estimate Error Intercept 1 0.7600 0.0000 LR Statistics For Type 3 Analysis Chi- Source Num DF Den DF F Value Pr > F Square Pr > ChiSq variety 1 17 1.3182 0.41 0.0000 variety*host 75 Bean 0 0. but the estimated standard errors are larger when we include a scale parameter. Part of the output was as follows: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 17 33. The parameter estimates are identical to those of the previous analysis.6322 0.4487 The procedure now uses a scaled deviance of 1.0000 variety*host 73 Bean 1 0.6511 1.53 0.1121 host 1 37.29 0.0114 As a second analysis.070 ¤ c Studentlitteratur ° .2483 host Cucumber 0 0.0000 0.9511 Log Likelihood -277.4287 variety*host 73 Cucumber 0 0.0001 variety*host 1 6.9575 Scaled Deviance 17 17.0000 0. Binary and binomial response variables 103 LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq variety 1 2.1748 variety 73 1 -0. This has the eﬀect that the Variety*Host interaction is no longer significant.2561 host 1 17 19.8618 Scaled Pearson X2 17 16.2938 variety 75 0 0.0000 host Bean 1 -1.3991 0.0004 19.0881 3.0000 Scale 0 1.0000 0.7781 0.0000 0.0000 0.5.1690 0.0000 1.15 <.2778 1.0000 variety*host 75 Cucumber 0 0. the data were analyzed using the automatic feature in Genmod to estimate the scale parameter from the data using the Maximum Likelihood method.00.2718 1.

0 15 0 20 B 1 60.8 15 2 20 A 2 65.5 15 7 20 A 4 70.5 15 3 20 B 4 70.8 15 2 20 B 3 65.8 20 3 20 B 4 75.8 20 7 20 The data set given above contains data from an experiment studying the survival of snails.5 10 0 20 B 2 70. Rel.5 15 0 20 A 2 70.0 20 1 20 B 2 60.0 15 7 20 B 4 60.5 10 3 20 B 4 70.8.0 10 12 20 A 4 60.8 15 4 20 B 4 65.0 10 0 20 B 2 60.8 10 4 20 B 4 65.5 20 6 20 A 3 75.0 20 16 20 A 4 65.8 10 0 20 A 2 65.8 Exercises Exercise 5.8 20 0 20 A 1 70.0 15 14 20 A 4 60.8 10 2 20 A 3 75.8 10 0 20 B 2 65.0 20 11 20 A 3 65.8 20 1 20 A 2 70. Exercises 5.5 20 0 20 B 1 70.8 20 9 20 A 3 70.8 20 7 20 B 4 65.8 20 1 20 A 3 60.0 10 0 20 A 1 60.0 20 0 20 A 1 65.8 15 0 20 A 2 75.0 10 7 20 A 3 60.8 15 3 20 B 4 75.8 20 0 20 B 2 75.8 15 0 20 A 1 65.5 15 0 20 A 1 70.8 10 0 20 A 1 75. Groups of 20 snails were held for periods of 1.8 10 1 20 A 2 75.8 10 0 20 B 2 75.8 15 1 20 B 2 65.5 20 1 20 A 2 75.104 5.8 15 1 20 B 3 75.8 10 0 20 B 1 65.8 15 0 20 B 2 75.0 10 7 20 B 4 60.8 10 0 20 A 1 65.5 10 0 20 A 1 70.0 15 11 20 A 3 60. 3 or 4 weeks under controlled conditions.8 10 0 20 B 3 65.8 10 0 20 B 3 75.5 10 5 20 A 4 70.0 20 0 20 B 1 60.0 20 7 20 B 4 60. 2.8 15 3 20 A 3 75.8 15 0 20 B 1 65.8 15 0 20 A 1 75.5 20 5 20 B 4 70. where temperature and humidity were kept at assigned levels.5 10 0 20 A 2 70.8 10 4 20 A 4 75.8 20 2 20 B 3 75.5 20 9 20 A 4 75.8 15 0 20 B 1 75.0 15 1 20 B 2 60.5 15 2 20 B 3 70.5 15 4 20 A 3 70.8 10 0 20 B 1 75.0 15 3 20 A 2 60. Temp Deaths N sure Hum sure Hum A 1 60.5 20 0 20 B 2 70.8 10 4 20 A 3 65. Temp Deaths N Species Expo. The experiment was c Studentlitteratur ° .5 15 0 20 B 2 70.8 15 5 20 A 4 75.8 20 5 20 A 4 60.8 15 5 20 A 3 65.5 10 0 20 B 1 70.8 20 12 20 A 4 70. Rel.0 10 1 20 B 3 60.0 15 4 20 B 3 60. The snails were of two species (A or B).8 20 0 20 B 1 75.5 20 0 20 A 1 75.8 10 2 20 B 4 75.5 20 3 20 B 3 70.8 20 0 20 A 2 60.8 20 0 20 B 1 65.8 20 0 20 B 2 65.5 10 2 20 A 3 70.0 20 2 20 A 2 65.8 15 12 20 A 4 65.5 15 0 20 B 1 70.0 15 0 20 A 1 60.0 20 5 20 B 3 60.0 10 0 20 B 1 60.5 10 0 20 B 3 70.8 20 4 20 B 3 65.8 10 10 20 A 4 65.0 10 0 20 A 2 60.1 Species Expo.

3 Finney (1947) reported some data on the relative potencies of Rotenone. and the number of dead insects was recorded. Temp. Batches of insects were subjected to these treatments. Exercise 5. 2. in diﬀerent concentrations. Binary and binomial response variables 105 a completely randomized design. 2nd or 3rd Age Age of the person Sex male or female Survived 1=survived. Note that some age data are missing. and a mixture of these. 0=died Find a model that can predict probability of survival as functions of the given covariates. Also. Humidity.org. and possible interactions. The raw data are: c Studentlitteratur ° .2 The file Ex5_2. The variables are as follows: Species Snail species A or B Exposure Exposure in weeks (1.5. make residual diagnostics and leverage diagnostics.encyclopedia-titanica. Name Name of the person PClass Passenger class: 1st. Exercise 5. Deguelin. Background material for the data can be found on http://www. 3 or 4) Humidity Relative humidity (four levels) Temp Temperature in degrees Celsius (3 levels) Deaths Number of deaths N Number of snails exposed Analyze these data to find whether Exposure. or inter- actions between these have any eﬀects on survival probability.dat gives the following information about pas- sengers travelling on the Titanic when it sank in 1912.

18 48 38 1. Exercise 5. In particular.4 Fahrmeir & Tutz (2001) report some data on the risk of in- fection from births by Caesarian section. risk=1. The variable wt is the number of observations with a given combination of the other variables.48 49 47 1.89 49 42 0. for example.58 48 16 0.61 50 47 1.40 47 7 Analyze these data. there were 17 un-infected cases (infection=0) with planned=1. and antibio=1. examine whether the regression lines can be assumed to be parallel. 0=no infection). c Studentlitteratur ° .01 50 44 0.106 5. Three dichotomous covariates that might aﬀect the risk of infection were studied: planned Was the Caesarian section planned (=1) or not (=0) risk Were risk factors such as diabetes.71 46 24 0.70 48 48 1.00 46 27 0.71 46 22 0.41 50 6 Deguelin 1. excessive weight or others present (=1) or absent (=0) antibio Were antibiotics given as a prophylactic (=1) or not (=0) The data are included in the following Sas program that also gives the value of the variable infection (1=infection. Thus. The response variable of interest is the occurrence of infections following the operation.31 48 34 1.71 49 16 Mixture 1.8. Exercises Treatment ln(dose) n x Rotenone 1.40 50 48 1.31 46 43 1.00 48 18 0.

5.3147 Scaled Deviance 8 226.4759 30.9251 Pearson Chi-Square 7 261. Some results: Model 1 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 8 226.2196 Model 3 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 7 216.3430 Log Likelihood -108. CARDS. Model 1 plus an interaction planned*antibio 3.7440 36.4759 30. INPUT planned antibio risk infection wt.3147 Pearson Chi-Square 8 257.2508 32.4393 32.4010 37.5177 28.7440 36.1563 Scaled Pearson X2 8 257. Model 1 plus an interaction planned*risk 4.9251 Scaled Deviance 7 216. Binary and binomial response variables 107 data cesarian. A binomial Glim model with a logit link.2380 c Studentlitteratur ° . The following analyses were run on these data: 1.4393 32.3430 Scaled Pearson X2 7 261.1563 Log Likelihood -113.5177 28.2588 Model 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 7 226.4010 37. discussed below.3920 Log Likelihood -113.3485 Pearson Chi-Square 7 254. The same as model 3 but with some extra features. with only the main eﬀects 2.2508 32. 1 1 1 1 1 1 1 1 0 17 1 1 0 1 0 1 1 0 0 2 1 0 1 1 28 1 0 1 0 30 1 0 0 1 8 1 0 0 0 32 0 1 1 1 11 0 1 1 0 87 0 1 0 1 0 0 1 0 0 0 0 0 1 1 23 0 0 1 0 3 0 0 0 1 0 0 0 0 0 9 .3920 Scaled Pearson X2 7 254.3485 Scaled Deviance 7 226.

108 5.8. Exercises

One problem with Model 3 is that no standard error, and no test, of the
parameter for the planned*risk interaction is given by Sas. This is because
the likelihood is rather flat which, in turn, depends on cells with observed
count = 0. Therefore, Model 4 used the same model as Model 3, but with
all values where Wt=0 replaced by Wt=0.5.
Model 4
Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 11 231.5512 21.0501
Scaled Deviance 11 231.5512 21.0501
Pearson Chi-Square 11 446.4616 40.5874
Scaled Pearson X2 11 446.4616 40.5874
Log Likelihood -115.7756

Algorithm converged.

Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq

Intercept 1 2.1440 1.0568 0.0728 4.2152 4.12 0.0425
planned 1 -0.8311 1.1251 -3.0363 1.3742 0.55 0.4601
antibio 1 3.4991 0.5536 2.4141 4.5840 39.95 <.0001
risk 1 -3.7172 1.1637 -5.9980 -1.4364 10.20 0.0014
planned*risk 1 2.4394 1.2477 -0.0060 4.8848 3.82 0.0506
Scale 0 1.0000 0.0000 1.0000 1.0000

Questions: Use these data to answer the following questions:
A. Compare models 1 and 2 to test whether the planned*antibio interaction
is significantly diﬀerent from zero.
B. Compare models 1 and 3 to test whether the planned*risk interaction is
significantly diﬀerent from zero.
C. Explain why the Deviance in Model 4 has more degrees of freedom than
in Model 3.
D. Based on the results for Model 4, estimate the odds ratios for infection for
the factors in the model. Note that the program has modeled the probability
of not being infected. Calculate the odds ratios for not being infected and,
from these, the odds ratios of being infected.
E. Calculate predicted values and raw residuals for the first four observations
in the data.

Exercise 5.5 An experiment has been designed in the following way: Two
groups of patients (A1 and A2) were used. The groups diﬀered regarding the
type of diagnosis for a certain disease. Each group consisted of nine patients.
The patients in the two groups were randomly assigned to three diﬀerent
treatments: B1, B2 and B3, with three patients for each treatment in each
group.

c Studentlitteratur
°

5. Binary and binomial response variables 109

The blood pressure (Z) was measured at the beginning of the experiment on
each patient.
A binary response variable (Y) was measured on each patient at the end
of the experiment. It was modeled as g(µ) = XB, using some computer
package. The final model included the main eﬀects of A and B and their
interaction, and the eﬀect of the covariate Z. The slope for Z was diﬀerent
for diﬀerent treatments, but not for the diﬀerent patient groups or for the
A*B interaction.
A. Write down the complete design matrix X. You should include all dummy
variables, even those that are redundant. Covariate values should be repre-
sented by some symbol. Also write down the corresponding parameter vector
B.
B. The link function used to analyze these data was the logit link g (p) =
p
log 1−p . What is the inverse g −1 of this link function?

c Studentlitteratur
°

6. Response variables as
counts

6.1 Log-linear models: introductory example
Count data can be summarized in the form of frequency tables or as contin-
gency tables. The data are then given as the number of observations with
each combination of values of some categorical variables. We will first look
at a simple example with a contingency table of dimension 2×2.

Example 6.1 Norton and Dunn (1985) studied possible relations between
snoring and heart problems. For 2484 persons it was recorded whether the
person had any heart problems and whether the person was a snorer. An
interesting question is then whether there is any relation between snoring
and heart problems. The data are as follows:
Snores
Heart problems Seldom Often Total
Yes 59 51 110
No 1958 416 2374
2017 467 2484

We assume that the persons in the sample constitute a random sample from
some population. Denote with pij the probability that a randomly selected
person belongs to row category i and column category j of the table. This
can be summarized as follows:
Snores
Heart problems Seldom Often Total
Yes p11 p12 p1·
No p21 p22 p2·
p·1 p·2 1

A dot in the subscript indicates a marginal probability. For example, p·1
denotes the probability that a person snores seldom, i.e. p·1 = p11 + p21 .
¤

111

Then the model becomes ¡ ¢ log µij = µ + αi + β j + (αβ)ij . (6. the eﬀect of variable A).  log (µ21 )  =  1 0 1  (6. for example α1 and β 1 .112 6. In our example. The terms (αβ)ij represent interac- tion between the factors A and B. and that the link function is log. This is analogous to ANOVA models. One way to constrain the parameters is to set the last parameter of each kind equal to zero. By taking the logs of both sides we get an additive model assuming independence: ¡ ¢ log µij = log (n) + log (pi· ) + log (p·j ) (6. This is a multiplicative model. i.1 A log-linear model for independence If snoring and heart problems were statistically independent. αi denotes the row eﬀect (i. j).e. the model for our example data can then be written as     log (µ11 ) 1 1 1    log (µ12 )   1 1 0  µ      α1  . This is a model that we would like to compare with the more general model that snoring and heart problems are dependent. r = c = 2 so we need only one parameter αi and one β j . eﬀects are often denoted with symbols like λX i . We can see that this model is a linear model (a linear predictor). where n is the total sample size and µij is the expected number in cell (i. but we keep a notation that is in line with the notation of previous chapters.2 When independence does not hold If independence does not hold we need to include in the model terms of type (αβ)ij that account for the dependence.2) β1 log (µ22 ) 1 0 0 6. In GLIM terms.1. Log-linear models: introductory example 6. Thus. In (6.1). we can state the models in terms of expected frequencies µij = npij . the independence model states that µij = npi· p·j .e.e. it would hold that pij = pi· p·j for all i and j. Note that a model for a crosstable of dimension r × c can include at most (r − 1) parameters for the row eﬀects and (c − 1) parameters for the column eﬀect. the eﬀect of variable B). and β j denotes the column eﬀect (i. Models of type (6. the eﬀect of one variable depends on the level of the other variable. Instead of modeling the probabilities.3) c Studentlitteratur ° .1) = µ + αi + β j .1. In log-linear model literature.1) are called log-linear models.1.

6. Response variables as counts 113

Any two-dimensional cross-table can be perfectly represented by the model
(6.3); this model is called the saturated model. We can test the restrictions
imposed by removing the parameters (αβ)ij by comparing the deviances: the
saturated model will have deviance 0 on 0 degrees of freedom, so the deviance
from fitting the model (6.1) can be used directly to test the hypothesis of
independence.

6.2 Distributions for count data
So far, we have seen that a model for the expected frequencies in a crosstable
can be formulated as a log-linear model. This model has the following prop-
erties:

The predictor is a linear predictor of the same type as in ANOVA.
The link function is a log function.

It remains to discuss what distributional assumptions to use.

6.2.1 The multinomial distribution
Suppose that a nominal variable Y has k distinct values y1 , y2 , ..., yk such
that no implicit ordering is imposed on the values. In fact, the values might
be observations on a single nominal variable, or they might be observations
on cell counts in a multidimensional contingency table. The probabilities
associated with the diﬀerent values of Y are p1 , p2 , ..., pk . We make n obser-
vations on Y . If the observations are independent, the probability to get n1
observations with Y = y1 , n2 observations with Y = y2 , and so on, is
n!
P (n1 , n2 , . . . , nk |n) = pn1 · pn2 2 · . . . · pnk k (6.4)
n1 ! · n2 ! · . . . · nk ! 1
Note that in (6.4), the total sample size n = n1 + n2 + . . . + nk is regarded
as fixed. Also note that the expression simplifies to the binomial distribution
for the case k = 2. The distribution given in (6.4) is called the multinomial
distribution. The multinomial distribution is a multivariate distribution since
it describes the joint distribution of y1 , y2 , ..., yk . It can be seen as a multi-
variate generalization of an exponential family distribution (see e.g. Agresti,
1990).

c Studentlitteratur
°

114 6.2. Distributions for count data

6.2.2 The product multinomial distribution
A contingency table may have some of its totals fixed by the design of the
data collection. For example, 500 males and 500 females might have been
interviewed in a survey. In such cases it is not meaningful to talk about the
random distribution of the “gender” variable. For such data each “slice” of
the table subdivided by gender may be seen as one realization of a multino-
mial distribution. The joint distribution of all cells of the table is then the
product of several multinomial distributions, one for each slice. This joint
distribution is called the product multinomial distribution.

6.2.3 The Poisson distribution
Suppose, again, that a nominal variable Y has k distinct values y1 , y2 , ..., yk .
We observe counts n1 , n2 , . . . , nk . The expected number of observations in
cell i is µi . If the observations arrive randomly, the probability to observe ni
observations in cell i is
e−µi µni i
p (ni ) = (6.5)
ni !
which is the probability function of a Poisson distribution. Note that in
this case, the total sample size n is not regarded as fixed. This is the main
diﬀerence between this sampling scheme and the multinomial case. Since
sums of Poisson variables follow a Poisson distribution, n in itself follows a
P
k
Poisson distribution with mean value µi .
i=1

6.2.4 Relation to contingency tables
Contingency tables can be of many diﬀerent types. In some cases, the total
sample size is fixed; an example is when it has been decided that n = 1000
individuals will be interviewed about some political question. In some cases
even some of the margins of the table may be fixed. An example is when
500 males and 500 females will participate in a survey. A table with a fixed
total sample size would suggest a multinomial distribution; if in addition one
or more of the margins are fixed we would assume a product multinomial
distribution. However, as noted by Agresti (1996), “For most analyses, one
need not worry about which sampling model makes the most sense. For the
primary inferential methods in this text, the same results occur for the Pois-
son, multinomial and independent binomial/multinomial sampling models”
(p. 19).

c Studentlitteratur
°

6. Response variables as counts 115

Suppose that we observe a contingency table of size i × j. The probability
that an observation will fall into cell (i, j) is pij . If the observations are
independent and arrive randomly, the number of observations falling into cell
(i, j) follows a Poisson distribution with mean value µij , if the total sample
size n is random. If the cell counts nij follow a Poisson distribution then the
conditional distribution of nij |n is multinomial. The Poisson distribution is
often used to model count data since it is rather easy to handle.
Note, however, there is no guarantee that a given set of data will adhere to
this assumption. Sometimes the data may show a tendency to “cluster” such
that arrival of one observation in a specific cell may increase the probabil-
ity that the next observation falls into the same cell. This would lead to
overdispersion. We will discuss overdispersion for Poisson models in a later
section; a distribution called the negative binomial distribution may be used
in some such cases. For the moment, however, we will see what happens if we
tentatively accept the Poisson assumption for the data on snoring and heart
problems.

6.3 Analysis of the example data
Example 6.2 We analyzed the data on page 111 using the Genmod proce-
dure with Poisson distribution and a log link. The program was:

DATA snoring;
INPUT snore heart count;
CARDS;
1 1 51
1 0 416
0 1 59
0 0 1958
;
PROC GENMOD DATA=snoring;
CLASS snore heart;
MODEL count = snore heart /
DIST = poisson
;
RUN;

c Studentlitteratur
°

116 6.3. Analysis of the example data

The output contains the following information:

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 1 45.7191 45.7191
Scaled Deviance 1 45.7191 45.7191
Pearson Chi-Square 1 57.2805 57.2805
Scaled Pearson X2 1 57.2805 57.2805
Log Likelihood . 15284.0145 .

Analysis Of Parameter Estimates

Parameter DF Estimate Std Err ChiSquare Pr>Chi

INTERCEPT 1 3.0292 0.1041 847.2987 0.0001
SNORE 0 1 1.4630 0.0514 811.6746 0.0001
SNORE 1 0 0.0000 0.0000 . .
HEART 0 1 3.0719 0.0975 992.0240 0.0001
HEART 1 0 0.0000 0.0000 . .
SCALE 0 1.0000 0.0000 . .

NOTE: The scale parameter was held fixed.

A similar analysis that includes an interaction term would produce a deviance
of 0 on 0 df . Thus, the diﬀerence between our model and the saturated model
can be tested; the diﬀerence in deviance is 45.7 on 1 degree of freedom which is
highly significant when compared with the corresponding χ2 limit with 1 df .
We conclude that snoring and heart problems do not seem to be independent.
Note that the Pearson chi-square of 57.28 on 1 df presented in the output is
based on the textbook formula
¡ ¢2
X nij − µ bij
2
χ = .
i,j
bij
µ

The conclusion is the same, in this case, but the tests are not identical.
The output also gives us estimates of the three parameters of the model:
b = 3.0292, α
µ b 1 = 1.4630 and βb = 3.0719. An analysis of the saturated model
1
[ = 1.4033. From
would give an estimate of the interaction parameter as (αβ) 11
this we can calculate the odds ratio OR as

OR = exp (1.4033) = 4.07.

Patients who snore have a four times larger odds of having heart problems.
Odds ratios in log-linear models is further discussed in a later section. ¤

c Studentlitteratur
°

Thus. Example 6. CARDS.4 Testing independence in an r×c crosstable The methods discussed so far can be extended to the analysis of cross-tables of dimension r × c. PROC GENMOD DATA=chisq. c Studentlitteratur ° . The corresponding GLIM approach is to model the expected number of beetles as a function of season and color. CLASS season color. The results are given as: Season Red Other Total Early spring 29 11 40 Late spring 273 191 464 Early summer 8 31 39 Late summer 64 64 128 Total 374 297 671 A standard analysis of these data would be to test whether there is indepen- dence between season and color through a χ2 test. Early_spring red 29 Early_spring other 11 Late_spring red 273 Late_spring other 191 Early_summer red 8 Early_summer other 31 Late_summer red 64 Late_summer other 64 . MODEL no = season color / DIST=poisson LINK=log . run. INPUT season \$ color \$ no. The observed numbers in each cell are assumed to be generated from an underlying Poisson distribution.3 Sokal and Rohlf (1973) presented data on the color of the Tiger beetle (Cicindela fulgida) for beetles collected during diﬀerent seasons. The canonical link for the Poisson distribution is the log link.6. Response variables as counts 117 6. a Genmod program for these data is DATA chisq.

1 A three-way table The arguments used above for the analysis of two-dimensional contingency tables can be generalized to tables of higher order.2280 Scaled Pearson X2 3 27.118 6. 2276 high school seniors were asked whether they had ever used Alcohol (A).6840 9.7264 .5321 Scaled Deviance 3 28. 2628. A general (saturated) model for a three-way table can be written as ¡ ¢ log µijk = µ + αi + β j + γ k + (αβ)ij + (αγ)ik + (βγ)jk + (αβγ)ijk (6. The Pearson chi- square is again the same value as would be obtained from a standard chi- square test. Thus.5321 Pearson Chi-Square 3 27.5.5 Higher-order tables 6. c Studentlitteratur ° .6 on 3 df which is highly significant.6) An important part of the analysis is to decide which terms to include in the model.00 on 0 df . Cigarettes (C) and/or Marijuana (M). Higher-order tables Part of the output is Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 3 28.2280 Log Likelihood .6 is a large-sample test of independence between color and season.6840 9.4 The table below contains data from a survey from Wright State University in 19921 .5. Alcohol Cigarette Marijuana use use use Yes No Yes Yes 911 538 ¤ No 44 456 No Yes 3 43 No 2 279 1 Data quoted from Agresti (1996) who credited the data to Professor Harry Khamis.5964 9. independence is tested by comparing the deviance of this model with the deviance that would be obtained if the Season*Color interaction was included in the model.5964 9. Formally. ¤ 6. This saturated model has deviance 0. This is a three-way contingency table of dimension 2 × 2 × 2. it is also highly significant. The deviance is 28. Example 6. the deviance 28.

6.4011 0. and between A and M. The pres- ence of an interaction.53). The fit of this model is good.3740 0.5. In this example this would mean that use of one drug does not change the risk of using any other drug.e.3740 0.3740 Pearson Chi-Square 1 0.6. for example A*C.2 Types of independence Models for data of the type given in the last example can include the main eﬀects of A.4011 Scaled Pearson X2 1 0. but not between C and M. is called a homogenous association model.4011 Log Likelihood .6124 .3740 Scaled Deviance 1 0.and threeway interactions was fitted to the data as a baseline. The parameter estimates for this model are as follows: c Studentlitteratur ° . A model that contains all interactions up to a certain level. Response variables as counts 119 6.4011 0. The output for the homogenous asso- ciation model containing all two-way interactions was as follows: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1 0. controlling for A. the model A C M is called a mutual independence model.3 Genmod analysis of the drug use data The saturated model that contains all main eﬀects and all two. we will return to this topic soon. a simple rule of thumb is that Value/df should not be too much larger than 1. C and M are then said to be conditionally independent. A model that only contains the main eﬀects. means that students who use alcohol have a higher (or lower) probability of also using cigarettes. C and M and diﬀerent interactions containing these.5. The three-way interaction A*C*M was not significant (p = 0. One way of in- terpreting interactions is to calculate odds ratios. 12010. i. A model of type A C M A*C A*M would permit interaction between A and C. but no higher- order interactions.

.0000 0. C*M Yes Yes 0 0.0000 . . it holds that.0001 A No 1 -5.0000 0.4647 41.0001 A*C No Yes 0 0.4518 0.3180 0. .0000 . In contrast.9860 0.0000 . C No 1 -3.0000 0.0000 0.4 Interpretation through Odds ratios Consider. All remaining interactions in the model are highly significant which means that no further simplification of the model is suggested by the data. . for the moment.8) µ12· µ21· where the dot indicates summation over all levels of Z. and only one parameter is estimable for each interaction.0001 M Yes 0 0.0000 0.4522 149. .0000 0.0532 0.0000 0. Thus.0000 .0000 .8479 0.2933 0.0000 .0543 93.0000 .0001 A Yes 0 0.0331 42312.0000 0.1516 395.0000 . a 2 × 2 × k cross-table of variables X.0001 C*M No Yes 0 0. C*M Yes No 0 0.0000 0. C*M No No 1 2.4854 0.0000 0.0000 . .0000 0.120 6. h i b θXY = exp (αβ)[ + (αβ) [ − (αβ) [ − (αβ) [ (6. the chosen model does not contain any three-way interaction.0001 C Yes 0 0. . A*M Yes Yes 0 0. Y and Z. . SCALE 0 1. M No 1 -0. . A*C Yes Yes 0 0. A*C Yes No 0 0.8139 0. The odds ratios can be estimated from the parameter estimates.6463 0.5283 0.5.0001 A*M No Yes 0 0.5249 0. the partial odds ratios for the two-way interactions can be estimated as: c Studentlitteratur ° . A*M Yes No 0 0.1409 0.0158 0.1741 139.9) 11 22 12 21 In our drug use example.0000 .0000 0.0545 0. Higher-order tables Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 6.1638 302. Within a fixed level j of Z. .0000 . . for example.5. .7) µ12j µ21j where µ denotes expected values.0000 0. . A*M No No 1 2. A*C No No 1 2. the conditional odds ratio for describing the relationship between X and Y is µ11j µ22j θXY (j) = (6.0000 .0000 . 6. in the marginal odds ratio the value of the variable Z is ignored and we calculate the odds ratio as µ µ θXY = 11· 22· (6.

PROC GENMOD DATA=snoring ORDER=data.1321 -3.0000 0. This is the same as the estimate of the interaction parameter for the saturated log-linear model.9860) = 19. c Studentlitteratur ° .1987 1. CLASS snoring. 51 467 Yes 59 2017 No RUN. RUN.0000 1. .0001 snoring Yes 1 1. Scale 0 1.89 <. A Genmod program for this analysis is: DATA snoring.80 A*M: exp (2. MODEL x/n = snoring / DIST=Binomial LINK=logit.6.5021 0. 25 As an example of an interpretation. INPUT x n snoring \$. a student who has tried alcohol has an odds of also having tried marijuana of 19. The corresponding output is Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 -3. an alternative to the log-linear model would be to model the probability of response as a function of the other variables in the table.0000 0. 81 C*M: exp (2.0545) = 7.0000 . Response variables as counts 121 A*C: exp (2. CARDS.0001 snoring No 0 0.47 <.1 Binary response If one binary variable in a contingency table can be regarded as the response. This can be done using logistic regression methods as outlined in Chapter 5. As a comparison. we will analyze the data on page 111 as a logistic regression. regardless of reported cigarette use.0000 0.7611 -3.6.2432 702.0000 0.7927 49.8479) = 17. 6.81.6 Relation to logistic regression 6.0000 We note that the parameter estimate for snoring is 1.0000 1.0139 1.4033 0.4033.

7 Capture-recapture data Capture-recapture data provide an interesting application of log-linear mod- els.6.2 Nominal logistic regression In log-linear models there is no variable that is regarded as the “dependent” variable. nominal logit models are not available in Proc Genmod. Capture-recapture data The odds ratio is OR = exp (1. the results and the interpretations are identical.4033) = 4. We capture and mark n1 of the individuals. however. that the models are written in diﬀerent ways. These logit equations should be estimated simultaneously. 6. The idea is as follows: One category of the nominal response variable is selected as a baseline. At present.122 6. M is unknown and we want to estimate M . If category 1 is the baseline. compared with the first.7. It turns out that s of c Studentlitteratur ° . one for each category except for the baseline. The saturated log- linear model regards the counts as functions of the row and column variables and their interaction: count = snoring heart snoring*heart. In some cases it may be preferable to regard one nominal variable as the re- sponse. Note. Although the models are written in diﬀerent ways. The satu- rated logistic model regards the proportion of persons with heart problems as a function of snoring status: x/n = snoring. The treatment of the row and column variables is symmetric. 6. the logits for the other categories. In such cases data can be modeled using nominal logistic regression. or reference. are µ ¶ pj logit (pj ) = log = Xβ. which makes the problem multivatiate. After some time we capture another n2 individuals. Suppose that there are M individuals in a population.07 which is also the same as in the log-linear model. p1 Thus we write (j − 1) logit equations. category. except for the case of ordinal response which is discussed in Chapter 7. This suggests that contin- gency tables where one of the variables may be regarded as a binary response can be analyzed either as a log-linear model or using logistic regression.

13) M M M for M . p123 = p1 p2 p3 . We have to assume independence between occasions.12) M M M and an estimate of the unknown population size can be obtained by solving n1 n2 n3 M = n + M(1 − )(1 − )(1 − ) (6. Response variables as counts 123 these were marked.11) M M M Thus. an estimator of the number of individuals that have never been cap- tured is n1 n2 n3 ˆ ¯1¯2¯3 = M · pˆ¯1¯2¯3 = M · (1 − n )(1 − )(1 − ) (6. the data can be written as a three-way contingency table. it would hold that. and we only use the information in the margins of the table. There are eight diﬀerent “capture patterns”: Notation Captured at occasion n123 1.10) pˆ s If the individuals are captured on three occasions. The expected number of individuals in this cell would then be µ123 = M p1 p2 p3 (6. for example. It is now relatively straightforward to estimate M as c = n1 = n2 · n1 M (6.14) c Studentlitteratur ° . A more flexible analysis of this kind of data can be obtained by using log-linear models. There are some drawbacks with the method outlined so far. where n is the number of individuals that have been captured at least once. the probability that an indi- vidual is never captured is n1 n2 n3 pˆ¯1¯2¯3 = (1 − )(1 − )(1 − ) (6. 2 and 3 n¯123 2 and 3 n1¯23 1 and 3 n¯1¯23 3 n12¯3 1 and 2 n¯12¯3 2 n1¯2¯3 1 n¯1¯2¯3 None If we assume independence between occasions.6. If the occasions are independent.

7. customs x5 Others c Studentlitteratur ° . in a capture- recapture sense. model count=x1 x2 x3 x4 x5 / dist=Poisson obstats residuals. po- lice or customs.16) Log-linear models are now often used to model capture-recapture data.15) where the occasions correspond to α.1 summarizes information about persons who were heavy misusers of drugs in Sweden in 1979. The SAS programs that were used for analysis had the structure proc genmod. β and γ. The individuals could appear in registers within the health care system.5 Table 6. for example. Thus. It is reasonable to assume that some of these sources of information are related. Thus. respectively. weight w. an individual who has been taken in charge by police is quite likely to appear also in the penal system at some stage. ln(µ123 ) = ln M + ln p1 + ln p2 + ln p3 = µ + α1 + β 1 + γ 1 (6. see Olsson (2000). In a similar way. the number of individuals that have never been captured. x1 to x5 refer to the following sources of information: Code Source of information x1 Health care x2 Social authorities x3 Penal system x4 Police. interactions between some or all of these sources are likely. penal system. This is a log-linear model. If the occasions are not independent. we can include parameters like (αβ)ij that account for the dependence.124 6. or others. In this program. we could write the expected numbers in all cells as linear functions of parameters. a log link and a feature to account for the fact that it is impossible to observe n¯1¯2¯3 . These correspond to the “captures”. Thus. class x1 x2 x3 x4 x5. The model specification includes a Poisson distribution for the numbers in each cell of the table. run. a general log-linear model can be written as ln(µijk ) = µ + α1 + β 1 + γ 1 + (αβ)11 + (αγ)11 + (βγ)11 + (αβγ)111 (6. social authorities. Capture-recapture data Taking logarithms. Example 6.

1: Swedish drug addicts with diﬀerent capture patterns in 1979. Response variables as counts 125 Table 6.6. Hospital Social Penal Police. Others Count care authorities system customs 0 0 0 0 0 0 0 0 0 0 1 45 0 0 0 1 0 2080 0 0 0 1 1 11 0 0 1 0 0 1056 0 0 1 0 1 11 0 0 1 1 0 942 0 0 1 1 1 9 0 1 0 0 0 1011 0 1 0 0 1 59 0 1 0 1 0 381 0 1 0 1 1 15 0 1 1 0 0 245 0 1 1 0 1 18 0 1 1 1 0 345 0 1 1 1 1 13 1 0 0 0 0 828 1 0 0 0 1 7 1 0 0 1 0 179 1 0 0 1 1 1 1 0 1 0 0 191 1 0 1 0 1 3 1 0 1 1 0 137 1 0 1 1 1 1 1 1 0 0 0 264 1 1 0 0 1 18 1 1 0 1 0 132 1 1 0 1 1 9 1 1 1 0 0 144 1 1 1 0 1 16 1 1 1 1 0 133 1 1 1 1 1 15 c Studentlitteratur ° .

5 on 14 df ). Thus..8 Poisson regression models We have seen that log-linear models for cross-tabulations can be handled as generalized linear models. . which was 12000 (Socialdepartementet 1980). 2. In addition.. 18 months prior to the interview. the final model was count =x1|x2|x3|x4|x5 @2 x1*x3*x4.. in a similar way as for regression models. shows the distribution of stressful events reported by 147 subjects who have experienced exactly one stressful event. We want to model the occurrence of stressful events as a function of time. Months 1 2 3 4 5 6 7 8 9 Number 15 11 14 17 5 11 10 4 8 Months 10 11 12 13 14 15 16 17 18 Number 10 7 9 11 3 6 1 1 4 One approach to modelling the occurrence of stressful events as a function of X=months is to assume that the number of persons responding for any month is a Poisson variate. The linear predictor then consists of a design matrix that contains dummy variables for the diﬀerent margins of the table.126 6.8. This model had a good fit to the data (χ2 = 17.. In the program example it is implicitly assumed that the diﬀerent sources of information are independent. This would mean that the number of drug addicts in Sweden in 1979 was 8319 + 8878 = 17197 individuals. The table gives the number of persons reporting a stressful event 1. This is more than 5000 individuals higher than the published result.. x1*x2. Poisson regression models The weight w has been set to 1 for all combinations except the combination where all x1. This combination cannot be observed and is a structural zero. Example 6.x5 are 0. The model esti- mates the number of uncaptured individuals as 8878 individuals with confi- dence interval 7640 − 10317 individuals. interactions can be included in the model as e.6 The table below. The published result was obtained through capture-recapture methods but assuming that the diﬀerent sources of information are independent... taken from Haberman (1978). It is quite possible to introduce quantitative variables into the model. In this rather large data set all two-way interactions were significant. ¤ 6.g. The canonical link for the Poisson distribution c Studentlitteratur ° . However. the interaction x1*x3*x4 was also significant.

MAKE 'obstats' out=ut.8451 . the p-value is 0. RUN.5356 Scaled Deviance 16 24.6.9476 0. 1 15 2 11 3 14 4 17 5 5 6 11 7 10 8 4 9 8 10 10 11 7 12 9 13 11 14 3 15 6 16 1 17 1 18 4 .4197 Log Likelihood .53.078.1.084x.80 − 0. Response variables as counts 127 is log. .5356 Pearson Chi-Square 16 22. A plot of the data. along with the fitted regression line.8639 0.8032 0.4197 Scaled Pearson X2 16 22. MODEL number = months / DIST=poisson LINK=log OBSTATS RESIDUALS. 174. PROC GENMOD DATA=stress.0001 MONTHS 1 -0.0000 0. Part of the output is Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 16 24. INPUT months number @@. The data have a less than perfect fit to the model.1482 357. so a first attempt to modelling these data is to assume that log (µ) = β 0 + β 1 x (6. ¤ c Studentlitteratur ° .0168 24.0000 . We find that the memory of stressful events fades away as log (µ) = 2.7145 1.0001 SCALE 0 1. A SAS program for this model can be written as DATA stress. is given as Figure 6.17) This is a generalized linear model with a Poisson distribution.5704 1.0838 0.7145 1. a log link and a simple linear predictor. CARDS.5704 1.2 shows the data and regression line with a log scale for the y-axis. with Value/df=1. Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 2. Figure 6.

Poisson regression models Figure 6.8.128 6.2: Distribution of persons remembering stressful events. log scale c Studentlitteratur ° .1: Distribution of persons remembering stressful events Figure 6.

5 4 M 4 5 5 O 8 . CLASS row col treat. β j is a column eﬀect and τ k is the eﬀect of treatment k. 1 2 3 4 5 1 P3 O2 N5 K1 M4 2 M6 K0 O6 N4 P4 3 O4 M9 K1 P6 N5 4 N 17 P8 M8 O9 K0 5 K4 N4 P2 M4 O8 We may model the number of wireworms in a certain plot as a Poisson distri- bution..7 The number of wireworms counted in the plots of a Latin square experiment following soil fumigations in the previous year is given in the following table2 .. αi is a row eﬀect. CARDS. PROC GENMOD DATA=Poisson.18) where β 0 is a general mean.. Response variables as counts 129 6. RUN. 2 Data from Snedecor √ and Cochran (1980). an “ANOVA-like” model for these data can be written as g (µ) = β 0 + αi + β j + τ k (6. More data lines . A SAS program for analysis of these data using Proc Genmod is: DATA Poisson.. INPUT Row Col Treat \$ Count. MODEL Count = row col treat / Dist=Poisson Link=Log Type3. a Column eﬀect and a Treatment eﬀect.6. Thus. The original analysis was an Anova on data transformed as y = x + 1 c Studentlitteratur ° . The design includes a Row eﬀect. 1 1 P 3 1 2 O 2 1 3 N 5 .9 A designed experiment with a Poisson distribution Example 6.

97.6851 0.7704 COL 4 1 -0.0029 TREAT M 1 0. COL 1 1 0.0368 ROW 5 0 0.0451 0.0000 0. Value/df should be closer to 1.2924 COL 2 1 -0.5008 Scaled Pearson X2 12 18. the p value is 0. .8906 0.3207 0.6257 Pearson Chi-Square 12 18.2910 0. .3041 0.2980 0.077.3099 0.POISSON Distribution POISSON Link Function LOG Dependent Variable COUNT Observations Used 25 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 12 19.0980 .0423 0.3045 0.2854 0.3093 0.6257 Scaled Deviance 12 19. The fit of the model is reasonable but not perfect. A designed experiment with a Poisson distribution The output is: The GENMOD Procedure Model Information Description Value Data Set WORK.2892 1.0000 .1751 0.2967 TREAT N 1 0.0000 .8703 COL 3 1 -0.0936 0.5080 1.0000 0. .8796 ROW 4 1 0.0636 0.4502 0.0888 0.9. TREAT K 1 -1.0000 0.5008 Log Likelihood .4827 TREAT P 0 0.0506 0. Ideally. Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 1.4419 0.2729 4.1087 0.0000 0.2789 1.3324 0.0852 0.4708 0.3519 17.0229 0.0096 1.5080 1.0001 ROW 1 1 -0. c Studentlitteratur ° .3175 0.2760 1.4627 8.4928 0.5813 ROW 3 1 0.2285 TREAT O 1 0.8370 COL 5 0 0.3618 0.3797 0.0000 .0096 1.5699 0. SCALE 0 1.2003 0.3404 1.1942 ROW 2 1 -0.0000 .130 6. .4670 0.0267 0.

of accidents 175 320 No. Females Males No. The oﬀset can easily be included in models analyzed with e. we can model this type of data as ³µ´ log = Xβ (6.1934 0.g.20) The adjustment term log (t) is called an oﬀset.3595 0.40 = 15. It is interesting to note that the GLM analysis of square root transformed data.30 = 10.8 The data below. Example 6. of person years (’000) 17. such that “crimes per 1000 inhabitants” is a meaningful measure of crime rate.30 21. We will return to these data later. The data refer to 16262 Medicaid enrollees.0 per 1000 person years for c Studentlitteratur ° . Proc Genmod. subdivided by sex. Response variables as counts 131 LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi ROW 4 14.6.1 per 1000 person years for females and 320/21. are accident rates for elderly drivers.19) t which means that log (µ) = log (t) + Xβ (6. Data of this type are called rate data. the number of crimes recorded in a number of cities depends on the size of the city. quoted from Agresti (1996).5880 TREAT 4 25.40 From the raw data we can calculate accident rates as 175/17. ¤ 6. If we denote the measure of size with t. For each sex the number of person years (in thousands) is also given.02) but no significant row or column eﬀects. For example.0062 COL 4 2.0001 We find a significant Row eﬀect and a highly significant Treatment eﬀect. This may be related to the fact that the model fit is not perfect.10 Rate data Events that may be assumed to be essentially Poisson are sometimes recorded on units of diﬀerent size. results in a significant treatment eﬀect (p = 0.8225 0. as suggested by Snedecor and Cochran (1980.

a log link. A simple way to model these data is to use a generalized linear model with a Poisson distribution. This diﬀerence is significant.132 6. MODEL accident = sex / LINK = log DIST = poisson OFFSET = logpy . This is done with the following program: DATA accident.3909 = 0. 2254.0000 .0001 SEX Female 1 -0.3269 0. Scaled Deviance 0 0. The model is a saturated model so we can’t assess the over-all fit of the model by using the deviance. CLASS sex. logpy=log(persyear). Scaled Pearson X2 0 0. ¤ c Studentlitteratur ° .10.3909 0. SCALE 0 1. are not included in this model.0000 .0000 . Thus the estimate can be interpreted such that the odds ratio is e−0.0559 2341. The output is Criterion DF Value Value/DF Deviance 0 0.0000 . however.0001 SEX Male 0 0. and to use the number of person years as an oﬀset. for example diﬀerences in driving distance. Rate data males.676.7049 0.400 Female 175 17. Pearson Chi-Square 0 0.7003 . The risk of having an accident for a female is 68% of the risk for men. PROC GENMOD DATA=accident. other factors that may aﬀect the risk of accident. INPUT sex \$ accident persyear. . Log Likelihood .0000 . RUN. CARDS.2824 0.300 . The parameter estimate for females is −0. Male 320 21.0000 . . Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 2.0940 17.0000 0. The model can be written as log (µ) = log (t) + β 0 + β 1 x where x is a dummy variable taking the value 1 for females and 0 for males.39.0000 0.

The ratio Deviance/df was 1. The eﬀects of overdispersion is that p- values for tests are deflated: it becomes “too easy” to get significant results.7784 0.63 . where the parameter φ can be estimated as (Deviance) /df or as χ2 /df .63. 6. RUN. Thus. The program is: PROC GENMOD data=poisson.5456 .63 as an estimate of the dispersion parameter φ. there were some indications of overdispersion.6. MODEL count = row col treat / dist=Poisson Link=log type3 scale=1. As noted earlier. the presence of overdispersion may be related to mistakes in the formulation of the generalized linear model: the distribution.5649 Log Likelihood . CLASS row col treat. where χ2 is the Pearson Chi-square.11.3424 0. c Studentlitteratur ° . Example 6. we would assume that V ar (Y ) = φσ2 . 36.6257 Scaled Deviance 12 7. One way to account for such heterogeneity is to introduce a scale parameter φ into the variance function. We re-analyze these data but ask the Gen- mod procedure to use 1. Response variables as counts 133 6.0096 1. overdispersion may be caused by heterogeneity among the observations.5080 1.5008 Scaled Pearson X2 12 6.1 Modeling the scale parameter If the model is correct.11 Overdispersion in Poisson models Overdispersion means that the variance of the response variable is larger than would be expected for the chosen distribution.6119 Pearson Chi-Square 12 18.9 In the analysis of data on number of wireworms in a Latin square experiment (page 129). Output: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 12 19. For Poisson data we would expect the variance to be equal to the mean. the link function and/or the linear predictor.

Overdispersion in Poisson models LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi ROW 4 5. an alternative approach is to replace the Poisson distribution with a Negative binomial distribution. This idea can be traced back to Student (1907).63 has a rather dramatic eﬀect on the result..0623 0. The distribution has mean value E (y) = pr and variance r(1−p) V ar (y) = p2 . The negative binomial distribution has a higher probability for the zero count.4046 0. the distribution can be defined even for non-integer values of r.21) r−1 for y = r. The probability of success is p.11. and the row eﬀect was significant (p = 0. than a Poisson distribution with the same mean value.0001). This is the binomial waiting time distribution. In our new analysis even the treatment eﬀect is above the 0.. who studied counts of red blood cells.134 6. In the origi- nal analysis of these data (Snedecor and Cochran. 1980). For a series of Bernouilli trials. it is called the Pascal distribution. Note that the Genmod procedure has an automatic feature to base the analysis on a scale parameter estimated by the Maximum Likelihood method. It can be shown that the resulting compound distribution for y is a negative binomial distribution.4823 0. The distribution for y is µ ¶ y−1 r P (y) = p (1 − p)y−r (6. A second way to derive the negative binomial distribution is as a so called compund distribution. If r = 1. suppose that we are studying the number of trials (y) until we have recorded r successes.2482 COL 4 1.. .11.021). This distribution can be derived in two ways.05 limit. and a longer tail. the treatment eﬀect was highly signif- icant (p = 0. Using the Gamma function. (r + 1). ¤ 6.0062). When r is an integer.9002 TREAT 4 9. Suppose that the response for individual i can be modeled as a Poisson distribution with mean value µi . c Studentlitteratur ° .0501 Fixing the scale parameter to 1.2 Modeling as a Negative binomial distribution If a Poisson model shows signs of overdispersion. only the treatment eﬀect was significant (p = 0. see the SAS manual for details. In our previous analysis of these data. it is called a ge- ometric distribution. Suppose further that the distribution of the mean values µi over individuals follows a Gamma distribution.

The residuals and predicted values were stored in a file.12 Diagnostics Model diagnostics for Poisson models follows the same lines as for other generalized linear models. 6. ¤ c Studentlitteratur ° . version 8.3 and Figure 6. The results are given in Figure 6. Example 6. The negative binomial distribution is available in Proc Genmod in SAS. The standardized deviance residuals were plotted against the predicted values. and a normal probability plot of the residuals was prepared.10 We can illustrate some diagnostic plots using data from the Wireworm example on page 129.4. We cannot see any irregularities in the plot of residuals against fitted values. Both plots indicate a reasonable behavior of the residuals. and the normal plot is rather linear. and because of the relation to compound distributions.6. Response variables as counts 135 Because of this. it is often used as an alternative to the Poisson distribution when over-dispersion can be suspected.

12. c Studentlitteratur ° .136 6.4: Normal probability plot for the wireworm data.3: Plot of standardized deviance residuals against fitted values for the Wireworm example. Figure 6. Diagnostics Figure 6.

4 9 64.3 The following data was taken from the Statlib database (In- ternet address http://lib. Re-analyze the data using Poisson regression. For each observation. Response variables as counts 137 6. The results were: Minutes exposure Count 0 271 15 108 30 59 45 29 60 12 Exercise 6.4 are of a kind that can often be approximated by a Poisson distribution. Prepare a graph of the relation and compare the results with the results from Exercise 1.6 27 131. Mode1 is the time spent in one mode and Mode2 is the time spent in the other. Try this.2 14.stat.3 25.9 97. Mode1 Mode2 Failures 33. Which model seems to fit best? Exercise 6.0 20.2 The following data consist of failures of pieces of electronic equipment operating in two modes.1 The data in Exercise 1. In the original analysis (Jør- gensen 1961) an Identity link was used. The following variables c Studentlitteratur ° . using Failures as a dependent variable and Mode1 and Mode2 as predictors.5 24 125.3 18 91. The total number of failures recorded in each period is also recorded. The data are repeated here for your convenience: Gowen and Price counted the number of lesions of Aucuba mosaic virus after exposure to X-rays for various times. The purpose of the analysis is to find variables that are related to the number of damage incidents for ships.6 23 85 87.8 22 Fit a Poisson regression model to these data.5 14 137.13 Exercises Exercise 6.6.4.9 47.3 15 52.3 53.edu/datasets).7 32. They have also been used by McCullagh and Nelder (1989). The source of the data is the Lloyd Reg- ister of Shipping.6 27 116.7 56. but also try a log link.

1975-79 mon_serv Aggregate months service for ships in this cell incident Number of damage incidents Type yr_constr per_op mon_serv incident A 60 60 127 0 A 60 75 63 0 A 65 60 1095 3 A 65 75 1095 4 A 70 60 1512 6 A 70 75 3353 18 A 75 60 0 * A 75 75 2244 11 B 60 60 44882 39 B 60 75 17176 29 B 65 60 28609 58 B 65 75 20370 53 B 70 60 7064 12 B 70 75 13099 44 B 75 60 0 * B 75 75 7117 18 C 60 60 1179 1 C 60 75 552 1 C 65 60 781 0 C 65 75 676 1 C 70 60 783 6 C 70 75 1948 2 C 75 60 0 * C 75 75 274 1 D 60 60 251 0 D 60 75 105 0 D 65 60 288 0 D 65 75 192 0 D 70 60 349 2 D 70 75 1208 11 D 75 60 0 * D 75 75 2051 4 E 60 60 45 0 E 60 75 0 * E 65 60 789 7 E 65 75 437 7 E 70 60 1157 5 E 70 75 2161 12 E 75 60 0 * E 75 75 542 1 The number of damage incidents is reported as * if it is a “structural zero”. D or E yr_constr Year of construction in 5-year intervals per_op Period of operation: 1960-74. C. Use the following instructions: • Use a Poisson model with a log link. Include any necessary interactions.13. B. Fit a model predicting the number of damage incidents based on the other variables. c Studentlitteratur ° . • Use all predictors as class variables. • Use log (Aggregate months service) as an oﬀset. Exercises are available: type Ship type A.138 6.

8914 Log Likelihood 138. 7. The counts were treated as independent Poisson variates in a generalized linear model with a log link. 6 imperfections.2221 Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 1.8914 Scaled Pearson X2 18 16. Parts of the results were: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 18 16.6.6 The following table (from Agresti 1996) gives the number of train miles (in millions) and the number of collisions involving British Rail passenger trains between 1970 and 1984. 6.8 (page 131) are rather simple.2676 0.9038 Scaled Deviance 18 16. Response variables as counts 139 • If necessary. It is quite possible to calculate the parameter estimates by hand.6094 0.5878 0. A.2676 0. 8. 14. For example. Test the hypothesis H0 : µA = µB using (i) a Likelihood ratio test. a ship constructed in 1975-79 cannot operate during the period 1960-74. (ii) a Wald test. Is it plausible that the collision counts are independent Poisson variates? Respond by testing a model with only an intercept term. Construct a 95% confidence interval for µB /µA .1764 Scale 0 1. 13. 5. Make that calculation. Also. 3.4 The data in table 6.0000 0. 6. Exercise 6.1414 treat 1 0. try to model any overdispersion in the data.9038 Pearson Chi-Square 18 16.857 on 19 degrees of freedom. Exercise 6. 2.5 An experiment analyzes imperfection rates for two processes used to fabricate silicon wafers for computer chips. 3. For treatment A applied to 10 wafers the number of imperfections are 8. 7. 4. 4. 8.0444 0. Hint: What is the rela- tionship between the parameter β and µB /µA ? Exercise 6. B. 7. Treat- ment B applied to 10 wafers has 9. c Studentlitteratur ° . 11. examine whether inclusion of log(miles) as an oﬀset would improve the fit.0000 A model with only an intercept term gave a deviance of 27.0444 0. 9. Note: some of the observations are “structural zeros”.

Analyze these data using smoking and coﬀee drinking as qualitative vari- ables.8 Even before the space shuttle Challenger exploded on Jan- uary 20. Exercises Year Collisions Miles Year Collisions Miles 1970 3 281 1977 4 264 1971 6 276 1978 1 267 1972 4 268 1979 7 265 1973 7 269 1980 3 267 1974 6 281 1981 5 260 1975 2 271 1982 6 231 1976 2 265 1983 1 249 Exercise 6. NASA had collected data from 23 earlier launches. 1986.140 6. B. Compare the analyses in A. smoking and the risk for myocardial infarction in a case-control study for men under 55 years of age. C. The following data are available: c Studentlitteratur ° .13. 129 80 102 58 118 44 373 85 A. One might ask whether the probability that an O-ring is damaged is related to the temperature. One part of these data was the number of O-rings that had been damaged at each launch. Perform residual analyses. In total there were six such O-rings at the Challenger.7 Rosenberg et al (1988) studied the relationship between coﬀee drinking. and B. Assign scores to smoking and coﬀee drinking and re-analyze the data using these scores as quantitative variables. The data are as follows: Cigarettes per day Coﬀee 0 1-24 25-34 35- per day Case Control Case Control Case Control Case Control 0 66 123 30 52 15 12 36 13 1-2 141 179 59 45 53 22 69 25 3-4 113 106 63 65 55 16 119 30 5. On the fateful day when the Challenger exploded. The data included the number of damaged O-rings. O-rings are a kind of gaskets that will prevent hot gas from leak- ing during takeoﬀ. Exercise 6. the temperature was 31◦ F. in terms of fit. and the temperature (in Fahrenheit) at the time of the launch.

8337 0. binomial model). Deviances for “null models” that only include an intercept were 22.1745 1.8016 Scaled Deviance 21 16.0000 c Studentlitteratur ° . Poisson model) and 24.1034 0. Part of the output from these two analyses are presented below.6.2304 (22 df.8016 Pearson Chi-Square 21 28.6442 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 5.8337 0. No.9691 2. Response variables as counts 141 No.1745 1.434 (22 df. and another model with a binomial distribution and a logit link. Poisson model: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 21 16.7628 Temp 1 -0.3416 Scaled Pearson X2 21 28. of Tempera. of Tempera- o o Defective ture F Defective ture F O-rings O-rings 2 53 0 70 1 57 1 70 1 58 1 70 1 63 0 72 0 66 0 73 0 67 0 75 0 67 2 75 0 67 0 76 0 68 0 76 0 69 0 78 0 70 0 79 0 81 A statistician fitted two alternative Generalized linear models to these data: one model with a Poisson distribution and a log link.3416 Log Likelihood -14.0430 Scale 0 1.0000 0.

13.1156 0.8613 Scaled Deviance 21 18. calculate the probability that three or more of the O-rings fail if the temperature is 31◦ F.8613 Pearson Chi-Square 21 29.4276 Log Likelihood -30. Exercise 6.0525 Temp 1 -0. All main eﬀects and two-way interactions were included. Which of the two models do you prefer? Explain why! D.0000 A.9802 1. Test whether temperature has any significant eﬀect on the failure of O- rings using i) the Poisson model ii) the binomial model B. A model with a three-way interaction fitted slightly better but is not discussed here. Predict the outcome of the response variable if the temperature is 31◦ F i) for the Poisson model ii) for the binomial model C. Part of the results were: c Studentlitteratur ° . Passengers in all traﬃc accidents during 1991 were classified by: Gender Gender of the person (F or M) Location Place of the accident: Urban or Rural Belt Whether the person used seat belt (Y or N) Injury Whether the person was injured in the accident (Y or N) A total of 68694 passengers were included in the data. Exercises Binomial model: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 21 18.0863 0.0863 0.0470 Scale 0 1.9 Agresti (1996) discusses analysis of a set of accident data from Maine.142 6.0000 0.0850 3. Using your preferred model. A log-linear model was fitted to these data using Proc Genmod.9802 1.1982 Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 5.4276 Scaled Pearson X2 21 29.

0000 0.0000 0.2906 0.0157 -0.0000 0.0000 0.0000 0.0000 . .0000 . gender*belt M N 0 0.0000 0.0000 0.0000 0.0001 gender*location F U 0 0. location*injury R N 1 -0. .0000 0. c Studentlitteratur ° .8681 -0.0000 . gender*location M U 0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0288 0. .0000 0.0000 0. . location*belt R N 1 -0. gender*belt F N 1 -0.0000 0. .0 <.0000 0.0000 0.0000 0.0001 gender F 1 0. .0000 .8078 -0.0000 0.2099 0.3510 4.0000 .0000 .2415 -0.3309 0.3752 4.0000 0.4872 394. gender*belt M Y 0 0.0000 0.0000 1.7796 0.0000 .0001 location*belt R Y 0 0.89 <.0269 -0.0000 0.0310 3. .0000 0.36 <. .0000 0.0291 0.0000 .0314 5. gender*injury M Y 0 0.3475 100.0001 belt Y 0 0.16 <.8 <. .0000 0. belt*injury N N 1 -0.0000 0.0000 0.0000 0. belt*injury Y N 0 0.4599 0.0000 .0000 0.2337 0.0000 .0000 0. location*injury U Y 0 0.6702 Scaled Deviance 5 23.0000 0.0000 0. .0276 -0.0001 location*injury R Y 0 0.0000 0.0001 belt*injury N Y 0 0.94 <.0000 0.7550 0.0000 0.7022 784.0000 0. . Response variables as counts 143 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 5 23.0000 .0000 . Scale 0 1.65 <.1167 -0.3510 4.0000 0.7225 0. location*belt U Y 0 0.8367 716.0000 0.0000 0.0000 .0000 .2702 3. belt N 1 0.50 <. . .0000 .0000 0. gender*location M R 0 0.0000 NOTE: The scale parameter was held fixed.0000 0.0000 . . gender*location F R 1 -0.0000 0.17 <. .0000 0.14 <.0000 .6750 Scaled Pearson X2 5 23.0000 0.0162 -0. location*belt U N 0 0.0000 .8984 6. . gender*injury F N 1 -0.0849 0.0000 0.0000 0.0000 0.4907 -0.0000 0.0161 -0.3916 11563.3752 4.0001 injury Y 0 0.0000 1.0000 0.0213 36133.0000 0.0000 0.6.0000 .0000 .50 <.0000 0.4292 860.0000 0.0000 0.9599 0.8140 0.6702 Pearson Chi-Square 5 23.5647 0. gender*injury M N 0 0.0000 .0001 gender*belt F Y 0 0.7599 868.1783 169.0000 0. belt*injury Y Y 0 0.5939 -0.0000 0. injury N 1 3.0000 0.0000 .0000 0. .0000 0. .0000 0.0290 0.0272 -0. .6777 463.0000 0.6750 Log Likelihood 536762. .0001 gender M 0 0. location*injury U N 0 0. . .0000 0.6212 0.6081 Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 5.0000 0.0532 27.0001 location U 0 0. location R 1 0. Calculate and interpret estimated odds ratios for the diﬀerent factors.5405 0.0001 gender*injury F Y 0 0.0000 0.0000 0.

.

which is an ordinal variable. this kind of analysis would disre- gard an important part of the information in the data. undecided. Often Always Total times Yes 24 35 21 30 110 No 1355 603 192 224 2374 Total 1379 638 213 254 2484 ¤ 145 . disagree. ordinal data are sometimes ana- lyzed as if the data had been numeric. However. We will briefly review some of these approaches from the point of view of generalized linear models. The data are given in the following table: Snoring Heart problems Never Some. Several suggestions on the modeling of or- dinal response data have been put forward in the literature. Examples of such variables are diagnos- tics of patients (improved. disagree completely). The amount of snoring was assessed on a scale ranging from “Never” to “Always”. The data were obtained through interviews with the patients. high quality. 7. worse). classification of potatoes (ordi- nary. extra high quality). using some scoring of the response. no change. This approach is often unsatisfactory since the data are then assumed to be “better” than they actually are. An interesting question is whether there is any relation between snoring and heart problems. Ordinal response variables can be analyzed as nominal response using the methods outlined in Chapter 6. Ordinal response Response variables in the form of judgements or other ordered classifications are called ordinal response variables. namely the fact that the categories are ordered.1 Arbitrary scoring Example 7. Alternatively.1 Norton and Dunn (1985) studied the relation between snoring and heart problems for a sample of 2484 patients. answers to opinion items (agree com- pletely. agree. and school marks.7.

Instead of entering the dependence between the variables as the interaction term (αβ)ij . in this context.0001) which is. of course. highly significant.1) of the percentage of persons with heart problems in each snoring category suggests that this percentage increases nearly linearly.97 on 3 df (p < 0.1). In this model. 3. does not use the fact that the snoring variable is ordinal. this gives a deviance of 21. the term γ · ui · vj captures the linear part of the dependence between the scores. This suggests a simple way of accounting for the ordinal nature of the data. For the data on page 145. the corresponding log-linear model ¡ ¢ log µij = µ + αi + β j + (αβ)ij (7. if we choose the arbitrary scores (1. 4) for the snoring categories. A plot (Figure 7. 1996). This analysis. A simple approach to analyzing these data is to ignore the ordinal nature of the data and use a simple χ2 test of independence or.1) This is a saturated model. Arbitrary scoring 12 Proportion with heart 7 problems 2 Score 1 2 3 4 Never Sometimes Often Always Figure 7. The test of the hypothesis of independence corre- sponds to testing the hypothesis that the parameter (αβ)ij is zero.1. This model is called a linear by linear association model (LL model. we write the model as ¡ ¢ log µij = µ + αi + β j + γ · ui · vj (7.1: Relation between heart problems and snoring The main interest lies in studying a possible dependence between snoring and heart problems. see Agresti. as in the saturated model (7.146 7. c Studentlitteratur ° . 2.2) where ui are arbitrary scores for the row variable and vj are scores for the column variable. however.

5545 0. .0000 0.1353 0. CLASS heart snore.0000 .0479 272.0001 SNORE Sometime 0 0.1820 Scaled Pearson X2 2 6.9833 0.2256 386. Parts of the output is Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 2 6.2247 .1199 Scaled Deviance 2 6.3640 3.0773 177.2398 3.0000 0. U*V 1 0.0001 SNORE Often 1 -1.1199 Pearson Chi-Square 2 6.7. . RUN.0001 HEART No 1 4.2258 77.3640 3.4822 0.8977 0. Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 1. CARDS.0000 .0188 0.1304 0.0000 0.6545 0. INPUT heart \$ snore \$ freq u v. MODEL freq = heart snore u*v/ DIST=poisson LINK=log .0001 SCALE 0 1.0000 .1820 Log Likelihood .0794 204. If we include the single parameter γ for the linear by linear association c Studentlitteratur ° .0289 0.0001 HEART Yes 0 0. The model request in Genmod can be written as PROC GENMOD DATA=snoring. Earlier we found that the independence model gives a deviance of 21. SNORE Always 1 -1.4319 0.97 on 3 df .0001 SNORE Never 1 0. Yes Never 24 1 1 Yes Sometimes 35 1 2 Yes Often 21 1 3 Yes Always 30 1 4 No Never 1355 0 1 No Sometimes 603 0 2 No Often 192 0 3 No Always 224 0 4 .2398 3. .7912 0. Ordinal response 147 In a Genmod analysis according to this model we need to arrange the data according to the following data step: DATA snoring.1306 0.0825 62. 13733. NOTE: The scale parameter was held fixed.

However. The row scores are kept fixed and the model is fitted for the column scores. the model is not formally a generalized linear model. but it is subjective in the sense that diﬀerent choices of scores for the ordinal variables may result in diﬀerent conclusions. Andersen.3) where µi and vj are now parameters to be estimated from the data. This method is iterative.2 RC models The method of arbitrary scoring is often useful. for a response variable with 5 categories we would get 5 − 1 = 4 diﬀerent cumulative logits. p < 0. Thus. The method seems to converge in most cases. the model can be written as ¡ ¢ log µij = µ + αi + β j + γ · µi · vj (7.9 on 1 df . where for simplicity we index the categories of the response variable with integers. 1980) is to include the row and column scores as parameters of the model.0001). The original analysis using a simple χ2 test indicated “some form of relationship” between snoring and heart problems. These column scores are then kept fixed and the row scores are estimated. The linear by linear association model suggests that snoring and heart problems may have a positive relationship. An approach that has been suggested (see e. Agresti (1985) suggested methods for fitting this model using standard software. This indicates that most of the dependence between snoring and heart problems is captured by the linear interaction term. These two steps are continued until convergence. called an RC model. . 7. .73 on 1 df which is highly significant.24. The diﬀerence is 15. Thus.g. RC models model we get a model with 2 df and a deviance of 6.2. 7. This model.4) 1 − P (Y ≤ j) The cumulative logits are defined for each of the categories of the response except the first one. is nonlinear since it includes a product term in the row and column scores. + pj . The Wald test of the parameter for the linear by linear association is also highly significant (χ2 = 62.148 7. c Studentlitteratur ° . The cumulative logits are defined as P (Y ≤ j) logit (P (Y ≤ j)) = log (7.3 Proportional odds The proportional odds model for an ordinal response variable is a model for cumulative probabilities of type P (Y ≤ j) = p1 + p2 + . Thus.

for the diﬀerent ordinal values of the response. The model states that the diﬀerent cumulative logits. a set of k − 1 related models if the response has k scale steps. the log odds is proportional to the diﬀerence between x2 and x1 . RUN. has the form P (Y ≤ j|x2 ) /P (Y > j|x2 ) . 68. the model gives.719 with 1 DF (p=0. The proportional odds model is not formally a (univariate) generalized linear model. A simple way of analyzing this type of data is to use the Logistic procedure of the SAS package: PROC LOGISTIC DATA=snoring.217 with 1 DF (p=0. can handle this type of models.2 We continue with the analysis of the data on page 145.7. for two diﬀerent values x1 and x2 of the predictor x. although it can be seen as a kind of multivariate Glim. the functions have diﬀerent intercepts αi but a common slope β. For the sake of illustration we use the ordinal snoring variable as the response. This means that the odds ratio. . (7.632 .351 5505.0001) c Studentlitteratur ° .351 5497. Example 7.e. Ordinal response 149 The proportional odds model for ordinal response suggests that all these cumulative logit functions can be modeled as logit (P (Y ≤ j)) = αj + βx (7.e.632 64.5733) Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 5568. SC 5585. i.1127 with 2 DF (p=0.6) P (Y ≤ j|x1 ) /P (Y > j|x1 ) The log of this odds ratio equals β (x2 − x1 ). FREQ freq. are all parallel but with diﬀerent intercepts. This is why the model is called the proportional odds model. MODEL v = u. in a sense.903 . The following output is obtained: Score Test for the Proportional Odds Assumption Chi-Square = 1. as well as the Logistic procedure in SAS. The Genmod procedure in SAS version 8 (SAS 2000b).0001) Score . and analyze the data to explore whether the risk of snoring depends on whether the patient has heart problems.804 5528. Thus. -2 LOG L 5562.5) i.

CLASS heart.0001 Scale 0 1.6594 844.0001 . The program can be written as PROC GENMOD data=snoring order=data.2012 0.0000 1. the assumption of a common slope) cannot be rejected (p = 0.0686 2. FREQ freq.0000 ¤ 7.0001 Intercept2 1 1.0001 .57).0535 1.7624 -1. The hypothesis that β is zero is rejected (p < 0. The intercept for the last category is set equal to zero.1742 -1. The estimates of the parameters α1 . MODEL v = u /dist=multinomial link=cumlogit. along with an estimate of the common slope β.0001).40 <.0534 846. Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCP1 1 0.4209 0. .150 7.0687 1101. The information is essentially the same as in the Logistic procedure but the standard error estimates are slightly diﬀerent.4497 1.4 Latent variables Another point of view when analyzing ordinal response variables is to assume that the observed ordinal variable Y is related to some underlying.5470 0.0001 .0414 46.53 <.242 A similar analysis can be done using Proc Genmod in SAS version 8 or later.4208 0.2824 0.4137 1102. α2 and α3 are given in the next part of the output. Also.0001 -0.2792 0.0000 1.5545 0. Latent variables The proportional odds assumption (i. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept1 1 0. .0000 0.1619 0.0001 Intercept3 1 2. . INTERCP2 1 1.1446 2.88 <. c Studentlitteratur ° .2824 0.161188 0.3635 46.2792 0.5227 0. U 1 -1.1774 64.49 <.0414 0. Proc Genmod does not automatically test the common slope assumption.0001 u 1 -1. RUN.5545 0. latent. INTERCP3 1 2.0793 66.5871 0.e.4.

McCullagh and Nelder. This is the well-known proportional hazards model used in survival analysis (Cox 1972)..7. (7. In a similar way.7) . a proportional odds model using a complementary log- log link corresponds to a latent variable having a so called extreme value distribution. 1989) that the latent variable approach gives a model that is identical to the proportional odds model with a logit link. c Studentlitteratur ° . for example a logistic or a Normal distribution.7) can be formally seen as a kind of link function. it can be shown (see e. for the case where the latent variable has a logis- tic distribution. However. where the latent variable is assumed to have a symmetric distribution.2.2: Ordinal variable with three scale steps generated by cutting a continuous variable at two thresholds variable η through a relation of type y=1 if η < τ1 y=2 if τ1 ≤ η < τ2 .g. modelling the data by assuming a latent variable underlying the ordinal response is not formally a generalized linear model. y=s if τ s−1 ≤ η An example of this point of view is illustrated in Figure 7. The estimated intercepts would be the estimated thresholds for the latent variables. Ordinal response 151 y=1 t1 y=2 t2 y=3 Figure 7. Although (7.

152 7. see Figure 5. for example ordinal logit regression or or- dinal probit regression. else Y = 0.4. for example such that the distribution of η is standard Normal for one of the values of x. 8 ≤ η < 11.5. As a comparison with the results given on page 149 we have analyzed the data on page 145 using the Logistic procedure and a Normal link. In these cases it is assumed that the obser- vations are generated by a latent variable: if this latent variable is smaller than a threshold τ we observe Y = 1. Latent variables 20 16 y=3 12 y=2 8 y=1 4 0 0 1 2 3 4 Figure 7. approximately. y = 1 is observed if the latent variable η is smaller than the lowest threshold which has a value close to 8. complementary log-log. The scale of η can be chosen arbitrarily.5 and we observe y = 3 if η > 11. 2 or 3. This leads to a class of models called ordinal regression models. We observe the ordinal variable y that has values 1. In practice the scale of η cannot be determined. The concept of an ordinal regression model can be illustrated as in Figure 7.3. Note that probit models and logistic regression models can also be derived as models with latent variables.3: An ordinal regression model A similar approach can also be used for the case where the latent variable is assumed to follow a Normal distribution.1 on page 86. or Normal. The following results were obtained: c Studentlitteratur ° . We observe y = 2 if. In the Genmod or Logistic pro- cedures in SAS it is possible to specify the form of the link function to be logistic.

0258 46. INTERCP3 1 1. 0.1749 0.8415 so the probabilities are 0. The data are as follows: Response Gender Treatment Marked Some None Female Active 16 5 6 Female Placebo 6 7 19 Male Active 5 2 7 Male Placebo 1 0 10 The object is to model the response as a function of gender and treatment.0352 1420.916 with 1 DF (p=0.9355 0. 7.17. are similar to the logistic model.0151.1159. and 1.8408 0.0001 -0.3266 0. . 0.351 5507.5694. U 1 -0.2606) Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 5568.0825 and 0.804 5530. -2 LOG L 5562.0923. 0. INTERCP2 1 0.1071 61.173150 The fit of the model. The three thresholds are estimated to be 0. Ordinal response 153 Score Test for the Equal Slopes Assumption Chi-Square = 2.0300 974.351 5499.0001 .6895 with 2 DF (p=0.706 .0001 .693 with 1 DF (p=0.0227 and 0.2558. For x = 1 the mean value of η is −0.8463. We will attempt a proportional odds model for the cumulative logits and the c Studentlitteratur ° .3 Koch and Edwards (1988) considered analysis of data from a clinical trial on the response to treatment for arthritis pain.8415 0.0001) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Variable DF Estimate Error Chi-Square Chi-Square Estimate INTERCP1 1 0.6773 0.94. 0.0001 . 0. 70.33. For x = 0 this would give the probabilities as 0.5 A Genmod example Example 7. SC 5585.1155 0.0795 0.436 .7.0001) Score . and the conclusions.436 62.

0000 .0001 gender*treat 1 3.0000 LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq gender 1 18.0000 0. gender*treat F A 1 2. gender*treat M A 0 0.7826 1. The program was written as follows: PROC GENMOD data=Koch order=formatted.2358 1. The signs of the parameters indicate that patients on active treatment experienced a higher degree of pain relief and that the females experienced better pain relief than the males. Scale 0 1.17 Intercept2 1 4.6901 5.87 gender*treat F P 0 0.0001 treat 1 28.5503 11.0000 .15 gender F 1 -3.11).0000 1.6591 13. CLASS gender treat.1110 1. ¤ c Studentlitteratur ° .5.0000 0.0000 0.3350 -1. that there are significant gender diﬀerences and that the treatment has a significant eﬀect.0000 0. gender*treat M P 0 0.1390 -6. A Genmod example cumulative probits.0000 0. Part of the output was: Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Intercept1 1 3.0000 0.5251 1.1366 9. RUN.0000 .0000 0.154 7.0150 -1.0000 0.13 gender M 0 0.0000 0. MODEL response = gender treat gender*treat/ LINK=cumlogit aggregate=response TYPE3.3312 4.0341 2.0000 0. treat A 1 -3. using the Genmod procedure of SAS (2000b).0000 . FREQ count.0000 1.60 0.2461 -0. The cumulative probit model gave similar results except that the interaction term was further from being significant (p = 0..6746 1.0000 0. The data were input in a form where the data lines had the form F A 3 16 F A 2 5 .0125 1.0710 -5.0000 0.0579 We can note that there is a slight (but not significant) interaction.4983 6.0000 0.0000 0.03 treat P 0 0.0000 0.5519 19.0000 0.01 <.5533 2..0000 .15 <.

You are also free to analyze the data using other methods that you may have met during your training.7. They were also asked about recent experience of mammography.1 Ezdinli et al (1976) studied two treatments against lympho- cytic lymphoma. from Hosmer and Lemeshow. Ordinal response 155 7. c Studentlitteratur ° . The women were asked the question “How likely is it that mammography could find a new case of breast cancer”. (1989). come from a survey on women’s attitudes towards mammography.2 The following data. After the experiment the tumour of each patient was graded on an ordinal scale from “Complete response” to “Progression”. Examine whether the treatments diﬀer in their eﬃciacy by fitting an appropriate or- dinal regression model. Treatment Total BP CP Complete response 26 31 57 Partial response 51 59 110 No change 21 11 32 Progression 40 34 74 Total 138 135 273 Exercise 7.6 Exercises Exercise 7. Results: Mammography Detection of breast cancer experience Not likely Somewhat likely Very likely Never 13 77 144 Over 1 year ago 4 16 54 Within the past year 1 12 91 Analyze these data.

.

Example 8. suggested by Aitkin (1987) works as follows.1 In our analysis of the data on page 16 we found that the data strongly suggested variance heterogeneity. z is a vector that contains some or all of the predictors x. The response for observation i is modeled using the linear predictor yi = xi β + ei (8. such that some distribution where the variance depends on the mean should be chosen instead of the Normal distribution.1) ¡ ¢ where we assume that ei ∼ N 0. and λ is a vector of parameters to be estimated. as well as the parameters in the model for the variance. Aitkin showed that this process produces the ML estimates. The procedure produces one set of estimates for the mean model and a second set of estimates for the variance model. Thus. This may indicate that the variance heterogeneity can be modeled using Aitkin’s method. This might indicate that the choice of distribution is wrong. Additional topics 8. on convergence. An alternative approach.1 Variance heterogeneity In general linear models. (8. λ) is to iterate between two generalized linear models.8.2) Here. the problem is to estimate the parameters of the linear predictor. A SAS macro for this process is given in the SAS (1997) manual. One model is a model with a Normal distribution and an identity link. The estimation procedure suggested by Aitkin (1987) to estimate the parameter vector (β. was rather symmetric. (8. it is not uncommon that diagnostic tools indicate that the variance is not constant. For these data we obtained the following results for the mean model: 157 . corresponding to (8. but that the distribution. for each contrast medium. The variance σ 2i is modeled as σ2i = exp (λzi ) . σ 2i .1).2). and the other model fits the squared residuals from this model to a Gamma distribution using a log link.

0000 .0000 0.4845 0.4859 20.6872 4.5175 0.g.9475 0.2431 0.0001 3 MEDIUM Hexabrix 1 -5. observations for which we do not know e.3536 785.3) c Studentlitteratur ° .0001 2 MEDIUM Diatrizo 1 1. ¤ 8.0000 .2. 9 SCALE 0 0. as well as in lifetime testing in industry.2 Survival models Survival data are data for which the response is the time a subject has sur- vived a certain treatment or condition.73 on 43 df .2685 0.2347 0.0000 0. Censoring means that the survival time is not known for all individuals when the study is finished. Survival models are used in epidemi- ology.5000 34.3196 0. .0001 6 MEDIUM Omnipaqu 1 -1.9524 0. i.0014 7 MEDIUM Ringer 1 -9.2197 0.4743 10.0905 0. . Survival models Mean model OBS PARM LEVEL1 DF ESTIMATE STDERR CHISQ PVAL 1 INTERCEPT 1 9.0883 0.0000 .0001 5 MEDIUM Mannitol 1 -2.6872 10.6708 0.1732 0.0016 0.0000 . .158 8.0001 8 MEDIUM Ultravis 0 0.8269 0.8927 7 MEDIUM Ringer 1 -6.8062 2.3517 0.0320 4 MEDIUM Isovist 1 -2. 9 SCALE 0 1.0001 2 MEDIUM Diatrizo 1 11.4859 149. .0001 8 MEDIUM Ultravis 0 0.9560 0.8867 0.8680 0.2914 0.e.5701 405.6788 0. the duration of disease when the study started. Denote the density function for the survival time with f (t).9075 0.0000 0. and let the Rt corresponding distribution function be F (t) = f (s) ds. The estimates for the variance model were as follows: Variance model OBS PARM LEVEL1 DF ESTIMATE STDERR CHISQ PVAL 1 INTERCEPT 1 2. The procedure converged after two iterations producing an overall deviance of 257.2620 0.5982 0. is also possible. For right censored observations we only know that the survival time is at least the time at which censoring occurred.1017 3 MEDIUM Hexabrix 1 -1. Censoring is a special feature of survival data.5000 0.0182 0.7796 0.7319 73.0001 4 MEDIUM Isovist 1 -8.4736 0.4859 287.6872 0.8140 0.0016 5 MEDIUM Mannitol 1 -0. The survival −∞ function is defined as S (t) = 1 − F (t) (8.6087 6 MEDIUM Omnipaqu 1 0. Left censoring.6975 0.5175 351.

e. i.4) S (t) dt The hazard function measures the instantaneous risk of dying. 2. A semiparametric approach is to leave the distribution unspecified but to assume that the hazard function changes in steps which occur at the observed events. This is the basis for the so called Kaplan-Meier estimates of the survival function. The exponential distribution. Additional topics 159 and the hazard function is defined as f (t) d log (S (t)) h (t) = =− (8. data where there is no censoring can be analyzed using standard GLIM software. Weibull distribution or extreme value distribution are often used to model survival times. However. which is equivalent. choosing a hazard function. the distribution of survival times is assumed to have some specified parametric form. The cumulative hazard function is Zt H (t) = h (s) ds (8. the treatment of censoring makes it more convenient to analyze gen- eral survival data using special programs. as long as the desired survival distribution belongs to the exponential family. The white cell count c Studentlitteratur ° . This can be done in diﬀerent ways: 1. For a more thor- ough description of analysis of survival data. 3.2 Feigl and Zelen (1965) analyzed the survival times for leukemia patients classified as AG positive or AG negative.8.1 An example Although survival models are often discussed in texts on generalized linear models.5) −∞ Modelling of survival data includes choosing a suitable distribution for the survival times or. We will here only give examples of the parametric approach. but is estimated nonparametrically through the observed survival distribution. the proba- bility of dying in the next small time interval of duration dt. reference is made to standard textbooks such as Klein and Moeschberger (1997). Example 8. the survival function is not specified. In parametric models. In nonparametric modelling. 8.2.

Survival models Table 8. we model the data using a Gamma distribution. The program is PROC GENMOD data=feigl. 2300 65 4400 56 750 156 3000 65 4300 100 4000 17 2600 134 1500 7 6000 16 9000 16 10500 108 5300 22 10000 121 10000 3 17000 4 19000 4 5400 39 27000 2 7000 143 28000 3 9400 56 31000 8 32000 26 26000 4 35000 22 21000 3 100000 1 79000 30 100000 1 100000 4 52000 5 100000 43 100000 65 (WBC) for each patient is also given. MAKE obstats out=ut.2. CLASS ag. The interaction ag*logwbc was not significant. The data are reproduced in Table 8. The log of the WBC was used.1: Survival of leukemia patients AG + AG − WBC Surv. MODEL survival = ag logwbc / DIST=gamma obstats residuals.160 8. RUN. WBC Surv.1. c Studentlitteratur ° . As a first attempt.

The eﬀects of WBC and AG are both significant.0440 1.0000 .0020 0.3 on 30 df. LOGWBC 1 0.5 -0.5 -1.0SL=2.2065 . Deviance/df =1.5 0 50 100 150 200 250 Residual Fit Figure 8. one very large fitted value stands out.9403 AG + 1 -0.1.0000 0. ¤ c Studentlitteratur ° . A residual plot based on these data is given in Figure 8.3348 Scaled Deviance 30 38. The distribution seems reasonable.0024 6.2985 1.2495 0. Additional topics 161 Residual Model Diagnostics Normal Plot of Residuals I Chart of Residuals 3 3.3884 -1 -1 -2 5 -2 -3 -3. Fits 6 1 5 Frequency Residual 4 0 3 2 -1 1 -2 0 -2.5 1.0SL=-3.5899 0.28.3814 .2766 Pearson Chi-Square 30 29.0220 AG .1: Residual plots for Leukemia data Parts of the output is Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 30 40. We asked the procedure to output predicted values and standardized Deviance residuals to a file.6222 0. .314 1 2 1 Residual Residual 0 0 X=-0.0 0.9444 Log Likelihood . .0056 0. The fit of the model is reasonable with a scaled deviance of 38. -146.3310 0.0344 0.9874 Scaled Pearson X2 30 28.0150 5.9564 0.0 -0.0061 0.0 -1.0 1.8.0262 0.091 -4 -2 -1 0 1 2 0 10 20 30 Normal Score Observation Number Histogram of Residuals Residuals vs.0103 SCALE 1 0. Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 -0. 0 0.

or robust.7) vi −∞ where f (yi ) is some arbitrary function of yi . The inverse I−1 θ is an estimator of the covariance matrix³of ´the parameter estimates. Quasi-likelihood can give a similar peace of mind to users of generalized linear models. 1983). Maximizing (8. for example.) when we fit generalized linear models. binomial.3 Quasi-likelihood In general linear models.162 8. Quasi-likelihood 8. regression models. QL estimators can be shown to have nice asymptotic properties. In principle. Estimation of parameters in GLM:s is of- ten done using some variation of Least squares. An alternative approach is to use the so called em- pirical. the mean µi and the variance vi . Wedderburn (1974) suggested that this can be used to define a class of estimators that do not require explicit expressions for the distributions. Q (yi . we need to specify a distribution (Poisson. QL estimators are asymptotically unbiased and eﬃcient among the class of estimating equations which are linear functions of the data (McCullagh. Estimators of the variances of QL estimators can be obtained in diﬀerent ways. This is called the model-based estimator of Cov β b . they are consistent. A type of generalized linear models can be constructed by specifying the linear predictor η and the way the variance v depends on µ. we can estimate pa- rameters in. The matrix Iθ of second order derivatives of (8. µi ) is called a quasi- likelihood. However. This integral is Zµi yi − t Q (yi . µi ) = dt + f (yi ) (8. the assumption that the observation come from a Normal distribution is not crucial. The integral of (8. estimator. as long as the linear predictor is correctly specified. Secondly. Wedderburn (1974) noted the following property of generalized linear models. for which certain optimality properties are valid even under non-normality.7) gives the QL equiva- lent of the Fisher information matrix. without being too much worried by non-normality.3. regardless of whether the variance assumption vi = V (µi ) is true. First.7) with respect to the parameters of the model yields quasi-likelihood (QL) estimators. Normal etc. which is less sensitive to assumptions regarding c Studentlitteratur ° .6) can be seen as a kind of likelihood function. The score equations for the regression coeﬃcients β have the form X ∂µ i −1 vi (yi − µi (β)) = 0 (8. Thus.6) i ∂β Note that this expression only contains the first two moments.

were fed diﬀerent diets during pregnancy and lactation. After three weeks it was recorded how many of the live born pups that still were alive. The data are given as x/n where x is the number of surviving pups and n is the total litter size. The control diet was a standard food whereas the treatment diet contained a certain chemical agent. It is also used in the GEE method for analysis of repeated measures data. Example 8. This is also called the sandwich estimator. Control 13/13 12/12 9/9 9/9 8/8 8/8 12/13 11/12 9/10 9/10 8/9 11/13 4/5 5/7 7/10 7/10 Treated 12/12 11/11 10/10 9/9 10/11 9/10 9/10 8/9 8/9 4/5 7/9 4/7 5/10 3/6 3/10 0/7 A standard logistic model has a rather bad fit with a deviance of 86.19 on 30 df. at least if the sample is reasonably large.8.4 Quasi-likelihood for modeling overdispersion The quasi-likelihood approach is sometimes useful when the data show signs of over-dispersion. We will illustrate this idea based on a set of data from Liang and Hanfelt (1994). each consisting of sixteen pregnant fe- males. In this model the treatment eﬀect is significant.0027). It has general form ³ ´ d β Cov b = I−1 I1 I−1 θ θ where Iθ is the information matrix and k X 0 0 ∂µ ∂µ I1 = i Vi−1 i . QL estimation is a viable alternative to the methods for modeling over-dispersion presented in earlier chapters. Additional topics 163 variances and covariances.3 Two groups of rats. c Studentlitteratur ° . and in the method for analysis of mixed generalized linear models discussed below.0001.0036) and when we use a likelihood ratio test (p = 0. p < 0. 8. both when we use a Wald test (p = 0. Since the emprical variance estimates obtained in QL estimation are rather robust against the variance assumption. i=1 ∂β ∂β Software supporting quasi-likelihood may have options to choose between model-based and robust variance estimators. The quasi-likelihood approach can be used in over-dispersed models.

4751 0. In the simulations. the overdispersion parameter was c Studentlitteratur ° .043) but the Score test is not (p = 0.0014 treat c 0. The methods included modeling as a beta-binomial distribution. The output is given in two parts.8925 2.4. . ¤ In the paper by Liang and Hanfelt (1994).) The quasi-likelihood estimates can be obtained in Proc Genmod by using the following trick. RUN. The program can be written as PROC GENMOD data=tera.164 8. The first part uses the model-based esti- mates of variances and these results are identical to the first output.0000 0.4747 1.0300 1. Such a model gives a non-significant treatment eﬀect (Wald test: p = 0. CLASS treat litter. MODEL x/n=treat / DIST=bin LINK=logit type3.89 0.0000 0. this can be modeled by including a dispersion parameter in the model.9693 3. REPEATED subject=litter.2220 0.0890 In these results the Wald test is significant (p = 0.9612 0. The second part presents the QL results: Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept 1. If it can be assumed that the dispersion parameter is the same in both groups.20 0.0431 treat t 0. as discussed in Chapter 5. a simulation study compared the performances of diﬀerent methods for allowing for overdispersion in this type of data.0000 0. Proc Genmod can use QL. Score Statistics For Type 3 GEE Analysis Chi- Source DF Square Pr > ChiSq treat 1 2.0890). We can then request a repeated-measures analysis but with only one measurement per female.3813 0. but only in repeated-measures models. and two QL approaches. LR test: p = 0.02 0.0000 . Quasi-likelihood for modeling overdispersion The bad fit may be caused by heterogeneity among the females: diﬀerent females may have diﬀerent ability to take care of their pups.0855.0765.

. Models for repeated measures data have the same basic components as other generalized linear models. 8.) Thus.. and we have a Pk total of ni measurements. The main problem with repeated measures data is that observations within one individual are correlated.8) i=1 ∂β This is similar to the quasi-likelihood equation (8.. Additional topics 165 diﬀerent for diﬀerent treatments.5 Repeated measures: the GEE approach Suppose that measurements have been made on the same individuals on k occasions. j = 1.. The GEE approach means that we estimate the parameters by solving the GEE equation k X ∂µ i Vi−1 (Yi − µi (β)) = 0 (8. the QL approach assuming constant overdispersion performed surprisingly well. It can be shown that the multivariate quasi-likelihood ap- proach provides consistent estimates of the parameters even if the covariance c Studentlitteratur ° . The GEE approach can be seen as an extension of the quasi-likelihood approach to a multivariate mean vector. The corresponding vector of mean values is £ ¤0 µi = µi1 . Suppose that we store all data for individual i in the vector Yi that has elements Yi = [Yi1 . But in addition we need to consider how observations within individuals are correlated. see also Diggle. The values of the independent variables for individual i at measurement (occasion) j are 0 collected in the vector xij = [xij1 . The vector β contains the parameters to be estimated. k. There are several ways to model this cor- relation. . We will here only consider the Generalized estimating equations approach of Liang and Zeger (1986). xijp ]0 . It was also concluded that results based on the beta-binomial distribution “can lead to severe bias in the estimation of the dose-response relationship” (p. but it is here written in matrix form. . The responses can then be represented as Yij . µini . .. 878. We need to specify a link function. . the QL approach seems to be a useful and rather robust tool for modeling overdispersed data. . . ni ... a distribution and a linear predictor. This approach is available in the Genmod procedure in SAS (2000b). . This type of data is called repeated measures i=1 data. Nevertheless.. i = 1. Let Vi be the covariance matrix of Yi ...8. Liang and Zeger (1994). Subject i has measurements on ni occasions. . . Yini ]0 . .6).

m where t is the time  0 if t > m span between the observations. Both approaches are available in Proc Genmod. The correlations between measurement occasions are modeled by a vector of parameters α. If we assume unstructured correlations we need to es- timate k (k − 1) /2 correlations. The children were from two diﬀerent cities. The structure of the data is given in Table 8. the choice of model for the covariance structure is a compromise between realism and parsimony. while the m-dependent correlation structure includes fewer correlations. The exchangeable structure includes only one parameter. Example 8. The fixed structure means that the user enters all correla- tions. The correlation is 0 for occasions more than m time units apart. c Studentlitteratur ° .166 8. The following correlation structures are available in Proc Genmod: Fixed (user-specified): Corr (Yij . 1994) were followed from the age of 9 to the age of 12. ni − j As usual. Repeated measures: the GEE approach structure is incorrectly specified.. A model with more parameters is often more realistic.   1 if t = 0 m-dependent: Corr (Yij .5. The model-based approach assumes that of β the model is correctly specified. so there are no parameters to estimate. AR(1): Corr (Yij . ½ 1 if j = k Exchangeable: Corr (Yij . the complete data set is available from the publishers home page as the file Wheezing. The robust approach provides consistent es- timates of variances and covariances even if Vi is incorrectly specified... 1. The binary response variable was the wheezing status of the child.dat.. but may be more diﬃcult to interpret and may give conver- gence problems. Yik ) = αt for t = 0. age.2. α if j 6= k ½ 1 if j = k Unstructured: Corr (Yij . . Yik ) = rjk . All correlations are equal.. The AR(1) structure also has only one param- eter but it is often intuitively appealing since the correlations decrease with increasing distance.. The explanatory variables were city. Yik ) = .4 Sixteen children (taken from the data of Lipsitz et al. and maternal smoking status. Estimates of the variances and covariances b can be obtained in two ways. Yik ) = αt if t = 1. . Yik ) = αjk if j 6= k Autoregressive.

RUN.0661 Scaled Pearson X2 60 63. -38. c Studentlitteratur ° . These are correlated.9380 1. The following output is obtained: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 60 76. with an exchangeable correlation structure as described above.9651 1. CLASS child city.4690 . REPEATED subject=child / type = exch covb corrw. The Repeated statement indicates that there are several measurements for each child. This program models the probability of wheezing as a function of city. The eﬀects of age and smoking are assumed to be linear. age and maternal smoking.2823 Pearson Chi-Square 60 63. A binomial distribution with a logit link is used.9651 1.9380 1.0661 Log Likelihood .2: Structure of the data on wheezing status Child City Age Smoke Wheeze 1 Portage 9 0 1 1 Portage 10 0 1 1 Portage 11 0 1 1 Portage 12 0 0 2 Kingston 9 1 1 2 Kingston 10 2 1 2 Kingston 11 2 0 A Genmod program for analysis of these data can be written as PROC GENMOD DATA=wheezing.8. Additional output is also requested. Additional topics 167 Table 8. MODEL wheeze = city age smoke /dist=bin link=logit.2823 Scaled Deviance 60 76.

9991 .0928 0.96838 -0.2272 1.7971 Scale 0. may be related to the rather small sample size.3429 -.01553 0.01658 PRM2 -0.01877 PRM4 -0.3610 -0.002201 PRM5 -0.29893 0.6764 CITY Kingston 0. along with their empirical standard error estimates.0560 -4.2571 0.6147 -.2650 0.2754 3.2788 -0.0000 0.01587 0.8595 CITY Portage 0.83232 -0.01658 PRM5 0. of course.71511 -0.02187 0.85121 -0.16530 0.4652 SMOKE -0.002411 0.8003 0.45733 -0. . This.47378 0.33891 -0.0000 AGE -0.6 Mixed Generalized Linear Models Mixed models are models where some of the independent variables are as- sumed to be fixed.40467 0.0000 0.7501 0.6.4709 0. Covariance Matrix (Model-Based) Covariances are Above the Diagonal and Correlations are Below Parameter Number PRM1 PRM2 PRM4 PRM5 PRM1 5.06353 -0.7141 7. and using the “robust” method.53133 0. Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Empirical 95% Confidence Limits Parameter Estimate Std Err Lower Upper Z Pr>|Z INTERCEPT 1. .0000 0. ¤ 8. An example of an applica- tion of mixed models is when several measurements have been taken on the c Studentlitteratur ° .e.04007 PRM4 -0. . while others are seen as randomly sampled from some population or distribution. None of the parameters are significantly diﬀerent from 0.13032 The covariance matrix of β b is estimated in two ways: assuming that the model for V is correct.05737 0.0000 0.7302 0.2036 0. Mixed models have proven to be very useful in modeling diﬀerent phenomena.05268 -0.22386 -0. i.1219 0.6883 -1.16667 PRM2 -0.4174 0.19088 Covariance Matrix (Empirical) Covariances are Above the Diagonal and Correlations are Below Parameter Number PRM1 PRM2 PRM4 PRM5 PRM1 9.1771 0.13847 0.15108 0.0000 0. Mixed Generalized Linear Models The model fits the data reasonably well.07775 -0.168 8.16125 -0.97676 0. chosen beforehand. Estimates of the parameters of the model are given.

Desmopressin.9) are often the regression parameters β. In mixed models based on Normal theory it is often assumed that ui ∼ N (0. There are situations when the response is of a type amenable for GLIM estimation but where there would be a need to assume that some of the in- dependent variables are random.4 mg. and Wolfinger and O’Connell (1993) have explored a pseudo-likelihood approach to fitting models such as (8. 0. where a dry night was recorded as 1 and a wet night as 0. Only one patient is listed. Essentially. In general. A special SAS pro- cedure. D is a general covariance matrix of dimension q × q. yi is the ni × 1 response vector for individual i. The children received 0.9) In (8. and Olsson and Neveus (2000). Example 8. The structure of the data is given in Table 8. Additional topics 169 same individual. the original data set contained 33 patients. The nights were grouped into sets of five nights where the same treatment had been given. Proc Mixed. β is a p × 1 parameter vector for the fixed eﬀects. In such cases the eﬀect of individual can often be included in the model as a random eﬀect. as yi = Xi β + Zi ui + ei (8. A mixed linear model for a continuous response variable y can be written. Σi ). The method and the macro are described in Littell et al (1996). and where a link function is used to model the expected response as a function of the linear predictor. D) and that ei ∼ N (0. the actual eﬀects of the random factors is not of primary concern. and ui is a q × 1 vector of random eﬀects. The data consisted of nightly recordings. c Studentlitteratur ° . A SAS macro Glim- mix has been written to do the estimation. all suﬀering from monosymptomatic nocturnal enuresis.8.9). for more details about the study and its analysis see Neveus et al (1999). Xi is a ni × p design matrix that contains values for the fixed eﬀect variables. where Ini is the identity matrix of dimension ni . The study was carried out with a double-blind randomized three- period cross-over design. Wet and dry nights were documented.9) but where the distributions are free to be any member of the exponential family. The parameters of interest in a model such as (8. Zi is a ni × q matrix that contains the random eﬀects variables. Σi is often chosen to be equal to σ 2 Ini . and estimates of the variance components.3. the macro iterates between Proc Mixed and Proc Genmod.8 mg. or placebo tablets at bedtime for five consecutive nights with each dosage. were enrolled in a study. Desmopressin. A wash-out period of at least 48 hours without any medi- cation was interspersed between treatment periods.5 Thirty-three children between the ages of 6 and 16 years. can be used for fitting mixed linear models in cases where the response variable is continuous and approximately normally distributed. Breslow and Clayton (1993). for each individual i.

10).6.j−1] (8.j] + λd[i. the model assumes a binomial distribution of the observations. c Studentlitteratur ° .j] is the direct eﬀect of the treatment administered in period j of group i. sik is the random eﬀect of patient k in sequence i. The numbers in the table are p-values to assess the significance of the diﬀerent factors. πj is the eﬀect of period j. µ is a general mean. the linear predictor part of a general model for our data may be written as ηijk = µ + sik + πj + τ d[i.j−1] is the carry-over eﬀect of the treatment administered in period j − 1 of group i.170 8.10) In (8. The model further includes a logistic link function: µijk ηijk = log (8. τ d[i. Patient was included as a random factor in all models. Models containing diﬀerent combinations of model parameters were tested. The results are summarized in the following table. and λd[i.11) 1 − µijk Finally. Mixed Generalized Linear Models Table 8.3: Raw data for one patient in the enuresis study Patient Period Dose Night Dry 1 1 1 1 1 1 1 1 2 1 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 2 0 1 0 1 2 0 2 0 1 2 0 3 1 1 2 0 4 0 1 2 0 5 0 1 3 2 1 1 1 3 2 2 1 1 3 2 3 1 1 3 2 4 1 1 3 2 5 1 Following Jones and Kenward (1989).

Seq.0759 .0001 .0001 . .7442 Dose. provided an appropriate description of the data. Period . ¤ c Studentlitteratur ° .0001 . Further analyses using pairwise comparisons revealed that there were no significant diﬀerences between doses but that the drug had a significant eﬀect at both doses.0898 .0938 Dose. After eﬀ. it was concluded that a model containing a random Patient eﬀect.6272 Dose.8713 Dose. Period . Neither the sequence eﬀect nor the after eﬀect was anywhere close to being significant in any of the analyses. After eﬀ. Seq . Period.7762 .6573 Dose. Seq .. and fixed eﬀects of Dose and Period. .0001 Dose.0001 .8.0001 . After eﬀ. Additional topics 171 Eﬀects included Dose Period Sequence After eﬀect Dose .0001 .7577 Based on these results..

Re-analyze these data using the method discussed in Section 8.0 1 22 5.0 0 65 100.5 1 3 10. Time wbc AG Time wbc AG 65 2.0 1 2 27.4 1 3 28.0 1 7 1.4 1 1 100.0 1 8 31.0 0 156 0. Also note that there are no censored observations so a straight-forward application of generalized linear models is possible. Try a survival distribution based on the Gamma distribution.5 0 1 100. in thousands) and the AG factor was also recorded. Exercise 8.0 0 5 52.4 1 4 26. For each patient the white cell count (wbc.0 1 17 4.1 Survival times in weeks were recorded for patients with acute leukaemia.0 0 134 2.0 1 16 6. Note that the wbc value may need to be transformed.2 The data in Exercise 1.3 1 30 79.0 1 3 21.0 0 56 9. The c Studentlitteratur ° .3 0 143 7.0 0 100 4.7 Exercises Exercise 8.2 show some signs of being heteroscedas- tic.7.6 1 4 100. Exercises 8. Construct a model that can predict survival time based on wbc and ag.0 0 121 10.0 1 The four largest wbc values are actually larger than 100.3 1 65 3.0 0 22 35.0 0 39 5.0 0 4 17.0 1 43 100. The data are as follows: The level of cortisol has been measured for three groups of patients with diﬀerent syndromes: a) adenoma b) bilateral hyperplasia c) cardinoma.172 8. Patients with positive AG factor had Auer rods and/or granulate of the leukemia cells in the bone marrow at diagnosis while the AG negative patients had not.0 0 26 32.0 0 56 4.8 1 4 19.0 0 108 10.0 1 16 9.

5 5. The experiment was designed to answer the following types of questions: Are the eating preferences of the lice diﬀerent for diﬀerent varieties? Are the c Studentlitteratur ° . Air was allowed to flow through the boxes from Variety 1 to Variety 2.2.8.7 13.3 An experiment on lice preferences for diﬀerent varieties of plants (Nincovic et al.1 15.8 4.4 7. 10 lice were placed in each tube.1 8.8 1.9 3.2 1. Tubes were placed on four leaves of the plant of Variety 2. 2002) was preformed in the following way: Plants of one variety (Variety 1) were placed in a box.1 9.0 3. After about two hours it was recorded how many of the 10 lice that were eating from the plant.8 7.7 6.8 53. see Figure 8.2 3.3 10. Additional topics 173 Air flow Variety 1 Variety 2 Figure 8.9 9.6 3.6 Exercise 8.2: Experimental layout for lice experiment results are summarized in the following table: a b c 3.9 15.8 9. An adjacent box contained plants of some other variety (Variety 2).

7. The complete dataset contains 320 observations and is listed at the end of the exercise. only a few observations are listed. n2 is the number of lice in the tube. Pot Tube x2 n2 Var1 Var2 1 1 9 10 F K 1 2 7 10 F K 1 3 7 10 F K 1 4 8 10 F K 2 5 9 10 F K 2 6 10 10 F K 2 7 10 10 F K 2 8 10 10 F K 3 9 10 10 F K 3 10 10 10 F K 3 11 9 10 F K Pot indicates pot number and Tube indicates tube number.174 8. Exercises eating preferences aﬀected by the smell from Variety 1? The structure of the raw data was as follows. Variety 2 or a combination of these. c Studentlitteratur ° . Hint: Since repeated observations are made on the same plants it may be reasonable to include Pot as a random factor in the model. and x2 is the number of lice eating after two hours. respectively. Formulate a model for these data that can answer the question whether the eating preferences of the lice depends on Variety 1. Var1 and Var2 are codes for Variety 1 and Variety 2.

8. Additional topics 175 4 15 8 10 K H Pot Tube x2 n2 Var1 Var2 4 16 9 10 K H 1 1 9 10 F K 5 17 6 10 K H 1 2 7 10 F K 5 18 9 10 K H 1 3 7 10 F K 5 19 7 10 K H 1 4 8 10 F K 5 20 8 10 K H 2 5 9 10 F K 1 1 9 10 F H 2 6 10 10 F K 1 2 7 10 F H 2 7 10 10 F K 1 3 9 10 F H 2 8 10 10 F K 1 4 8 10 F H 3 9 10 10 F K 2 5 7 10 F H 3 10 10 10 F K 2 6 9 10 F H 3 11 9 10 F K 2 7 8 10 F H 3 12 7 10 F K 2 8 8 10 F H 4 13 8 10 F K 3 9 7 10 F H 4 14 9 10 F K 3 10 7 10 F H 4 15 8 10 F K 3 11 7 10 F H 4 16 9 10 F K 3 12 6 10 F H 5 17 9 10 F K 4 13 8 10 F H 5 18 9 10 F K 4 14 8 10 F H 5 19 10 10 F K 4 15 8 10 F H 5 20 10 10 F K 4 16 7 10 F H 1 1 8 10 K F 5 17 7 10 F H 1 2 7 10 K F 5 18 7 10 F H 1 3 10 10 K F 5 19 2 10 F H 1 4 7 10 K F 5 20 7 10 F H 2 5 9 10 K F 1 1 9 10 H F 2 6 8 10 K F 1 2 10 10 H F 2 7 10 10 K F 1 3 9 10 H F 2 8 8 10 K F 1 4 9 10 H F 3 9 10 10 K F 2 5 7 10 H F 3 10 10 10 K F 2 6 9 10 H F 3 11 10 10 K F 2 7 8 10 H F 3 12 10 10 K F 2 8 7 10 H F 4 13 7 10 K F 3 9 7 10 H F 4 14 10 10 K F 3 10 7 10 H F 4 15 7 10 K F 3 11 10 10 H F 4 16 10 10 K F 3 12 7 10 H F 5 17 8 10 K F 4 13 8 10 H F 5 18 7 10 K F 4 14 9 10 H F 5 19 10 10 K F 4 15 10 10 H F 5 20 7 10 K F 4 16 8 10 H F 1 1 9 10 H K 5 17 10 10 H F 1 2 7 10 H K 5 18 9 10 H F 1 3 7 10 H K 5 19 8 10 H F 1 4 8 10 H K 5 20 7 10 H F 2 5 7 10 H K 1 1 10 10 A K 2 6 10 10 H K 1 2 8 10 A K 2 7 9 10 H K 1 3 9 10 A K 2 8 8 10 H K 1 4 7 10 A K 3 9 8 10 H K 2 5 8 10 A K 3 10 9 10 H K 2 6 6 10 A K 3 11 7 10 H K 2 7 8 10 A K 3 12 9 10 H K 2 8 7 10 A K 4 13 10 10 H K 3 9 8 10 A K 4 14 8 10 H K 3 10 8 10 A K 4 15 6 10 H K 3 11 10 10 A K 4 16 9 10 H K 3 12 9 10 A K 5 17 9 10 H K 4 13 7 10 A K 5 18 8 10 H K 4 14 8 10 A K 5 19 8 10 H K 4 15 10 10 A K 5 20 9 10 H K 4 16 8 10 A K 1 1 8 10 K H 5 17 7 10 A K 1 2 8 10 K H 5 18 10 10 A K 1 3 8 10 K H 5 19 8 10 A K 1 4 8 10 K H 5 20 7 10 A K 2 5 7 10 K H 1 1 7 10 K A 2 6 8 10 K H 1 2 9 10 K A 2 7 8 10 K H 1 3 10 10 K A 2 8 7 10 K H 1 4 8 10 K A 3 9 9 10 K H 2 5 8 10 K A 3 10 8 10 K H 2 6 8 10 K A 3 11 9 10 K H 2 7 8 10 K A 3 12 6 10 K H 2 8 7 10 K A 4 13 7 10 K H 4 14 8 10 K H c Studentlitteratur ° .

176 8.7. Exercises 3 9 6 10 K A 1 3 9 10 H A 3 10 3 10 K A 1 4 8 10 H A 3 11 7 10 K A 2 5 9 10 H A 3 12 8 10 K A 2 6 10 10 H A 4 13 4 10 K A 2 7 8 10 H A 4 14 8 10 K A 2 8 5 10 H A 4 15 9 10 K A 3 9 5 10 H A 4 16 8 10 K A 3 10 9 10 H A 5 17 6 10 K A 3 11 8 10 H A 5 18 5 10 K A 3 12 8 10 H A 5 19 10 10 K A 4 13 10 10 H A 5 20 7 10 K A 4 14 9 10 H A 1 1 8 10 A F 4 15 9 10 H A 1 2 5 10 A F 4 16 9 10 H A 1 3 5 10 A F 5 17 8 10 H A 1 4 6 10 A F 5 18 7 10 H A 2 5 8 10 A F 5 19 9 10 H A 2 6 10 10 A F 5 20 5 10 H A 2 7 3 10 A F 1 1 9 10 K K 2 8 7 10 A F 1 2 9 10 K K 3 9 6 10 A F 1 3 9 10 K K 3 10 9 10 A F 1 4 5 8 K K 3 11 6 10 A F 2 5 10 10 K K 3 12 8 10 A F 2 6 8 10 K K 4 13 8 10 A F 2 7 9 10 K K 4 14 5 10 A F 2 8 7 10 K K 4 15 8 10 A F 3 9 8 10 K K 4 16 6 10 A F 3 10 10 10 K K 5 17 5 10 A F 3 11 7 10 K K 5 18 6 10 A F 3 12 10 10 K K 5 19 9 10 A F 4 13 9 10 K K 5 20 7 10 A F 4 14 7 10 K K 1 1 8 10 F A 4 15 8 10 K K 1 2 10 10 F A 4 16 8 10 K K 1 3 9 10 F A 5 17 8 10 K K 1 4 9 10 F A 5 18 7 10 K K 2 5 8 10 F A 5 19 10 10 K K 2 6 8 10 F A 5 20 9 10 K K 2 7 7 10 F A 1 1 9 10 A A 2 8 7 10 F A 1 2 10 10 A A 3 9 7 10 F A 1 3 10 10 A A 3 10 8 10 F A 1 4 10 10 A A 3 11 10 10 F A 2 5 10 10 A A 3 12 6 10 F A 2 6 6 10 A A 4 13 10 10 F A 2 7 9 10 A A 4 14 9 10 F A 2 8 8 10 A A 4 15 8 10 F A 3 9 8 10 A A 4 16 9 10 F A 3 10 10 10 A A 5 17 9 10 F A 3 11 10 10 A A 5 18 8 10 F A 3 12 8 10 A A 5 19 8 10 F A 4 13 4 10 A A 5 20 9 10 F A 4 14 8 10 A A 1 1 7 10 A H 4 15 8 10 A A 1 2 7 10 A H 4 16 8 10 A A 1 3 6 10 A H 5 17 7 10 A A 1 4 4 10 A H 5 18 9 10 A A 2 5 8 10 A H 5 19 9 10 A A 2 6 7 10 A H 5 20 8 10 A A 2 7 5 10 A H 1 1 9 10 F F 2 8 7 10 A H 1 2 8 10 F F 3 9 7 10 A H 1 3 9 10 F F 3 10 5 10 A H 1 4 8 10 F F 3 11 6 10 A H 2 5 9 10 F F 3 12 6 10 A H 2 6 9 10 F F 4 13 10 10 A H 2 7 8 10 F F 4 14 9 10 A H 2 8 9 10 F F 4 15 8 10 A H 3 9 8 10 F F 4 16 8 10 A H 3 10 8 10 F F 5 17 6 10 A H 3 11 8 10 F F 5 18 7 10 A H 3 12 7 10 F F 5 19 5 10 A H 4 13 8 10 F F 5 20 8 10 A H 4 14 8 10 F F 1 1 7 10 H A 4 15 8 10 F F 1 2 6 10 H A 4 16 9 10 F F c Studentlitteratur ° .

Additional topics 177 5 17 10 10 F F 5 18 9 10 F F 5 19 10 10 F F 5 20 8 10 F F 1 1 9 10 H H 1 2 7 10 H H 1 3 8 10 H H 1 4 7 10 H H 2 5 8 10 H H 2 6 6 10 H H 2 7 8 10 H H 2 8 7 10 H H 3 9 10 10 H H 3 10 7 10 H H 3 11 9 10 H H 3 12 8 10 H H 4 13 8 10 H H 4 14 10 10 H H 4 15 8 10 H H 4 16 9 10 H H 5 17 10 10 H H 5 18 10 10 H H 5 19 8 10 H H 5 20 9 10 H H c Studentlitteratur ° .8.

.

Appendix A: Introduction to matrix algebra Some basic definitions Definition: A vector is an ordered set of numbers. µ ¶ 1 2 4 Example: A = is a matrix with two rows and three columns. Both matrices and vectors are written in bold.  is a column vector with n elements. The general element of the matrix B is bij . 1 6 3   b11 b12 b1c  b21 b22    Example: B =  ..  br1 br2 brc columns. Each number has a given position.  yn Definition: A matrix is a two-dimensional (rectangular) ordered set of num- bers.  . while matrices are often written using uppercase letters like A.  is a matrix with r rows and c  . 8   y1  y2    Example: y =  . The first index de- notes row. 179 . the second index denotes column.   5 Example: x =  3  is a column vector with 3 elements.. Vectors are often written using lowercase symbols like x.

Example: The transpose of the matrix µ ¶ 1 2 4 A= 1 6 3 is   1 1 A0 =  2 6  .e.. is a matrix that has dimension 1 × 1. i.. r = c) is a square matrix. 0 For the elements aij of A0 it holds that 0 aij = aji   x1  x2  ¡ ¢   If x =  . a number.  xn vector with n elements. c Studentlitteratur ° .180 The dimension of a matrix The dimension of a matrix Definition: A matrix that has r rows and c columns is said to have dimension r × c. Definition: A row matrix with m columns has dimension 1 × m. If A is a matrix of dimension r × c then the transpose of A is a matrix of dimension c × r.e.  is a column vector. Definition: A column matrix with n rows has dimension n × 1. then x0 = x1 x2 . 4 3 Some special types of matrices Definition: A matrix where the number of rows = number of columns (i. xn is a row  . The transpose operator is denoted with a prime. Definition: A scalar. 0 . The transpose of a matrix Transposing a matrix means to interchange rows and columns. so the transpose of A is denoted with A0 (Some textbooks indicate a transpose by using the letter T)..

 . Definition: Equality: Two matrices A and B with the same dimension r × c are equal if and only if aij = bij for all i and j. −1 2 4 Definition: The elements aii in a square matrix are called the diagonal elements. . Definition: An identity matrix I is a symmetric matrix  where all elements  1 0 0  0 1 0    are 0. 5 8 4 c Studentlitteratur ° . µ ¶ µ ¶ 1 2 4 3 9 6 Example: If A = and B = then 1 6 3 4 2 1 µ ¶ 4 11 10 A+B= . i. subtraction and multiplication can be defined for matrices. except a1 0 0  0 a2 0    for the diagonal elements: D (ai ) =  .. Definition: Addition: The sum of two matrices A and B that have the same dimension is the matrix that consists of the sum of the elements of A and B. if all elements are equal. Calculations on matrices Addition. except that the diagonal elements are 1: I =  .  ..  0 0 ar   1  1    Definition: A unit vector is a vector where all elements are 1: 1 =  .  1 ¡ ¢ The transpose is 10 = 1 1 ··· 1 ..   .  0 0 1 Definition: A diagonal matrix is amatrix where all elements  are 0.e.   3 0 −1 Example: The matrix A =  0 1 2  is square and symmetric.Appendix A: Introduction to matrix algebra 181 Definition: A square matrix that is unchanged when transposed is symmet- ric. .

The elements of C are calculated as r X cij = aik bkj . If A has dimension p × r and B has dimension r × q then the product A · B will have dimension p × q.   k · ar1 k · ar2 k · arc Multiplication by a matrix Matrix multiplication of type C = A · B is defined only if the number of columns in A is equal to the number of rows in B.182 Matrix multiplication     a11 a12 a1c b11 b12 b1c  a21 a22   b21 b22  Example: If A =    and B =     then  ar1 ar2 arc br1 br2 brc   a11 + b11 a12 + b12 a1c + b1c  a21 + b21 a22 + b22  A + B =  . Matrix multiplication Multiplication by a scalar To multiply a matrix A by a scalar (= a number) c means that all elements in A are multiplied by c. k=1 c Studentlitteratur ° . µ ¶ µ ¶ 1 2 4 4 8 16 Example: If A = then 4 · A = 1 6 3 4 24 12   k · a11 k · a12 k · a1c  k · a21 k · a22  Example: k · A =  .  ar1 + br1 ar2 + br2 arc + brc Definition: Subtraction: Matrix subtraction is defined in an analogous way. It holds that A+B = B+A A + (B + C) = (A + B) + C A − (B − C) = (A − B) + C For matrices that do not have the same dimensions. addition and subtraction are not defined.

1 0. Note that in general.Appendix A: Introduction to matrix algebra 183   µ ¶ 6 5 4 1 2 3 Example: If A = and B = −1 1 −1  then AB = −1 0 1 µ 0 2 0 ¶ 1 · 6 + 2 · (−1) + 3 · 0 1·5+2·1+3·2 1 · 4 + 2 · (−1) + 3 · 0 = µ −1 · 6 + 0 · (−1) ¶ + 1 · 0 −1 · 5 + 0 · 1 + 1 · 2 −1 · 4 + 0 · (−1) + 1 · 0 4 13 2 . µ ¶ 5 10 Example: The inverse of the matrix A = is 3 2 µ ¶ −1 −0. The inverse of a matrix Definition: The inverse of a square matrix A is the unique matrix A−1 for which it holds that AA−1 = A−1 A = I. In the expression BA the matrix A has been pre-multiplied with 0 the matrix B. In the expression AB the matrix A has been post-multiplied with the matrix B. AB 6= BA. −6 −3 −4 Calculation rules of multiplication It holds that A (B + C) = A · B + A · C A (B · C) = (A · B) ·C.25 c Studentlitteratur ° . (Note that the same rule holds for scalars: 3 · 3−1 = 3 · 13 = 33 = 1). Note that (AB) = B0 A0 . That is: the matrix multiplied with its inverse results in the unit matrix.15 −0.5 A = . The order has importance for multiplica- tion. Idempotent matrices Definition: A matrix A is idempotent if A · A = A. 0.

25 µ ¶ 5 · (−0. The rank of a matrix Definition: Two vectors are linearly dependent if the elements of one vector are proportional to the elements of the other vector.0 It is possible that the inverse A−1 does not exist.15 3 · 0. The following relations hold for inverses: The inverse of a symmetric matrix is symmetric −1 ¡ ¢0 (A0 ) = A−1 . ¡ ¢ ¡ ¢ Example: If x0 = 1 0 1 and y0 = 4 0 4 then the vectors x and y are linearly dependent.15 −0.5 + 10 · (−0. c Generalized inverses A matrix B is said to be a generalized inverse of the matrix A if ABA = A.184 Generalized inverses To verify this we calculate µ ¶µ ¶ −1 5 10 −0.5 + 2 · (−0. A− is not unique. If A is nonsingular then A− = A−1 .25) µ ¶ 1. A is then said to be singular.25) = 3 · (−0. The inverse of a product of several matrices is obtained by taking the product of the inverses. then 1 (cA)−1 = A−1 . When A is singular.0 0 = = I. If c is a scalar diﬀerent from zero.1) + 2 · 0.1 0. in opposite order: (ABC)−1 = C−1 B−1 A−1 . A generalized inverse of a matrix A can be calculated as −1 A− = (A0 A) A0 .15 5 · 0.5 A·A = 3 2 0. c Studentlitteratur ° . The generalized inverse of a matrix A is denoted with A− .1) + 10 · 0. 0 1.

#π (n) de- notes the number of inversions of a permutation π (n). which is the sum of the diagonal elements of A. but for larger matrices we prefer to leave the work to computers. These are solutions to the equation |A−λI| = 0. The rank of A0 A is equal to the rank of A (It is also true that the rank of AA0 is equal to the rank of A). The sum of all eigenvalues of A is equal to tr (A). The eigenvalues have the following properties: The product of all eigenvalues of A is equal to |A|. Eigenvalues and eigenvectors To each symmetric square matrix A of dimension n × n belongs n scalars that are called the eigenvalues of A. . If A is singular. . Determinants of small matrices can be calculated by hand. The symbol tr (A) can be read as ”the trace of A”. This is the number of exchanges of pairs of the numbers in π (n) that are needed to bring them back into natural order. u0 = 0 1 0 and v0 = 0 0 1 are linearly independent.i . Determinants To each square matrix A belongs a unique scalar that is called the deter- minant of A. c Studentlitteratur ° . 2. The following properties hold for the rank of a matrix: The rank of A−1 is equal to the rank of A. then the determinant |A| = 0. Definition: The degree of linear independence among a set of vectors is called the rank of the matrix that is composed by the vectors. The determinant of P Q n a matrix of dimension n can be calculated as |A| = (−1)#(π(n)) aπi .Appendix A: Introduction to matrix algebra 185 Definition: A set of vectors are linearly independent if it is impossible to write any one of the vectors as a linear combination of the others. . 0 ¡ ¢ ¡ ¢ ¡Example: ¢The vectors t = 1 0 0 . n. i=1 Here.or postmultiplied with a nonsingular matrix. The determinant of A is written as |A|. The rank of a matrix A does not change if A is pre. π (n) denotes any permutation of the numbers 1.

 i=1 yn n X 0 −1 1y= yi 10 1 =n 10 yn−1 = (10 1) 10 y = y i=1 Further reading This chapter has only given a very brief and sketchy introduction to matrix algebra.. xn  . A more complete treatment can be found in textbooks such as Searle (1982).. xn  ..  i=1 i xn   y1 0 ¡ ¢  y2  X  n xy= x1 x2 . = xi yi  . c Studentlitteratur ° ..186 Some statistical formulas on matrix form Some statistical formulas on matrix form   x1 ¡ ¢  x2  X  n x0 x = x1 x2 . = x2  ...

Appendix B: Inference using likelihood methods The likelihood function Suppose that we want to estimate the (single) parameter θ in some distri- bution. we use the term ”density function” whether x is continuous or discrete. . the log function is monotone which means that L and l = log (L) have their maxima for the same param- eter values. θ) (B. if we take logs. The likelihood function of our sample is defined as n Y L= f (xi . Secondly. xn . We take a random sample of¡ size n from the distribution ¢ and end up with the observation vector x0 = x1 x2 . In either case. given the value of θ. maximizing the likelihood (B. In many cases it is more convenient to work with the log of the likelihood function. For continuous distributions we use the term ”likelihood” rather than probability since the probability of obtaining any specified value of x is zero.1) is equivalent to maximizing the log likelihood n X l = log (L) = log (f (xi . Thus. L is the probability of obtaining our sample. Thirdly.1) i=1 For discrete distributions. L indicates how likely our sample is. First. We assume that the distribution has some density function f (x. There are three reasons for this. θ).2) i=1 187 . θ)) (B. we will replace the product sign with a summation sign which makes derivations somewhat easier. The Maximum Likelihood estimator of θ is the value b θ which maximizes the likelihood function L. the behavior of L can often be such that it is diﬃcult numerically to find the maximum. . This seems intuitively sensible: we choose as our estimator the value of θ for which our sample of observations is most likely.

The asymptotic eﬃciency means that the variance of ML estimators ap- proaches the Cramer-Rao lower bound as n increases. The connection between variance and information is that an estimator that has small variance gives us more information about the parameter.5) dθ dθ I−1 θ is called the Cramér-Rao lower bound. I−1 .4) where "µ ¶2 # "µ ¶2 # d log (L (θ. in large samples. This gives the so called score equation µn ¶ P d log (f (xi . Iθ is called the Fisher infor- mation about θ. Maximum Likelihood estimators are consistent. θ c Studentlitteratur ° .188 The Cramér-Rao inequality with respect to θ. (B. we can regard b θ as normally distributed with mean θ and variance I−1 θ : ¡ ¢ b θ ∼ N θ. x)) dl Iθ = E =E (B. θ) The Cramér-Rao inequality We state without proof the following theorem: The variance of any unbiased estimator of θ must follow the Cramér-Rao inequality ³ ´ V ar b θ ≥ I−1 θ (B. Properties of Maximum Likelihood estimators The following properties of Maximum Likelihood estimators hold under fairly weak regularity conditions: Maximum Likelihood estimators can be biased or unbiased. Maximum Likelihood estimators are asymptotically normally distributed. Maximum Likelihood estimators are asymptotically eﬃcient. θ) = = = 0. This means that. This is done by diﬀerentiating (B.3) dθ dθ i=1 f (xi . θ)) Xn dl i=1 f 0 (xi .2) with respect to θ.

the (j.Appendix B: Inference using likelihood methods 189 Distributions with many parameters So far. k):th element ·µ ¶µ ¶¸ ∂l ∂l Ij. we have discussed Maximum Likelihood estimation of a single pa- rameter θ. c Studentlitteratur ° . θ)) ∂l i=1 = =0 (B. but not always.k = E (B. the expressions we have given so far must be written as vectors and matrices. θ)) . and denote the matrix of second derivatives with H (θ). The matrix H is known as the Hessian matrix. Numerical procedures have been developed that mostly. If we have an observation vector x of dimension n · 1 and a parameter vector θ of dimension p · 1 the log likelihood equation can be written as n X l = log (L) = log (f (xi . In the case where the distribution has. k):th element of H is ∂ 2 l/∂θ j ∂θ k . Thus.8) ∂θj ∂θk The Maximum likelihood estimator θ b of the parameter vector θ is asymptot- ically multivariate Normal with mean θ and covariance matrix I−1θ . x). The set of p score equations is µn ¶ P ∂ log (f (xi .6) i=1 l should be maximized with respect to all elements of θ. Two commonly used procedures are the Newton-Raphson method and Fisher’s method of scoring. (B.7) ∂θj ∂θ j The asymptotic covariance matrix of θ is the inverse of the Fisher information matrix Iθ that has as its (j. Numerical procedures For complex distributions the score equations may be diﬃcult to solve an- alytically. Denote the vector of first derivatives of the log likelihood with respect to the elements of θ with g (θ). The Newton-Raphson method We wish to maximize the log likelihood l (θ. converge to the solution. say. p parameters.

the Newton-Raphson method and Fisher’s scoring method are equivalent. θ2 .190 Numerical procedures b The method works by a Suppose that we guess an initial estimate θ0 of θ. to calculate the expected Hessian we do not need to evaluate the second ∂l order derivatives. for distributions in the exponential family. First. the method is also called Iteratively reweighted least squares.9).10) ∂θ j ∂θk ∂θj ∂θk Thus. It holds that E [H (θ)] = −Iθ . the Fisher information matrix. On the other hand. In the generalized linear model context.9) We can now substitute θ1 for θ 0 in (B. Fisher’s scoring method often converges more slowly than the Newton-Raphson method. Fisher’s scoring Fisher’s scoring method is a variation of the Newton-Raphson method. as a kind of weighted least squares procedure. it can be shown that µ ¶ ·µ ¶µ ¶¸ ∂2l ∂l ∂l E = −E (B. and so on until the process has converged. The basic idea is to replace the Hessian matrix H with its expected value. (B. c Studentlitteratur ° . this leads to a new approximation Since g θ θ 1 = θ0 − g (θ0 ) H−1 (θ 0 ) . Fisher’s scoring method can be regarded. b Taylor series expansion of g (θ) around θ: ³ ´ ³ ´ g θb = g (θ0 ) + θ b − θ 0 H (θ 0 ) . at each step. There are two advantages to using the expected Hessian rather than the Hessian itself. A second advantage is that the expected Hessian is guaranteed to be positive definite so some non-convergence problems with the Newton-Raphson method do not occur. We get a series of estimates θ1 . ³ ´ b = 0. However. it suﬃces to calculate the first-order derivatives of type ∂θ j .

and Lerman.. (eds): Second inter- national symposium on inference theory. 383-430. Hotelling’s paper. American J. A. New York. 191 . 267-281. [11] Box. J. A. [8] Anscombe. and Csàki.Bibliography [1] Aanes. M. (1996): An introduction to categorical data analysis. (1984): Analysis of ordered categorical data. E. H. 15. JASA. B. D. (1998): Encyclopedia of Biostatistics. and Clayton. Wiley. Wiley. pp. 332-339. R. Roy. J. A. Soc. Wiley. (1985): Discrete choice analysis: The- ory and application to travel demand. M. [9] Armitage. N. North-Holland. 9-25. G. [3] Agresti. W. 22. New York. N. Stat. F. and Cox. Stat. (1980): Discrete statistical models with social science applications. R. [10] Ben-Akiva. F. WI- ley. [2] Agresti. G. B. MIT press. Budapest. (1973): Information theory and an extension of the maximum likelihood principle. Amsterdam. R. 36. (1964): An analysis of transformations. (1993): Approximate inference in generalized linear mixed models. 143. (1990): Categorical data analysis. D. E. 88. Akadèmiai Kiadó. Cambridge. (1987): Modelling variance heterogeneity in normal regression using GLIM. Chich- ester. [4] Agresti. (1953): Contribution to the discussion of H. [6] Akaike. T. S. 229-30. A. P. 47-52. and Colton. (1961): Pingue (Hymenoxys richardsonii) poisoning in sheep. New York. A. Applied statistics. of Veterinary Research. J. B. In: Petrov. P. Roy. [12] Breslow. [7] Andersen. [5] Aitkin. Soc.

J. and Zeger. Sage publications. and Lewis. [16] Collett. 2001): Multivariate statistical modeling based on generalized linear models. 3rd Ed. F. (1966): The statistical analysis of series of events. R. Berard. In: Biostatis- tics Casebook. Miller. K. Biometrics. (1994. C. S. D. [18] Cox. 1060-1068. D. [28] Gill. (1972): Regression models and life tables. London. P. Wiley. (1987): Applied categorical data analysis. 34. Soc. D. W. L. New York. Cancer. R. (2000): Generalized linear models: a unified approach. 2nd ed. 187-220. A. New York. J. N. Efron. (1998): Applied regression analysis. J. et al (1976): Comparison of in- tensive versus moderate chemotherapy of lymphocytic lymphomas: a progress report. sec- ond edition.. E. S. and Snell. (1989): The analysis of binary data. 100. 38. E. H.. [21] Dobson. D. R. B. J. [15] Cicirelli. B. [25] Feigl. Wiley. M. A. [20] Diggle. D. Eds. Chapman & Hall. New York. W. design and regression. E. Cambridge University Press. London: Chapman & Hall/CRC Press. (1983): Internal pH of Xenopus oocytes: a study of the mechanism and role of pH changes during meotic maturation. Robinson. D. 1952): Probit analysis. A statistical treatment of the sigmoid response curve. Chapman and Hall. Chapman & Hall. [23] Ezdinli. J. G. Stat. and Zelen. Roy. Marcel Dekker. c Studentlitteratur ° . [24] Fahrmeir. [27] Freeman. P. [19] Cox. L. R.. B. J. D. R. Cambridge. London. (1996): Analysis of variance. Developmental Biology. (2002): An introduction to generalized linear models. [22] Draper. and Smith. Chapman and Hall. New York. (1994): Analysis of longitu- dinal data.192 Bibliography [13] Brown. Moses. D. and Smith. H. L. London. B. (1991): Modelling binary data. Oxford: Clarendon press. Lon- don. and Tutz. and Laughton. [17] Cox. R. [14] Christensen. Y. (1947. [26] Finney. K.: (1980): Prediction analysis for binary data. J. New York.. 133-146. Springer. W. Brown and L. 21. M. C. R. (1965): Estimation of exponential survival prob- abilities with concomitant information. Liang. Pocock. P. 826-838.

L. London. and Magath. 73. Trans. New York. pp. [42] Liang. (1945): The determina- tion of prothrombin time following the administration of dicumarol with specific reference to thromboplastin. [35] Hutcheson. Biometrika. S. 874. K. J. New York. Lab.) (1993): The GLIM system manual. Clarendon press. New York. (1986): Longitudinal data analysis using generalized linear models. W. [33] Hosmer. In: Biopharmaceutical statistics for drug development. London. B. M. 872-880. 134-146. E. Peace.. F.. Chapman and Hall. [38] Klein.. J. Stenlid. W. S. K-Y. 40. and Payne. London. Vol. and Edwards. W.: Design and analysis of cross-over trials. J. Barker. [43] Lindahl. Wiley. S. Clin. B.. S. 50. [36] Jones. and Ostrowski. (1978): Analysis of qualitative data. 432-447. (1988): Clinical eﬃciacy trials with cate- gorical data. [32] Horowitz. (1999): Translocation of 32 P between interacting mycelia of a wood-decomposing fungus and ectomycorrhizal fungi in microcosm systems. New Phytol. J. Sage publications. Med. and Lemeshow. McConway. [40] Leemis. E. [30] Haberman. 1: Introductory topics. K. M. G. Research Record. [31] Hand. K-Y and Hanfelt. and Finlay. 183-193. G. D. New York. Lunn. 403-451. Chapman & Hall. J.. c Studentlitteratur ° . D. 127-162. B. R. G. B49. American Statistician. 19-25. J. B. (1989): Applied logistic regression.Bibliography 193 [29] Francis. L. New York. Journal of the Royal Statistical Society. D. (1982): An evaluation of the usefulness of two standard goodness-of-fit indicators for comparing non-nested random utility mod- els. (1986): Relationships among common univariate distribu- tions. and Kenward. J. [39] Koch. and Moeschberger. 144. Release 4. (1999): Introductory statistics using Generalized Lin- ear Models.. M. [41] Liang. Biometrics. S. Academic Press. (1994): On the use of the quasi-likelihood method in teratological experiments. 13-22. M. Springer.. (1997): Survival analysis: techniques for censored and truncated data. A. 30. N. Green. Olsson.. and Zeger. Marcel Dekker. (1994): A handbook of small data sets. D. M. ed.. (Eds. [34] Hurn. D. [37] Jørgensen. (1987): Exponential dispersion models. T. C. Daly. G.

162. (1998): Minitab User’s Guide. Orav.an application of capture-recapture methodology. G. 33. 11. [55] Ninkovic. London. K. British Medical Journal. 59-67. Olsson U. and Nelder. A.. R.. Entomologia Experimentalis et Applicata. U. [52] Nagelkerke. C.. Nygren.. [57] Olsson. Stroup. P. Mixing barley culti- vars aﬀect aphid host plant acceptance in field experiments. C. in press. [45] Lipsitz. A. D. H. SAS Institute Inc. [56] Norton. and Stenberg A. New York. U. E.: Eﬀects of contrast media and Mannitol on renal medullary blood flow and renal blood cell aggregation in the rat kidney. (2002). Journal of Urology. Wiley. W. P. M. Ulfendahl. Olsson. P. 1999. Tuvemo T.. Cary. c Studentlitteratur ° . JRSS (B). and Dunn.. (1971): Discussion on papers by Wynn. G. J. U. Minitab Inc. Annals of Statistics. Chapman and Hall. [53] Nelder. Olsson. 1268-1275. Biometrika. M. [51] Montgomery. (1996): SAS system for mixed models. [47] Littell. O´Neill and Wetherill.. Biometrics. Bloomfield. 78. [46] Liss. V. [50] Minitab Inc. J. N. 49. (1994): Performance of generalized estimating equations in practical situations. 291. State College. [54] Neveus T. N. (1997): Applying generalized linear models. 244-246. Report 55.. (1991): A note on a general definition of the coeﬃ- cient of determination. and Wolfinger. 630-632. R. and Eriksson. New York. (1984): Design and analysis of experiments. Läckgren G. Springer. [49] McCullagh.. (1989): Generalized Linear Models. J. D. (1985): Snoring as a risk factor for disease. (2000): Estimation of the number of drug addicts in Sweden .194 Bibliography [44] Lindsey. C. Release 12.. A. V. Kidney International. J. Fitzmaurice. H. U. R. P. Department of Statistics. E.. 270-278. N. Milliken. J. 2136. Swedish University of Agricultural Sciences. S. W.. 1996. [48] McCullagh. and Laird. and Pettersson. J. 691-692. (1983): Quasi-likelihood functions. 50. G. (1999): Desmopressin-resistant Enuresis: Pathogenetic and Therapeutic Consid- erations. D.

F. V. Stockholm: Socialdepartementet (Ds S 1980:5). 439-477. Report 54. E. M.: Matrix Algebra Useful for Statistics. (1990): Regression analysis. Biometrika. Rapport från utredningen om narkotikamissbrukets omfattning (UNO). 31. S.Bibliography 195 [58] Olsson. Am J. (1982): Extra-binomial variation in linear logistic mod- els. Cary. [72] Williams. 570-578. and Shapiro. Gossett) (1907): On error of counting with an haemocy- tometer. S. (1984): Eﬀects of Toulene inhalation on brain biogenic amines in the rat. 128. G. [62] SAS Institute Inc. (1999): Statistics for the life sciences. 5. 31. T.a comprehensive survey. [69] Sokal.en totalundersökning 1979. [64] SAS Institute Inc. W. J. c Studentlitteratur ° . T. M. J.. 144-148. methods and applications. (1980): Statistical methods.. 61. SAS Institute Inc. (1988): Coﬀee drinking and nonfatal myocaridal infarction in men under 55 years of age. New York. [65] Searle. W. R. generalized linear models and the Gauss-Newton method. Iowa. and Rohlf. M. A. R. Swedish University of Agricultural Sciences. Freeman. J. Cary. in Swedish). 143-150. NC. Biometrika. G. (1997): SAS/Stat software: Changes and enhance- ments through release 6. U. S. Palmer. NC.. W. [71] Wedderburn. (1973): Introduction to biostatistics. San Fransisco. Department of Statistics. and Neveus. (Heavy drug use . (1974): Quasi-likelihood function. A.. [68] Socialdepartementet: Tungt narkotikamissbruk . SAS Institute Inc. Ames. Wiley. NJ: Prentice-Hall. 351-360. D. [66] Sen. SAS Institute Inc. [59] Rea. [67] Snedecor. W. Applied Statistics. (2000): Generalized Linear Mixed Models used for Evaluating Enuresis Therapy. G. D.12.. [70] Student (W. (2000b): SAS/Stat user’s guide. R. The Iowa State University Press. and Kessler. (2000a): JMP software. and Witmer. L. 7th ed. Kelly. A. R.. and Srivastava. 1982. Born. Theory. P. M. [60] Rosenberg. S. NC. F. [61] Samuels. W. J. Cary. Nash. Toxicology. Kaufman. [63] SAS Institute Inc. New York. Upper Saddle River. Version 8. J. Springer. R. Epidemiol. J. and Cochran. version 4.. Zabik.

Statist. S. Bjarnason. Comput. R. and O’Connell..196 Bibliography [73] Wolfinger. M. Plant and Soil. [74] Zagal. 48. (1993): Generalized linear models: a pseudo-likelihood approach. U. J. 51-63. E. 233-243. (1993): Carbon and nitrogen in the root-zone of Barley supplied with nitrogen fertilizer at two rates.. c Studentlitteratur ° . 157. and Olsson. Simul.

43080000 197 .39557500 340.00158621 18. using the GLM procedure of the SAS package. The Anova table is Dependent Variable: leucine Leucine level Sum of Source DF Squares Mean Square F Value Pr > F Model 1 2.0001 B. C.02925t.1 ¡ ¢ A.83 0.0292500000 0.0475000000 0.0001 Error 5 0. The model can be written as yi = α + βti + ei . ei ∼ N 0.00704500 Corrected Total 6 2.44 <.05719172 -0.04 <.4441 time 0. The estimated regression equation is yb = −0.03522500 0. A regression analysis.39557500 2.Solutions to the exercises Exercise 1. gives the following results: Standard Parameter Estimate Error t Value Pr > |t| Intercept -.0475 + 0. σ 2 . A plot of the data and the regression line is as follows.

0271). We wishPto test the null hypothesis that there are no group diﬀerences. The increase is nearly linear.8460952 4.3306 9. This is a one-way Anova model: yij = µ + αi + eij .44 0.e.0271 Error 18 1614. in the studied time range.469299 9.709524 R-Square Coeff Var Root MSE cortisol Mean 0.667630 Corrected Total 20 2409.2 ¡ ¢ A. eij ∼ N 0.9244818 b 10 8.7200000 19.1800000 3.7891072 c 5 19.9666667 0.692190 397. The Anova table produced by Proc GLM is as follows: Dependent Variable: cortisol Sum of Source DF Squares Mean Square F Value Pr > F Model 2 795.44 0.017333 89.0271 The results suggest that there are significant diﬀerences between the groups (p = 0.846095 4. To study these diﬀerences we prepare a table of mean values. σ2 . i.330203 100.6921905 397.2388149 c Studentlitteratur ° . and a box plot: The GLM Procedure Level of -----------cortisol---------- group N Mean Std Dev a 6 2.438095 Source DF Type III SS Mean Square F Value Pr > F group 2 795. Exercise 1. H0 : ni α2i = 0.198 It can be concluded that the leucine level increases with time.

767421 26.0001 Error 21 549.614389 79.843235 18. Exercise 1.154046 Corrected Total 23 2899.Solutions to the exercises 199 B.659255 R-Square Coeff Var Root MSE co2 Mean 0.0001 days*level 1 94. The corresponding GLM outputs are presented below.566014 22. We want to compare two competing models: i) yijk = µ + αi + βxj + eijk . Model i) Dependent Variable: co2 Sum of Source DF Squares Mean Square F Value Pr > F Model 2 2350.659255 R-Square Coeff Var Root MSE co2 Mean 0.1626 days 1 2085.424299 1175. Equal slopes (no interaction) ii) yijk = µ + αi + βxj + (αβ)ij xj + eijk .031080 35.728301 Corrected Total 23 2899.18513 Source DF Type III SS Mean Square F Value Pr > F level 1 47.0001 Model ii) Dependent Variable: co2 Sum of Source DF Squares Mean Square F Value Pr > F Model 3 2445.614389 2085. the analysis presented above may not be the optimal one. Since the model assumes that the population variances are equal.093241 815.785031 2. Diﬀerent slopes (interaction exists).668942 94.17 0.212150 44.3 A.53056 5.809910 10.0547 c Studentlitteratur ° .10 0.614389 2085.93 <.234956 26.0045 days 1 2085.809910 264.86 <.614389 91.76 <. The box plot suggests that one or two observations may be outliers.13 0.668942 4.810586 19.20660 4.18513 Source DF Type III SS Mean Square F Value Pr > F level 1 264. The sample standard deviations are rather diﬀerent in the diﬀerent groups.785031 47.0001 Error 20 454.114103 26.74 <.

45 0.0002 level High 17.36658913 -2.11095713 B 11. C. I would use Model 2 for interpretation and plotting.25921766 8.00000000 B . days 2.429. .00000000 B . the predicted value for High nitrogen level is −38.7482 · 35 = 27. see the discussion on model building strategy in the text.1180+17.11803843 B 8.80081087 1. a regression output is as follows: c Studentlitteratur ° .1110+ 2.57 0. . Thus.74816925 B 0.34443339 -4.0547 days*level Low 0. . After taking logs of the count variable. There are strongly significant eﬀects of time and of nitrogen level.1180 + 2.4 A.0001 days*level High -0. Exercise 1. for model building purposes. I would prefer to retain the interaction term in the model. Still.353. indicates that the increase of CO2 emission may be somewhat faster for the low nitrogen treatment.200 The test of parallelism is not significant (p = 0.1299 · 35 = 36.12991722 B 0. .04 0. Thus. Estimates of model parameters for model ii) are as follows: Standard Parameter Estimate Error t Value Pr > |t| Intercept -38.1299 · 35 − 0. B.22 <.1626 level Low 0. A graph of the model that does not assume parallel regression lines is: D. The predicted value for Low level is similarly −38. The interaction.0547). although not formally significant.

we get: y = Ae−Bx · ².Solutions to the exercises 201 Dependent Variable: logcount Sum of Source DF Squares Mean Square F Value Pr > F Model 1 5. If we use θ = −λ.02563161 0.050328398 0.55 <.72476387 R-Square Coeff Var Root MSE logcount Mean 0.04 0.07159833 77. then b (θ) = log (−θ).092433 4.04 0.1 We write the density as f (x) = λe−λx = elog λ−λx which is an ex- ponential family distribution.0001 Error 3 0.69913226 667.00854387 Corrected Total 4 5. The corresponding graph is: Exercise 2. C.83 0. We prefer to make the graph on the original scale. after taking logs.0001 Standard Parameter Estimate Error t Value Pr > |t| Intercept 5. Thus.286363 0. This gives.0001 minutes -0. If we assume that there is a multiplicative residual in the original model.69913226 667.042798 Source DF Type III SS Mean Square F Value Pr > F minutes 1 5. c Studentlitteratur ° .995523 2. we calculate predicted values and take the anti-logs of these.552649942 0.00194866 -25. There is a strong relationship between log(count) and time.0001 B.69913226 5.69913226 5. log (y) = log (A) − Bx + e (where e = log ²) which is a linear model.

2 A.0956 Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 0.07 0. Exercise 2. Since we have no other information about the data.0647 0. However. we get b (θ) = − exp (θ) − θ ln 1 − e−e .0001 0. c Studentlitteratur ° .0006 0.0291 0.8557 2.0008 0. b (θ) = −λ − ln 1 − e−λ and c (y.0239 -0. φ) = − ln (yi !).0154 2.4560 weight 1 0.0178 0. and c (y.2189 Pearson Chi-Square 11 4.3923 Scaled Pearson X2 11 22.1655 A graph of the data and the fitted line may explain why: one ob- servation has an unusually long survival time. We write the distribution as −λ e−λ λyi (1−e−λ )yi ! = = e[−λ+yi ln λ−ln(1−e )−ln(yi !)] which is an e−λ eyi ln λ −λ eln(1−e ) eln(yi !) ¡ ¢ Exponential family with θ = ln λ. For¡the ¢ variance function. Exercise 2. we find that d b0 = dθ (log (−θ)) = 1θ and b00 = dθ d 1 θ = − θ 1 2 .202 a (φ) = 1.5315 0. B. The derivatives of b with respect to θ are: ³ ³ ´´ d θ eθ b0 = dθ − exp (θ) − ln 1 − e−e = −1+e−eθ ³ ´ θ θ d eθ θ −eθ−e −e2θ−e b00 = dθ −1+e−eθ = −e 2 which is the required variance (−1+e−eθ ) function. φ) = 0.2301 Scaled Deviance 11 13.3154 0.0004 -0.7878 Scale 1 5. deletion of this observation cannot be justi- fied.4077 1. One of the best models is probably the one with a Gamma distribution and an inverse link.3 It is not easy to find a well-fitting model for these data.0778 Log Likelihood -55.5123 11.56 0. most models we have tried do not indicate any significant relation between weight and survival: Criterion DF Value Value/DF Deviance 11 2. a (φ) = 1. but other models might also be considered.2964 2. If ³we insert ´λ = eθ into the expression for b (·).

0025 4 30 0.0050 3 20 0.07 1.74 1.0475 0.25 0.4150 0.0325 B.0850 7 60 1.8300 -0.1 A.1225 -0.0675 2 10 0.1400 5 40 1. but this is diﬃcult to see in such a small data set.2450 0.0525 6 50 1.50 1.02 -0.54 0.7075 0.69 0. Data. c Studentlitteratur ° .Solutions to the exercises 203 Exercise 3. predicted values and residuals are: Obs time leucine pred res 1 0 0.5375 0. A plot of residuals against fitted values indicates no serious deviations from homoscedasticity.

and D. None of the observations have a “Hat Diag” value above this limit.9014 0.2244 -1.3026 -0. The influence diagnostics can be obtained from Proc Reg: Hat Diag Cov ------DFBETAS----- Obs Residual RStudent H Ratio DFFITS Intercept time 1 0. and leverage values (diagonal elements from the Hat matrix) are as follows: c Studentlitteratur ° .3772 Since there are n = 7 observations and p = 2 parameters the average leverage is 2/7 = 0.0391 -0.286.1429 0.1116 0.5570 -0.0850 1.4643 1.0504 -0.6490 0.8740 2 0.4534 -0.4870 0. Data.0061 4 -0.2 A.0399 0.0525 -0.0000 5 -0.1400 -2.0119 -0.0504 1.0325 0.1744 0.0137 0.0282 3 0.1284 0.1832 0.002500 0.1574 0.2694 0.0675 1.5677 7 0.0631 0. Exercise 3.005000 0.1786 1.1353 6 0.6161 0. The rule of thumb that an observation is influ- ential if h > 2 · p/n would suggest that observations with h > 0.2857 2.6783 1.0375 -0.571 are influential.8028 -0.5993 0.1786 1.204 C.7205 0.4643 2.0294 0. The Normal probability plot was obtained using Proc Univariate in SAS: D.2857 1.1106 -0. predicted values.

4291 -3.5544 0.09239 9 H 30 18.11456 15 H 35 20.730 36.5190 0.6991 0.8188 -1.5991 0.1549 -0.2550 0.8188 2.09239 10 L 30 28.26090 5 L 24 11.19882 24 L 38 45.403 31.11456 17 L 35 34.677 42.9349 0.301 12.4391 0. The Normal probability plot has a slight “bend”: c Studentlitteratur ° .830 36.255 13.594 12.26090 7 H 30 19.8751 0.220 12.09239 8 H 30 31.7377 0.414 25.7795 2.3541 7.26090 6 L 24 10.069 13.5322 0.3541 -6.8188 0.19882 23 L 38 43.4454 -1.4454 10.951 27.6696 0.9310 0.4454 -1.11456 16 L 35 32.186 31.200 25.479 27.26090 3 H 24 11.1549 -3.19882 21 H 38 21.0000 2.862 36.448 42.4993 -10.26090 4 L 24 15.6661 0.1494 0.6292 0.4291 -1.5969 0.19882 B.6345 0.09239 12 L 30 28.0000 -2.09239 13 H 35 25.19882 20 H 38 39.351 42.0963 0.8539 0.26090 2 H 24 12.1418 0.19882 22 L 38 41.3133 0.09239 11 L 30 26.3541 -1. The plot of residuals against fitted values shows no large diﬀerences in variance: C.688 27.11456 18 L 35 35.Solutions to the exercises 205 Obs LEVEL days Co2 pred res hat 1 H 24 8.891 20.5671 0.1549 0.296 20.115 20.765 25.7795 0.9855 0.4993 -0.4291 -0.0000 -1.7795 2.11456 14 H 35 34.481 13.4993 7.237 31.4205 0.11456 19 H 38 31.

12188 B.206 The limit for influential observations is 2 · p/n = 2 · 4/24 = 0.42609 3 61 24 41. The Hat values of all observations are below this limit.31864 11 60 36 41.09831 8 59 44 41. Predicted values and deviance residuals are as follows: Obs weight survival pred res 1 46 44 44.1089 -0.12984 6 75 36 39. Exercise 3.3067 -0.3 A.6013 -0.1089 1.04727 9 64 120 41.4434 -0.2810 -0.6295 -0. one observation stands out as a possible outlier: c Studentlitteratur ° .30216 10 67 29 40.50452 4 75 24 39.13383 13 66 36 40.7112 -0. In the plot of residuals against fitted values.3067 -0.333.9435 0.8060 -0.14589 12 63 36 41.08661 7 71 44 39.01001 2 55 27 42.7691 -0.45590 5 64 36 41.9839 0.

The long-living sheep is an outlier in the Normal probability plot as well: D. The influence of each observation can be obtained via the Insight pro- cedure.Solutions to the exercises 207 C. A plot of hat diagonal values against observation number is as follows: c Studentlitteratur ° .

34 <.57077226 0.868055 18.09 0. The first observation is influential according to this criterion.43855991 72.24008564 Corrected Total 47 65.489985 2.3867 Error 36 8. there is no e against yb.622376 The eﬀects of treatment and of poison are highly significant.1 An analysis on the transformed data using a two-factor model with interaction gives the following edited output: Source DF Squares Mean Square F Value Pr > F treatment 3 20.64308307 0.63 <.41428935 6. The residual plots (b plot) seem to indicate that the data agree fairly well with the assumptions: c Studentlitteratur ° .0001 poison 2 34. Exercise 4.50526450 R-Square Coeff Var Root MSE z Mean 0.68478 0.80476312 28.26179538 1.0001 treatment*poison 6 1. Normal probability significant interaction.308.87711982 17.208 The “influence limit” is 2·p/n = 2·2/13 = 0.

Solutions to the exercises 209 The same model analyzed as a generalized linear model with a Gamma distribution gives the following results: Criterion DF Value Value/DF Deviance 36 1. no significant interaction.0573 LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq treatment 3 43.0001 treatment*poison 6 10. The residual plots for this model are: c Studentlitteratur ° .3422 Pearson Chi-Square 36 1.31 <.0533 Scaled Deviance 36 48.3107 Log Likelihood 50.3179 1.1232 The conclusions are the same: significant eﬀects of treatment and poi- son.0521 Scaled Pearson X2 36 47.1866 1.8755 0.76 <.9205 0.04 0.0001 poison 2 59.

0584 Scaled Deviance 199 210.6205 1.3507 1.0972 Scaled Pearson X2 199 218.3507 1. but the Gamma model seems to produce residuals for which the variance increases slightly with increasing yb.0972 Log Likelihood 98. according to the fit statistics: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 199 210.210 The distribution of the deviance residuals is close to normal. with scale parameter equal to 1.8650 c Studentlitteratur ° . Exercise 4.0584 Pearson Chi-Square 199 218. Such a model fits these data reasonably well.2 The exponential distribution is a special case of the gamma distribution.6205 1.

Solutions to the exercises 211 An easy way of judging the fit to an exponential distribution is to ask Proc Univariate to produce an exponential probability plot: It seems that the data deviate somewhat from an exponential distribu- tion in the upper tail of the distribution. Exercise 5.865 86 0.9834). Species*exposure (p = 0.2108 = 0.0698 91 0.0698 − 32.6625).6052 Treating temperature as numeric costs 31.3279).1 One questions with these data is whether we should include the factors Exposure. which is clearly nonsignificant. Temp*humidity (p = 0.1509 − 31. Temp*exposure (p = 0. Many models for these data that include interactions lead to a Hessian matrix that is not positive definite.3582 Temperature numeric 31. We could use a quadratic term for expo- sure. so this approx- imation is not worthwhile. when we treat Exposure as numeric and linear. most interactions can indeed be estimated. clearly an unsignificant loss. adding Humidity as a nu- meric factor gives 32.3458 on 1 df.919 on 2 df. but we might as well keep it as a class variable. Similarly.2108 − 30. On the other hand.3587 Humidity also numeric 32.1509 = 22. when some factors are included as numeric variables. c Studentlitteratur ° .3612 Also Exposure numeric 55.9401 on 2 df.2108 87 0. we lose 55. for a main eﬀects model: Types of terms Deviance df D/df All “class” 30.9676). However.865 = 0. p-values for two- way interactions are Exposure*Humidity (p = 0. One approach is to compare deviances for the diﬀerent ap- proaches.1509 89 0. Temperature and Humidity as “class” variables or as numeric variables.

1509 0.0000 0. The generalized linear model approach may suﬀer from the fact that 41 of the 96 observations have pb = 0.0784 58. ³p ´ It is interesting to note that an “old-fashioned” Anova on arcsin pb suggests that the interactions species*exposure.9172 Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 5.1616 -1.0001 Exposure 3 1 -0.6733 0.3006).0001 Exposure 3 385.0000 0.95 0.7649 -2. and with humidity and temperature used as numeric variables.1324 -0.38 <. The model with only main eﬀects.9636 3. Residual plots for this model are as follows: c Studentlitteratur ° .2635 -0.67 <. Exposure 1 1 -26. gives the following results: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 89 32.0930 0.0001 Species A 1 -1.00 <.59 <. .3143 Log Likelihood -534.6232 33.00 0.212 Temp*species (p = 0.9434 0.34 <.3143 Scaled Pearson X2 89 27.0000 0.1793 0. .52 <.0001 Scale 0 1. Humidity 1 -0.6038 -0.2988 -3.0000 0.1509 0.2871 0.5937 113.0000 0.9761 0. There does not seem to be any need to include interactions.0000 .1633 -1.0192 0.0001 Temp 1 24.24 <.0001 Species B 0 0.9091).5618 34.0001 Exposure 4 0 0.1054 0.0000 0.10 -63355.0000 LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq Species 1 69.0001 Temp 1 0.9761 0. and Humidity*species (p = 0.0001 The survival is highly related to all four factors.0001 Humidity 1 63.0138 -0.56 <.3612 Scaled Deviance 89 32.3612 Pearson Chi-Square 89 27.6441 32311.7847 7.0555 0.2 63301.1305 23.0000 1.0000 0.9993 Exposure 2 1 -3.0000 1. temp*exposure and humidity*exposure are indeed significant.9703 63.35 <.0000 .44 <.

Solutions to the exercises 213 Leverage diagnostics. One question in this data set is how to model Age. we set this question aside.g. The resulting plot is as follows: c Studentlitteratur ° . Exercise 5.2 The inferential aspects of this exercise are interesting: to which popula- tion could we generalize the results? However. in terms of diagonal elements of the Hat matrix. The relation between age and survival can be explored by plotting proportion survival against sex and age (in 10-year intervals). can be obtained e. from the Insight procedure but are not listed here in order to save space.

and to use a linear age relation for ages above 10 years. This model fits fairly well to the data.214 It seems that survival probability for women is high.8209 0.9850 Scaled Pearson X2 744 732. a model for these data can be written as logit(b p) = β 0 + β 1 · sex + β 2 · pclass +d (β 2 + β 3 · age + β 4 · sex + β 5 · age · sex + β 6 · pclass · sex) .9224 0.8209 0.9612 c Studentlitteratur ° . and a linear change in survival probability (diﬀerent for males and females) for persons above 10 years. One possibility to modeling is to use a dummy variable for children under 10.9224 0. and increases with age. This model assumes a separate survival probability for boys and girls below 10. whereas only the young boys were rescued (“women and children first”).9850 Log Likelihood -309. If the dummy variable for childhood is d.8332 Pearson Chi-Square 744 732.8332 Scaled Deviance 744 619. as judged by Deviance/df: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 744 619.

8722 Log Likelihood -368.00 <. 24% of second class passengers but 55% of third class passengers.3 A binomial model with treatment.0055 d*Age*Sex 1 2.7228 2.94 0. Exercise 5.1601 lnDose 1 287.72 0.0001 Sex 1 0.5940 1. However.20 <.Solutions to the exercises 215 LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq Pclass 2 20. but this time allowing the program to estimate the scale parameter.19 <.0098 This analysis suggests that there is a significant interaction between treatment and ln(dose).25 0.0705 d*Age 1 4. The fact that d*Sex is significant means that there are diﬀerences in survival for males and females above 10 years.07. If we fit the same model. that the slopes may be diﬀerent.4777 d 1 3. i.0657 Scaled Deviance 11 22.50 0.e.5940 1.27 0.0001 lnDose*treatment 2 9. passengers with missing age data have been excluded. the fit of the model is not perfect: Deviance/df=2.47 0. there seems to be a relation between missing age and passenger class: age data are missing for 30% of first class passengers. as well as an interaction between class and sex for persons above 10 years.1158 d*Pclass*Sex 4 33.0262 d*Sex 1 7.66 0. Sex (which actually should be interpreted as sex of a child) is not significant: young boys and girls have similar survival probabilities. However.8722 Scaled Pearson X2 11 20.7886 LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq treatment 2 3. ln(dose) and their interaction as factors produces the following results: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 11 22.0657 Pearson Chi-Square 11 20. In this analysis. so the analysis should be interpreted with care. we get: c Studentlitteratur ° .7228 2.0001 As an interpretation of the parameter estimates: there is a highly signif- icant eﬀect of passenger class.

The condition that model 1 is nested within model 2 is fulfilled.841.77 0. Assumptions: Independent observations. Assumptions: Indepen- dent observations. Our result is clearly non-significant. large sample.828.5177 − 226.1529 4. H0 cannot be rejected. The test statistic is (D1 − D3 ) / (df1 − df3 ) which.1% c Studentlitteratur ° .0784 which should be compared with χ2 on 1 d. is asymptot- ically distributed as χ2 on (df1 − df2 ) degrees of freedom. The 5% limit is 3.635 and the 0.1% limit is 10. under H0 .48 0. The 5% limit is 3.0001 lnDose*treatment 2 11 2. is asymptotically distributed as χ2 on (df1 − df3 ) degrees of freedom. Result: (226. the 1% limit is 6. B.5177 − 216. H0 : β p∗r = 0 H1 : β p∗r 6= 0 can similarly be tested using the deviances. that the slopes may be equal. The condition that model 1 is nested within model 3 is fulfilled.4759) / (8 − 7) = 10.03 <.24 0.216 LR Statistics For Type 3 Analysis Chi- Source Num DF Den DF F Value Pr > F Square Pr > ChiSq treatment 2 11 0. This analysis sug- gests that the interaction is not significant. H0 : β p∗a = 0 against H1 : β p∗a 6= 0 can be tested using the deviances.4 A.841.03 <.f. The observed proportions (on a logit scale) plotted against ln(dose) are: Exercise 5. under H0 .635 and the 0.4120 lnDose 1 11 139. the 1% limit is 6. large sample.f.4395 1. i.4393) / (8 − 7) = 0. The test statistic is (D1 − D2 ) / (df1 − df2 ) which.0001 139.1066 The p-values are rather sensitive to overdispersion.042 which should be compared with χ2 on 1 d.e.89 0. Result: (226.

030 Risk: OR=e−3.086. The four cells with observed count =0 do not contribute to the likelihood. OR of infection=1/11.09 0 0.5 in these cells they are included. The cor- responding odds ratios of being infected are the inverses of these. The remaining cell odds ratios (of no infection) compared to this baseline are: Planned 1 0 Risk Risk Antibio 1 0 1 0 1 4. For the second observation the predictors have the same value but y = 0 so the raw residual is 0 − 0. i. p C.41 0.7172 = 3. Using the inverse logit transformation 1+e x this corre- 3.4991 = 33.9919 = 0.9716 = −0.812 = 0. When p is zero (or one) this is not defined.9716 = 0. Antibio=1. D. The raw residual is y − yb = 1 − 0.0284. Remember that the observations. All odds ratios take one cell as the baseline.0081 and 0 − 0. Our result is significant at the 1% level but not at the 0. This gives: Planned: OR=e−0.5342 e sponds to yb = 1+e 3.e.4991 = 4. One might consider to calculate odds ratios separately for each cell of the 2 · 2 · 2 cross-table.9919 and raw residuals 1 − 0. we get for this observation logit(µ) = 2. In the presence of interactions.7172 = 0.466.4394 = 11. The first data line has Planned=1.8311 = 0.1440 − 0. H0 is rejected.00 E.80 33.812 predicted value yb = 1+e 4. respectively.436 = 2.086 = 0. y = 1 and y = 0 are the only possible values of y. in this example.9717 which is the predicted value.1440−0. The logit link is log (1−p) .294 An- tibio: OR=e3. Risk=0.024 = 41. OR of infection=1/0.Solutions to the exercises 217 limit is 10.667 Planned*Risk: e2.OR of infection=1/33.812 which gives e4.087.1% level. We might use the cell Planned=0.4991− ex 3.02 1. Risk=1 and Infection=1.02 14.12 0.9716.436.9919. with OR=1. raw Odds ratios are not very informa- tive. The third and fourth observations have the same predicted values. ob- tained through logit(µ) = 2.9919 = −0. OR of infection=1/0. c Studentlitteratur ° .8311 + 3. When we replace 0 with 0. The odds ratios of not being infected are calculated as eβ .f.43 0. Antibio=0 as a baseline. are binary. so we get an extra four d.466 = 0.024.5342 = 0. compared with model 3.828.8311+3. Using the parameter estimates.5342.

2. 2. The inverse of the logit link g (p) = log 1−p is g−1 = ep +1 .5 A. 3. This gives the model in matrix terms as y = XB + e. j = 1. 3.218 Note that the counts (Wt) are not the values to predict! Exercise 5. i = 1. The model is g = µ + αi + β j + (αβ)ij + γzijk + (βγ)j zijk + eijk . k = 1. 2. c Studentlitteratur ° .  1 0 1 1 0 0 0 0 0 1 0 0 z10 z10 0 0    1 0 1 1 0 0 0 0 0 1 0 0 z11 z11 0 0     1 0 1 1 0 0 0 0 0 1 0 0 z12 z12 0 0     1 0 1 0 1 0 0 0 0 0 1 0 z13 0 z13 0     1 0 1 0 1 0 0 0 0 0 1 0 z14 0 z14 0     1 0 1 0 1 0 0 0 0 0 1 0 z15 0 z15 0     1 0 1 0 0 1 0 0 0 0 0 1 z16 0 0 z16     1 0 1 0 0 1 0 0 0 0 0 1 z17 0 0 z17   1 0 1 0 0 1 0 0 0 0 0 1 z18 0 0 z18 µ  α1     α2     β1     β2     β3     (αβ)   11   (αβ)  B=  (αβ)  12   13   (αβ)   21   (αβ)   22   (αβ)   23   γ     (βγ)   1   (βγ)  2 (βγ)3 p ep B. where   1 1 0 1 0 0 1 0 0 0 0 0 z1 z1 0 0  1 1 0 1 0 0 1 0 0 0 0 0 z2 z2 0 0     1 1 0 1 0 0 1 0 0 0 0 0 z3 z3 0 0     1 1 0 0 1 0 0 1 0 0 0 0 z4 0 z4 0     1 1 0 0 1 0 0 1 0 0 0 0 z5 0 z5 0     1 1 0 0 1 0 0 1 0 0 0 0 z6 0 z6 0     1 1 0 0 0 1 0 0 1 0 0 0 z7 0 0 z7     1 1 0 0 0 1 0 0 1 0 0 0 z8 0 0 z8     1 1 0 0 0 1 0 0 1 0 0 0 z9 0 0 z9  X=  .

0572 -0.7635 Pearson Chi-Square 3 2.0567 5.2453 0.0513 0. as judged by the deviance/df criterion.2 Two Poisson models were fitted to the data: one with a log link.0030 -0.2906 0.5713 0.0000 A plot of observed counts along with the fitted function indicates a good fit: Exercise 6.0000 1. The model with a log link fitted marginally better.0455 298.7484 Log Likelihood 1911. another with an identity link.0000 1. First the log link results: c Studentlitteratur ° . The parameter estimates are as follows: Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 5.Solutions to the exercises 219 Exercise 6.25 <.2453 0.0001 exposure 1 -0.0001 Scale 0 1.28 <.1 A model with a Poisson distribution and a log link gives the following model information: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 3 2.2906 0.0000 0. as judged by deviance/df.7443 The fit of the model to the data is good.6825 9650.4602 5.7484 Scaled Pearson X2 3 2.7635 Scaled Deviance 3 2.

6745 2.6995 Scaled Deviance 6 4.0070 0.0000 1.81 0.6672 Scaled Deviance 6 4.9505 0.0118 8.6385 Both models show a good fit.1752 0.1567 0.220 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 6 4.6672 Pearson Chi-Square 6 3.0000 1.0030 0.1971 0.0025 0.3685 Scale 0 1.0024 0.1971 0.0028 -0. Plots of residuals vs. fitted values are similar for the two models.1567 0.6995 Pearson Chi-Square 6 4.7354 Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 2.34 0.0033 0.0033 0.6584 Log Likelihood 362.0023 0.6928 Scaled Pearson X2 6 4.0039 mode2 1 0.6584 Scaled Pearson X2 6 3.0000 0.6759 72.2555 1.50 <.0081 0. The normal probability plot is slightly better for the identity link model: c Studentlitteratur ° . slightly better for the log link.9505 0.0001 mode1 1 0.6928 Log Likelihood 362.0000 The model fit for the Identity link model: Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 6 4.

3 Models with a Poisson distribution.Solutions to the exercises 221 In all. β = log (320) − 0 0 0 log(21.587 13 1. The LR test of the hypothesis H0 : µA = µB is obtained by comparing the deviances of the two models: c Studentlitteratur ° .3) − 2. +type*yr_c 14.695 25 1.4 The model analyzed in the text is saturated. Exercise 6.756 21 1. +yr_c*per_op 36. plus the type*yr_c inter- action. Main eﬀects only 38.4) = 2. which means that the data should agree perfectly with the model.908 23 It seems that a model with main eﬀects. The model for males is log(b µ) = b which gives log (320) = log(21. Exercise 6. However.e. it is diﬃcult to judge which of the two models is “best”. Some of the models were: Model deviance df 1. The model for females is log (b b +β µ) = log(t) + β b . a log link and using log(mon_serv) as an oﬀset were fitted to the data. These results agree with the computer outputs in the text. 0 1 b We get β 1 = log(175) − log(17.5 A. Exercise 6.4) + β log(t) + β b i. this model produces a near-singular Hessian matrix.7049 = −0. would describe the data well.39082.7049. based on statistical criteria. +type*per_op 33.

Thus.0602 on 13 df. The model with all main eﬀects and two-way interactions gives deviance 11. In the full model. The model is g (µB ) = β 0 + β 1 x. 2. Taking antilogs of minus these limits. The model with scores 0. the limits are [0. 1. possibly because the values for miles are rather similar for the diﬀerent years. This is compared to appropriate limits of a standard 0. Inclu- sion of log(miles) as an oﬀset gives the deviance 16. a log link.2639). 1%: 2. de- viance/df=1.9371 on 13 df. C. Therefore log (µB ) − log (µA ) = −β 1 .1% level.1%: 3. .3457.1% limit: we reject the null hypothesis. approximate 95% limits for µµB are obtained as e−0. Inclusion of the oﬀset does not aﬀect the fit very much. 0.0001). A A 95% Wald confidence interval for β 1 is 0.2676) / (19 − 18) = 11.96.5878 = 3.1764 normal variate z. Our observed test statistic is (numerically) larger than the 0.828.2354.2421 = 0. the 1% limit is 6. b β−0 The Wald test of the same hypothesis uses the test statistic b) s.9335]. Our observed value is even larger than 10. 0. it holds that g (µB ) − g (µA ) = (β 0 ) − (β 0 ³ + β 1´) = −β 1 . Of the two models.08 on 25 df.785 A and e−0. B. and for treatment B the model is g (µB ) = β 0 . 1. 0. deviance/df=1.5878 ± 0.9335 = 0.635 and the 0. deviance/df=1.f.589 which is used as an asymptotic χ2 on 1 d. and no oﬀset vari- able gives a deviance of 15. B. Limits: 5%: 1. is to be preferred because of a much better fit.6 A model with a Poisson distribution. Residuals plots for this model are as follows: c Studentlitteratur ° .7 A.1764.(β = 0.3322.393. 2.841. the result is clearly significant and H0 can be rejected at the 0.222 (D1 − D2 ) / (df1 − df2 ) = (27. which means that log µµB = −β 1 .2259. 3 for cigarettes is not a good model: deviance=301. the three-way interaction is not significant (p = 0. The link function g is a log function.1% limit is 10.857 − 16.291. The 5% limit is 3.e.5878 ± 1. Exercise 6.2416. All main eﬀects and in- teractions are highly significant (p < 0. the model in A. where x is a dummy variable (x = 1 for treatment A).2421 . .1746 on 9 df.576. Exercise 6. 3 for coﬀee and 0.828.96 · 0. For treatment A the model is g (µA ) = β 0 + β 1 .

434 − 16. The Normal plot is rather straight.600.Solutions to the exercises 223 The Residual vs.f. c Studentlitteratur ° . The test of the hypothesis of no relation between temperature and prob- ability of failure is obtained by calculating the diﬀerence in deviance be- tween the null model and the estimated model. with a couple of deviating observations at each end. for which the 5% limit is 3. this is significant at the 5% level (p = 0.. i) Poisson model: χ2 = 22.841 and the 1% limit is 6.8337 = 5.635. with a decreasing variance for increasing yb.018). These diﬀerences can be interpreted as χ2 variates on 1 d. Fits plots shows some tendency towards an “inverse trumpet” shape. Exercise 6.8 A.

0863 = 6.0121 = 0.0850 − 0. g (·) is a ex e1.1441.7637. again significant at the 5% level (p = 0. For the diﬀerent interaction eﬀects in the model we get: c Studentlitteratur ° . P (x ≥ 3) = 1 − P (x ≤ 2) = 1 − 0. log link. We interpret the parameters by the ordered values in the SAS printout. the odds ratio for.9 ³ ´ The odds ratios can be calculated as exp β b .5014. The Poisson model has the disadvantage that the expected number of fail- ing O-rings is actually larger than the total number on board: we predict 16 failures among 6 O-rings. Injury=I and Belt use=B. B. the number of failures.9879. so we only consider the interactions. Predicted values at 31o F are i) Poisson model: g (bµ) = 5. of course. however.2304 − 18. Using the Binomial model with n = 6 and p = 0.5014 = 0.224 ii) Binomial model: χ2 = 24.1034 · 31 = 2.9691 − 0. In the table we abbreviate Gender=G. The Binomial model is more reasonable. In the presence of i interactions the main eﬀect odds ratios are not very illuminating. D. we would expect nb p = 6 · 0. Location=L.81778. Exercise 6. in particular. for example. Since N is (alphabetically) before Y.5014 logit link which has inverse 1+e b = 1+e x so p 1.1156 · 31 = 1. Both models indicate a significant relationships between failure risk and temperature.7637) = 15. is so small that the asymptotics may not work.8179. B*I means that persons with “B=No” have “I=No” less often.0132). This. 9 of them to fail! C. Since g (·) is a b = exp (2. this gives µ ii) Binomial model: g (bp) = 5. could be stated as “Users of seat belts are injured less often”. Note.81778 = 4. that the number of observations and. With n = 6 O-rings on board.858.

1462 treatment CP 0 0.1782 0.8140) = 0.32 0.2928 -1.2803 122. An ordinal model for prediction of the attitude towards detection of cancer gives a Type 3 χ2 = 25.11 0. Exercise 7.0924 mammo Never 0 0.2500 -3.0000 0.1117 -0.1817 0.4599) = 0.1462).0001 treatment BP 1 0.0000 1.1814 -1.0001 mammo > 1 year 1 -0.919 Belts are avoided less often in ru- ral areas.4926 0. The parameter estimates for this model are as follows: Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept1 1 -2.0000 According to this analysis.20.0001.0000 0.8388 20.0000 1.811 Females traveled in rural areas less often than males.66 0.0003 Intercept3 1 1.4759 0.2 The standard χ2 test of independence gives χ2 = 24.7380 -0.0000 0.Solutions to the exercises 225 Term OR Comment G*L exp (−0.5405) = 0.7703 0. A standard Chi-square test of independence gives χ2 = 4.0000 0. G*I exp (−0.0004 mammo < 1 year 1 -1. p < 0.0000 0.2137 12.64 <.0000 .7550) = 0.03 <.86 on 2 df.1705 -0.1607 0. L*I exp (−0.5163 -0. G*B exp (−0. . p = 0.0001. Scale 0 1.582 Females were uninjured less of- ten than males.8051 40.631 Females avoided belt use less of- ten than males.5344 42.4753 0.1124 0.2602 -2.0001 Intercept2 1 -0.0000 1.0000 0.1481 on 4 df.0849) = 0.470 Passengers are uninjured less of- ten in rural areas B*I exp (−0.443 Non-users of belts are uninjured less often Exercise 7.0664 0.0001 Intercept2 1 -0.0000 0. Scale 0 1.6 on 3 df.3219 0.8220 1. there is no significant association between treatment and response (p = 0.1 A generalized linear model with a multinomial distribution and a cu- mulative logit link gave the following result: Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept1 1 -1.93 <.3247 -2.9564 -0.7563 2.0812 2.0000 c Studentlitteratur ° .2216 -0.1337 -0. .0000 0. p < 0.0000 .6222 0.0000 1.80 <.2881 13. L*B exp (−0.83 0.2099) = 0.

83 <. lwbc 1 0.0357 0.1258 1079. along with the survival times predicted by the model.3052 -2.0000 0.226 An alternative model is the linear by linear association model.0000 0.9968 0.1036 -1.0000 0.0061 0. Exercise 8.17 0.0000 The c ∗ m association is highly significant.09 0.0348 0. is as follows: c Studentlitteratur ° .0024 0. .0000 0.0000 1.6518 1.0000 0.95 <.0000 0.0411 -0.0057 0.7119 343.0001 Scale 0 1. The output is: Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 0. For these data.6821 0.0001 mammo > 1 year 1 -2.6763 cancer 2 0 0.0564 0.2160 0.8206 2711.0116 Scale 1 0.0128 2.5242 A plot of observed survival times for the two groups.0000 0. The sug- gested model produces a deviance of 38.0633 0.0089 0.8533 103.0688 -2.44 0.0000 0.7662 487.5756 0.0109 6.0001 mammo Never 0 0.5435 3.93 <.0000 0.0014 0.2342 on 30 df.0773 6.41 <.0136 ag 1 0 0.0014 0.0707 3.0000 .0001 cancer 0 1 -1.1180 ag 0 1 0.2745.37 0.06 <.2595 -0. .0171 0. deviance/df=1.0000 . this model gives: Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 3.1 A gamma model with log-transformed wbc values was tried. .0000 1.3954 -2.0036 -0.0976 0.0174 0.0001 cancer 1 1 0.6437 0.0000 .1375 -3. The wbc*ag interaction was far from significant so it was excluded. c*m 1 0.0431 0. All three analyses suggest a strong relationship between mammography experience and attitude to- wards cancer detection.0000 0. mammo < 1 year 1 -3.2606 0.

0000 1.0000 0.0001 4 group c 0 0.57 0.0000 0.8514 -1.20 0.6370 34.Solutions to the exercises 227 Exercise 8.96 <.5400 7.6553 4.5000 0.0296) between groups a and c.0301 0. The results for the mean value model was as follows: Mean model Prob Obs Parameter Level1 DF Estimate StdErr LowerCL UpperCL ChiSq ChiSq 1 Intercept 1 19.9303 80. 5 Scale 0 0.0000 0.0104 2 group a 1 -16. 5 Scale 0 1.7790 -26.8563 -7.4511 6.7746 -4.3 A SAS program for analysis of these data using the Glimmix macro is as follows.6955 4.0000 1.0000 0. The results for the variance model are: Variance model Prob Obs Parameter Level1 DF Estimate StdErr LowerCL UpperCL ChiSq ChiSq 1 Intercept 1 5.6136 16.35 <.7032 -31.73 0.0000 .3517 49.5000 0.2 The data were run using the macro for variance heterogeneity listed in the Genmod manual. Note that the macro itself must be run before this program is c Studentlitteratur ° .0000 _ _ There is a significant diﬀerence (p = 0.58 <.6907 0. Exercise 8.6500 -1.7866 3.8030 6. .0296 3 group b 1 -11.5000 _ _ The results indicate significant diﬀerences in variance between the groups.0001 3 group b 1 -3.0000 0.0001 2 group a 1 -6. .1318 0.0000 0.7200 7.7533 7.7066 2.6325 4.0000 0.7085 -4.0000 0.0000 .1379 4 group c 0 0.

05 0. . . stmts=%str( class pot var1 var2. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F Var1 3 64 3.33 <.51 0.5332 Var1 F 0.1333 0.4423 64 -2. Var1*Var2 K F 0 .25 0.6779 Var1 K 0 . . .55 0. .07 0.6849 0.2870 Var1*Var2 H F -0.0042 There is a significant interaction between varieties. error=binomial.06 0.4581 64 0. .12 0.4641 0. .1826 Var2 K 0 .46 0. . .3172 64 -0. . . . .0280 Var2 F 0. .4426 64 1. model x2/n2 = var1 var2 var1*var2.1351 Var1*Var2 F K 0 .3194 64 -0.e.4323 64 1.2612 Var1*Var2 F H -0. . Var1*Var2 K H 0 .6850 0.3074 0.228 submitted.9072 0. run.3324 64 0. .3447 64 1. .22 0.9360 0.6812 0.4266 64 -0. i. some combina- tions of varieties are more palatable than others to the lice.07013 0.21 0. c Studentlitteratur ° . Some of the output is as follows: Solution for Fixed Effects Standard Effect Var1 Var2 Estimate Error DF t Value Pr > |t| Intercept 1.4402 64 2. Var1*Var2 K A 0 .3047 64 -2.1817 0. ). .4186 0. Var1*Var2 A A 0. %glimmix( data=labexp.72 0.5865 Var2 H -0.6464 Var1*Var2 F F -0. . .35 0. link=logit ). .3030 Var1*Var2 H K 0 .4737 Var1*Var2 A K 0 . Var1*Var2 F A 0.04 0.0285 Var2 3 64 4. .4159 0.2112 0.0434 Var1*Var2 A F -0.63 0.2320 Var1 H -0.37 0.4600 64 -0. random pot*var1*var2. Var1*Var2 K K 0 . Var1*Var2 H A 0. .3107 64 -1.13 0.42 0.4596 0.4501 64 -1.8793 Var1*Var2 H H 0.2299 64 7. This conclu- sion may be followed up by comparing least squares mean values for the diﬀerent combinations.0001 Var1 A -0. . .1987 0.15 0. Var2 A -0.0382 Var1*Var2 A H -0.5441 0.4800 64 -1. . .0074 Var1*Var2 9 64 3.

21 contrast. 1 Bernoulli distribution. 7 analysis of variance. 12. 117 sandwich estimator. 57 computer software. 166 covariance analysis. 111 AR(1) structure. 5 166 comparisonwise error rate. 169 assumptions in general linear mod. 17 dilution assay. 37. 6 conditional independence. 26 estimable functions. 58 count data. 13 constraints. 42 dummy variable. 21 confidence interval. 2. 111 ANCOVA. cumulative logits. 23 classification variables. 24 adjusted R-square. 112 complementary log-log link. ix. 12 exchangeable correlation structure. 39 canonical link. 45 binomial distribution. 15 expected frequencies. 13 Cook’s distance. 5 contingency table. 119 Akaike’s information criterion. 92 censoring. 42 deterministic model. 158 empirical estimator chi-square distribution. 122 ED50. 57 compound distribution. 134 adjusted Pearson residual. 14 canonical parameter. 101. 60 boxplot. 21 arbitrary scores. ix. 148 els. 57 Bonferroni adjustment.Index adjusted deviance residual. 15 229 . 37 capture-recapture data. 120 analysis of covariance. dependent variable. 92 cross-over design. 15 ANOVA. 88. 4 analysis of variance table. 188 arcsine transformation. 40. 71 correlation structure. 24 autoregressive correlation structure. 86 dispersion parameter. 46 conditional odds ratio. 146 Cramér-Rao inequality. 87 deviance. 162 class variables. 60 ANOVA as GLIM. 2 166 design matrix. 73 robust estimator chi-square test. coeﬃcient of determination. 16 Dfbeta. 86 experimentwise error rate. 166 Anscombe residual. 113 deviance residual.

18 linear by linear association model. 168 homogenous association. 87 marginal probability. 36. 18. 1 homoscedasticity. ix. 44. 10 intercept. 2 logistic regression. 159 mixed generalized linear models. 40 gamma function. 3. 111 link function. 115 influential observations.230 Index exponential dispersion family. 112 GEE. 15 hat matrix. 86. 115. 48 F test. 91 generalized inverse. Fisher information. 190 linear predictor. 169 linear regression as GLIM. 159 nested models. 112 multiple regression. 87. 92 interaction. 36. 134 166 Glimmix. 3 Minitab. 31. 24 model building. 187 likelihood ratio test. 165 logistic distribution. 122 c Studentlitteratur ° . 36 m-dependent correlation structure. 3 multivariate quasi-likelihood. 121 Generalized estimating equations. logit link. 3 exponential family. negative binomial distribution. 45 LL model. 1. 151 likelihood function. 75 least squares. ix. ix fixed eﬀects. 190 134 Kaplan-Meier estimates. 53. 129 exponential distribution. 189 mixed models. 73 log link. 8 hazard function. 73 log-linear model. 25 identity link. 165 intrinsically nonlinear models. 45 Newton-Raphson’s method. 119 model. 119 iteratively reweighted least squares. 151 general linear model. 58 factorial experiments. 59 multiple logistic regression. 168 Hessian matrix. 37 latin square design. 4 generalized linear model. ix. 187 gamma distribution. 91. 6 likelihood residual. 59 extreme value distribution. 23 mutual independence. 162 independent variable. 56 Maximum Likelihood. 189 latent variable. 113. 4 linear regression. 37 leverage. 40. 55. 42 fitted value. 31. 69 frequency table. 151 nominal logistic regression. geometric distribution. 146 log likelihood. 169 marginal odds ratio. 111 mass significance. 36. 2 multinomial distribution. 40 model-based estimator. 120 Gumbel distribution. 42 hat notation. 40 full model. 86 165 logit regression. 188 146 Fisher’s scoring.

64 RC model. 133 parameter. 169 normal equations. 85 Proc Mixed. 131 response variable. 113 pairwise comparison. 151 Type 1 test. 120. 16 tests on subsets of parameters. 48 Pascal distribution. 162 nominal variable. 61. 45 relative risk. 114. 113 non-linear regression. 3. 116. 3 scaled deviance. 152 response variables. 98. 34 outlier. 112 Poisson distribution. 4 survival function. 8 Type 3 test. 8 score test. 26 tolerance distribution. 4. 115. 148 type 2 SS. 8. 8 proportional odds model. 12 PROC GLM. 158 probit analysis. 85 t test. 8 Pearson chi-square. 145 quasi-likelihood. continuous. 5 normal distribution. 23 R-square. 31 ordinal logit regression. rates. 169 total sum of squares. 53 114 type 1 SS. 117 statistical model. 55 odds ratio. 49 quantal response. 188 partial odds ratio. 8 proportional hazards. 46. binary. 35 overdispersion. 32 ordinal response. 126 sum of squares. 26 saturated model. 134 sequential sum of squares. 98 residual plots. 133 overdispersion parameter. 1. 49 proportional odds. 45. 152 response variables. 120 score residual. 98 repeated measures data. 59 response variables. 56 odds. 37. 32 ordinal probit regression. 149 type 3 SS. 158 predicted value. 122 residual sum of squares.Index 231 nominal response. 40. 16. 60 score equation. 40 survival data. 3 rate data. 165 observed residual. 131 normal probability plot. 57 partial sum of squares. binomial. 116 simple linear regression. 5 oﬀset. 32. 33 ordinal regression. 4 power link. 45 partial leverage. 14 scale parameter. counts. 32. 4 residual. 89 probit link. 15. 145 response variables. 32 type 4 SS. 148 null model. 38 random eﬀects. 7 Proc GLM. 1 Poisson regression. 56 statistical independence. 35. 8 c Studentlitteratur ° . 32. 62 SAS. 4 product multinomial distribution. truncated Poisson distribution. 32. 152 response variables. 55. 8 Pearson residuals.

61 variance function. 52 c Studentlitteratur ° .232 Index underdispersion. 47 Wilcoxon-Mann-Whitney test. 39 variance heterogeneity. 157 Wald test.