0 Up votes0 Down votes

4 views26 pagesgoal and objectives

Sep 11, 2015

© © All Rights Reserved

PDF, TXT or read online from Scribd

goal and objectives

© All Rights Reserved

4 views

goal and objectives

© All Rights Reserved

- Generalized Linear Models
- Multiple Linear Regression
- ch13_S06.pdf
- Agenda for SAS' 12th Annual Data Mining Conference, M2009
- Anova slides presentation
- analysis of teeth estimation
- DATA HUGO
- R Data - programação
- 572286c5fafbef32d44443e23defeefc1627
- General Linear Model
- Orm Project_Regression (3)
- Anova Reg Summary
- The Strength of a Magnet
- format 10 semoga fix.xlsx
- tutorial on variational inference.pdf
- out_40
- Manual.pdf
- After Statistics
- Cereal Example Notes
- regression of thermodynamic data

You are on page 1of 26

Simon Wood

Mathematical Sciences, University of Bath, U.K.

i = X i

yi N(i , 2 ).

where i = E (yi ).

g (i ) = Xi

yi EF(i , )

EF(i , ) is any exponential family distribution (e.g. Normal,

gamma, Poisson, binomial, Tweedie, etc. )

is a known or unknown scale parameter

X (= ) is the linear predictor

(log{i /(1 i )}).

g acts a little bit like the data transformations used before

GLMs. However note:

The link function does not transform yi itself.

model, without changing the distribution of the random

variability.

(density) function can be written in a particular general form.

exponential family distributions, then we can write:

var(y ) = V ()

where V is a known variance function of = E(y ), and is

a scale parameter (known or unknown).

knowledge of V , without needing to know the full distribution

of y , using quasi-likelihood theory.

GLMs: scope

example:

distribution.

distribution.

150

100

50

New Cases

200

250

AIDS in Belgium

1982

1984

1986

1988

Year

1990

1992

E(yi ) = N0 e 1 ti

yi Poi

Model is for exponential increase of the expected rate.

Observed number of cases, follows a Poisson distribution.

log (E(yi )) = log(N0 ) + 1 ti

= 0 + 1 ti ,

i.e. a GLM with a log link (0 log(N0 )).

yi Poi

GLM estimation

Model estimation is by maximum likelihood, via a Newton type method. e.g. for the

0.2

0.1

0.0

0.3

0.4

10

20

30

exp(0)

40

50

60

70

IRLS

as an Iteratively Re-weighted Least Squares scheme. . .

Initialize i = g (yi ), then iterate the following steps.

1. Form pseudodata zi = g (

i )(yi

i )/i + i and iterative

weights, wi = i / V (

i )g (

i )2 .

P

2. Minimize the weighted sum of squares i wi (zi Xi )2 w.r.t.

and hence new and .

to obtain a new ,

i = 1 + (yi

i )(Vi /Vi + gi /gi ) gives Newtons method.

likelihood replaces the actual Hessian in Newtons method.

which the Newton Fisher.

At convergence is the MLE (both methods!).

Distribution of

weighted least squares),

N(, (XT WX)1 ).

X

2 /(n dim())

=

wi (zi Xi )

i

Deviance

residual sum of squares of a linear model. This is the deviance.

the : l (). Then the deviance is

D = 2 {l (y) l ()}

for hypothesis testing we need the scaled deviance D = D/.

(When = 1, D = D).

Properties of deviance

If the model exactly fits the data then the deviance is zero.

As a rough approximation

D 2ndim()

if the model is correct. Approximation can be good in some

cases and is exact for the strictly linear model.

= D/{n dim()}.

(Since E (2p ) = p and D = D/.)

Model Comparison

compared using a generalized likelihood ratio test. . .

In terms of the scaled deviance, if model 0 is correct then

D0 D1 2p1 p0 .

2. If is unknown, then the GLRT leads to the approximate

result that, under model 0

(D0 D1 )/(p1 p0 )

Fp1 p0 ,np1 .

D1 /(n p1 )

rather than the simplest model supportable by the data.

For GLMs we need to check the assumptions that the data are

independent and have the assumed mean-variance

relationship, and are consistent with the assumed

distribution.

the mean variance relationship or distribution.

approximately constant variance, and behave something like

residuals for an ordinary linear model.

Pearson residuals

their scaled standard deviation, according to the model

yi

i

pi = p

V (

i )

variability of these residuals should appear to be fairly

constant, when they are plotted against fitted values or

predictors.

Deviance residuals

the squared residuals.

deviances for each datum:

X

D=

di

then by analogy we can view di as a residual.

from a linear model.

glm in R

and the structure of the linear predictor on the right.

referred to by the formula.

model with a log link assuming a Poisson response variable.

belg.aids <- data.frame(cases=c(12,14,33,50,67,74,123,

141,165,204,253,246,240),year=1:13)

am1 <- glm(cases ~ year,data=belg.aids,

family=poisson(link=log))

plot(am1)

13

3.5

4.5

5.5

Predicted values

13

1.5

1.5

Residuals vs Leverage

0.5

1.0

0.5

1

2

0.0 1.0

Theoretical Quantiles

13

0.5

0.5 0.5

2

ScaleLocation

2

0.0

2.0

2

0

Residuals

4 2

Normal QQ

Residuals vs Fitted

Cooks distance

3.5

4.5

5.5

0.0

Predicted values

points.

0.2

Leverage

13

0.4

Try a quadratic time dependence?

am2 <- glm(cases ~ year+I(year^2),data=belg.aids,

family=poisson(link=log))

plot(am2)

ScaleLocation

3.5

4.5

5.5

Predicted values

. . . much better.

1.5

0.0 1.0

Theoretical Quantiles

1.2

0.8

0.4

11

0.5

0.5

13

2.5

3.5

4.5

Predicted values

5.5

Cooks distance

2.5

6

2

0.0

2

1

0

1

1.0

0.0

Residuals

1.5

Residuals vs Leverage

11

11

Normal QQ

11

Residuals vs Fitted

0.0

0.2

0.4

Leverage

0.6

Now, examine the fitted model, first with the default print method

> am2

Call:

glm(formula=cases~year+I(year^2),

family=poisson(link=log),data=belg.aids)

Coefficients:

(Intercept)

1.90146

year

0.55600

I(year^2)

-0.02135

Null Deviance:

872.2

Residual Deviance: 9.24

AIC: 96.92

summary.glm (edited)

> summary(am2)

...

Deviance Residuals:

Min

1Q

-1.45903 -0.64491

Median

0.08927

3Q

0.67117

Max

1.54596

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.901459

0.186877 10.175 < 2e-16 ***

year

0.556003

0.045780 12.145 < 2e-16 ***

I(year^2)

-0.021346

0.002659 -8.029 9.82e-16 ***

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 872.2058

Residual deviance:

9.2402

AIC: 96.924

on 12

on 10

degrees of freedom

degrees of freedom

Hypothesis testing

hypothesis that am1 is correct, against the alternative that am2 is

...

> anova(am1,am2,test="Chisq") ## NOT doing ANOVA!

Analysis of Deviance Table

Model 1:

Model 2:

Resid.

1

2

cases ~ year

cases ~ year + I(year^2)

Df Resid. Dev Df Deviance P(>|Chi|)

11

80.686

10

9.240 1

71.446 2.849e-17

> ## NOT doing ANOVA!

Analysis of Deviance Table

Model 1:

Model 2:

Resid.

1

2

cases ~ year + I(year^2) + I(year^3)

Df Resid. Dev Df Deviance P(>|Chi|)

10

9.2402

9

9.0081 1

0.2321

0.6300

AIC comparison

> AIC(am1,am2,am3)

df

AIC

am1 2 166.36982

am2 3 96.92358

am3 4 98.69148

So, both hypothesis testing and AIC agree that the quadratic

model, am2 is the most appropriate.

predict

predictions from a fitted model object.

in a newdata dataframe. If absent the values used for fitting

are employed.

The following predicts from am2, with standard errors.

fv <- predict(am2,newdata=data.frame(year=year),se=TRUE)

Now we can plot the data, fitted curve, and standard error

bands:

plot(belg.aids$year+1980,belg.aids$cases)

lines(year+1980,exp(fv$fit),col=2)

lines(year+1980,exp(fv$fit+2*fv$se),col=3)

lines(year+1980,exp(fv$fit-2*fv$se),col=3)

#

#

#

#

data

fit

upper c.l.

lower c.l.

150

100

50

cases

200

250

1982

1984

1986

1988

year

1990

1992

- Generalized Linear ModelsUploaded byBoris Polanco
- Multiple Linear RegressionUploaded byMarlene G Padigos
- ch13_S06.pdfUploaded bySuzain Nath
- Agenda for SAS' 12th Annual Data Mining Conference, M2009Uploaded byeTsir
- Anova slides presentationUploaded byCarlos Samaniego
- analysis of teeth estimationUploaded byAisyah Rieskiu
- DATA HUGOUploaded byRicardo Perez-Saavedra Carruitero
- R Data - programaçãoUploaded byBárbaraHelohá
- 572286c5fafbef32d44443e23defeefc1627Uploaded byDidi Agustino
- General Linear ModelUploaded byrahmah
- Orm Project_Regression (3)Uploaded byAnkit Goel
- Anova Reg SummaryUploaded byMuhammad Imran
- The Strength of a MagnetUploaded byOvidiu Rotariu
- format 10 semoga fix.xlsxUploaded byK Adrielle
- tutorial on variational inference.pdfUploaded byFERNANDO RODRÍGUEZ S.
- out_40Uploaded byVedaYanthi
- Manual.pdfUploaded byXimena Sanchez Duque
- After StatisticsUploaded byMichael Welker
- Cereal Example NotesUploaded byRohan Gupta
- regression of thermodynamic dataUploaded byamo
- Solved Papers-Oldpaper 1Uploaded byvignesh__m
- Improving Artificial Neural Networks With a Pruning Methodology and Genetic Algorithms for Their Application in Microbial Growth Prediction in FoodUploaded byShanmugaprakash Muthusamy
- Multiple Regression Paper 2Uploaded byRashid Ayubi
- SESSION_2_EXERCISESUploaded byrini_reen
- 1-s2.0-S016816050500334X-mainUploaded byCarlos Ceballos
- Ali Thesis Compilation.docxUploaded byheena zubair
- Becker et alUploaded byQiaonan Yang
- Topology Error Identification Using Branch Current State Estimation for Distribution SystemsUploaded byJesus Ortiz L
- Fast Decoupled State Estimation for Distribution Networks Considering Branch Ampere MeasurementsUploaded byJesus Ortiz L
- SN048Predictive Validity of Critical Power, The Onset of Blood Lactate and Anaerobic Capacity for Cross-Country Mountain Bike Race PerformanceUploaded byKlau Manriquez

- Hunza Gilgit Skardu Tour MapUploaded byOsama Raheel
- 10.1.1.561.2492Uploaded byweran
- 053Uploaded byweran
- 52Uploaded byweran
- 10633Uploaded byweran
- 63117Uploaded byweran
- 1Uploaded byweran
- 10-11-10-288cc41Uploaded byweran
- Urdu Homework Class 4Uploaded byweran
- Urdu Worksheet 1Uploaded byweran
- Unit 25 PPUploaded byweran
- Unit 26 PPUploaded byweran
- Unit 26.pdfUploaded byweran
- Unit 27 PPUploaded byweran
- Unit 28 PPUploaded byweran
- Unit 28Uploaded byweran
- Unit 27Uploaded byweran
- Unit 29.pdfUploaded byweran
- Unit 29 PPUploaded byweran
- Unit 31 PPUploaded byweran
- Unit 31Uploaded byweran
- Unit 32Uploaded byweran
- Unit 33 PPUploaded byweran
- Unit 33Uploaded byweran
- Unit 34 PPUploaded byweran

- Sprenger_ecu_0600M_10405.pdfUploaded byEd Pineda
- Resili4rt ProjectUploaded byFlávia Passos
- jesuswants to save summaryUploaded byapi-246559116
- Brand Architecture.pdfUploaded bySaurabh Das
- Abhi RathoreUploaded byAjay Singh
- Bawanyair Annual Report JUNE 2012-2013Uploaded byMansoor1991
- effects on students reading comprehension and parental involvementUploaded byapi-426081440
- Corrosion Protective Conversion Coatings on Magnesium DisksUploaded bythuron
- IB Chapter 1 AnswersUploaded byMostafa Barakat
- MENA Tax Insight - November 2013 EditionUploaded byMido1
- Python Web ScrapingUploaded byRimbun Ferianto Sr
- Examberry English Paper 2Uploaded bypantmukul
- Hassan Al BannaUploaded byPutranto Argi Noviantoko
- Plus ToolsUploaded byMarina
- Botswana Tax SummaryUploaded bydave
- Asimov's Guide to Halley's Comet - Isaac AsimovUploaded byAnonymous69
- Sanchez-Vives (2005) From Presence to Consciousness Through Virtual RealityUploaded bybobbyb
- دوائر التوقيت الزمنىUploaded byمعاذ المحمودى
- (a) Example 8-1. (1) What Are_ (2) What Would Have Been... _ Chegg.comUploaded byAnonymous Hzdnl0WN
- BACHandTUNING WerckmeisterUploaded byOnir García Cerda
- 64077173-Chapter-16Uploaded byamrdeck1
- 04 Ultra High Density Storage Using Nano TechnologyUploaded byDevi Pillai
- ea sports fifa 18 analysis-2Uploaded byapi-371922655
- coke UKUploaded byAmir Hamzah
- EssayUploaded byKevin Vaquiro
- Angel Robbie WilliamsUploaded byCésar Sepúlveda R.
- E560_23BA20_DSUploaded bySaqib Hasan
- SOCTEC 2 DevelopmentModernizationState BuildingUploaded byxheyyitskatex
- Sexual Harrassment MitologyUploaded bySiniša Glušac
- writing 1 syllabusUploaded byapi-260126569

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.