Attribution Non-Commercial (BY-NC)

328 views

Attribution Non-Commercial (BY-NC)

- 140425 2. Regression
- Applied Statistics and the SAS Programming 5th edition
- Harvesting Collective Intelligence: Temporal Behavior in Yahoo Answers
- SAS Course Contents (2)
- Tt Teaching
- 12-16 - 2010 Polygamy Reference Case Proceedings- Day 14
- Gaussian PDF
- THESIS IT!
- Bernasconi Euw Beer Ta 2005
- pone.0093647-1
- Dealing With Tyranny International Sanctions and Autocrats' Duration
- 0805.3264
- The DEA Method in Managing the Credit Risk
- 1-s2.0-S0094730X12000836-main
- eronini40_1324337978
- GV10-BASEBALL.xls
- 27645754
- 407-2013
- 2014 Factores de Riesgo Accidente Muerte Us Army 2004 2009
- EMI2014-203747

You are on page 1of 4

Response variable is binomial counts, denoted by counts of binary variables. That is, if

X~bernoulli(p), then Y = ∑ X i ∼ Binomial ( n, p )

i

Example 1 Case Study 21.1: Island size and bird extinctions: On each island we count the

number of species that went extinct out all the species on the island. What is the relationship

between the area of an island and the probability of extinction of birds present on the island?

Example 2 Case study 21.2: Moth coloration and natural selection: At each distance from

Liverpool we count the number of moths from each morph that were taken by predators. What is

the relationship between the distance from Liverpool, where trees are dark from industrial soot,

and the probability of predation on the light and dark morphs of the moth Carbonaria?

Y = the number of successes in m binomial trials. For example, how many species went

extinct in the 10 year period of the study on each island?

Yi ~Binomial(mi,πi) where i is the ith island and πi is the probability of extinction on each

island.

X 1 ,..., X p : explanatory variables, in the extinction example, X is the area of the island.

Y/m = the binomial proportion. Note that the sample size in the bird extinction study is

the number of islands, not the number of species.

• logit(π) = η = β 0 + β1 X 1 + … + β p X p

• As before:

eη

π=

1 + eη

page 2

Continuous versus Counted Proportions:

Not all proportions are appropriate to model with logistic regression. We model proportions like

fat calories/total calories, etc., using normal theory, usually. The only proportions that are

appropriate in this context are those that result from an integer count of a certain outcome over

the total number of trials or outcomes.

Variance

µ (Yi X 1i ,…, X pi ) = πi

SD (Yi X 1i ,… , X pi ) = miπ i (1 − π i )

As for the binary response model we use the maximum likelihood estimators (MLE’s).

Model Assessment

Estimated versus observed: One was to assess the appropriateness of the model and the efficacy

of the estimation routine is to plot the estimated probability, πˆi , against the observed response

Y

proportion, π i = i . Additionally, plots of the observed logits versus one or more of the

mi

explanatory variables are useful for visual examination as we do for ordinary scatterplots in

linear regression. See Display 21.2.

Residual analysis: As in the binary response case, we have two widely used residuals for

binomial counts models.

Residuals

There are two standard ways to define a residual in logistic regression.

yi − miπˆi

1. Pearson residual = .

miπˆi (1 − πˆi )

⎧⎪ ⎛ Y ⎞ ⎛ m − Y ⎞ ⎫⎪

2. Deviance residual = Dresi = sign (Yi − miπˆi ) 2 ⎨Yi log ⎜ i ⎟ + ( mi − Yi ) log ⎜ i ⎟⎬

⎩⎪ ⎝ miπˆi ⎠ ⎝ mi − miπˆi ⎠ ⎭⎪

The Pearson residual is more easily understood, but the deviance residual directly gives the

contribution of each point to the lack of fit of the model.

page 3

Since the data are grouped, the residuals in a binomial counts logistic regression (either Pearson

or deviance) are more useful than in the binary response regression.

The residuals should be plotted against the predicted values for πis and examined for outliers or

remaining patterns.

As with the binary response case, we use the value –2ln(Maximized likelihood function) to

compare models. Recall that the MLE’s of the β i ’s are the values that maximize the likelihood

function of the data. So, we find that the values for the β i ’s that maximize the likelihood

function, take the natural log and multiply by –2.

• The quantity –2 ln(Maximized likelihood) is also called the deviance of a model since

larger values indicate greater deviation from the assumed model. Comparing two nested

models by the difference in deviances is a drop-in-deviance test.

• The difference between the values of –2ln(Maximized likelihood function) for a full and

reduced model has approximately a chi-square distribution if the null hypothesis that the

extra parameters are all 0 is true. The d.f. is the difference in the number of parameters

for the two models.

Model Selection

Both AIC and BIC can be used as model selection criteria. As with linear regression models,

they are only relative measures of fit, not absolute measures of fit.

AIC = Deviance + 2p

where p is the number of parameters in the model. Stepwise model selection methods are

available in SPSS using likelihood ratio tests or Wald’s test. The LR methods are preferred.

Other software programs, like S-Plus, have stepwise procedures using AIC or BIC.

Goodness-of-fit Tests

Since we have multiple counts per cell there is a goodness-of-fit test similar to that in linear

regression. We can compare the model with the log odds or logit is linear in the parameters to the

model where each cell has a separate mean. So we are comparing the logistic regression model

with p predictors to the model n different parameters, where p is the number of predictors in the

logistic regression model and n is the number of categorical treatment combinations. That is we

are testing

Saturated model: logit(π i ) = α i (n parameters)

page 4

⎧⎪ ⎛ Y ⎞ ⎛ m −Y ⎞ ⎫⎪

The test statistic is D 2 = ∑ Di 2 = ∑ 2 ⎨Yi log ⎜ i ⎟ + ( mi − Yi ) log ⎜ i ⎟⎬

i i ⎩⎪ ⎝ miπˆi ⎠ ⎝ mi − miπˆi ⎠ ⎭⎪

Both the denominators, miπˆi and mi − miπˆi , need to be large for the distribution of the test

{

statistic to be approximately χ 2 n − p . The pvalue for the test is then Pr χ 2 n − p ≥ D 2 }

Wald Test and Confidence Intervals for Single Coefficients.

The Wald test performs similarly as in the binary counts case. The normal approximation used

by this test is adequate so long as n is moderately large and the mπ is greater than 5.

Below is the code for fitting the bird extinction model in Matlab.

105.8000 67.0000 3.0000

30.7000 66.0000 10.0000

8.5000 51.0000 6.0000

4.8000 28.0000 3.0000

4.5000 20.0000 4.0000

4.3000 43.0000 8.0000

3.6000 31.0000 3.0000

2.6000 28.0000 5.0000

1.7000 32.0000 6.0000

1.2000 30.0000 8.0000

0.7000 20.0000 2.0000

0.7000 31.0000 9.0000

0.6000 16.0000 5.0000

0.4000 15.0000 7.0000

0.3000 33.0000 8.0000

0.2000 40.0000 13.0000

0.0700 6.0000 3.0000];

extinct=case2101(:,3);

atrisk=case2101(:,2);

area=case2101(:,1);

[b,dev,stats]=glmfit(area,[extinct atrisk],'binomial');

x = 1:10:180;

y = glmval(b,x,'logit');

plot(area,extinct./atrisk,'x',x,y,'r-')

- 140425 2. RegressionUploaded bychaiyan_05
- Applied Statistics and the SAS Programming 5th editionUploaded byblackgenie13
- Harvesting Collective Intelligence: Temporal Behavior in Yahoo AnswersUploaded byHewlett-Packard
- SAS Course Contents (2)Uploaded byVishal Kumar
- Tt TeachingUploaded byPrio Sakti Prambudi
- 12-16 - 2010 Polygamy Reference Case Proceedings- Day 14Uploaded byborninbrooklyn
- Gaussian PDFUploaded byezad22
- THESIS IT!Uploaded byPauline Hazel Tamio Gache
- Bernasconi Euw Beer Ta 2005Uploaded byJonasDispersyn
- pone.0093647-1Uploaded byGita Listawati
- Dealing With Tyranny International Sanctions and Autocrats' DurationUploaded byPol Labelle
- 0805.3264Uploaded byMichael Ray
- The DEA Method in Managing the Credit RiskUploaded byphuongivy
- 1-s2.0-S0094730X12000836-mainUploaded byrikirdn27
- eronini40_1324337978Uploaded byakita_1610
- GV10-BASEBALL.xlsUploaded bySemir Selimovic
- 27645754Uploaded byGiwgiwIkiw
- 407-2013Uploaded byManjunath Gorripati
- 2014 Factores de Riesgo Accidente Muerte Us Army 2004 2009Uploaded byJaime Muñoz Insunza
- EMI2014-203747Uploaded byChristabella Natalia Wijaya
- Jama StatisticUploaded byHugo Bustamante Chávez
- Slides3 RevisedUploaded byTalvany Luis de Barros
- etivariasiUploaded byjaelani
- Pastor Valero 2007Uploaded byHendryjati
- ANF Distribution of Mallorcan Alytes Muletensis in Relation to Landscape Topography and Introduced PredatorsUploaded byFilipe Silva
- Class 22Uploaded byNaas Djeddaoui
- Opportunistic Political Budget Cycle and Re-Election IncumbentUploaded byInternational Journal of Innovative Science and Research Technology
- Home Work %2F Interview QuestionsUploaded bykomal kashyap
- 2 Days Machine Learning WorkshopUploaded byvpsampath
- j3Uploaded byrachman

- Model- vs. design-based sampling and variance estimationUploaded byFanny Sylvia C.
- ReviewChaps3-4Uploaded byFanny Sylvia C.
- SampleSizeCalcRevisitedUploaded byFanny Sylvia C.
- Hypo%26PowerLectureUploaded byFanny Sylvia C.
- ReviewChaps1-2Uploaded byFanny Sylvia C.
- Non%26ParaBootUploaded byFanny Sylvia C.
- Chapter 20Uploaded byFanny Sylvia C.
- Chapter 14Uploaded byFanny Sylvia C.
- Chapter 13Uploaded byFanny Sylvia C.
- Chapter 12Uploaded byFanny Sylvia C.
- Chapter 11Uploaded byFanny Sylvia C.
- Chapter 8Uploaded byFanny Sylvia C.
- Chapter 10Uploaded byFanny Sylvia C.
- Chapter 9Uploaded byFanny Sylvia C.
- Chapter 5Uploaded byFanny Sylvia C.
- Chapter 6Uploaded byFanny Sylvia C.
- Chapter5p2LectureUploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- An Ova PowerUploaded byFanny Sylvia C.
- Intro BootstrapUploaded byMichalaki Xrisoula
- Good Article on Standard Error vs Standard DeviationUploaded byAshok Kumar Bharathidasan
- Data Modeling: General Linear Model &Statistical InferenceUploaded byFanny Sylvia C.
- Bio Math 94 CLUSTERING POPULATIONS BY MIXED LINEAR MODELSUploaded byFanny Sylvia C.
- GRM: Generalized Regression Model for Clustering Linear SequencesUploaded byFanny Sylvia C.
- Clustering in the Linear ModelUploaded byFanny Sylvia C.
- R Matrix TutorUploaded byFanny Sylvia C.
- The not so Short Introduction to LaTeXUploaded byoetiker
- Close Out NettingUploaded byFanny Sylvia C.

- Multiple linear regressionUploaded byShameer P Hamsa
- Simulation Models s AsUploaded byngyncloud
- timeseries.pptUploaded byAnonymous iEtUTYPOh3
- The York Pedagogy: What and why, how and whyUploaded byyorkforum
- Model-Mediating-and-Moderating-Effects.pdfUploaded byAntonio Garcia
- Gretl ManualUploaded byVictor Villano
- Statlab Hlm Intro 0407Uploaded byamirriaz1984
- 293Uploaded byArpan Kumar Patra
- Econometric sUploaded byvic721130
- ARDL Model - Hossain Academy NoteUploaded byabdulraufhcc
- ch14pplnUploaded byMH.Sezan
- APPLYING REGRESSION USING E VIEWS (WITH COMMANDS)Uploaded bySafia Aslam
- Statistical Modeling of Monetary Policy and Its EffectsUploaded byJames Matthew Miraflor
- Best Fit Line RegressionUploaded bynitin30
- Analysis & Interpretation of Economic Sanctions Journal of Economic Studies, Volume 24, Number 5, 1997, Pages 324-348Uploaded byShane Bonetti
- A Guide to Writing in EconomicsUploaded byTung Hoang
- Dynamic Panels and Non-Stationary Data ModelsUploaded byAheisibwe Ambrose
- Handout 1aUploaded byVaniVanilla
- Chapter 3Uploaded byJoseph Kandalaft
- exam10Uploaded byPriyadarshini Srinivasan
- Penghitungan SPSS Tinggi Dan Jumlah DaunUploaded byDimas Urip S
- Unemployment Rate 3Uploaded bySk.Ashiquer Rahman
- Intervalo de Confianza y Dummy Variables 1Uploaded byJeannette Sanhueza Delgado
- cep01-02Uploaded bymarhelun
- Interpretation of Eviews RegressionUploaded byRabia Nawaz
- Logit powerpointUploaded bystevenson1256
- pecicanUploaded byDimas O Jawa
- Bowerman CH15 APPT FinalUploaded byMuktesh Singh
- Evaluation of Capm on Malayan Banking Berhad (1155.Kl) EXCEL FILEUploaded byJeevan Kumar Rajah
- RobustEstimators-2Uploaded byKoh Siew Kiem

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.