You are on page 1of 12

Twisted relationships

Generalised linear models

Adolfo Amézquita

Departmento de Ciencias Biológicas

Universidad de Los Andes

Bogotá, Colombia

http://gecoh.uniandes.edu.co

aamezqui@uniandes.edu.co

The topics

• Logistic and Poisson regressions are particular cases of


generalised models

• Generalised models (GLM)

• Distribution functions

• How to model a relationship between two variables without


assuming a specific mathematical model

• What the hell is the odds ratio

Counting
Examples of count data

• Número de Unidades Formadoras


X1 Yn
de Colonia

• Número de aves avistadas en un


transecto
X1 Yn
• Número de carros que pasan por X2
una calle en un minuto

• Número de plántulas en un
cuadrante X1 Yn
X2 (2)
• Número de crías en la camada de
cada hembra X1 * X2
Count data
need a different approach

• “The linear model might lead to the X1 Yn


prediction of negative counts. 


• The variance of the response


variable is likely to increase with the X1 Yn
mean. 

X2
• The errors will not be normally
distributed. 

X1 Yn
• Zeros are difficult to handle in
transformations.” X2 (2)
X1 * X2
Crawley 2013

X1 Yn
On Poisson regression

Y(n) 2 2
non-linear

1 1

0 0
X1(continuous) X1(continuous)

X1 Yn
On Poisson regression

non-linear
Increasing complexity of models

Models that can Really non-linear


Linear models
be linearised models

e.g. logistic regression,

Poisson regression,

Other Glims

etc, etc, etc

Two ways to deal with non-linear models that can


be linearised

me… lost

I learn each I learn


possible generalised
statistical test models (GLM)

Two kinds of models

General linear Generalised


models linear models
lm glm

> wdistModel <- lm (wDistance ~ glucolevel + preydensity)


> summary (wdistModel)

We must define family and link

> SurvModel <- glm (Surv ~ glucolevel + age, family = binomial)


> summary (wdistModel)
Error distribution in a linear statistical model

response = deterministic part + stochastic part

wDistance = A*glucolevel + B*preydensity + C + ℇ

normal uniform

error will be platykurtic

error is bounded within 0 and 1


error leads only to positive values

binomial Poisson

Error distribution in a generalised statistical model

response = deterministic part + stochastic part

wDistance = A*glucolevel + B*preydensity + C + ℇ

other distribution
Link function in a generalised statistical model

response = deterministic part + stochastic part

wDistance = LINK [A*glucolevel + B*preydensity + C + ℇ]

other distribution

On distributions
event space, probability functions, descriptors

Event space of the variable Sex

Binary
Event space of the variable Size of egg clusters

Frequency

Count

Probability distribution of sex

Probability distribution of Size of egg clusters


Cumulative distribution function (CDF) of Size of
egg clusters

Cómo sería la CDF de la variable Sexo?

A continuous distribution such as ..normal

Continuous

work.thaslwanter.at
Descriptors depend on the kind of distribution

Sin palabras: distribución normal


mean
PDF standard deviation

Continuous

lower limit
descriptors

Uniform distribution upper limit

PDF CDF

Discrete

Continuous
Count of binary occurrences in a given
number of trials: binomial distribution
p of event occurrence
PDF sample size (N)

Discrete

Number of occurrences in a unit of time or space:


the Poisson distribution

λ = mean average count


PDF

Discrete

Two kinds of models

General linear Generalised


models linear models
lm glm

> wdistModel <- lm (wDistance ~ glucolevel + preydensity)


> summary (wdistModel)

> SurvModel <- glm (Surv ~ glucolevel + age, family = binomial)


> summary (wdistModel)
What family do I use
depends on the type of variable and its distribution

Error structure is defined by the family of the model

type of data error default link

continuous data family = normal link = identity

count data family = poisson link = log

binary, proportions family = binomial link = logit

survival, time to event family = exponential link = ?

And what the hell is OR


The Odds Ratio
What is an Odds Ratio (the effect size in GLM)
Do sexually transmitted diseases increase the risk of an ectopic pregnancy?

Normal Ectopic Risk


No STI 90 10 0.11
STI 30 70 2.33

OR = 2.33/0.11 = 21.12

Cómo Oportunidad
traducir OR? Relativa

Is OR significant?
El clímax:
calcule el valor
OR como
magnitud del
efecto de la
variable
continua Edad
sobre el riesgo
de padecer
Onicomicosis

We should know now

• Non-normal response variables can be modelled by

• ...explicitly stating the kind of error structure / probability distribution:


(binomial, Poisson, exponential, others)

• ...transforming them with a link non-linear function

• ...testing a generalised linear model (GLM)

• When response variables are not continuous, the effect size is additionally
expressed as an Odds ratio

You might also like