Httpsemas2.Ui - Ac.idpluginfile - Php2375826mod Resourcecontent1kuliah1 2 PDF

GENERALIZED LINEAR MODEL
INTRODUCTION TO GENERALIZED
LINEAR MODELS
Program Studi S1 Ilmu Aktuaria
Departemen Matematika
FMIPA UI
ATA 2021/2022
Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

GENERALIZED LINEAR MODEL Introduction
Course Description
This course present students with an introduction to the GLM

methodology and its applications of interest to actuarial science.
I GLM is methodology for modeling relationships between the
response and the explanatory variables, predictors or
covariates (in some contexts these are called risk factors or
rating factors)
I Extend the linear model modeling framework to variables that
are not normally distributed which insurance analysts
typically encounter
I GLMs are most commonly used to model binary or count
data, so we will focus on models for these types of data
I GLM’s allow also to include nonnormal errors such as
Binomial, Poisson and Gamma errors.

Course Outline
1. Introduction to Generalized Linear Models

2. Model Fitting
3. Exponential Family and Generalized Linear Models
4. Estimation and Inference
5. Binary Variables and Logistic Regression
6. Nominal and Ordinal Logistic Regression
7. Poisson Regression, Log-Linear Models, Negative Binomial
Regression
8. GLM and its Application

Text Books and Software
1. An Introduction to GLM (Dobson & Barnett; CRC Press,

2018)
2. Foundation of Linear and Generalized Linear Models (Alan
Agresti; Wiley, 2015)
3. GLM for Insurance Data (Jong, Heller; Cambridge University
Press, 2008)
Software : R package/ Phyton

GLM: Model Fitting and Inference Models
1. Review of Linear Models

2. Exponential Dispersion Family Distributions for a GLM
3. Likelihood and Asymptotic Distributions for GLMs
4. Likelihood-Ratio/Wald/Score Methods for GLM Parameters
5. Deviance of a GLM, Model Comparison, and Model Checking
6. Fitting GLM
7. Selecting Explanatory Variables for a GLM
8. Example : Building a GLM
9. Exercises

Models for Binary Data
1. Link Function for Binary Data

2. Logistic Regression
3. Inference About Parameters of Logistic Regression
4. Logistic Regression Model Fitting
5. Deviance and Goodness of Fit for Binary GLMs
6. Probit and Complementery Log-Log Models
7. Examples
8. Exercises

Multinomial Response Models
1. Nominal Responses : Baseline-Category Logit Models

2. Ordinal Responses : Cumulative Logit and Probit Models
3. Examples
4. Exercises

Models for Count Data
1. Poisson GLMSs for Counts and Rates

2. Poisson/Multinomial Models for Contingency Tables
3. Negative Binomial GLMS
4. Example
5. Exercises

Review : The General Linear Models
In a general linear model
yi = —0 + —1 x1i + ... + —p xpi + ‘i (1)
The response yi , i = 1, .., n is modelled by a linear function of

explanatory variables xj , j = 1, .., p plus an error term
General and Linear
I General refers to the dependence on potentially more than one
explanatory variable, v.s the simple linear model
I The model is linear in the parameters:
yi = —0 + —1 x1i + —2 x21 + ‘i (2)
yi = —0 + “1 ”1 x1 + exp(—2 )x2 + ‘i (3)

I but not: yi = —0 + —1 x—1 2 + ‘i
Error Structure
I We assume that the errors ‘i are independent and identically

distributed such that E(‘i ) = 0 and V ar(‘i ) = ‡ 2
I Typically we assume ‘i ≥N (0, ‡ 2 ) as a basis for inference
(t-test on parameters)

Restrictions of Linear Models
Although a very useful framework, there are some situations where

general linear models are not appropriate
I the range of Y is restricted (binary, count)
I the variance of Y depends on the mean
Generalized linear models extend the general linear model
framework to address both of these issues

Generalized Linear Model (GLM)
A generalized linear model is made up of a linear predictor:
÷i = —0 + —1 x1i + ... + —p xpi (4)
and two functions

I a link function that describes how the mean, E(Yi ) = µi ,
depends on the linear predictor g(µi ) = ÷i
I a variance function that describes how the variance,
V ar(Yi ) = „V (µ) where the dispersion parameter „ is a
constant

Components of GLM
Response Yi and independent variables Xi = (x1i , ..., xpi ) for

i = 1, ..., n
1. Random Component: Yi , 1 Æ i Æ n independent with density
from exponential family distribution, i.e
y◊ ≠ b(◊)
f (y; ◊, „) = exp[ + c(y, „)] (5)
a(„)
where „ is a dispersion parameter and functions b(), a() and
c() are known
2. Systematic Component:
÷i (—) = xti — = —0 + —1 x1i + ... + —p xpi (6)
linear predictor,— = (—0 , ..., —p ) regression parameters
3. Parametric Link Component: The link function
g(µi ) = ÷i = xti — combines linear predictor with mean µi of
yi . Canonical link function if ◊ = ÷.
Exponential Family
Most of the commonly used statistical distributions; Normal,

Binomial and Poisson, are members of the exponential family of
distributions whose densities can be written in the form
y◊ ≠ b(◊)
f (y; ◊, „) = exp[ + c(y, „)] (7)
a(„)
where the „ is the dispersion parameter and ◊ is the canonical or

natural parameter.
Often a(„) = 1 and c(yi , „) = c(yi ), giving the natural
exponential family of the form:
f (y; ◊) = h(y)exp[y◊ ≠ b(◊)] (8)

It can be shown that mean and variance in a GLM can be

represented as
µi = E(Yi ) = bÕ (◊i ) (9)
V ar(Yi ) = a(„).b”(◊i ) (10)
V (◊) := b”(◊) is called variance function of the GLM.
Prove: ....

Canonical Link Function
For a glm where the response follows an exponential distribution

we have
÷i = g(µi ) = g((bÕ (◊i )) = —0 + —1 x1i + ... + —p xpi (11)
The link function of a GLM connects the random component and

the linear predictor, that is a GLM states that a linear predictor
q
÷i = pj=1 —j xij to µi by ÷i , for a link function g
The link function g that transforms the mean µi to the natural
parameter ◊i is called canonical link:
g = (bÕ )≠1 =∆ g(µi ) = ◊i = —0 + —1 x1i + ... + —p xpi (12)
This direct relationship equates the natural parameter to the linear

predictor.
Normal General Linear Model as a Special Case
I For the general linear model with ‘≥N (0, ‡ 2 ) we have the
linear predictor
÷i = —0 + —1 x1i + ... + —p xpi (13)
I the link function g(µi ) = µi

I the variance function V (µi ) = 1
Proof:
Yi = xti — + ‘i = µi + ‘i , ‘i ≥ N (0, ‡ 2 )iid, i = 1, .., n (14)
The density of Yi has exponential family form since:

1 1
f (yi , µi , ‡) = Ô exp{≠ 2 (yi ≠ µi )2 } (15)
2fi‡ 2‡

µ2i
yi µi ≠ 2
= exp{ 2 ≠ 1 [ln(2fi‡ 2 )) + yi ]} (16)
‡2 2 ‡2
This implies for ◊i = µi and „ = ‡ 2
µi ◊i 1 y2
b(◊i ) = = , a(„) = ‡ 2 , c(yi , „) = ≠ [ln(2fi„) + i ] (17)
2 2 2 „
We have the identity as link function, i.e. g(µi ) = µi

Some canonical link function :
I log link for the Poisson distribution
I logit link for the Binomial distribution
Proof:

Modelling Binomial Data
I Suppose Yi ≥Binomial(ni , pi ) and we wish to model the

proportions Yi /ni , then
1
E(Yi /ni ) = pi , V ar(Yi /ni ) = pi (1 ≠ pi ) (18)
ni
I Variance function is V (µi ) = µi (1 ≠ µi )
I Link function must map from (0, 1) æ (≠Œ, Œ).
I A common choice is
µi
g(µi ) = logit(µi ) = log( ) (19)
1 ≠ µi

Modelling Poisson Data
I Suppose Yi ≥P oisson(⁄i ) then
E(Yi ) = ⁄i , V ar(Yi ) = ⁄i (20)
I Variance function is V (µi ) = µi

I Link function must map from (0, Œ) æ (≠Œ, Œ).
I A natural choice is
g(µi ) = log(µi ) (21)

Exercise 1
Prove these distribution are belong to exponential family

distribution:
1. Binomial Distribution
2. Poisson Distribution
3. Normal Distribution
4. Gamma Distribution
5. Negative Binomial Distribution

Exercise 2
Data are generated for the exponential distribution with density

f (y) = ⁄exp(≠⁄y), where ⁄, y > 0. The distribution is a member
of the exponential family.
1. Identify the specific form of ◊, „, a(), b() and c() for the
exponential distribution
2. Whats the canonical link and variance function for a GLM
with a response following the exponential distribution
3. Identify a practical difficulty that may arise when using the
canonical link in this instance

Transformation vs. GLM
In some situations a response variable can be transformed to

improve linearity and homogeneity of variance so that a general
linear model can be applied. This approach has some drawbacks
I response variables has changed!
I transformations must simultaneously improve linearity and
homogeneity of variance
I transformations may not be defined on the boundaries of the
sample space

Likelihood and Asymptotic Distributions for GLMs
For n independent observations,the log-likelihood for the sample

y1 , ...., yn is:
n
ÿ n
ÿ n
ÿ yi ◊i ≠ b(◊i ) n
ÿ
L(—) = Li = logf (yi , ◊i , „) = + c(yi , „)
i=1 i=1 i=1
a(„i ) i=1
(22)
The notation L(—) reflect the dependence of ◊ on the model
parameters —.

Likelihood Equations for a GLM

q
I For a GLM ÷i = pj=1 —j xij = g(µi ) with link function g, the
likelihood equations are
n
ˆL(—) ÿ ˆLi
= = 0, ’j (23)
ˆ—j i=1
ˆ—j
I To differentiate the log likelihood (22), we use the chain rule:

ˆLi ˆLi ˆ◊i ˆµi ˆ÷i
= . . . (24)
ˆ—j ˆ◊i ˆµi ˆ÷i ˆ—j
I Since ˆL
ˆ◊i = [yi ≠ b (◊i )]/a(„) and since µi = b (◊i ) and
i Õ Õ
V ar(yi ) = b”(◊i )a(„), then

ˆLi ˆµi
= (yi ≠ µi )/a(„), = b”(◊i ) = V ar(yi )/a(„) (25)
ˆ◊i ˆ◊i
qp
I We also know that ÷i = ˆ÷i
j=1 —j xij , ˆ—j = xij
Finally since ÷i = g(µi ) ˆµ

ˆ÷i depends on the link function for the
i
model, then
ˆLi ˆLi ˆ◊i ˆµi ˆ÷i
= . . . (26)
ˆ—j ˆ◊i ˆµi ˆ÷i ˆ—j
(yi ≠ µi ) a(„) ˆµi (yi ≠ µi )xij ˆµi
= . . .xij = . (27)
a(„) V ar(yi ) ˆ÷i var(yi ) ˆ÷i
Summing over the n obsrevations yields Likelihood Equations for

a GLM:
ˆL(—) ÿ n
(yi ≠ µi )xij ˆµi
= = 0, j = 1, 2, ..., p (28)
ˆ—j i=1
var(yi ) ˆ÷i
qp
where ÷i = j=1 —j xij = g(µi ) for link function g.

I Let V denote the diagonal matrix of variances of the

observations
I Let D denote the diagonal matrix with elements ˆµˆ÷
i
i
I For the GLM ÷ = X—, these likelihood equations have the
form:
X T DV ≠1 (y ≠ µ) = 0 (29)
I Different link function yields different set of equations
I The likelihood equations are nonlinear functions of — that
must be solved iteratively

Asymptotic Distribution of —ˆ for GLM ÷ = X—
I —ˆ has an approximate N [—, ()X T W X)≠1 ] distribution, where

i 2
W is the diagonal matrix with elements wi = ( ˆµˆ÷i ) /V ar(yi )
I The asymptotic covariance matrix i estimated by:
ˆ = X T Ŵ X)≠1
V âr(—) (30)
where Ŵ is W evaluated at —ˆ

Exercise 3
Consider a study intended to investigate race discrimination in

calling fouls by referees in NBA. n black referees and n white
referees where randomly selected (n > 0). For each referee, k foul
calls were randomly selected to count how many calls are given to
black players and how many calls were given to white players.
Therefore, we have the dataset (Yi , Xi ), i = 1, ..., n, where Yi is
the number of foul calls given to black players by ith refree, and Xi
indicates whether the ith referee is black.
1. Please construct a GLM to serve the study goal:
I Specify ◊i , „, b(◊i ), a and c(yi ) which defines the exponential
family of distributions of Yi
I State the canonical link function and variance function
2. Using the canonical link function, suppose we have
◊î = 1.2 ≠ 0.5Xi estimated from the data. Interpret the result

Exercise 4
I You have studied how many emails 20 smart phone users

(Device A and Device B) sent to someone from their mobile
devices during the period of the study.
I You have the average number of the email per day for each
participant, and the Device and DataPlan are treated as
categorical data
I Construct a GLM for testing the hypothesis that the number
of emails sent from mobile devices can be influenced by the
difference of the devices and whether the participants had an
unlimited data plan


Httpsemas2.Ui - Ac.idpluginfile - Php2375826mod Resourcecontent1kuliah1 2 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Httpsemas2.Ui - Ac.idpluginfile - Php2375826mod Resourcecontent1kuliah1 2 PDF

Uploaded by

Copyright:

Available Formats

GENERALIZED LINEAR MODEL

Program Studi S1 Ilmu Aktuaria

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

This course present students with an introduction to the GLM

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

1. Introduction to Generalized Linear Models

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

Text Books and Software

1. An Introduction to GLM (Dobson & Barnett; CRC Press,

Software : R package/ Phyton

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

GLM: Model Fitting and Inference Models

1. Review of Linear Models

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

Models for Binary Data

1. Link Function for Binary Data

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

Multinomial Response Models

1. Nominal Responses : Baseline-Category Logit Models

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

Models for Count Data

1. Poisson GLMSs for Counts and Rates

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

Review : The General Linear Models

In a general linear model

yi = —0 + —1 x1i + ... + —p xpi + ‘i (1)

The response yi , i = 1, .., n is modelled by a linear function of

yi = —0 + —1 x1i + —2 x21 + ‘i (2)

yi = —0 + “1 ”1 x1 + exp(—2 )x2 + ‘i (3)

I We assume that the errors ‘i are independent and identically

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

Restrictions of Linear Models

Although a very useful framework, there are some situations where

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

Generalized Linear Model (GLM)

A generalized linear model is made up of a linear predictor:

÷i = —0 + —1 x1i + ... + —p xpi (4)

and two functions

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

Response Yi and independent variables Xi = (x1i , ..., xpi ) for

Most of the commonly used statistical distributions; Normal,

where the „ is the dispersion parameter and ◊ is the canonical or

f (y; ◊) = h(y)exp[y◊ ≠ b(◊)] (8)

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

It can be shown that mean and variance in a GLM can be

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

Canonical Link Function

For a glm where the response follows an exponential distribution

÷i = g(µi ) = g((bÕ (◊i )) = —0 + —1 x1i + ... + —p xpi (11)

The link function of a GLM connects the random component and

g = (bÕ )≠1 =∆ g(µi ) = ◊i = —0 + —1 x1i + ... + —p xpi (12)

This direct relationship equates the natural parameter to the linear

Normal General Linear Model as a Special Case

÷i = —0 + —1 x1i + ... + —p xpi (13)

I the link function g(µi ) = µi

Yi = xti — + ‘i = µi + ‘i , ‘i ≥ N (0, ‡ 2 )iid, i = 1, .., n (14)

The density of Yi has exponential family form since:

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

We have the identity as link function, i.e. g(µi ) = µi

Dr. Fevi Novkaniza, S.Si, M.Si SCAK603104-MODEL LINIER LANJUT

Modelling Binomial Data

I Suppose Yi ≥Binomial(ni , pi ) and we wish to model the