You are on page 1of 1

H0E=)0= ;*1),16+1*/

!"##"$ %&'()!"##"$*+, - ./01"+,2)3045, !*617+*, 8**9)80:*, ;+"$ <"=1+0>71* ?>"71

P"7)A6:*)G)D+**)B*B>*+Q"=#K),1"+0*,)#*D1)1A0,)B"=1AI)H0E=)79)D"+)@*/07B)6=/)E*1)6=)*R1+6)"=*

!"#$%&'()%**+%,-.)/012'34$/05
67$4-(%89/(70:5%;%*)-(')7<-=>
2-.-7:4)-?%"-'$)%8@A'(-$
B*2"8C%D%E%27:/(/A$%F<-(<7-.
@61A*B61046#)8*+0:610"=,)6=/)CB9#*B*=1610"=)"D)C1*+610:*)!0110=E
F*4A=0G7*,)$01A)<"B971610"=6#)H0B7#610"=,I

?=/+*$)J"1AB6= @6K)L - MN)B0=)+*6/

3A"1")>K)@"A6BB6/)J6AB6=0)"=)O=,9#6,A

!"#$%&'()*+,-#%,-#.*/01%/0*,
Generalized Linear Models (GLMs) play a critical role in fields including
Statistics, Data Science, Machine Learning, and other computational
sciences.

In Part I of this Series, we provided a thorough mathematical overview


(with proofs) of common GLMs both in Canonical and Non-Canonical
forms. The next problem to tackle is, how do we actually fit data to GLM
models?

When looking at GLMs from a historical context, there are three important
data-fitting procedures which are closely connected:

Newton-Raphson

Fisher Scoring

Iteratively Reweighted Least Squares (IRLS)

I have found the relationships and motivations of these techniques is often


poorly understood, with the terms above sometimes used interchangeably
in an incorrect manner. This piece provides a rigorous overview of these
three important iterative numerical fitting procedures, a discussion of the
history connecting them, and a detailed computational simulation
implementing these methods on example Canonical and Non-Canonical
GLMs. In a future piece we will cover derivations of Neural Networks
through the lens of multi-stage recursive GLM models.

The Table of Contents for this piece are as follows:

CB6E*)>K)?71A"+

With that said, let’s jump in!

2"#34)01%/0*,#*5#6/4)%/014#7+84)0&%9#:0//0,(
;4&<,0=+4>
2?!"#:0//0,(#%#.*-49#/*#3%/%#@#%,#6,/)*-+&/0*,
Let’s first lay the ground-work for how we can think about “fitting a model
to data”, and what we mean by this mathematically.

CB6E*)>K)?71A"+

CB6E*)>K)?71A"+

Before constructing our first iterative numerical fitting procedure, we need


to first take a detour through the Taylor Expansion.

2?2"#;%A9*)#BCD%,>0*,#%,-#2,-#E)-4)#FDD)*C08%/0*,>
Before constructing our first iterative numerical fitting procedure, we first
need to understand the basics of the Taylor Expansion.

CB6E*)>K)?71A"+

CB6E*)>K)?71A"+

We are now ready to construct our three iterative numerical fitting


procedures, starting with Newton-Raphson.

2?G"#;<)44#6/4)%/014#7+84)0&%9#:0//0,(#H)*&4-+)4>
2.3.1: Newton-Raphson

CB6E*)>K)?71A"+

CB6E*)>K)?71A"+

CB6E*)>K)?71A"+

CB6E*)>K)?71A"+

We are now ready to discuss our second iterative numerical fitting


procedure Fisher Scoring, and it’s connections with Newton-Raphson.

2.3.2: Fisher Scoring

Applying Newton Raphson to our example from section 2.1, we have:

CB6E*)>K)?71A"+

Is there a means to ease this computational burden? This is where Fisher


Scoring comes into play.

CB6E*)>K)?71A"+

CB6E*)>K)?71A"+

Later in this piece we will see that in the case of GLMs that can be
parameterized in the canonical form, Newton-Raphson and Fisher Scoring
are mathematically equivalent.

Having discussed Newton-Raphson and Fisher Scoring, we’re ready to


discuss our last iterative numerical fitting procedure Iteratively Reweighted
Least Squares (IRLS).

2.3.3: Iteratively Reweighted Least Squares (IRLS)

To understand our last iterative numerical fitting procedure Iteratively


Reweighted Least Squares (IRLS) and it’s relation to Fisher Scoring, we
need a quick refresher on the Weighted Least Squares (WLS) estimator.

CB6E*)>K)?71A"+

(For a more in-depth derivations of the Weighted Least Squares (WLS),


Generalized Least Squares (GLS), and Ordinary Least Squares (OLS)
estimators, see my previous pieces)

CB6E*)>K)?71A"+

You may be asking…

“well… that’s nice… but what does this have to do with GLMs? Why would I
want my Fisher Scoring algorithm to ‘sort of’ look like the WLS estimator
anyway? What is the point of this???”

To appreciate the connection between Fisher Scoring and WLS, we need


some historical context on GLMs.

If you recall from part I of this series, GLMs were developed as a unifying
theory in the early 1970s. With this theory developed, practitioners wanted
a means of fitting their data to GLM models with computational software
they could run on a computer. Well, the 1970s was still the relatively early
days of computing, in particular scientific computing. There weren’t linear
algebra and numerical libraries at one’s fingertips to use (i.e. numpy with
python). Instead, one had to write all their own libraries from scratch. And
to say computers at the time had little RAM and hard drive space is an
understatement; compared to today, the memory in 1970s era computers
was laughably small. Software had to be written with care, and had to be
highly optimized with respect to memory. Writing software back then,
particularly scientific computing software, was difficult. Very very difficult.

However, prior to the publishing of GLM theory, researchers and


practitioners in this space had already written computer software to fit the
Weighted Least Squares Estimator to data. In other words, they already
wrote software to recover an empirical estimate of the following estimator:

CB6E*)>K)?71A"+

So, with the establishment of GLM theory and the need for software to fit
data to GLMs using Fisher Scoring, practitioners had a thought:

“You know… part of the terms in our Fisher Scoring algorithm look a lot like
the WLS estimator. And we already wrote software that solves for the WLS
estimator, and it seems to work quite well. So… what if instead of writing
Fisher Scoring software completely from scratch, we instead make our Fisher
Scoring software a simple wrapper-function around our WLS software! For
each iterative step of the Fisher Scoring algorithm we can reparametrize our
problem to look like the WLS estimator, and call our WLS software to return
the empirical values.”

In other words, at each iterative step of our Fisher Scoring algorithm, we


would like to reparametrize the following:

CB6E*)>K)?71A"+

Hence, Iteratively Reweighted Least Squares (IRLS) was born. The term
“reweighted” refers to the fact that at each iterative step of the Fisher
Scoring algorithm, we are using a new updated weight matrix.

In section 3, we will show how to operationalize Newton-Raphson, Fisher


Scoring, and IRLS for Canonical and Non-Canonical GLMs with
computational examples. However first, a short aside on Quasi-Newton
Methods and Gradient Descent.

2?I"#J<*)/#F>0-4#*,#K+%>0L74M/*,#.4/<*->#%,-#N)%-04,/#34>&4,/
Before jumping into the implementation of our three main iterative
numerical fitting procedures for GLMs, I want to make short mention of two
other families of related iterative methods, Quasi-Newton Methods and
Gradient Descent.

CB6E*)>K)?71A"+

CB6E*)>K)?71A"+

CB6E*)>K)?71A"+

CB6E*)>K)?71A"+

CB6E*)>K)?71A"+

G"#ED4)%/0*,%90O4#:0//0,(#H)*&4-+)4>#5*)#NP.>
We are now ready to explore how to operationalize Newton-Raphson,
Fisher Scoring, and IRLS for Canonical and Non-Canonical GLMs. Let’s start
with a GLM refresher:

G?!"#NP.#Q45)4><4)
A Generalized Linear Model (GLM) has three main components:
:

You might also like