Professional Documents
Culture Documents
P"7)A6:*)G)D+**)B*B>*+Q"=#K),1"+0*,)#*D1)1A0,)B"=1AI)H0E=)79)D"+)@*/07B)6=/)E*1)6=)*R1+6)"=*
!"#$%&'()%**+%,-.)/012'34$/05
67$4-(%89/(70:5%;%*)-(')7<-=>
2-.-7:4)-?%"-'$)%8@A'(-$
B*2"8C%D%E%27:/(/A$%F<-(<7-.
@61A*B61046#)8*+0:610"=,)6=/)CB9#*B*=1610"=)"D)C1*+610:*)!0110=E
F*4A=0G7*,)$01A)<"B971610"=6#)H0B7#610"=,I
3A"1")>K)@"A6BB6/)J6AB6=0)"=)O=,9#6,A
!"#$%&'()*+,-#%,-#.*/01%/0*,
Generalized Linear Models (GLMs) play a critical role in fields including
Statistics, Data Science, Machine Learning, and other computational
sciences.
When looking at GLMs from a historical context, there are three important
data-fitting procedures which are closely connected:
Newton-Raphson
Fisher Scoring
CB6E*)>K)?71A"+
2"#34)01%/0*,#*5#6/4)%/014#7+84)0&%9#:0//0,(
;4&<,0=+4>
2?!"#:0//0,(#%#.*-49#/*#3%/%#@#%,#6,/)*-+&/0*,
Let’s first lay the ground-work for how we can think about “fitting a model
to data”, and what we mean by this mathematically.
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
2?2"#;%A9*)#BCD%,>0*,#%,-#2,-#E)-4)#FDD)*C08%/0*,>
Before constructing our first iterative numerical fitting procedure, we first
need to understand the basics of the Taylor Expansion.
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
2?G"#;<)44#6/4)%/014#7+84)0&%9#:0//0,(#H)*&4-+)4>
2.3.1: Newton-Raphson
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
Later in this piece we will see that in the case of GLMs that can be
parameterized in the canonical form, Newton-Raphson and Fisher Scoring
are mathematically equivalent.
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
“well… that’s nice… but what does this have to do with GLMs? Why would I
want my Fisher Scoring algorithm to ‘sort of’ look like the WLS estimator
anyway? What is the point of this???”
If you recall from part I of this series, GLMs were developed as a unifying
theory in the early 1970s. With this theory developed, practitioners wanted
a means of fitting their data to GLM models with computational software
they could run on a computer. Well, the 1970s was still the relatively early
days of computing, in particular scientific computing. There weren’t linear
algebra and numerical libraries at one’s fingertips to use (i.e. numpy with
python). Instead, one had to write all their own libraries from scratch. And
to say computers at the time had little RAM and hard drive space is an
understatement; compared to today, the memory in 1970s era computers
was laughably small. Software had to be written with care, and had to be
highly optimized with respect to memory. Writing software back then,
particularly scientific computing software, was difficult. Very very difficult.
CB6E*)>K)?71A"+
So, with the establishment of GLM theory and the need for software to fit
data to GLMs using Fisher Scoring, practitioners had a thought:
“You know… part of the terms in our Fisher Scoring algorithm look a lot like
the WLS estimator. And we already wrote software that solves for the WLS
estimator, and it seems to work quite well. So… what if instead of writing
Fisher Scoring software completely from scratch, we instead make our Fisher
Scoring software a simple wrapper-function around our WLS software! For
each iterative step of the Fisher Scoring algorithm we can reparametrize our
problem to look like the WLS estimator, and call our WLS software to return
the empirical values.”
CB6E*)>K)?71A"+
Hence, Iteratively Reweighted Least Squares (IRLS) was born. The term
“reweighted” refers to the fact that at each iterative step of the Fisher
Scoring algorithm, we are using a new updated weight matrix.
2?I"#J<*)/#F>0-4#*,#K+%>0L74M/*,#.4/<*->#%,-#N)%-04,/#34>&4,/
Before jumping into the implementation of our three main iterative
numerical fitting procedures for GLMs, I want to make short mention of two
other families of related iterative methods, Quasi-Newton Methods and
Gradient Descent.
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
CB6E*)>K)?71A"+
G"#ED4)%/0*,%90O4#:0//0,(#H)*&4-+)4>#5*)#NP.>
We are now ready to explore how to operationalize Newton-Raphson,
Fisher Scoring, and IRLS for Canonical and Non-Canonical GLMs. Let’s start
with a GLM refresher:
G?!"#NP.#Q45)4><4)
A Generalized Linear Model (GLM) has three main components:
: