You are on page 1of 42

SPLINES AND GENERALIZED

ADDITIVE MODEL (GAM)

By Seif Abdul
Overview
 Introduction
 Simple approaches
 Polynomials
 Step functions (Piecewise)

 Splines
 Regression splines
 Natural splines
 Splines for classification

 Smoothing splines
 Definition
 Computation
 Nonparametric logistic regression

 Generalized Additive models (GAMs)


 Principle
 Fitting GAMs
GAM (introduction)
 GAM is the extension of GLMs

 Developed by Hastle and Tibshirani (1990)

 Useful if relationship between Y and X is non-linear and we


don’t have any theory or any mechanistic model to
suggest a particular function form

 A more flexible approach in which each Y, is linked with X


by a smoothing function instead of the coefficient of β
intro..
 GAMs are data driven rather than model
driven; that is the resulting fitted values do not
come from a priori model

 All of the error families allowed by GLM are


available with GAM (binomial, poisson,
gamma, etc)
GAMs in a nutshell

 GAMs is an extension of GLMs with a smoothing


function.
 Let’s start with an equation for a Gaussian linear
model:
y=β0+x1β1+ε
 What changes in a GAM is the presence of a
smoothing term:
y=β0+f(x1)+ε
SIMPLE APPROACHES IN
GAMs
 Polynomials
 Step functions (piecewise)
Polynomials Regression
 Is a regression in which the relationship between
independent variables (X) and dependent variable
(Y) is modeled as an nth degree polynomial
 This function fits a polynomial regression model to
power a single predictor by method LLS.

Y = b0 + b1X + b2X2 + ... + bkXk + ε


Polynomial in graph (n=2)
polynomial graphs with different degrees
Cont..
Principles of polynomial
regression
 Some general principles are:

 The fitted model is more reliable when it is


built on large numbers of observation.

 Do not extrapolate beyond the limits of


observed values.
Polynomial regression…

Advantages Disadvantage
 Provide the best  Too sensitive to
approximation of outliers
relationship btn X and
Y.

 Can fit a broad range of


function
Step function (Piecewise)
 Non linear model is the most appropriate when a
straight line is not sufficient to model a data.

 Sometimes there might be clear break point


demarcating two different linear relationships.

 Piecewise (step function) regression is a form of


regression that fits the data for different ranges of
X.
Example 1.
 Consider you have random sample of 200 kids and
ask them how old they are and how many minutes
they spend talking on the phone. So you want to
access the relationship between how much a child
talks on the phone and age of the child (use
talk.dta)
twoway (scatter talk age) (lfit talk age)
twoway (scatter talk age) (lfit talk age) (qfit talk age)
Taking into account piecewise regression at
age 14
Piecewise Regression in STATA
 There are several ways which can be used to
perform a piecewise regressions in Stata. The
following are some of the ways of doing it:-
First option (by centering age)
generate age14 = age - 14

. regress talk age14 if age < 14

Source | SS df MS Number of obs = 62


-------------+------------------------------ F( 1, 60) = 3.19
Model | 175.387138 1 175.387138 Prob > F = 0.0791
Residual | 3297.59673 60 54.9599456 R-squared = 0.0505
-------------+------------------------------ Adj R-squared = 0.0347
Total | 3472.98387 61 56.9341618 Root MSE = 7.4135

------------------------------------------------------------------------------
talk | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age14 | .6820981 .3818309 1.79 0.079 -.0816775 1.445874
_cons | 17.62425 1.752455 10.06 0.000 14.11882 21.12968
------------------------------------------------------------------------------
Cont…
regress talk age14 if age >= 14

Source | SS df MS Number of obs = 138


-------------+------------------------------ F( 1, 136) = 144.88
Model | 11570.8699 1 11570.8699 Prob > F = 0.0000
Residual | 10861.5142 136 79.8640747 R-squared = 0.5158
-------------+------------------------------ Adj R-squared = 0.5123
Total | 22432.3841 137 163.74003 Root MSE = 8.9367

------------------------------------------------------------------------------
talk | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age14 | 3.629046 .3014985 12.04 0.000 3.032814 4.225277
_cons | 25.83397 1.626457 15.88 0.000 22.61755 29.05039
------------------------------------------------------------------------------
Combined model, separate slope & intercept

 generate age1 = (age - 14)


 replace age1 = 0 if age >= 14
 generate age2 = (age - 14)
 replace age2 = 0 if age < 14

 generate int1 = 1
 replace int1 = 0 if age >= 14
 generate int2 = 1
 replace int2 = 0 if age < 14
Regress talk int1 int2 age1 age2, hascons

Source | SS df MS Number of obs = 200


-------------+------------------------------ F( 3, 196) = 210.66
Model | 45655.2691 3 15218.423 Prob > F = 0.0000
Residual | 14159.1109 196 72.2403617 R-squared = 0.7633
-------------+------------------------------ Adj R-squared = 0.7597
Total | 59814.38 199 300.574774 Root MSE = 8.4994

------------------------------------------------------------------------------
talk | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
int1 | 17.62425 2.009156 8.77 0.000 13.66191 21.58659
int2 | 25.83397 1.54688 16.70 0.000 22.7833 28.88464
age1 | .6820981 .4377618 1.56 0.121 -.1812301 1.545426
age2 | 3.629046 .2867473 12.66 0.000 3.063539 4.194552
------------------------------------------------------------------------------
Graph of the results
 To test different in intercepts

lincom int2 – int1


( 1) - int1 + int2 = 0

------------------------------------------------------------------------------
talk | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 8.20972 2.535655 3.24 0.001 3.209051 13.21039
------------------------------------------------------------------------------
So as you turn 14 years old the time you talk on the phone jump by 8.2
minutes
 To test for difference in slopes

lincom age2 - age1


( 1) - age1 + age2 = 0
------------------------------------------------------------------------------
talk | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 2.946947 .5233158 5.63 0.000 1.914895 3.979
------------------------------------------------------------------------------

 The slope after 14 yrs is greater by 2.94 is statistically


significant
Alternative combined model
regress talk age14 age2 int2

Source | SS df MS Number of obs = 200


-------------+------------------------------ F( 3, 196) = 210.66
Model | 45655.2691 3 15218.423 Prob > F = 0.0000
Residual | 14159.1109 196 72.2403617 R-squared = 0.7633
-------------+------------------------------ Adj R-squared = 0.7597
Total | 59814.38 199 300.574774 Root MSE = 8.4994

------------------------------------------------------------------------------
talk | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age14 | .6820981 .4377618 1.56 0.121 -.1812301 1.545426
age2 | 2.946947 .5233158 5.63 0.000 1.914895 3.979
int2 | 8.20972 2.535655 3.24 0.001 3.209051 13.21039
_cons | 17.62425 2.009156 8.77 0.000 13.66191 21.58659
------------------------------------------------------------------------------
Explanation of the output
 age14 is the slope when age is less than 14.
 age2 is the change in the slope as a result of becoming
age 14 or higher (as compared to being less than 14).
 _cons is the predicted mean for someone who is just
infinitely close to being 14 years old (but not quite 14).
 int2 is the predicted mean for someone who just turned
14 years old minus the predicted mean for someone
who is infinitely close to being 14 years old
 The coefficients for age2 and int2 now focus on
the change that results from becoming 14 years old.
SPLINES
 This is the technique of fitting smooth curves
through data points.
 These smooth curves are drawn by connecting the
data points (knots)
 Types of splines
 Regression splines
 Natural splines
Regression splines
 In data with high variability polynomials curve
suffer in over-fitting

 Regression spline is best used in this kind of


situation.

 It uses a combination of linear and polynomial


functions to fit the data.
Regression splines graphs
Natural splines
 The behavior of polynomials fit to data tends to be
erratic near the boundaries, and extrapolation can
be dangerous.

 These problems are worsen with splines.

 A natural cubic spline adds additional constraints,


namely that the function is linear beyond the
boundary knots.
Natural splines
 There will be a price paid in bias near the
boundaries, but assuming the function is linear near
the boundaries (where we have less information
anyway) is often considered reasonable.
Example
Smoothing splines
 What do we mean by smoothness?
 Some things are fairly clearly smooth
 Constant
 Straight line
 In smoothing spline what we real want to do is to
eliminate small “wiggles” in the data.
 Smoothing splines helps in interpolation and
extrapolation of the data. (e.g. Straight line in linear
regression)
Example of smoothing
GAMs - Principles
 Regression models play an important role in many data
analyses, providing prediction and classification rules, and
data analytic tools for understanding the importance of
different inputs.

 Traditional linear model often fails in some situations: in real


life, effects are often not linear.

 More automatic flexible statistical methods that may be used


to identify and characterize nonlinear regression effects. These
methods are called generalized additive models (GAMs).
 In the regression setting, a generalized additive
model has the form
 E(Y |X1, X2, . . . , Xp) = α + f1(X1) + f2(X2) + . . .
+ fp(Xp)
 As usual X1, X2, . . . , Xp represent predictors and
Y is the outcome. The fj ’s are unspecified smooth
(nonparametric) functions.
GAM for binary classification
 For two-class classification, recall the logistic regression model for binary data
discussed previously. We relate the mean of the binary response μ(X ) = P(Y =
1|X ) to the predictors via a linear regression model and the logit link function:
log μ(X) =α+β1X1+...+βpXp
1−μ(X)
 The additive logistic regression model replaces each linear term by a more
general functional form
log μ(X) =α+f1(X1)+...+fp(Xp)
1−μ(X)
 where again each fj is an unspecified smooth function.
While the nonparametric form for the functions fj makes the model more
flexible, the additivity is retained and allows us to interpret the model in much
the same way as before
Fitting GAMs

Example in R
Results
“GAM is not a silver bullet but it is
a technique you should add to your
arsenal”

You might also like