You are on page 1of 50

Outline The issue Splines Penalized Comparison An extension Software Comments

Smoothing with penalized splines


A brief introduction and an illustrative application

Antonio Gasparrini

Department of Social and Environmental Health Research


London School of Hygiene and Tropical Medicine (LSHTM)

Centre for Statistical Methodology – LSHTM


29 May 2015

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Outline
1 The issue

2 Splines

3 A penalized approach

4 A comparison

5 An extension

6 Software

7 Some comments

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Outline
1 The issue

2 Splines

3 A penalized approach

4 A comparison

5 An extension

6 Software

7 Some comments

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Relationships between x and y

Scatterplot of x and y

15
10
5
y
0
−5

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Non-linearity

Scatterplot of x and y

15
10
5
y
0
−5

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Regression models

A relationship between a predictor x and an outcome y is usually


estimated through regression models, controlling for potential
confounders
In the simple linear case:

P
X
yi = α + f (xi ) + γp zip (1)
p=1

A number of alternative options are available for representing f (x),


describing the relationship as a smooth shape

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Smoothing methods

Parametric
Polynomials, fractional polynomials, regression splines

In between
Penalized splines

Non-parametric
Lowess, kernel, smoothing splines

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Outline
1 The issue

2 Splines

3 A penalized approach

4 A comparison

5 An extension

6 Software

7 Some comments

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Splines: basis representation

A spline is a numeric function composed by piecewise-connected


polynomial functions
The advantage of using regression splines is that f (x) can be
represented in a basis form:

d
X
f (xi ; β) = βj bj (xi ) = xT β (2)
j=1

where bj (x) are a series of d (known) basis transformations of x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Regression splines

This type of splines allow the use of standard estimation


methods, derived by minimizing the usual least square objective:

 2
N
X P
X
y − α − f (x; β) + γp zp  = ||y − α − Xβ − Zγ||2 (3)
i=1 p=1

Several different transformations for bj (x) (e.g. B-splines, natural


splines), determining the mathematical properties

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Graphical representation - I

Knots and splines

1.0
Splines
0.5
0.0

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Graphical representation - II

Splines and coefficients

1.01 −0.49 1.09 4.45 13.29 4.26


1.0
Splines
0.5
0.0

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Graphical representation - III

Sum of linear terms

15
10
Terms
5
0
−5

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Graphical representation - IV

Estimated relationship

15
10
5
y
0
−5

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Graphical representation - V

Estimated and true

15
10
5
y
0
−5

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Problems and limitations

In regression splines, the smoothness of the fitted curve is


determined by:
the degree of the spline
the specific parameterization
the number of knots
the location of knots
No general selection method for number and position of knots

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Number of knots

Degree of smoothness

15 df = 1
df = 3
df = 5
df = 7
10
5
y
0
−5

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Knots location

Best fitting models

15 5k at eq.−spaced val
8k at eq.−spaced per
10
5
y
0
−5

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Outline
1 The issue

2 Splines

3 A penalized approach

4 A comparison

5 An extension

6 Software

7 Some comments

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Generalized additive models

A general framework of smoothing methods is offered by


generalized additive models (GAMs)
GAMs extends traditional GLMs by allowing the linear predictor to
depend linearly on unknown smooth functions. In the linear case:

P
X
yi = α + f (xi ) + f (zip ) (4)
p=1

where f are traditionally represented by non-parametric terms such


as smoothing splines of lowess

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Penalty

The idea is to define a flexible function and control the smoothness


through a penalty term, usually on the second derivative
The objective in (3) is modified to:

 2
N
X P
X Z
y − α − f (x; β) + f (zip ) + λ [f 00 (x)]2 dx (5)
i=1 p=1

with λ as smoothing parameter

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Penalized splines

However, traditional GAMs are limited by complex and


computationally-heavy estimation methods
Penalized splines offer an flexible and efficient version of GAM,
based on low-rank basis transformations
The objective in (5) can be re-written in matrix terms as:

||y − α − Xβ − Zγ||2 + λβ T Sβ (6)

where S is a penalty matrix

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Smoothers
Alternative smoothers available, differing by parameterization and
penalty:
Thin-plate splines
Cubic splines
P-splines
Random-effects
Markov random fields
Soap film smooths
...

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Graphical representation - I
Increasing knots and splines

1.0
Splines

0.5

0.0

0 2 4 6 8 10
x
Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Graphical representation - II

Splines and coefficients

0.39 0.26 1.70 4.52 9.25 7.42


0.79 0.11 1.07 3.25 7.62 9.43 2.19
1.0

1.20 0.11 0.54 2.20 5.88 10.07 4.82


Splines
0.5
0.0

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Graphical representation - III


Sum of linear terms
15

10
Terms

−5

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Estimation

Estimation concerns coefficients of penalized and unpenalized


terms (α, β, γ) and smoothing parameters (λs)
For the former, a penalized iteratively reweighted least squares
(P-IRLS) scheme is used
Estimation of λs is integrated through either outer iteration or
performance iteration, using GCV, UBRE/AIC or REML

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Advantages

Relatively low-rank basis and simplified penalties


Completely parametric form
Number and location of knots not critical
Automatic smoothing selection
Efficient computational methods
Well-grounded theoretical framework

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Outline
1 The issue

2 Splines

3 A penalized approach

4 A comparison

5 An extension

6 Software

7 Some comments

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Comparison
Comparison of smoothing methods

Fractional polynomials
Regression splines
10 Penalized splines

5
y

0 2 4 6 8 10
x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Simulations
Comparing alternative methods:
Fractional polynomials
Regression splines
Penalized splines
Different shapes:
Linear
Decay
Peak
Complex

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Simulated shapes
Linear Decay
10 10
8 8
6 6
4 4
y

y
2 2
0 0
−2 −2
0 2 4 6 8 10 0 2 4 6 8 10
x x

Peak Complex
10 10
8 8
6 6
4 4
y

y
2 2
0 0
−2 −2
0 2 4 6 8 10 0 2 4 6 8 10
x x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Simulation results - I
Fractional polynomials

Linear Decay
10 10
8 8
6 6
4 4
y

y
2 2
0 0
−2 −2
0 2 4 6 8 10 0 2 4 6 8 10
x x

Peak Complex
10 10
8 8
6 6
4 4
y

2 2
0 0
−2 −2
0 2 4 6 8 10 0 2 4 6 8 10
x x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Simulation results - II
Regression splines

Linear Decay
10 10
8 8
6 6
4 4
y

y
2 2
0 0
−2 −2
0 2 4 6 8 10 0 2 4 6 8 10
x x

Peak Complex
10 10
8 8
6 6
4 4
y

2 2
0 0
−2 −2
0 2 4 6 8 10 0 2 4 6 8 10
x x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Simulation results - III


Penalized splines

Linear Decay
10 10
8 8
6 6
4 4
y

y
2 2
0 0
−2 −2
0 2 4 6 8 10 0 2 4 6 8 10
x x

Peak Complex
10 10
8 8
6 6
4 4
y

2 2
0 0
−2 −2
0 2 4 6 8 10 0 2 4 6 8 10
x x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Simulation results - IV
Statistics

Fun (e)df Bias Cov RMSE


Linear GLM 2.21 0.01 0.90 0.82
Linear GAM 1.28 0.00 0.95 0.66
Decay GLM 3.45 0.05 0.87 0.89
Decay GAM 2.61 0.13 0.94 0.76
Peak GLM 4.82 0.05 0.89 0.93
Peak GAM 4.06 0.19 0.94 0.84
Complex GLM 6.65 0.11 0.87 1.02
Comples GAM 5.87 0.31 0.91 0.94

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Outline
1 The issue

2 Splines

3 A penalized approach

4 A comparison

5 An extension

6 Software

7 Some comments

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Distributed lag non-linear models

Statistical tools to model non-linear and lagged dependencies


Defined by a cross-basis function of lagged exposures:
L
X
s(xt−`0 , . . . , xt−` ) = f ·w (xt−` , `) (7)
`=`0

The function is composed of an exposure response function f (x)


and a lag-response function w (`)

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

An example

1.10

1.05
RR

1.00 0

5
0.95
10

g
La
20
15 15
Tem 10
p erat 5
ure 20
(C) 0
Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Tensor product basis

Parameterized by a special tensor product:

s(xt−`0 , . . . , xt−` ) = (1T T


L−`0 +1 At )η = wt η (8)
with
At = (1T T
v` ⊗ Rt ) (C ⊗ 1vx ) (9)

where Rt and C are basis matrices for x and `, respectively

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Penalized DLNMs

Question: what about a penalized version of DLNMs?


Modify objective in (6) to:
    
||y−α−Wη −Zγ||2 +η T λx 1T v` ⊗ S x +λ ` S ` ⊗ 1T
vx η (10)

with λx , λ` and Sx , S` as smoothing parameters and penalty


matrices for each dimension

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Simulated surfaces

Scenario 1 Scenario 2 Scenario 3

0.6 0.5

0.4
0.10
0.4

Outcome
0.3

Outcome
Outcome

0.2
0.05
0.2
0.1

0.00 0.0
0.0

10 10 10
8 0 8 0 8 0
6 10 6 10 6 10
x 4 20 x 4 20 x 4 20
g g g
2 30 La 2 30 La 2 30 La

0 40 0 40 0 40

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Simulation results

Scenario 1 Scenario 2 Scenario 3

GAMps GAMps GAMps

0.5
0.20

0.3

0.3
Outcome

Outcome

Outcome
0.05

0.1

0.1
−0.10

−0.1

−0.1
0 10 20 30 40 0 10 20 30 40 0 10 20 30 40
Lag Lag Lag

GAMps GAMps GAMps

0.5
0.20

0.3

0.3
Outcome

Outcome

Outcome
0.05

0.1

0.1
−0.10

−0.1

−0.1
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
x x x

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

An application

Exposure−lag−response Lag−response for 29C Overall cumulative exposure−response

3.0
1.3

2.5
1.2

2.0
RR

RR
1.2

1.1

1.5
1.1 1.0
RR

1.0
1.0
0
0.9

0.5
0.9 5

10
0 5 10 15 20 0 5 10 15 20 25 30
g

20
La

15
Tem 10
pera Lag Temperature
ture 0 20

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Outline
1 The issue

2 Splines

3 A penalized approach

4 A comparison

5 An extension

6 Software

7 Some comments

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

The R package mgcv

Collection of functions implementing GAMs with penalized splines


Written by Simon Wood, extensively documented
Example of code:
library(mgcv)
model <- gam(y ~ s(x,bs="ps") + z, data, family=gaussian,
method="REML")

The function s determines the spline transformations and penalties

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

The R package dlnm

Collection of functions implementing DLNMs


Example of code:
library(dlnm)
cb <- crossbasis(x,lag=c(10),
argvar=list(fun="bs",degree=2,knots=5,cen=0),
arglag=list(fun="ns",knots=c(3,6),int=F))
model <- glm(y ~ s(x,bs="ps") + z, data, family=gaussian)
pred <- crosspred(cb,model)
plot(pred,"3d",xlab="x",ylab="Lag",zlab="Effect")

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Embedding dlnm and mgcv

Example of code:
library(dlnm) ; library(mgcv)
cb <- crossbasis(x,lag=c(10), argvar=list(fun="ps"),
arglag=list(fun="ps"))
pen <- cbPen(cb)
model <- model <- gam(y ~ cb + z, data, family=gaussian,
method="REML",parapen=list(cb=list(pen)))
pred <- crosspred(cb,model)
plot(pred,"3d",xlab="x",ylab="Lag",zlab="Effect")

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Outline
1 The issue

2 Splines

3 A penalized approach

4 A comparison

5 An extension

6 Software

7 Some comments

Gasparrini A LSHTM
Smoothing with penalized splines
Outline The issue Splines Penalized Comparison An extension Software Comments

Some comments

Penalized splines combine the flexibility of non-parametric


methods with stability and simplicity of parametric smoothers
Based on theoretically-grounded and computationally-efficient
estimators
Well implemented in the package mgcv in R
Research still ongoing

Gasparrini A LSHTM
Smoothing with penalized splines

You might also like