Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building

Flexible Regression - Lecture 6
Marnie McLean
Room 344 Mathematics and Statistics Building,
marnie.mclean@glasgow.ac.uk
October 2018
Chapter 2 part 1- Smoothing Methods
Smoothing methods:
Used for description and/or estimation. Examples are:

loess
local linear regression
smoothing splines
regression splines
For now we are only interested in ‘how’ we do the smoothing?
The selection of smoothing parameters will be considered in the

next section.
Introduction Smoothing Splines Basis functions 2/27

2.5 Splines
Splines
Represent the fit of f (x) as a piecewise polynomial.
Spline functions consist of polynomial segments which are

joined together smoothly at pre-defined subintervals.
The points at which the joins occur are called breakpoints, or

knots, of the spline.

2.5 Splines
Figure : Spline with 6 interior knots

2.5 Splines
The function (and at least its first derivative) are constrained to

be continuous everywhere.
The aim of spline smoothing is to fit a smooth, flexible function

which minimizes the residual sum of squares.

2.5 Splines
Degree of the polynomial

Degree = 0 - a discontinuous step function.
Degree = 1 - a chain of line segments.
For larger values of degree the spline is increasingly smooth..
A cubic spline (degree = 3) is a common choice in many

situations for practical and computational reasons.

2.5 Splines
Number and position of knots
In an (unpenalised) spline the number of knots determines the

level of smoothing.
The more knots used, the more flexible the regression function
can become.
The positioning of the knots can be important, especially when

the number of knots is small.
One approach - use “too many” knots (one knot per observation
in the most extreme case) and use a penalty term to control for
the smoothness (section 2.5.1).

2.5.1 Smoothing Splines
For data (xi , yi ) and the model:
Yi = f (xi ) + i , i = 1, . . . , n
Pn
the solution f̂ (xi ) = yi minimises i=1 (yi − f (xi ))2 .
This is a regression which interpolates the points.

Therefore, to avoid this, a second term is added to the

expression to give:
n
X Z b
2 2
(yi − f (xi )) + λ [f 00 (x)] dx,
i=1 a
λ is a fixed constant,
a ≤ x1 ≤ · · · ≤ xn ≤ b,
and we choose f̂ to minimise this modified least squares

criterion.

Rb
The term λ a
[f 00 (x)]2 dx is referred to as a roughness penalty .
Pn 2
i=1 (yi − f (xi )) - measures closeness to the data
Rb
λ a [f 00 (x)]2 dx - penalizes curvature in the function.
Other choices of roughness penalty can be considered with

penalties on higher order derivatives.
This is the smoothing spline fit.

For values of λ > 0, λ is the smoothing parameter.
Increasing λ penalises fluctuations, and so produces a smoother

curve.
Hence, λ here plays a similar role to the standard deviation and

span used in earlier sections.
The knots are the observed unique x values and λ is used to

control the smoothing.

For this choice of roughness penalty: f̂ , is a natural cubic

spline.
This means that f̂ is a piecewise cubic polynomial in each

interval (xi , xi+1 ) (for ordered xi ).
The functions f̂ , fˆ0 and fˆ00 are continuous.
Cubic smoothing splines are among the most commonly used.

Natural cubic splines
The value of the second and third derivatives of f at the start

and end points a and b are both equal to zero.
This implies that the function is linear beyond the boundary

knots.
Cubic smoothing splines can be fitted using the

smooth.spline function in R.

Figure : Simulated data with a smoothing spline, the smoothing

parameter has been automatically chosen, the dashed line is the
underlying true curve

Figure : Radiocarbon data with a smoothing spline, the smoothing

parameter has been automatically chosen

What choices have to be made?
The main choice to be made is the size of the smoothing

parameter λ for the roughness penalty.
Drawbacks
Since there is a knot at every unique x value, there are as many

parameters are there are observations.
This excessive number of parameters can become very

computationally inefficient, particularly if there are multiple
covariates.

2.5 Splines
(Penalised) regression splines are often used instead, which

are a way to combine splines with computational efficiency.

2.5.2 Basis functions
Regression splines are underpinned by a set of known functions

called basis functions.
Basis functions are another common way to build a smooth

function.
Smooth functions can be approximated using weighted sums of

the individual functions.
The choice of basis system is often dependent on the data (or

type of variable) to which the smooth functions are to be fitted.

For the general model:
Yi = f (xi ) + i , i = 1, . . . , n
a curve estimate can be produced by fitting the regression
Yi = β0 b0 (xi ) + β1 b1 (xi ) + β2 b2 (xi ) + . . . + βp bp (xi ) + εi ,
where the bj are referred to as basis functions .
Therefore,
p
X
f (x) = βj bj (x)
j=0

Examples of basis expansions for simple models:
Simple linear regression
Yi = β0 + β1 xi + εi ,
 
1 x1

 1 x2 

 . . 
design matrix = X = B(x) = B =  

 . . 
 . . 
1 xn
basis functions are b0 (x) = 1, b1 (x) = x

More generally for a polynomial of degree p;
x1p
 
1 x1 . .
 . x2 . .
 x2p 

 . . . . . 
X = B(x) = B =   . . . .

 . 

 . . . . . 
1 xn . . xnp
basis functions are b0 (x) = 1, b1 (x) = x, . . . , bp (x) = xp

A simple linear regression The basis functions for

example simple linear regression 1, x

Truncated power basis (Ruppert et. al 2003)
The following is a simple type of polynomial spline of degree p:

K
X
β0 + β1 x + · · · + βp x p + βpk (x − κk )p+
k=1
where p ≥ 1, κ1 , . . . , κK are the knots and (.)p+ = max {(.)p , 0}.
The model is constructed as a linear combination of basis

functions: 1, x, . . . , xp , (x − κ1 )p+ , . . . (x − κK )p+ .

Truncated power basis
A simple case would be for a linear spline basis with one knot
at x = 0.5 e.g.
Yi = β0 + β1 xi + β11 (xi − 0.5)+ + i , i = 1, . . . , n

where (xi − 0.5)+ is the positive part of the function x − 0.5.
The + sets this function to zero for those values of x where

x − 0.5 is negative.

Yi = β0 + β1 xi + β11 (xi − 0.5)+ + i , i = 1, . . . , n
The basis functions are:
1, x, (x − 0.5)+
This simple example has one knot at 0.5.

Yi = β0 + β1 xi + β11 (xi − 0.5)+ + i , i = 1, . . . , n

 
1 x1 (x1 − 0.5)+

 1 x2 (x2 − 0.5)+ 

 . . . 
X = B(x) = B =  

 . . . 

 . . . 
1 xn (xn − 0.5)+

Figure : Basis functions for Figure : linear spline fitted to

linear spline: 1, x, (x − 0.5)+ simulated data

Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building

Uploaded by

Copyright:

Available Formats

Flexible Regression - Lecture 6

Used for description and/or estimation. Examples are:

For now we are only interested in ‘how’ we do the smoothing?

The selection of smoothing parameters will be considered in the

Introduction Smoothing Splines Basis functions 2/27

Represent the fit of f (x) as a piecewise polynomial.

Spline functions consist of polynomial segments which are

The points at which the joins occur are called breakpoints, or

Introduction Smoothing Splines Basis functions 3/27

Figure : Spline with 6 interior knots

Introduction Smoothing Splines Basis functions 4/27

The function (and at least its first derivative) are constrained to

The aim of spline smoothing is to fit a smooth, flexible function

Introduction Smoothing Splines Basis functions 5/27

Degree of the polynomial

A cubic spline (degree = 3) is a common choice in many

Introduction Smoothing Splines Basis functions 6/27

Number and position of knots

In an (unpenalised) spline the number of knots determines the

The positioning of the knots can be important, especially when

Introduction Smoothing Splines Basis functions 7/27

For data (xi , yi ) and the model:

This is a regression which interpolates the points.

Introduction Smoothing Splines Basis functions 8/27

Therefore, to avoid this, a second term is added to the

and we choose f̂ to minimise this modified least squares

Introduction Smoothing Splines Basis functions 9/27

Other choices of roughness penalty can be considered with

This is the smoothing spline fit.

Introduction Smoothing Splines Basis functions 10/27

For values of λ > 0, λ is the smoothing parameter.

Increasing λ penalises fluctuations, and so produces a smoother

Hence, λ here plays a similar role to the standard deviation and

The knots are the observed unique x values and λ is used to

Introduction Smoothing Splines Basis functions 11/27

For this choice of roughness penalty: f̂ , is a natural cubic

This means that f̂ is a piecewise cubic polynomial in each

The functions f̂ , fˆ0 and fˆ00 are continuous.

Cubic smoothing splines are among the most commonly used.

Introduction Smoothing Splines Basis functions 12/27

Natural cubic splines

The value of the second and third derivatives of f at the start

This implies that the function is linear beyond the boundary

Cubic smoothing splines can be fitted using the

Introduction Smoothing Splines Basis functions 13/27

Figure : Simulated data with a smoothing spline, the smoothing

Introduction Smoothing Splines Basis functions 14/27

Figure : Radiocarbon data with a smoothing spline, the smoothing

Introduction Smoothing Splines Basis functions 15/27

What choices have to be made?

The main choice to be made is the size of the smoothing

Since there is a knot at every unique x value, there are as many

This excessive number of parameters can become very

Introduction Smoothing Splines Basis functions 16/27

(Penalised) regression splines are often used instead, which

Introduction Smoothing Splines Basis functions 17/27

Regression splines are underpinned by a set of known functions

Basis functions are another common way to build a smooth

Smooth functions can be approximated using weighted sums of

The choice of basis system is often dependent on the data (or

Introduction Smoothing Splines Basis functions 18/27

Yi = β0 b0 (xi ) + β1 b1 (xi ) + β2 b2 (xi ) + . . . + βp bp (xi ) + εi ,

where the bj are referred to as basis functions .

Yi = β0 + β1 xi + β11 (xi − 0.5)+ + i , i = 1, . . . , n

Yi = β0 + β1 xi + β11 (xi − 0.5)+ + i , i = 1, . . . , n

Yi = β0 + β1 xi + β11 (xi − 0.5)+ + i , i = 1, . . . , n