You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/226282200

Smoothing Techniques for Visualisation

Chapter · January 2008


DOI: 10.1007/978-3-540-33037-0_20 · Source: OAI

CITATION READS

1 433

4 authors:

Adrian Bowman Chun-houh Chen


University of Glasgow Academia Sinica
149 PUBLICATIONS 9,141 CITATIONS 172 PUBLICATIONS 5,990 CITATIONS

SEE PROFILE SEE PROFILE

Wolfgang Karl Karl Härdle Antony Unwin


Humboldt-Universität zu Berlin Universität Augsburg
952 PUBLICATIONS 25,917 CITATIONS 189 PUBLICATIONS 3,967 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Adrian Bowman on 01 September 2014.

The user has requested enhancement of the downloaded file.


Smoothing techniques for visualisation

Adrian W. Bowman
Dept. of Statistics, University of Glasgow, Glasgow, U.K.
adrian@stats.gla.ac.uk

April 12, 2005

Abstract for the Computational Statistics Handbook on Visualisation.

1 Introduction

In exploring data graphically, principal interest often lies in identifying the nature of the underlying
trends. This is particularly true in a regression setting, where it is the nature of the relationships
between explanatory variables and the mean value of a response which is the focus of attention.
Nonparametric smoothing techniques are extremely useful in this setting, particularly when there
is a large amount of data and/or a substantial amount of variation present with the result that
underlying patterns are obscured in plots of the raw data. This is illustrated in Figure 1 where
the left hand panel plots data on a catch score, representing the abundance of marine life on the
sea bed at various sampling points in a region near the Great Barrier Reef, against longitude. A
clear drop in level is apparent as longitude increases. However, considerably more insight is gained
through the addition of a smooth curve to the plot, estimating the mean value of the response
as a function of longitude. A non-linear pattern emerges, indicating rapid movement between
two different levels, rather than constant linear decline. The aim of the chapter is to discuss this
graphical insight and its uses in exploring regression data. A brief overview of the techniques to
be discussed will be given in Section 1.

2 Smoothing in one dimension

An article on smoothing by Loader is available in the first Computational Statistics Handbook


and this will be heavily referenced here to avoid the need to repeat too many technical details.
However, the basic mechanism of the local linear approach to smoothing will be outlined, together

1
with sufficient technicality to derive standard errors. This will provide the basis of variability bands
to indicate the precision of smooth estimates, and reference bands for natural models, as illustrated
in the middle and right hand panels of Figure 1. This give very helpful graphical representations
of the evidence for or against the suitability of candidate models.

Smoothing techniques for other types of response variables such as binary and survival
data will also be discussed and illustrated in this section.

3 Smoothing in two dimensions

This topic will be introduced by simple extension of the local linear method although other ap-
proaches to the construction of regression surfaces will also be indicated. The addition of further
information to a plot of the estimated surface will again be a major theme. Figure 2 illustrates this
in several ways. The first panel shows a surface which has been painted to indicate the variations
in standard error in different locations. The second panel is painted according to the size of the
standardised difference (m̂ − p̂)/s.e.(m̂ − p̂), to assess graphically the plausibility of a linear model
p̂ in two covariates. This idea can be extended to the comparison of two different surfaces, as
in the final two panels of the Figure, where painting by the values of the standardised distance
(m̂1 − m̂2 )/s.e.(m̂1 − m̂2 ) assesses the evidence for differences between the two underlying surfaces
m1 and m2 . Here the surfaces correspond to two different years of sampling and the painting
indicates that there are only relatively small differences between the catch score patterns in the
two years.

Surfaces are three-dimensional objects and software to display such objects in high quality
is now widely available. The OpenGL system is a good example of this and access to these powerful
facilities is now possible from statistical computing environments such as R. Some illustrations
of lighting, multiple surface display and interactive rotation will be given. The first panel of
Figure 3 gives one example of this, showing a regression surface for the Reef data produced by
two-dimensional smoothing, with additional wire mesh surfaces to define a reference region for a
linear model. The protrusion of the estimated surface through this reference region indicates the
substantial lack-of-fit of the linear model.

2
4 Additive models

Additive models provide a very efficient method of extending the smoothing approach to the
estimation of the effects of larger numbers of simultaneous covariates. The backfitting algorithm
for fitting additive models will be described and illustrated. The constraction of standard errors
will also be discussed. Figure 3 show the results of this in terms of graphical display of the two
components for longitude and latitude, together with a display of the resulting fitted surface.
Cross-validation and other methods of determining the smoothing parameter will be revisited in
this section as a practical issue. The mgcv and gam libraries in R will be highlighted as accessible
means of fitting model such as this.

5 A case study

In this section, an application will be discussed in reasonable detail. This is likely to be from an
environmental setting and several sets of data on water or air quality are available. The emphasis
will be on the use of the univariate, bivariate and additive models discussed in the previous sections
to create graphical insight into the patterns and trends in the data.

6 Discussion

This section will contain further pointers to the literature to guide further reading. Reference will
also be made to the web site associated with the e-book version of the Handbook, to indicate
the material which is available by that route. I would intend this to be mostly R software, using
appropriate libraries to allow the reader to experiment with different datasets, including those used
to illustrate the Handbook article.

References

A good selection of references to the literature will be provided.

3
0.0 0.5 1.0 1.5 2.0

0.0 0.5 1.0 1.5 2.0

0.0 0.5 1.0 1.5 2.0


Catch score

Catch score

Catch score
142.8 143.2 143.6 142.8 143.2 143.6 142.8 143.2 143.6

Longitude Longitude Longitude

Figure 1: The left panel shows data on catch score and longitude from the Great Barrier Reef.
The middle panel adds a smooth curve as an estimate of the underlying regression function, with
variability bands to indicate the precision of estimation. The right panel shows a reference band
which indicates where a smooth curve is likely to lie if the underlying relationship is linear.

2 2 2 2
Catch score

Catch score

Catch score

Catch score
1 1 1 1

0 0 0 0

−1 −1 −1 −1
−11.3 −11.3 −11.3 −11.3
143.0 −11.4 143.0 −11.4 143.0 −11.4 143.0 −11.4
143.2 −11.5 re 143.2 −11.5 re 143.2 −11.5 re 143.2 −11.5 re
Lo 143.4 −11.6 sco Lo 143.4 −11.6 sco Lo 143.4 −11.6 sco Lo 143.4 −11.6 sco
ng h ng h ng h ng h
itu 143.6 −11.7 atc itu 143.6 −11.7 atc itu 143.6 −11.7 atc itu 143.6 −11.7 atc
de C de C de C de C
143.8 −11.8 143.8 −11.8 143.8 −11.8 143.8 −11.8

Figure 2: The first panel shows a smooth surface to illustrate an estimate of the regression function
of catch score on latitude and longitude simultaneously. The surface is painted to add information
on the relative size of the standard error of estimation across the surface. The second panel
is painted to indicate the size of the standardised difference between the estimate and a linear
regression functions. The final two panels show the standard differences between two estimated
surfaces corresponding to different different years of data collection.
0.5

0.5

1.5
linear predictor

1.0
0.0

0.0
s(Longitude,4.52)

s(Latitude,1)

0.5
−0.5

−0.5

0.0
−1.0

−1.0

143.0
143.2
Lon 143.4 −11.2
gitu 143.6 −11.4
−1.5

−1.5

de 143.8 −11.6
−11.8 Latitude
142.8 143.0 143.2 143.4 143.6 143.8 −11.8 −11.6 −11.4 −11.2

Longitude Latitude

Figure 3: The first panel shows a regression surface for the Reef data, relating the mean catch
score to latitude and longitude. The upper and lower wire mesh surfaces indicate the region within
which a smooth surface is likely to lie if the true regression surface is linear. The next two panels
show the estimated components for an additive model for the Reef data while the final panel shows
the fitted surface from this additive model.

View publication stats

You might also like