You are on page 1of 7

9.

Interpolation
Spatial Interpolation
Interpolation is the method of predicting the value of attributes at unsampled sites from measurements made at known
point locations within the same space. The rationale behind interpolation is that the values at points that lie close
together in the space are more likely to be similar than that of those which lie apart.

Figure 1: Interpolation

Interpolation finds its use in many areas such as estimation of rainfall and temperature at places where no weather
stations are available; contouring, where it is necessary to know where to place contours in between measured
locations; resampling a raster data, where raster is changed from one grid to another
An interpolation method which predicts the value of an attribute at sampled location equal to the value measured at that
location is called an exact interpolator. The rest of the methods are termed as inexact interpolators. The statistics of
difference (absolute or squared difference between the measured and predicted values at data points)
an indicator of the quality of the inexact interpolators.
There are two methods of interpolation:

Global method
Local method

is

Global methods make predictions using the entire dataset. On the other hand local methods make predictions using the
measured points within neighborhoods (smaller spatial areas within the larger study area). Global methods usually
create smooth surfaces, but they can be very sensitive to outliers (values that are significantly different as compared to
the other values). Compared with global methods, local methods yield less smooth surfaces and are not sensitive to
outliers as their effects do not influence the entire interpolated surface, but only local regions/parts of the interpolated
surface.

Global Method: Trend Surface Analysis


When an attribute varies continuously over a landscape, it can be modeled using a mathematical surface. A polynomial
equation is fitted over the observations made at the sampled points. The value of the attribute at unsampled locations is
calculated from the coordinates of this equation.
A simple way to model long range spatial variation is to use regression analysis. A multiple regression of attribute values versus
geographic locations is carried out where geographic locations are the independent variables, and attribute value is the
dependent variable which is normally distributed. Through this we try to fit a polynomial equation or surface by least square
through data points so as to minimize the sum of squares
Consider the values of an attribute z which is measured along a transect at points x , x , x . If the value of z increases linearly
with location, x, its long range variation can be approximated as:
1

Where
b and b are the polynomial coefficients known as intercept and slope respectively.
0

is the residual which is normally distributed and independent of the x values.


Sometimes z is not a simple linear function of x but may vary in a complicated higher order polynomial such as

In two dimension the multiple regression on x and y derives the polynomial surface of the following form

The integer p is the order of the trend surface .


Commonly used polynomials are :
Linear b + b + b
0

1x

2y

Figure 2: Trend surface Analysis


(Figure adapted from Lo & Yeung Concepts & techniques in geographical information system)

Interpolation using Geostatistics


Kriging is a statistical model based method. The statistical methods do not predict values perfectly. Probability has to be
associated with prediction, and as one predicts one has to assess the error of prediction as well.
Kriging relies on autocorrelation. Correlation is the tendency of two different variables to be related. The term autocorrelation
means self correlation or having correlation within itself. For example, the values of soil moisture in two consecutive days tend
to be more similar than the values on the days that lie a month apart. The rate at which the correlation decays can be
expressed as a function of distance. Autocorrelation is a function of distance and is a key feature of geostatistics. On the other
hand, in classical statistics observations are assumed to be independent which means no correlation exists between
observations.
The geostatistical data have trends which are expressed using the following formula:

Where Z(s) is the variable of interest,


is a deterministic trend and
is the random autocorrelated error. The symbol s
indicates the location (longitude, latitude). Few assumptions are made about the error term which are:

It is expected to be zero ( on average)


The autocorrelation between the two error terms (s) and (s + h) at two different locations, separated by a distance h
depend on the displacement h between the two and not on the actual location s

Kriging does not require the data to be normally distributed. Transformations and trend removal can help justify assumptions of
normality and stationarity. Prediction, using ordinary, simple, and universal kriging for general Box-Cox transformations and
arcsine transformations, is called transgaussian kriging. Prediction for log transformation is called lognormal kriging.

Semivariogram

Semi-variance is half the difference squared between the values of the paired locations separated by a distance h. This plot of
semi-variance with half the squared difference on the y-axis and the distance that separates the locations on the x-axis is called
semivariogram cloud. Plotting all the pairs becomes unmanageable therefore the pairs are grouped into lag bins and the
process of grouping is called binning. Binning form the pair of points and then group the pairs so that they have a common
distance and direction. For each bin, only the average distance and semi-variance for all the pairs in that bin are plotted as a
single point on the empirical semivariogram cloud. The plotted semivariogram is then fitted with a curve known as a
semivariogram model. The distance at which the curve flattens out is called the range. For distances greater than the range, it
is assumed that there exist no autocorrelation between the points. The value (on the y axis) the semivariogram attains at range
is known as sill. The intercept made by the semivariogram model on y axis of the semivariogram is called the nugget.

Figure 3: Semivariogram

Local Method: Thiessen Polygons


Thiessen polygon method assumes that the values at unsampled locations are equal to those of the nearest points. The space is
divided into polygons with each polygon containing a single point. The polygons are generated in such a manner that any
location inside a polygon is closer to the point inside that polygon than any of the other sample points. The size of the polygons
varies inversely with point density.
How are Thiessen polygons created?
Connect all the points in sequential order with dashed lines. Draw perpendicular bisectors of the dashed lines. Draw these
perpendicular bisectors (solid lines) until they merge to create polygons. Erase points and the dashed lines. Build polygon
attribute table.

Point-ID

Attribute

Area

Poly-ID

Attribute

10

200

10

20

550

20

30

475

30

40

525

40

Input Point Theme

Output Polygon Theme

The major disadvantage of this method is that the attribute inside a polygon is assumed to have homogenous value which
changes only at the boundary. Since a polygon has a single point inside it, the variation within a polygon cannot be estimated.

Figure 4: Thiessen Polygons

Inverse Distance Weighting (IDW)


This method works on the rationale that things which are close to each other are more similar than those which are far apart.
For predicting the value at an unknown location known values at surrounding locations are used. The closely lying points have a
greater influence on the predicted value than those which lie farther away. IDW assumes that each measured point has a local
influence that diminishes with distance hence it is known as inverse distance weighting. The points closer to the prediction
location are given higher weights.
The weights are proportional to the inverse of the distance raised to the power p.

As the distance increases the weight decreases. The rate at which the weights decrease depends on the value of p. If p is zero
there is no decrease with distance, and because each weight is the same the predicted value is the average of all the data
values in the search neighborhood. As p increases, the weights for distant points decrease rapidly.
An approximately correct value of p can be determined by minimizing the root mean square prediction error (RMSPE), a
statistic which quantifies the error of the prediction surface and is calculated by cross validation. The process of cross validation
involves removal of one or more data locations and predicts their associated data using the data at the rest of the locations. In
this way, one can compare the predicted value to the observed value and obtain useful information about the quality of the
used model.
Splines
Spline functions are piece-wise functions, which are fitted to a small number of data points exactly and at the same time it is
ensured that the joins between one part of the curve and another are continuous. Unlike trend surface, it is possible to modify a
part of the curve without having to recompute the whole, and hence is an advantage.
A piecewise polynomial function is written as:

The points x ,.x are called break points. They divide an interval into k sub-intervals. The points of the curve at these values
of x are called knots. The functions P (x) are polynomials of degree m or less. The term r denotes the constraints on the spline.
i

k-1

At r = 0, function has no constraints


r = 1, function is continuous with no constraints on its derivatives
r = m+1, the interval x , x can be represented by a single polynomial
0

r = m is the maximum number of constraints for a piecewise solution.


For m = 1, 2 and 3 the splines are called linear, quadratic and cubic. The derivatives are the order of m-1 so that a quadratic
spline must have one continuous derivative at each knot and a cubic spline must have two continuous derivatives at each knot.
There are difficulties in calculating simple splines over a range of separate sub-intervals therefore most practical applications
use a special kind of spline known as B-Spline. These are sum of other splines that have the value of zero outside the interval
of interest and so allow local fitting from low order polynomials. B-splines are used for smoothing the digitized lines such as
boundaries on soil and geological maps.

References:
Lo, C.P. & Yeung, A. 2009, Concepts and techniques of Geographic Information Systems, PHI Learning Private Limited, New
Delhi.
Goodchild, M.F., Longley, P.A., Maguire, D. J. & Rhind, D.W 2001, Geographic information systems and science, John Wiley &
Sons Ltd. , England.

You might also like