You are on page 1of 9

# Generalized Least Squares

Modeling Spatially
Continuous Data

n First fit this model to the observed data using ordinary least
squares

Lecture 12

## n Then estimate a variogram model using the residuals from

the ordinary least squares estimation

October17, 2002

## n Then compute a covariogram using the variogram model

n From the covariogram estimate the covariance matrix and
refit the model using generalized least squares
Bailey and Gatrell Chapter 5
The validity of the final model depends on:
appropriate trend surface model choice
appropriate variogram model choice

## Want to predict a value for the attribute at location s

Objectives
n Understanding and describing the nature of the spatial
variation in an attribute
Knowledge of trend and covariance structure are
sufficient

## n Prediction of the attribute values at locations where it

has not been sampled
How to use derived model for prediction purposes

## Developing Models for Prediction

Assume we have the model

## Matheron coined the term in honor of D. G. Krige, a South

African mining engineer

Y (s) = x ( s) + U (s)
T

## U(s) as a zero mean process with covariance function C()

From this model can do better than just a
prediction based on mean value

Kriging

= x (s )
T

## Can add a local component to the mean based on knowledge of

the covariance structure and the observed values at sampled
points

## An optimal spatial linear prediction method

Optimal in the sense that it is unbiased and minimizes the
mean squared prediction error
Based primarily on the second order properties of the process Y
Prediction weights are based on the spatial dependence
between observations modeled by the variogram

## This approach is known as kriging

Kriging Models

Simple Kriging
Assumes mean is known and does not need to be estimated

n Simple Kriging
Mean is known and constant throughout the study region
n Ordinary Kriging
Mean is fixed but unknown and needs to be estimated
n Universal Kriging
Mean varies, is an unknown linear combination
of known functions

## n Subtract mean from sample observations to derive a set

of residuals ui u i = y i (s )
n We assume these residuals with zero mean also have
2
a known variance and covariance function C(s)
n Find an estimate u ( s ) for a value u (s ) of the random
variable U(s) at the location s given observed values ui of the
random variable U(s i)at the n sample locations s i
n With an estimate u ( s ) the predicted value of the random
variable y ( s) is derived by adding u ( s ) to the known trend at
the point s

Simple Kriging

n

i =1

## i (s) different weights applied to different locations s

U ( s )as the sum of random variables is a random variable itself

U ( s) = i (s)U (si )

s2

residuals

i=1

2
s1

s0
s6

## s1 ,sn are observed point

locations

s3

3
4

s4

s5
U ( s ) should have a mean value of zero for any set of weights

## since by assumption the mean of U (s) is zero and

weights are constants

Simple Kriging

Simple Kriging

## Assuming U ( s ) and U (s ) both have zero mean the expected

mean square error is:

## Differentiating with respect to (s ) in order to minimize gives

= i ( s) j (s )C( si , s j ) + 2 2 i (s )C (s , s i )
i=1 j =1

i =1

= ( s)C (s ) + 2 (s )c(s )
T

## C is an (n x n) matrix of covariances, between all possible

pairs of the n sample points
C(s) is an (n x 1) column vector of covariances between
the prediction point s and each of the n sample points

( s ) = C 1c( s )
U ( s) = T ( s)U = c T ( s)C 1U
The minimized expected mean square error corresponding to
the choice of weights is

## E ((U (s) U (s)) 2 ) = 2 cT ( s)C 1c (s)

Mean square prediction error or kriging variance

e2

ID
1
2
3
4
5

Distances
1
2
3
4
5

Y
29
73
82
91
22

1
0

Z
82
105
65
52
22

2
49.2
0

122
183
148
160
176

u
-38.37
22.63
-12.37
-0.37
15.63

3
53.9
40.3
0

4
65.2
55.9
15.8
0

Covariances
1
2
3
4
5

5
60.795
97.1
72.09
69.6
0

c(s)

## Simple kriging weights need not sum to 1

6.895
5.032
10.15
8.083
4.079

*
( s)

( s ) = C 1c( s )

0.225
0.066
0.351
0.128
0.107
Sum

2
4.571
20

3
3.970
5.970
20

4
2.828
3.739
12.450
20

5
3.228
1.086
2.300
2.479
20

C-1

## 0.054914 -0.01004 -0.00646 -0.00095 -0.00746

-0.01004 0.056751 -0.01508 0.000168 0.000252
-0.00646 -0.01508 0.087403 -0.05044 -0.00194
-0.00095 0.000168 -0.05044 0.082023 -0.00422
-0.00746 0.000252 -0.00194 -0.00422 0.051936

## Worked Example Simple Kriging

C-1
0.054914 -0.01004 -0.00646 -0.00095 -0.00746
-0.01004 0.056751 -0.01508 0.000168 0.000252
-0.00646 -0.01508 0.087403 -0.05044 -0.00194
-0.00095 0.000168 -0.05044 0.082023 -0.00422
-0.00746 0.000252 -0.00194 -0.00422 0.051936

1
20

0.877

U ( s) = i ( s)U ( si )
i =1

u ( s) = 0.225 * 38. 37 + 0 .066 * 22 .63 + 0.351 * 12 .37 + 0.128 * 0 .37 + 0.107 *15.63

9. 86

## y (s) = 160.37 9.86 150.5

e2 = 2 c T ( s)C 1c (s)
= 20.0-6.92
= 13.08

i(s)
0.225
0.066
0.351
0.128
0.107

## 95 percent confidence interval is y(s ) 1 .96 e

95 percent CI = 150 .5 7. 09

s2
n

U ( s) = i (s)U (si )

s1

i=1

0.066

0.225

s0

0.351

s3

150.5

0.128

s4

0.107

## Optimal local spatial prediction

n First order effects are implicitly estimated as part
of the prediction process
n Prediction of yi occurs in one step using a weighted
linear combination of the observed values yi
n

Y ( s ) = i ( s) Y ( si )
i= 1

s5

## In this case we chose weights so that the mean value of

Y(s) is constrained to be the mean

locations

Ordinary Kriging
The mean of

Ordinary Kriging

Ordinary Kriging

## The mean of Y ( s) will also be

weights sum to one

as long as the

## With this constraint we minimize the mean squared

error between values of Y(s) and Y ( s )
The expected mean square is
E ((Y ( s ) Y ( s )) 2 ) = T ( s ) C ( s ) + 2 2 T ( s )c (s )

E ((Y ( s) Y ( s)) 2 ) = T ( s) C ( s) + 2 2 T ( s )c (s )

## C is an (n x n) matrix of covariances, C(si, sj) between all

possible pairs of the n sample sites and c(s) is an (n x 1)
column vector of covariances, C(s,s i) between the
prediction point s and each of the n sample sites
To carry out optimization with constraints use the method
of Lagrange multiplier v(s)
T ( s)C ( s) + 2 2 T ( s)c ( s) + 2( T 1 1)v ( s)

Ordinary Kriging

Ordinary Kriging

1 =1
These equations are expressed as modified
matrix C+ and vectors + and c+
C (s) + 1v( s) = c (s)
T

C+
C ( s1, s1 )

C ( s n , s1 )

? + (s )

L C ( s1 , sn ) 1 ( s)

O
M
1
M

M
M
( s)
L C ( sn , sn ) 1 n

L
1
0 v ( s)

T ( s)C ( s) + 2 2T (s )c ( s) + 2( T 1 1) v (s )

c+ ( s)

C ( s, s1 )

M
C (s, s )
n

T1 =1
C ( s) + 1v( s) = c( s )
And a solution

+ (s ) = C+ 1c+ (s )

Ordinary Kriging
To obtain the prediction
Solve the equation for

y ( s)

+ (s)

## Extract from this the vector

Then

(s )

y ( s ) = T (s ) y

## yi is the original set of observations

Source

http://www. msu.edu/~ashton/466/2002_notes/3-18-02/ok_ill.html

1
2
3
4

x y
2 4
4 7
8 9
7 4

z
3
4
2
4

## The distance matrix for the 5 data points is:

[1] [2] [3] [4] [5]
[1] 0.000 3.605 7.810 5.000 4.000
[2] 3.605 0.000 4.472 4.243 3.605
[3] 7.810 4.472 0.000 5.099 5.385
[4] 5.000 4.243 5.099 0.000 1.000
[5] 4.000 3.605 5.385 1.000 0.000

## The distance vector between the data points and the

unknown point is:
[1] 3.162 2.236 5.000 2.236 1.414

## Worked Example Ordinary Kriging

The vector of covariances for the points to the unknown point is:
4.061023 5.026350 2.343750 5.026350 5.919616 1.000000
A vector of weights lambda, along with the LaGrange multiplier
in column 6:
[1]
[1] 0.17289193
[2] 0.26523729
[3] 0.05887157
[4] 0.16986833
[5] 0.33313088
[6] -0.13471033

## Worked Example Ordinary Kriging

Variogram model is a spherical model with nugget=2.5, sill=7.5,
range=10.0

3
( h) = C ( h )

3h
h
0 h 10
2.5 + ( 7.5 2.5)

20
2000

( h) = 0
h=0
7.5
otherwise

3h
h3

## C (h) = 7. 5 2.5 + (7 .5 2.5 )

20 2000

The covariance matrix for the data points for the given model
[1]
[1] 7.500000
[2] 3.6200650
[3] 0.5001733
[4] 2.3437500
[5] 3.2400000
[6] 1.0000000

[2]
3.620065
7.500000
2.804380
3.013076
3.620065
1.000000

[3]
[4]
[5]
0.5001733 2.343750 3.240000
2.8043796 3.013076 3.620065
7.5.00000 2.260774 2.027458
2.2607737 7.50000 6.378750
2.0274579 6.378750 7.50000
1.0000000 1.000000 1.000000

[6]
1
1
1
1
1
0

## Worked Example Ordinary Kriging

Calculate the variance as the transpose of c times the inverse
of C times c, subtracted from the sill
5.135453
The kriging standard error is the square root of this:

## Then multiply the weight for each data

point by the attribute value of that point
to determine the ordinary kriging
estimate:

e2 =

2.266154

4.375627

## Ordinary versus Simple Kriging

Primarily a local neighborhood estimator
There is no need to estimate a first-order trend; instead, the
mean is estimated from nearby data only.
Estimates are not as sensitive to non-stationarity (though the
covariance model may be affected, it is not as strongly affected
at short ranges as at long ones).

Universal Kriging

Universal Kriging

## Includes a first order trend component

Y (s ) = 0 + 1 x1( s) + 2 x2 ( s) + L + n xn ( s) + ( s)
n

Y ( s ) = i ( s) Y ( si )
i= 1

## As for ordinary kriging, the weights are chosen to minimize

mean squared error

## Subject to the constraint that Y ( s ) is unbiased for

{ }

that is E Y (s ) = E{(Y (s )}

i =1

=1

## To obtain the weights that minimize the mean square error

subject to this constraint, again use the method of Lagrange
multipliers

## Forms a prediction for y in one step

MSE = E | Y ( s) Y (s) | 2

## Y ( s ) is unbiased if and only if

Y(s)

for all 0 1 2 ,L p

## Taking the derivatives with respect to and v, setting the

expressions to zero and re-arranging terms we get the
kriging equations (using variogram)
n

(s
i =1

i= 1

=1

sk ) + v 0 + v j x j ( sk ) = ( sk s0 ); k = 1, 2L , n
j =1

i k

( s i ) = xk ( s0 ); k = 1,2 L , p

i =1

Universal Kriging

Universal Kriging

In matrix notation
(s1 , s2 )
0

0
( s2 , s1 )

M
M

( sn , s1 ) (s1 , s2 )
1
1

x1 (s1 )
x1 (s 2 )

M
M

x (s )
x p (s2 )
p 1

L ( s1 , s n ) 1 x1 (s1 ) L xp ( s1 ) 1

L (s 2 , sn ) 1 x1 (s 2 ) L x p (s 2 ) 2
O
M
M
M
M
M M

L
0
1 x 1 (s n ) L x p (s n ) n
L
1
0
0
L
0 v0

L x1 ( sn ) 0
0
L
0 v
1

M
M
M
M M

L x p (s n ) 0
0
L
0
vp

(s 1 s0 )

( s2 s0 )

( sn s0 )

x1 (s 0 )

x (s )
p 0

C+
C (s1 , s2 )

C (s , s )
1 2

x1 ( s1 )

x (s )
p 1

L C( s1 , s 2 )
O
M

x1 (s1 ) L
M
O

L C( s1 , s 2 ) x1 (s n )
L x1 (s n )
0
O
M
M
L x p ( sn )
0

L
L
O
L

x p ( s1 )

M
x p (s n )

0
M
0

+ (s )

1 ( s)

M
(s )
n
v1 ( s)

M
v p ( s)

c + ( s)

C ( s, s1)

M
C ( s, s )
1

x1 (s )

M
x (s )
p

solution

+ ( s) = C+1c+ ( s)
MSE

Prediction is
1
+ +

y ( s) = T ( s) y

= c (s )C c ( s)
2
e

T
+

Universal Kriging
n May make more sense to estimate trend explicitly
n Need to estimate trend to derive residuals for variogram
modeling in any case since it is only safe to estimate
variogram model from y when the mean is assumed
constant
n Does not make sense to use it for local neighborhoods