You are on page 1of 9

# Generalized Least Squares

Modeling Spatially
Continuous Data

n First fit this model to the observed data using ordinary least
squares

Lecture 12

## n Then estimate a variogram model using the residuals from

the ordinary least squares estimation

October17, 2002

## n Then compute a covariogram using the variogram model

n From the covariogram estimate the covariance matrix and
refit the model using generalized least squares
Bailey and Gatrell Chapter 5
The validity of the final model depends on:
appropriate trend surface model choice
appropriate variogram model choice

## Want to predict a value for the attribute at location s

Objectives
n Understanding and describing the nature of the spatial
variation in an attribute
Knowledge of trend and covariance structure are
sufficient

## n Prediction of the attribute values at locations where it

has not been sampled
How to use derived model for prediction purposes

## Developing Models for Prediction

Assume we have the model

## Matheron coined the term in honor of D. G. Krige, a South

African mining engineer

Y (s) = x ( s) + U (s)
T

## U(s) as a zero mean process with covariance function C()

From this model can do better than just a
prediction based on mean value

Kriging

= x (s )
T

## Can add a local component to the mean based on knowledge of

the covariance structure and the observed values at sampled
points

## An optimal spatial linear prediction method

Optimal in the sense that it is unbiased and minimizes the
mean squared prediction error
Based primarily on the second order properties of the process Y
Prediction weights are based on the spatial dependence
between observations modeled by the variogram

## This approach is known as kriging

Kriging Models

Simple Kriging
Assumes mean is known and does not need to be estimated

n Simple Kriging
Mean is known and constant throughout the study region
n Ordinary Kriging
Mean is fixed but unknown and needs to be estimated
n Universal Kriging
Mean varies, is an unknown linear combination
of known functions

## n Subtract mean from sample observations to derive a set

of residuals ui u i = y i (s )
n We assume these residuals with zero mean also have
2
a known variance and covariance function C(s)
n Find an estimate u ( s ) for a value u (s ) of the random
variable U(s) at the location s given observed values ui of the
random variable U(s i)at the n sample locations s i
n With an estimate u ( s ) the predicted value of the random
variable y ( s) is derived by adding u ( s ) to the known trend at
the point s

Simple Kriging

n

i =1

## i (s) different weights applied to different locations s

U ( s )as the sum of random variables is a random variable itself

U ( s) = i (s)U (si )

s2

residuals

i=1

2
s1

s0
s6

## s1 ,sn are observed point

locations

s3

3
4

s4

s5
U ( s ) should have a mean value of zero for any set of weights

## since by assumption the mean of U (s) is zero and

weights are constants

Simple Kriging

Simple Kriging

## Assuming U ( s ) and U (s ) both have zero mean the expected

mean square error is:

## Differentiating with respect to (s ) in order to minimize gives

= i ( s) j (s )C( si , s j ) + 2 2 i (s )C (s , s i )
i=1 j =1

i =1

= ( s)C (s ) + 2 (s )c(s )
T

## C is an (n x n) matrix of covariances, between all possible

pairs of the n sample points
C(s) is an (n x 1) column vector of covariances between
the prediction point s and each of the n sample points

( s ) = C 1c( s )
U ( s) = T ( s)U = c T ( s)C 1U
The minimized expected mean square error corresponding to
the choice of weights is

## E ((U (s) U (s)) 2 ) = 2 cT ( s)C 1c (s)

Mean square prediction error or kriging variance

e2

ID
1
2
3
4
5

Distances
1
2
3
4
5

Y
29
73
82
91
22

1
0

Z
82
105
65
52
22

2
49.2
0

122
183
148
160
176

u
-38.37
22.63
-12.37
-0.37
15.63

3
53.9
40.3
0

4
65.2
55.9
15.8
0

Covariances
1
2
3
4
5

5
60.795
97.1
72.09
69.6
0

c(s)

## Simple kriging weights need not sum to 1

6.895
5.032
10.15
8.083
4.079

*
( s)

( s ) = C 1c( s )

0.225
0.066
0.351
0.128
0.107
Sum

2
4.571
20

3
3.970
5.970
20

4
2.828
3.739
12.450
20

5
3.228
1.086
2.300
2.479
20

C-1

## 0.054914 -0.01004 -0.00646 -0.00095 -0.00746

-0.01004 0.056751 -0.01508 0.000168 0.000252
-0.00646 -0.01508 0.087403 -0.05044 -0.00194
-0.00095 0.000168 -0.05044 0.082023 -0.00422
-0.00746 0.000252 -0.00194 -0.00422 0.051936

## Worked Example Simple Kriging

C-1
0.054914 -0.01004 -0.00646 -0.00095 -0.00746
-0.01004 0.056751 -0.01508 0.000168 0.000252
-0.00646 -0.01508 0.087403 -0.05044 -0.00194
-0.00095 0.000168 -0.05044 0.082023 -0.00422
-0.00746 0.000252 -0.00194 -0.00422 0.051936

1
20

0.877

U ( s) = i ( s)U ( si )
i =1

u ( s) = 0.225 * 38. 37 + 0 .066 * 22 .63 + 0.351 * 12 .37 + 0.128 * 0 .37 + 0.107 *15.63

9. 86

## y (s) = 160.37 9.86 150.5

e2 = 2 c T ( s)C 1c (s)
= 20.0-6.92
= 13.08

i(s)
0.225
0.066
0.351
0.128
0.107

## 95 percent confidence interval is y(s ) 1 .96 e

95 percent CI = 150 .5 7. 09

s2
n

U ( s) = i (s)U (si )

s1

i=1

0.066

0.225

s0

0.351

s3

150.5

0.128

s4

0.107

## Optimal local spatial prediction

n First order effects are implicitly estimated as part
of the prediction process
n Prediction of yi occurs in one step using a weighted
linear combination of the observed values yi
n

Y ( s ) = i ( s) Y ( si )
i= 1

s5

## In this case we chose weights so that the mean value of

Y(s) is constrained to be the mean

locations

Ordinary Kriging
The mean of

Ordinary Kriging

Ordinary Kriging

## The mean of Y ( s) will also be

weights sum to one

as long as the

## With this constraint we minimize the mean squared

error between values of Y(s) and Y ( s )
The expected mean square is
E ((Y ( s ) Y ( s )) 2 ) = T ( s ) C ( s ) + 2 2 T ( s )c (s )

E ((Y ( s) Y ( s)) 2 ) = T ( s) C ( s) + 2 2 T ( s )c (s )

## C is an (n x n) matrix of covariances, C(si, sj) between all

possible pairs of the n sample sites and c(s) is an (n x 1)
column vector of covariances, C(s,s i) between the
prediction point s and each of the n sample sites
To carry out optimization with constraints use the method
of Lagrange multiplier v(s)
T ( s)C ( s) + 2 2 T ( s)c ( s) + 2( T 1 1)v ( s)

Ordinary Kriging

Ordinary Kriging

1 =1
These equations are expressed as modified
matrix C+ and vectors + and c+
C (s) + 1v( s) = c (s)
T

C+
C ( s1, s1 )

C ( s n , s1 )

? + (s )

L C ( s1 , sn ) 1 ( s)

O
M
1
M

M
M
( s)
L C ( sn , sn ) 1 n

L
1
0 v ( s)

T ( s)C ( s) + 2 2T (s )c ( s) + 2( T 1 1) v (s )

c+ ( s)

C ( s, s1 )

M
C (s, s )
n

T1 =1
C ( s) + 1v( s) = c( s )
And a solution

+ (s ) = C+ 1c+ (s )

Ordinary Kriging
To obtain the prediction
Solve the equation for

y ( s)

+ (s)

## Extract from this the vector

Then

(s )

y ( s ) = T (s ) y

## yi is the original set of observations

Source

http://www. msu.edu/~ashton/466/2002_notes/3-18-02/ok_ill.html

1
2
3
4

x y
2 4
4 7
8 9
7 4

z
3
4
2
4

## The distance matrix for the 5 data points is:

    
 0.000 3.605 7.810 5.000 4.000
 3.605 0.000 4.472 4.243 3.605
 7.810 4.472 0.000 5.099 5.385
 5.000 4.243 5.099 0.000 1.000
 4.000 3.605 5.385 1.000 0.000

## The distance vector between the data points and the

unknown point is:
 3.162 2.236 5.000 2.236 1.414

## Worked Example Ordinary Kriging

The vector of covariances for the points to the unknown point is:
4.061023 5.026350 2.343750 5.026350 5.919616 1.000000
A vector of weights lambda, along with the LaGrange multiplier
in column 6:

 0.17289193
 0.26523729
 0.05887157
 0.16986833
 0.33313088
 -0.13471033

## Worked Example Ordinary Kriging

Variogram model is a spherical model with nugget=2.5, sill=7.5,
range=10.0

3
( h) = C ( h )

3h
h
0 h 10
2.5 + ( 7.5 2.5)

20
2000

( h) = 0
h=0
7.5
otherwise

3h
h3

## C (h) = 7. 5 2.5 + (7 .5 2.5 )

20 2000

The covariance matrix for the data points for the given model

 7.500000
 3.6200650
 0.5001733
 2.3437500
 3.2400000
 1.0000000


3.620065
7.500000
2.804380
3.013076
3.620065
1.000000




0.5001733 2.343750 3.240000
2.8043796 3.013076 3.620065
7.5.00000 2.260774 2.027458
2.2607737 7.50000 6.378750
2.0274579 6.378750 7.50000
1.0000000 1.000000 1.000000


1
1
1
1
1
0

## Worked Example Ordinary Kriging

Calculate the variance as the transpose of c times the inverse
of C times c, subtracted from the sill
5.135453
The kriging standard error is the square root of this:

## Then multiply the weight for each data

point by the attribute value of that point
to determine the ordinary kriging
estimate:

e2 =

2.266154

4.375627

## Ordinary versus Simple Kriging

Primarily a local neighborhood estimator
There is no need to estimate a first-order trend; instead, the
mean is estimated from nearby data only.
Estimates are not as sensitive to non-stationarity (though the
covariance model may be affected, it is not as strongly affected
at short ranges as at long ones).

Universal Kriging

Universal Kriging

## Includes a first order trend component

Y (s ) = 0 + 1 x1( s) + 2 x2 ( s) + L + n xn ( s) + ( s)
n

Y ( s ) = i ( s) Y ( si )
i= 1

## As for ordinary kriging, the weights are chosen to minimize

mean squared error

## Subject to the constraint that Y ( s ) is unbiased for

{ }

that is E Y (s ) = E{(Y (s )}

i =1

=1

## To obtain the weights that minimize the mean square error

subject to this constraint, again use the method of Lagrange
multipliers

## Forms a prediction for y in one step

MSE = E | Y ( s) Y (s) | 2

## Y ( s ) is unbiased if and only if

Y(s)

for all 0 1 2 ,L p

## Taking the derivatives with respect to and v, setting the

expressions to zero and re-arranging terms we get the
kriging equations (using variogram)
n

(s
i =1

i= 1

=1

sk ) + v 0 + v j x j ( sk ) = ( sk s0 ); k = 1, 2L , n
j =1

i k

( s i ) = xk ( s0 ); k = 1,2 L , p

i =1

Universal Kriging

Universal Kriging

In matrix notation
(s1 , s2 )
0

0
( s2 , s1 )

M
M

( sn , s1 ) (s1 , s2 )
1
1

x1 (s1 )
x1 (s 2 )

M
M

x (s )
x p (s2 )
p 1

L ( s1 , s n ) 1 x1 (s1 ) L xp ( s1 ) 1

L (s 2 , sn ) 1 x1 (s 2 ) L x p (s 2 ) 2
O
M
M
M
M
M M

L
0
1 x 1 (s n ) L x p (s n ) n
L
1
0
0
L
0 v0

L x1 ( sn ) 0
0
L
0 v
1

M
M
M
M M

L x p (s n ) 0
0
L
0
vp

(s 1 s0 )

( s2 s0 )

( sn s0 )

x1 (s 0 )

x (s )
p 0

C+
C (s1 , s2 )

C (s , s )
1 2

x1 ( s1 )

x (s )
p 1

L C( s1 , s 2 )
O
M

x1 (s1 ) L
M
O

L C( s1 , s 2 ) x1 (s n )
L x1 (s n )
0
O
M
M
L x p ( sn )
0

L
L
O
L

x p ( s1 )

M
x p (s n )

0
M
0

+ (s )

1 ( s)

M
(s )
n
v1 ( s)

M
v p ( s)

c + ( s)

C ( s, s1)

M
C ( s, s )
1

x1 (s )

M
x (s )
p

solution

+ ( s) = C+1c+ ( s)
MSE

Prediction is
1
+ +

y ( s) = T ( s) y

= c (s )C c ( s)
2
e

T
+

Universal Kriging
n May make more sense to estimate trend explicitly
n Need to estimate trend to derive residuals for variogram
modeling in any case since it is only safe to estimate
variogram model from y when the mean is assumed
constant
n Does not make sense to use it for local neighborhoods